Making People Search Tool in 2022 Using BeautifulSoup

Hello and welcome back everyone, in this part of the blog I am going to share how can we create our own people search tool using BeautifulSoup and Python. We will leverage the Google search engine to search for the profiles of people based on keyword and we will use GitHub's search to do people search in GitHub.

Introduction

Why do we need people finder?

There are lots of benefits of having easier way to find the person. And this might be the best thing for hiring companies. Having the list of professionals and their public profile based on their expertise and the experience is one of the rich data. So lets try to make one such data for ourselves. It is a great resource to have a people search tool within a grasp.

How will we do it?

We will automate the search in Search Engine and some search portal like GitHub and then store the result in dataframe then into file. First, we do google search to find the linkedin profile based on the keyword. In other words, we will do people search in Google Search based on the keyboard and the we will do people search in GitHub's search.

Installing Dependencies

For this purpose, we are going to use BeautifulSoup, a python library.

!pip install beautifulsoup4

Importing Libraries

import pandas as pd
import requests
import urllib3
from bs4 import BeautifulSoup as BS
import time
import bs4
import warnings
warnings.filterwarnings("ignore")

There are bunch of libraries we will use:

  • Pandas to make dataframe later on.
  • Requests to make HTTP requests.
  • Urllib3 to make manager and headers.
  • BeautifulSoup to scrape and search over the page.
  • Time to show scrape time.
  • Warning to supress the warnings.

Google Search

Lets use google search to make our first people search. Head over to the Google.com and make a first search, linkedin google engineer.

Whenever we search something in Google, it takes our query into https://google.com/search?q= and shows the list of results. But mostly the results are location based.


query="linkedin google engineer"
url = f"https://google.com/search?q={query}"

http = urllib3.PoolManager()
http.addheaders = [('User-agent', 'Mozilla/61.0')]
# web_page = http.request('GET',url)
web_page=requests.get(url)
soup = BS(web_page.content, 'html5lib')
# soup

Getting All URLs

In the search result, there will be a lot of links and we only need links at this moment. So lets find all links using the element a.

urls = soup.find_all("a")
# urls

Getting only URLs that will be relevant

There will be lots of other links which will not be relevant to us at this moment. For example the Google's Sign In page or Privacy Policy so lets do simple check. Lets put the name, url of the profile and then role in a dictionary.

profiles = {"names":[],"urls":[],"roles":[]}

for url in urls:
    href = url.get("href")

    if "/url?q=" in href and "linkedin" in href and \
        "accounts.google.com" not in href and "policies.google.com" not in href and "linkedin.com/in" in href:
        nhref=href.split("=")[1].split("&")[0]

        print(url.text, nhref)

        profiles["names"].append(url.text.split("-")[0])
        profiles["roles"].append(url.text.split("-")[1])
        profiles["urls"].append(nhref)
Akshay Miterani - Software Engineer - Google - LinkedInin.linkedin.com › akshay-mite... https://in.linkedin.com/in/akshay-miterani-108827105
David Garry - Software Engineer - Google | LinkedInwww.linkedin.com › davidgar... https://www.linkedin.com/in/davidgarry1
Betty Chen - Software Engineer - Google | LinkedInwww.linkedin.com › bettyjxch... https://www.linkedin.com/in/bettyjxchen
Risab Manandhar - Software Engineer - Google - LinkedInwww.linkedin.com › risab-ma... https://www.linkedin.com/in/risab-manandhar
Hai Bi - Software Engineer - Google | LinkedInwww.linkedin.com › ... https://www.linkedin.com/in/hai-bi-b6a10010
Sabbir Yousuf Sanny - Software Engineer - Google | LinkedInwww.linkedin.com › ... https://www.linkedin.com/in/sabbir-yousuf-sanny-11aa7a21
Delia Lazarescu - Software Engineer - Google - LinkedInca.linkedin.com › delialazarescu https://ca.linkedin.com/in/delialazarescu
Shailee Patel - Software Engineer - Google | LinkedInwww.linkedin.com › shailee26 https://www.linkedin.com/in/shailee26
Sahil Gaba - Software Engineer - Google | LinkedInwww.linkedin.com › gabag26 https://www.linkedin.com/in/gabag26

Dataframe of the results

Dataframes are easy to do data analysis works in Pandas. So lets make one out of above dictionary.

pd.DataFrame(profiles)
names urls roles
0 Akshay Miterani https://in.linkedin.com/in/akshay-miterani-108... Software Engineer
1 David Garry https://www.linkedin.com/in/davidgarry1 Software Engineer
2 Betty Chen https://www.linkedin.com/in/bettyjxchen Software Engineer
3 Risab Manandhar https://www.linkedin.com/in/risab-manandhar Software Engineer
4 Hai Bi https://www.linkedin.com/in/hai-bi-b6a10010 Software Engineer
5 Sabbir Yousuf Sanny https://www.linkedin.com/in/sabbir-yousuf-sann... Software Engineer
6 Delia Lazarescu https://ca.linkedin.com/in/delialazarescu Software Engineer
7 Shailee Patel https://www.linkedin.com/in/shailee26 Software Engineer
8 Sahil Gaba https://www.linkedin.com/in/gabag26 Software Engineer

Pros and Cons of Using Google Search

  • Its easier to make GET requests however it might ask us for security check if made too many request and at that time scraping fails.
  • It is easier to do people search as Google's Crawlers already have list of results based on our query and thus we only have to do very little to find right information. But it might be tough to get information by visiting LinkedIn profile.
  • With very little luck, we could visit the person's LinkedIn profile without having to login. So relying in Google Search to find LinkedIn Profile is not much fruitful.

GitHub Search

In the above part, we scraped some of the LinkedIn profiles from the Google Search but we were unable to get portfolio of people. It is quite common among the tech people to have a portfolio and GitHub account. Lets use GitHub's Search to do people search based on the keyword. Most people often put their location, company they work for, twitter handle and the portfolio in the GitHub Profile and we are willing to scrape those.

The URL to get result is https://github.com/search?q=[QUERY]&type=users&p=[PAGE]. Where QUERY is the query we do search for, type is user and the p for page.


query="google engineer"
url = f"https://github.com/search?q={query}&type=users"

print(f"URL : {url}")

http = urllib3.PoolManager()
http.addheaders = [('User-agent', 'Mozilla/61.0')]
# web_page = http.request('GET',url)
web_page=requests.get(url)
soup = BS(web_page.content, 'html5lib')

pages = soup.find_all("em", class_="current")[0].get("data-total-pages")

max_page = 5

pages
URL : https://github.com/search?q=google engineer&type=users

'100'

In above result, we did GET request on google engineer users and received a webpage and upon Inspecting the page, we can see the Elements. From Elements we can find the elements like dev, a and so on where our desired information will be. Like that, we searched for em with class as current and it have a data-total-pages attribute in it. Upon doing get, one can get the value of it. It seems that there are 100 pages with results.

Now lets loop over to those pages to get the information of the user like name and URL of profile.


github_profiles = {"name":[], "urls":[]}

if pages:
    pages=int(pages)
    print(f"Total Pages: {pages}")

    for page in range(1,pages):
        if page>max_page:
            break
        url = f"https://github.com/search?q={query}&type=users&p={page}"
        print(f"\n Current URL: {url} \n")

        http = urllib3.PoolManager()
        http.addheaders = [('User-agent', 'Mozilla/61.0')]

        web_page=requests.get(url)
        soup = BS(web_page.content, 'html5lib')

        for a in soup.find_all("a",class_="mr-1"):

            gurl = "https://github.com/"+a.get("href")
            gname = a.text

            print(gname, gurl)

            github_profiles["name"].append(gname)
            github_profiles["urls"].append(gurl)
Total Pages: 100

 Current URL: https://github.com/search?q=google engineer&type=users&p=1 

Seth Vargo https://github.com//sethvargo
Kevin Naughton Jr. https://github.com//kdn251
Miguel Ángel Durán https://github.com//midudev
Jose Alcérreca https://github.com//JoseAlcerreca
Shubham Mathur https://github.com//googleknight
Dan Field https://github.com//dnfield
Nick Bourdakos https://github.com//bourdakos1
Mark https://github.com//MarkEdmondson1234

 Current URL: https://github.com/search?q=google engineer&type=users&p=2 

Pierfrancesco Soffritti https://github.com//PierfrancescoSoffritti
Parker Moore https://github.com//parkr
Gokmen Goksel https://github.com//gokmen
Justin Poehnelt https://github.com//jpoehnelt
Sanket Singh https://github.com//singhsanket143
Shanqing Cai https://github.com//caisq
Adam Silverstein https://github.com//adamsilverstein
Mizux https://github.com//Mizux
Valerii Iatsko https://github.com//viatsko
Zulkarnine Mahmud https://github.com//zulkarnine

 Current URL: https://github.com/search?q=google engineer&type=users&p=3 

Yacine Rezgui https://github.com//yrezgui
Zulkarnine Mahmud https://github.com//zulkarnine
Google https://github.com//Google987
Prateek Narang https://github.com//prateek27
Rakina Zata Amni https://github.com//rakina
Sriram Sundarraj https://github.com//ssundarraj
Irene Ros https://github.com//iros
Clément Mihailescu https://github.com//clementmihailescu
Márton Braun https://github.com//zsmb13
Andrey Kulikov https://github.com//andkulikov

 Current URL: https://github.com/search?q=google engineer&type=users&p=4 

Kate Lovett https://github.com//Piinks
Faisal Abid https://github.com//FaisalAbid
Viktor Turskyi https://github.com//koorchik
Milad Naseri https://github.com//mmnaseri
Rahul Ravikumar https://github.com//tikurahul
Robert Kubis https://github.com//hostirosti
Corey Lynch https://github.com//coreylynch
Emma Twersky https://github.com//twerske
Shivam Goyal https://github.com//ShivamGoyal1899
Abhinay Omkar https://github.com//abhiomkar

 Current URL: https://github.com/search?q=google engineer&type=users&p=5 

Marek Siarkowicz https://github.com//serathius
Mais Alheraki https://github.com//pr-Mais
Abhinay Omkar https://github.com//abhiomkar
Imaculate https://github.com//imaculate
Zhixun Tan https://github.com//phisiart
Jafer Haider https://github.com//itsjafer
Christie Wilson https://github.com//bobcatfish
Jason Feinstein https://github.com//jasonwyatt
Ryan Sepassi https://github.com//rsepassi
Nick Rout https://github.com//ricknout

In above example, we looped for 5 pages and we have stored those info in dictionary github_profiles.

Now we open the profile of a person and extract information like Twitter Handle and Portfolio.
Lets select a last profile url.

gurl
'https://github.com//ricknout'

Lets visit that url from BS4.


http = urllib3.PoolManager()
http.addheaders = [('User-agent', 'Mozilla/61.0')]
# web_page = http.request('GET',url)
web_page=requests.get(gurl)
soup = BS(web_page.content, 'html5lib')
# soup

Just like previous time, we should look for the class that holds our information. For headline we can do like below.

headline = soup.find_all("div",class_="p-note user-profile-bio mb-3 js-user-profile-bio f4")[0].text
headline
'Android Developer Relations Engineer at Google 🇿🇦'

For Followers and Following Counts we can do something like below.

followers = soup.find_all("a",class_="Link--secondary no-underline no-wrap")[0].text.strip().split("\n")[0]
following = soup.find_all("a",class_="Link--secondary no-underline no-wrap")[1].text.strip().split("\n")[0]
followers,following
('510', '29')

For the information like Twitter handle and Portfolio URL we can do something like below.

vcard = soup.find_all("ul",class_="vcard-details")[0].text
vcard = [v.strip() for v in vcard.strip().split("\n") if len(v.strip())>0]

vcard
['@google', 'Cape Town, South Africa', 'ricknout.dev', 'Twitter', '@ricknout']

But more easily, we can find these information using Itemprop attribute assigned.

vcard = soup.find_all("ul",class_="vcard-details")[0]

portfolio=None
home=None
work=None
twitter=None
for vc in vcard.find_all("li"):
    item=vc.get("itemprop")
    if item=="url":
        portfolio=vc.text.strip()
    if item=="homeLocation":
        home=vc.text.strip()
    if item=="worksFor":
        work=vc.text.strip()
    if item=="twitter":
        twitter=vc.text.strip()

portfolio,home,work,twitter
('ricknout.dev',
 'Cape Town, South Africa',
 '@google',
 'Twitter\n\n      @ricknout')

Now lets combine above codes to work as a whole.


query="google engineer"
url = f"https://github.com/search?q={query}&type=users"

print(f"URL : {url}")

http = urllib3.PoolManager()
http.addheaders = [('User-agent', 'Mozilla/61.0')]
# web_page = http.request('GET',url)
web_page=requests.get(url)
soup = BS(web_page.content, 'html5lib')

pages = soup.find_all("em", class_="current")[0].get("data-total-pages")

max_page = 5

github_profiles = {"name":[], "urls":[], "portfolio":[],"headline":[],
                   "followers":[],"following":[],
                   "home":[], "work":[], "twitter":[]}

if pages:
    pages=int(pages)
    print(f"Total Pages: {pages}. Running upto {max_page}.")

    for page in range(1,pages):
        if page>max_page:
            break
        url = f"https://github.com/search?q={query}&type=users&p={page}"
        print(f"\n Current URL: {url} \n")

        http = urllib3.PoolManager()
        http.addheaders = [('User-agent', 'Mozilla/61.0')]

        web_page=requests.get(url)
        osoup = BS(web_page.content, 'html5lib')

        for a in osoup.find_all("a",class_="mr-1"):

            gurl = "https://github.com/"+a.get("href")
            gname = a.text

            print(f"Got: {gname}, {gurl}")

            github_profiles["name"].append(gname)
            github_profiles["urls"].append(gurl)

            http = urllib3.PoolManager()
            http.addheaders = [('User-agent', 'Mozilla/61.0')]
            web_page=requests.get(gurl)
            soup = BS(web_page.content, 'html5lib')

            headline = soup.find_all("div",class_="p-note user-profile-bio mb-3 js-user-profile-bio f4")[0].text

            github_profiles["headline"].append(headline)

            followers = soup.find_all("a",class_="Link--secondary no-underline no-wrap")[0].text.strip().split("\n")[0]
            following = soup.find_all("a",class_="Link--secondary no-underline no-wrap")[1].text.strip().split("\n")[0]

            github_profiles["followers"].append(followers)
            github_profiles["following"].append(following)

            vcard = soup.find_all("ul",class_="vcard-details")[0]

            portfolio=None
            home=None
            work=None
            twitter=None
            for vc in vcard.find_all("li"):
                item=vc.get("itemprop")
                if item=="url":
                    portfolio=vc.text.strip()
                if item=="homeLocation":
                    home=vc.text.strip()
                if item=="worksFor":
                    work=vc.text.strip()
                if item=="twitter":
                    twitter=vc.text.strip().split("\n")[-1].strip()

            github_profiles["portfolio"].append(portfolio)
            github_profiles["home"].append(home)
            github_profiles["work"].append(work)
            github_profiles["twitter"].append(twitter)
URL : https://github.com/search?q=google engineer&type=users
Total Pages: 100. Running upto 5.

 Current URL: https://github.com/search?q=google engineer&type=users&p=1 

Got: Seth Vargo, https://github.com//sethvargo
Got: Kevin Naughton Jr., https://github.com//kdn251
Got: Miguel Ángel Durán, https://github.com//midudev
Got: Jose Alcérreca, https://github.com//JoseAlcerreca
Got: Shubham Mathur, https://github.com//googleknight
Got: Nick Bourdakos, https://github.com//bourdakos1
Got: Mark, https://github.com//MarkEdmondson1234
Got: Dan Field, https://github.com//dnfield
Got: Pierfrancesco Soffritti, https://github.com//PierfrancescoSoffritti

 Current URL: https://github.com/search?q=google engineer&type=users&p=2 

Got: Parker Moore, https://github.com//parkr
Got: Gokmen Goksel, https://github.com//gokmen
Got: Sanket Singh, https://github.com//singhsanket143
Got: Justin Poehnelt, https://github.com//jpoehnelt
Got: Shanqing Cai, https://github.com//caisq
Got: Valerii Iatsko, https://github.com//viatsko
Got: Mizux, https://github.com//Mizux
Got: Gabriela D'Ávila Ferrara, https://github.com//gabidavila
Got: Zulkarnine Mahmud, https://github.com//zulkarnine

 Current URL: https://github.com/search?q=google engineer&type=users&p=3 

Got: Adam Silverstein, https://github.com//adamsilverstein
Got: Yacine Rezgui, https://github.com//yrezgui
Got: Google, https://github.com//Google987
Got: Prateek Narang, https://github.com//prateek27
Got: Rakina Zata Amni, https://github.com//rakina
Got: Clément Mihailescu, https://github.com//clementmihailescu
Got: Faisal Abid, https://github.com//FaisalAbid
Got: Kate Lovett, https://github.com//Piinks
Got: Andrey Kulikov, https://github.com//andkulikov
Got: Márton Braun, https://github.com//zsmb13

 Current URL: https://github.com/search?q=google engineer&type=users&p=4 

Got: Kate Lovett, https://github.com//Piinks
Got: Faisal Abid, https://github.com//FaisalAbid
Got: Viktor Turskyi, https://github.com//koorchik
Got: Milad Naseri, https://github.com//mmnaseri
Got: Rahul Ravikumar, https://github.com//tikurahul
Got: Robert Kubis, https://github.com//hostirosti
Got: Corey Lynch, https://github.com//coreylynch
Got: Emma Twersky, https://github.com//twerske
Got: Shivam Goyal, https://github.com//ShivamGoyal1899
Got: Abhinay Omkar, https://github.com//abhiomkar

 Current URL: https://github.com/search?q=google engineer&type=users&p=5 

Got: Marek Siarkowicz, https://github.com//serathius
Got: Greg Spencer, https://github.com//gspencergoog
Got: Mais Alheraki, https://github.com//pr-Mais
Got: Jafer Haider, https://github.com//itsjafer
Got: Zhixun Tan, https://github.com//phisiart
Got: Imaculate, https://github.com//imaculate
Got: Christie Wilson, https://github.com//bobcatfish
Got: Jason Feinstein, https://github.com//jasonwyatt
Got: Nick Rout, https://github.com//ricknout
Got: Joe Stanton, https://github.com//JoeStanton

Turn Result into Dataframe

In order to do data analysis, it is easier to work with tabular data. So lets convert our above dictionary into dataframe.

df = pd.DataFrame(github_profiles)
df
name urls portfolio headline followers following home work twitter
0 Seth Vargo https://github.com//sethvargo https://www.sethvargo.com Engineer @google 3.2k 5 Pittsburgh, PA @google @sethvargo
1 Kevin Naughton Jr. https://github.com//kdn251 youtube.com/kevinnaughtonjr Software Engineer @google 3.8k 9 New York, New York Google @kevinnaughtonjr
2 Miguel Ángel Durán https://github.com//midudev https://midu.dev Software Engineer\n\nGitHub Star 🌟\nGoogle Dev... 6.3k 10 Barcelona @AdevintaSpain @midudev
3 Jose Alcérreca https://github.com//JoseAlcerreca twitter.com/ppvi Android Developer Relations Engineer @ Google 2.3k 0 Madrid, Spain @google @ppvi
4 Shubham Mathur https://github.com//googleknight https://googleknight.github.io Software engineer II @ MDL Bangalore\n 29 40 Bangalore, India Mckinsey & Company None
5 Nick Bourdakos https://github.com//bourdakos1 None Software Engineer @google 480 8 New York City @google None
6 Mark https://github.com//MarkEdmondson1234 https://code.markedmondson.me/ Data Engineer @iihnordic \nGoogle Developer E... 783 117 Copenhagen @iihnordic @HoloMarkeD
7 Dan Field https://github.com//dnfield None Software Engineer @google for @flutter 928 0 None @google None
8 Pierfrancesco Soffritti https://github.com//PierfrancescoSoffritti https://pierfrancescosoffritti.com/ Software engineer @google 566 40 London, UK @google @psoffritti
9 Parker Moore https://github.com//parkr https://byparker.com 🍩 🌎 Senior Engineer. Currently: @google. Forme... 1.3k 316 USA Google None
10 Gokmen Goksel https://github.com//gokmen None Software Engineer @google 396 58 San Francisco @google @gokmen
11 Sanket Singh https://github.com//singhsanket143 None SDE @google | SDE @linkedin | Google Summer Of... 1.7k 13 India @google None
12 Justin Poehnelt https://github.com//jpoehnelt https://justin.poehnelt.com @google, @googleworkspace Developer Relations... 307 9 United States @google @jpoehnelt
13 Shanqing Cai https://github.com//caisq https://caisq.github.io/ Software Engineer @ Google Research 343 59 None Google None
14 Valerii Iatsko https://github.com//viatsko None UI Engineer @ Google 346 55 None Google None
15 Mizux https://github.com//Mizux http://www.mizux.net OSS Release Engineer @google 167 81 Tours, France @google None
16 Gabriela D'Ávila Ferrara https://github.com//gabidavila https://gabi.dev Developer Relations Engineer @google 247 25 New Jersey @google @GoogleCloudPlatform @gabidavila
17 Zulkarnine Mahmud https://github.com//zulkarnine www.zulkarnine.com Software Engineer at Google 564 0 None Google None
18 Adam Silverstein https://github.com//adamsilverstein http://www.earthbound.com Developer Relations Engineer @ Google 153 6 Colorado, USA Google None
19 Yacine Rezgui https://github.com//yrezgui https://yrezgui.com Creative software engineer.\nDeveloper advocat... 513 143 London, UK Google None
20 Google https://github.com//Google987 youtube.com/alittlecoding Software Engineer\nYoutube: a little coding 13 11 India None @arif_decrypted
21 Prateek Narang https://github.com//prateek27 www.prateeknarang.com Software Engineer-III at Google, Udemy Instruc... 2.4k 4 Hyderabad Google India None
22 Rakina Zata Amni https://github.com//rakina None Software Engineer @google @chromium 🇮🇩 🇯🇵👩‍💻 307 22 Tokyo, Japan @google None
23 Clément Mihailescu https://github.com//clementmihailescu algoexpert.io/clem Co-Founder & CEO, AlgoExpert | Ex-Google & Ex-... 7.5k 2 None AlgoExpert @clemmihai
24 Faisal Abid https://github.com//FaisalAbid http://www.FaisalAbid.com @google Developer Expert, Entrepreneur, and En... 505 30 Toronto @eirene-cremations @bitstrapped @Shopistry @FaisalAbid
25 Kate Lovett https://github.com//Piinks None Software Engineer at @google for @flutter 745 10 Nashville, TN Google @k8lovett
26 Andrey Kulikov https://github.com//andkulikov http://linkedin.com/in/andkulikov/ Software Engineer at Google 378 0 London @google None
27 Márton Braun https://github.com//zsmb13 https://zsmb.co/ Android Developer Relations Engineer @google, ... 418 7 Budapest, Hungary @google @zsmb13
28 Kate Lovett https://github.com//Piinks None Software Engineer at @google for @flutter 745 10 Nashville, TN Google @k8lovett
29 Faisal Abid https://github.com//FaisalAbid http://www.FaisalAbid.com @google Developer Expert, Entrepreneur, and En... 505 30 Toronto @eirene-cremations @bitstrapped @Shopistry @FaisalAbid

limit_output extension: Maximum message size of 10000 exceeded with 15427 characters

df.shape
(48, 9)

We were able to scrape about 50 profiles. There are lots of rich information like portfolio of a person and his/her profile headline and twitter handle.

There are few more cleaning needed too. Like The followers and following counts.

df["followers"] = df.followers.apply(lambda x: 1000*float(x.replace("k","")) if "k" in x else float(x))
df.head()
name urls portfolio headline followers following home work twitter
0 Seth Vargo https://github.com//sethvargo https://www.sethvargo.com Engineer @google 3200.0 5 Pittsburgh, PA @google @sethvargo
1 Kevin Naughton Jr. https://github.com//kdn251 youtube.com/kevinnaughtonjr Software Engineer @google 3800.0 9 New York, New York Google @kevinnaughtonjr
2 Miguel Ángel Durán https://github.com//midudev https://midu.dev Software Engineer\n\nGitHub Star 🌟\nGoogle Dev... 6300.0 10 Barcelona @AdevintaSpain @midudev
3 Jose Alcérreca https://github.com//JoseAlcerreca twitter.com/ppvi Android Developer Relations Engineer @ Google 2300.0 0 Madrid, Spain @google @ppvi
4 Shubham Mathur https://github.com//googleknight https://googleknight.github.io Software engineer II @ MDL Bangalore\n 29.0 40 Bangalore, India Mckinsey & Company None
df["following"] = df.following.apply(lambda x: 1000*float(x.replace("k","")) if "k" in x else float(x))
df.head()
name urls portfolio headline followers following home work twitter
0 Seth Vargo https://github.com//sethvargo https://www.sethvargo.com Engineer @google 3200.0 5.0 Pittsburgh, PA @google @sethvargo
1 Kevin Naughton Jr. https://github.com//kdn251 youtube.com/kevinnaughtonjr Software Engineer @google 3800.0 9.0 New York, New York Google @kevinnaughtonjr
2 Miguel Ángel Durán https://github.com//midudev https://midu.dev Software Engineer\n\nGitHub Star 🌟\nGoogle Dev... 6300.0 10.0 Barcelona @AdevintaSpain @midudev
3 Jose Alcérreca https://github.com//JoseAlcerreca twitter.com/ppvi Android Developer Relations Engineer @ Google 2300.0 0.0 Madrid, Spain @google @ppvi
4 Shubham Mathur https://github.com//googleknight https://googleknight.github.io Software engineer II @ MDL Bangalore\n 29.0 40.0 Bangalore, India Mckinsey & Company None

Plotting Followers

We will use Seaborn a library built above Matplotlib.

import seaborn as sns
sns.set()

df.followers.hist()

It seems that most people have very less followers.

Finding Contact Details

If a profile has portfolio, then there is high chances that the portfolio has contact page too. So again, we can scrape that portfolio and collect such information.

Pros and Cons of Using GitHub Search

  • It is quite easier to do people search in tech based on the skill-set but finding people who work in tech but does not have GitHub profile is not possible.
  • Getting contact details is only possible if a person has the portfolio and that portfolio has it. Either way, its easier than finding the information from LinkedIn Profile.
  • Sometimes the results might not be shown once GitHub suspects something is wrong in our request.

For more blogs from us about BeautifulSoup, please follow the link.

Leave a Reply

Scroll to top
Subscribe to our Newsletter

Hello surfer, thank you for being here. We are as excited as you are to share what we know about data. Please subscribe to our newsletter for weekly data blogs and many more. If you’ve already done it, please close this popup.



No, thank you. I do not want.
100% secure.