Skip to main content

A LinkedIn Profile Scraper

Project description

LinkedIn Employee Extraction and Gender eStimation (LEEGS)

Installation

$ pip install leegs  

See Options

$ leegs --help  

Run Challenges 1 & 2 for 20 employees

$ leegs hubspot -at 20

How It Works

Leegs is a powerful tool that allows you to scrape LinkedIn profiles and then uses AI to guess the gender of the employees. This can be extremely useful in diversity hiring initiatives. It completes the job by breaking it into smaller tasks, like an assembly line. First, leegs goes to the profile page of the company scrapes the links to the profiles of all their employees that have public profiles on LinkedIn. Then, it uses multiple threads to crawl the profile links concurrently and extract employee information. Finally, it sends the profile pictures from each employee that had one through an open source gender detection software, DeepFace (a smaller open source one, not the Meta one).
Under the hood, the majority of the speedups come from employee_scraper.py where it runs browsers in various threads in order to speed up employee data extraction. By splitting up the employee load between threads, they can each log in to a different account making sure none of them get flagged. This allows me to keep extracting lots of data without needing loads of accounts. In the case that an account does get banned, the thread is removed and the profile links it couldn't crawl are spread among the other threads. This ensures that leegs can always complete the job, even if there's a faulty piece.
The image prediction model used is called DeepFace. It is a tool to create image detection models or use pre-trained ones. I chose to use retina net. It is a Convolutional neural network that is trained on over a million images with over 1000 different people to predict if an image is of a male or female. The model outputs a prediction for each image and the leegs scraper keeps track of the gender output for each employee.

Full CLI functionality

get info & gender detection on one employee

$ leegs employee -al https://www.linkedin.com/in/<employeeID> -p photos  #### get hubspot employee data for 20 employees without downloading their profile pics 

crawl through a file of profile links must be a .txt file

$ leegs employee -f filename.txt -p photos LinkedIn Profile Scraper & Gender Classification

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leegs-1.3.0.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

leegs-1.3.0-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file leegs-1.3.0.tar.gz.

File metadata

  • Download URL: leegs-1.3.0.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.2

File hashes

Hashes for leegs-1.3.0.tar.gz
Algorithm Hash digest
SHA256 8a11572b9965ae5ce883d266bd138d6a9e529fa4c9e46904b0e0adb6a7b72fa2
MD5 0ee0bafddde4f3326efa5458569a97f2
BLAKE2b-256 dfc1ea2aad96ca6af54e122d23aa89837d06b12b4e7ba3c2ba09850439bdfea6

See more details on using hashes here.

File details

Details for the file leegs-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: leegs-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.2

File hashes

Hashes for leegs-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f21c402f66bd2a3f8a88b9e6557465cecdcf4ac2e28680b5ec34b72914fc933
MD5 7c0a1b4b78c0cf511de4ac8c0bd17303
BLAKE2b-256 20a71e5a5a0bad8777868391a6d90752c82c38a65902213bddcdec8757a9d675

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page