A LinkedIn Profile Scraper
Project description
LinkedIn Employee Extraction and Gender eStimation (LEEGS)
Installation
$ pip install leegs
See Options
$ leegs --help
Run Challenges 1 & 2 for 20 employees
$ leegs hubspot -at 20
How It Works
Leegs is a powerful tool that allows you to scrape LinkedIn profiles and then uses AI to guess the gender of the employees. This can be extremely useful in diversity hiring initiatives. It completes the job by breaking it into smaller tasks, like an assembly line. First, leegs goes to the profile page of the company scrapes the links to the profiles of all their employees that have public profiles on LinkedIn. Then, it uses multiple threads to crawl the profile links concurrently and extract employee information. Finally, it sends the profile pictures from each employee that had one through an open source gender detection software, DeepFace (a smaller open source one, not the Meta one).
Under the hood, the majority of the speedups come from employee_scraper.py
where it runs browsers in various threads in order to speed up employee data extraction. By splitting up the employee load between threads, they can each log in to a different account making sure none of them get flagged. This allows me to keep extracting lots of data without needing loads of accounts. In the case that an account does get banned, the thread is removed and the profile links it couldn't crawl are spread among the other threads. This ensures that leegs can always complete the job, even if there's a faulty piece.
The image prediction model used is called DeepFace. It is a tool to create image detection models or use pre-trained ones. I chose to use retina net. It is a Convolutional neural network that is trained on over a million images with over 1000 different people to predict if an image is of a male or female. The model outputs a prediction for each image and the leegs scraper keeps track of the gender output for each employee.
Full CLI functionality
get info & gender detection on one employee
$ leegs employee -al https://www.linkedin.com/in/<employeeID> -p photos #### get hubspot employee data for 20 employees without downloading their profile pics
crawl through a file of profile links must be a .txt file
$ leegs employee -f filename.txt -p photos LinkedIn Profile Scraper & Gender Classification
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file leegs-1.3.0.tar.gz
.
File metadata
- Download URL: leegs-1.3.0.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a11572b9965ae5ce883d266bd138d6a9e529fa4c9e46904b0e0adb6a7b72fa2 |
|
MD5 | 0ee0bafddde4f3326efa5458569a97f2 |
|
BLAKE2b-256 | dfc1ea2aad96ca6af54e122d23aa89837d06b12b4e7ba3c2ba09850439bdfea6 |
File details
Details for the file leegs-1.3.0-py3-none-any.whl
.
File metadata
- Download URL: leegs-1.3.0-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f21c402f66bd2a3f8a88b9e6557465cecdcf4ac2e28680b5ec34b72914fc933 |
|
MD5 | 7c0a1b4b78c0cf511de4ac8c0bd17303 |
|
BLAKE2b-256 | 20a71e5a5a0bad8777868391a6d90752c82c38a65902213bddcdec8757a9d675 |