Staff scraper library for LinkedIn
Project description
StaffSpy is a staff scraper library for LinkedIn.
Features
- Scrapes staff from a company on LinkedIn
- Obtains skills, experiences, certifications & more
- Aggregates the employees in a Pandas DataFrame
Video Guide for StaffSpy - updated for release v0.1.4
Installation
pip install -U staffspy
Python version >= 3.10 required
Usage
from staffspy import scrape_staff
from pathlib import Path
session_file = Path(__file__).resolve().parent / "session.pkl"
staff = scrape_staff(
company_name="openai",
search_term="software engineer", # optional
location="london", # optional
extra_profile_data=True, # fetch all past experiences, schools, & skills
max_results=50, # can go up to 1000
session_file=str(session_file), # save browser cookies
log_level=1,
)
filename = f"staff.csv"
staff.to_csv(filename, index=False)
A browser will open to sign in to LinkedIn on the first sign-in. Press enter after signing in to begin scraping.
Partial Output
name | position | profile_id | first_name | last_name | potential_email | company | school | location | followers | connections | premium |
---|---|---|---|---|---|---|---|---|---|---|---|
Andrei Gheorghe | Product Engineer | idevelop | Andrei | Gheorghe | andrei.gheorghe@openai.com | OpenAI | Universitatea „Politehnica” din București | London, England, United Kingdom | 723 | 704 | FALSE |
Douglas Li | @ OpenAI UK, previously at Meta | dougli | Douglas | Li | douglas.li@openai.com | OpenAI | Washington University in St. Louis | London, England, United Kingdom | 533 | 401 | TRUE |
Javier Sierra | Software Engineer | javiersierra2102 | Javier | Sierra | javier.sierra@openai.com | OpenAI | Hult International Business School | London, England, United Kingdom | 726 | 717 | FALSE |
Parameters for scrape_staff()
├── company_name (str):
| company identifier on linkedin
| e.g. openai from https://www.linkedin.com/company/openai
Optional
├── search_term (str):
| staff title to search for
| e.g. software engineer
|
├── location (str):
| location the staff resides
| e.g. london
│
├── extra_profile_data (bool)
| fetches educations, experiences, skills, certifications (Default false)
│
├── max_results (int):
| number of staff to fetch, default/max is 1000 for a search imposed by LinkedIn
│
├── session_file (str):
| file path to save session cookies, so only one manual login is needed.
| can use mult profiles this way
│
├── log_level (int):
| Controls the verbosity of the runtime printouts
| (0 prints only errors, 1 is info, 2 is all logs. Default is 0.)
Staff Schema
Staff
├── search_term
├── id
├── name
├── bio
|
├── position
├── profile_id
├── profile_link
├── first_name
├── last_name
├── potential_email
├── estimated_age
|
├── followers
├── connections
├── mutual_connections
|
├── location
├── company_1
├── company_2
├── company_3
├── school_1
├── school_2
|
├── influencer
├── creator
├── premium
├── profile_photo
|
├── skills
│ ├── name
│ └── endorsements
├── experiences
│ ├── from_date
│ ├── to_date
│ ├── duration
│ ├── title
│ ├── company
│ ├── location
│ └── emp_type
├── certifications
│ ├── title
│ ├── issuer
│ ├── date_issued
│ ├── cert_id
│ └── cert_link
└── schools
├── years
├── school
└── degree
└── LinkedIn notes:
| - only 1000 max results per search
| - extra_profile_data increases runtime by O(n)
Frequently Asked Questions
Q: Can I get my account banned?
A: It is a possibility, although there are no recorded incidents. Let me know if you are the first.
Q: Scraped 999 staff members, with 869 hidden LinkedIn Members?
A: It means your LinkedIn account is bad. Not sure how they classify it but unverified email and a bunch of factors go into it.
Q: Encountering issues with your queries?
A: If problems
persist, submit an issue.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for staffspy-0.1.17-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5a9af96945bacc2e0a278412aee14c5ecfbb4de39909fda2a6edbaf925380b0 |
|
MD5 | 4747c0d88cd8b404ad25416958e18e5f |
|
BLAKE2b-256 | b27da5c4ecaa35325f3e864f704eca184ad191d5d80e6bc9de083054b3979105 |