Skip to main content

Staff scraper library for LinkedIn, customized for specific needs. Original author: Cullen Watson <cullen@bunsly.com>. GitHub: https://github.com/cullenwatson/StaffSpy Do not use this if you do not need enhanced functionality.

Project description

3FAD4652-488F-4F6F-A744-4C2AA5855E92

StaffSpy is a staff scraper library for LinkedIn.

NOTE

⚠️ This package is enhanced version of StaffSpy by cullenwatson. for my specific need ⚠️

Refer to the original if you need general version. Enhancement is listed below in the Features section. If you need that you can use it too.

This is not well tested and may not work as expected. Use at your own risk. For me, it works.

If you want to use it just replace staffspy with staffspy_enhanced

Features

  • Scrapes staff from a company on LinkedIn
  • Obtains skills, experiences, certifications & more
  • Aggregates the employees in a Pandas DataFrame
  • Fetch Company Details (Enhancement)

Video Guide for StaffSpy - updated for release v0.1.4

Installation

pip install -U staffspy

Python version >= 3.10 required

Usage

from staffspy import scrape_staff
from pathlib import Path
session_file = Path(__file__).resolve().parent / "session.pkl"

staff = scrape_staff(
    company_name="openai",
    search_term="software engineer",
    location="london",
    extra_profile_data=True, # fetch all past experiences, schools, & skills
    
    # login credentials (remove these to sign in with browser)
    username="myemail@gmail.com",
    password="mypassword",
    capsolver_api_key="CAP-6D6A8CE981803A309A0D531F8B4790BC", # in case hit with captcha on sign-in
    

    max_results=50, # can go up to 1000
    session_file=str(session_file), # save login cookies to only log in once (lasts a week or so)
    log_level=1, # 0 for no logs
)
filename = "staff.csv"
staff.to_csv(filename, index=False)

Two login methods

Requests login

If you pass in a username & password, it will sign in via LinkedIn api (you should disable 2fa for this method). If hit with a captcha, you need to pass capsolver_api_key for the third-party service to solve it.

Browser login

If that fails or you rather use a browser, install the browser add-on to StaffSpy .

pip install staffspy[browser]

Do not pass the username & password params, then a browser will open to sign in to LinkedIn on the first sign-in. Press enter after signing in to begin scraping.

Output

profile_id name first_name last_name location age position followers connections premium company past_company1 past_company2 school extra_school skill1 skill2 skill3 is_connection premium creator potential_email profile_link profile_photo
javiersierra2102 Javier Sierra Javier Sierra London, England, United Kingdom 39 Software Engineer 735 725 FALSE OpenAI Meta Oculus VR Hult International Business School Universidad Simón Bolívar Java JavaScript C++ FALSE FALSE FALSE javier.sierra@openai.com, jsierra@openai.com https://www.linkedin.com/in/javiersierra2102 https://media.licdn.com/dms/image/C4D03AQHEyUg1kGT08Q/profile-displayphoto-shrink_800_800/0/1516504680512?e=1727913600&v=beta&t=3enCmNDBtJ7LxfbW6j1hDD8qNtHjO2jb2XTONECxUXw
dougli Douglas Li Douglas Li London, England, United Kingdom 37 @ OpenAI UK, previously at Meta 583 401 FALSE OpenAI Shift Lab Facebook Washington University in St. Louis Java Python JavaScript FALSE TRUE FALSE douglas.li@openai.com, dli@openai.com https://www.linkedin.com/in/dougli https://media.licdn.com/dms/image/D4E03AQETmRyb3_GB8A/profile-displayphoto-shrink_800_800/0/1687996628597?e=1727913600&v=beta&t=HRYGJ4RxsTMcPF1YcSikXlbz99hx353csho3PWT6fOQ
nkartashov Nick Kartashov Nick Kartashov London, England, United Kingdom 33 Software Engineer 2186 2182 TRUE OpenAI Google DeepMind St. Petersburg Academic University Bioinformatics Institute Teamwork Java Haskell FALSE FALSE FALSE nick.kartashov@openai.com, nkartashov@openai.com https://www.linkedin.com/in/nkartashov https://media.licdn.com/dms/image/D4E03AQEjOKxC5UgwWw/profile-displayphoto-shrink_800_800/0/1680706122689?e=1727913600&v=beta&t=m-JnG9nm0zxp1Z7njnInwbCoXyqa3AN-vJZntLfbzQ4

Parameters for scrape_staff()

├── company_name (str): 
|    company identifier on linkedin, will search for that company if that company id does not exist
|    e.g. openai from https://www.linkedin.com/company/openai

Optional 
├── search_term (str): 
|    staff title to search for
|    e.g. software engineer
|
├── location (str): 
|    location the staff resides
|    e.g. london
│
├── extra_profile_data (bool)
|    fetches educations, experiences, skills, certifications (Default false)
│
├── max_results (int): 
|    number of staff to fetch, default/max is 1000 for a search imposed by LinkedIn
│
├── session_file (str): 
|    file path to save session cookies, so only one manual login is needed.
|    can use mult profiles this way
│
├── username (str): 
|    linkedin account email
│
├── password (str): 
|    linkedin account password
|
├── capsolver_api_key (str): 
|    solves the captcha using capsolver.com if hit with captcha on login
│
├── log_level (int): 
|    Controls the verbosity of the runtime printouts 
|    (0 prints only errors, 1 is info, 2 is all logs. Default is 0.)

Staff Schema

Staff
├── Personal Information
│   ├── search_term
│   ├── id
│   ├── name
│   ├── first_name
│   ├── last_name
│   ├── location
│   └── bio
│
├── Professional Details
│   ├── position
│   ├── profile_id
│   ├── profile_link
│   ├── potential_emails
│   └── estimated_age
│
├── Social Connectivity
│   ├── followers
│   ├── connections
│   └── mutuals_count
│
├── Employment History
│   ├── company
│   ├── past_company1
│   ├── past_company2
│   ├── school
│   ├── extra_school
│   ├── top_skill_1
│   ├── top_skill_2
│   └── top_skill_3
│
├── Status
│   ├── influencer
│   ├── creator
│   ├── premium
│   └── is_connection
│
├── Visuals
│   └── profile_photo
│
├── Skills
│   ├── name
│   └── endorsements
│
├── Experiences
│   ├── from_date
│   ├── to_date
│   ├── duration
│   ├── title
│   ├── company
│   ├── location
│   └── emp_type
│
├── Certifications
│   ├── title
│   ├── issuer
│   ├── date_issued
│   ├── cert_id
│   └── cert_link
│
└── Educational Background
    ├── years
    ├── school
    └── degree

LinkedIn notes

- only 1000 max results per search
- extra_profile_data increases runtime by O(n)

Frequently Asked Questions


Q: Can I get my account banned?
A: It is a possibility, although there are no recorded incidents. Let me know if you are the first.


Q: Scraped 999 staff members, with 869 hidden LinkedIn Members?
A: It means your LinkedIn account is bad. Not sure how they classify it but unverified email, new account, low connections and a bunch of factors go into it.


Q: Encountering issues with your queries?
A: If problems persist, submit an issue.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

staffspy_enhanced-0.2.4.tar.gz (21.1 kB view details)

Uploaded Source

Built Distribution

staffspy_enhanced-0.2.4-py3-none-any.whl (23.9 kB view details)

Uploaded Python 3

File details

Details for the file staffspy_enhanced-0.2.4.tar.gz.

File metadata

  • Download URL: staffspy_enhanced-0.2.4.tar.gz
  • Upload date:
  • Size: 21.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for staffspy_enhanced-0.2.4.tar.gz
Algorithm Hash digest
SHA256 dc5742b8161eff3912649bace8b7f31ec18a0515aaf6de596985f211e6acadcb
MD5 8c16a843aab976703560a0bdca1b98f6
BLAKE2b-256 e6ee866774a930a56ba618ed3f1470277472054fe36d9c446e236ce0224f1fcd

See more details on using hashes here.

File details

Details for the file staffspy_enhanced-0.2.4-py3-none-any.whl.

File metadata

File hashes

Hashes for staffspy_enhanced-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5914f7fdd99dc9276694c5aa81913c56298abb24ccc8adec94660c2d564ee902
MD5 55ace84055e0a916a173d4f693bd0e2c
BLAKE2b-256 ef14a6859db27015fd6e7f70c0e90951f962360ebb9ba3c4cfcc69a7733836fd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page