staffspy

Staff scraper library for LinkedIn

These details have not been verified by PyPI

Project description

StaffSpy is a staff scraper library for LinkedIn.

why pay $100/mo for LSN when you could do it for free and get a nice csv to go along with it?

Features

Scrapes staff from a company on LinkedIn
Obtains skills, experiences, certifications & more
Or fetch individuals users / comments on posts
Scrape your own LinkedIn connections with details
Aggregates the employees in a Pandas DataFrame

Video Guide for StaffSpy - updated for release v0.2.18

Installation

pip install -U staffspy[browser]

Python version >= 3.10 required

Usage

from staffspy import LinkedInAccount, SolverType, DriverType, BrowserType

account = LinkedInAccount(
    # if issues with webdriver, specify its exact location
    # driver_type=DriverType(
    #     browser_type=BrowserType.CHROME,
    #     executable_path="/Users/pc/chromedriver-mac-arm64/chromedriver"
    # ),

    session_file="session.pkl", # save login cookies to only log in once (lasts a week or so)
    log_level=1, # 0 for no logs
)

# search by company
staff = account.scrape_staff(
    company_name="openai",
    search_term="software engineer",
    location="london",
    extra_profile_data=True, # fetch all past experiences, schools, & skills
    max_results=50, # can go up to 1000
    # block=True # if you want to block the user after scraping, to exclude from future search results
)
# or fetch by user ids
users = account.scrape_users(
    user_ids=['williamhgates', 'rbranson', 'jeffweiner08']
)

# fetch all comments on two of Bill Gates' posts 
comments = account.scrape_comments(
    ['7252421958540091394','7253083989547048961']
)

# fetch company details
companies = account.scrape_companies(
    company_names=['openai', 'microsoft']
)

# fetch connections
connections = account.scrape_connections(
    extra_profile_data=True,
    max_results=50
)

staff.to_csv("staff.csv", index=False)
users.to_csv("users.csv", index=False)
comments.to_csv("comments.csv", index=False)
companies.to_csv("companies.csv", index=False)
connections.to_csv("connections.csv", index=False)

Browser login

If you rather use a browser to log in, install the browser add-on to StaffSpy .

pip install staffspy[browser]

If you do not pass the username & password params, then a browser will open to sign in to LinkedIn on the first sign-in. Press enter after signing in to begin scraping.

Output

profile_id	name	first_name	last_name	location	age	position	followers	connections	company	past_company1	past_company2	school1	school2	skill1	skill2	skill3	is_connection	premium	creator	potential_email	profile_link	profile_photo
javiersierra2102	Javier Sierra	Javier	Sierra	London, England, United Kingdom	39	Software Engineer	735	725	OpenAI	Meta	Oculus VR	Hult International Business School	Universidad Simón Bolívar	Java	JavaScript	C++	FALSE	FALSE	FALSE	javier.sierra@openai.com, jsierra@openai.com	https://www.linkedin.com/in/javiersierra2102	https://media.licdn.com/dms/image/C4D03AQHEyUg1kGT08Q/profile-displayphoto-shrink_800_800/0/1516504680512?e=1727913600&v=beta&t=3enCmNDBtJ7LxfbW6j1hDD8qNtHjO2jb2XTONECxUXw
dougli	Douglas Li	Douglas	Li	London, England, United Kingdom	37	@ OpenAI UK, previously at Meta	583	401	OpenAI	Shift Lab	Facebook	Washington University in St. Louis		Java	Python	JavaScript	FALSE	TRUE	FALSE	douglas.li@openai.com, dli@openai.com	https://www.linkedin.com/in/dougli	https://media.licdn.com/dms/image/D4E03AQETmRyb3_GB8A/profile-displayphoto-shrink_800_800/0/1687996628597?e=1727913600&v=beta&t=HRYGJ4RxsTMcPF1YcSikXlbz99hx353csho3PWT6fOQ
nkartashov	Nick Kartashov	Nick	Kartashov	London, England, United Kingdom	33	Software Engineer	2186	2182	OpenAI	Google	DeepMind	St. Petersburg Academic University	Bioinformatics Institute	Teamwork	Java	Haskell	FALSE	FALSE	FALSE	nick.kartashov@openai.com, nkartashov@openai.com	https://www.linkedin.com/in/nkartashov	https://media.licdn.com/dms/image/D4E03AQEjOKxC5UgwWw/profile-displayphoto-shrink_800_800/0/1680706122689?e=1727913600&v=beta&t=m-JnG9nm0zxp1Z7njnInwbCoXyqa3AN-vJZntLfbzQ4

Parameters for `LinkedInAccount()`

Optional
├── session_file (str):
|    file path to save session cookies, so only one manual login is needed.
|    can use mult profiles this way
|
| For automated login
├── username (str):
|    linkedin account email
│
├── password (str):
|    linkedin account password
|
├── driver_type (DriverType):
|    signs in with the given BrowserType (Chrome, Firefox) and executable_path
|
├── solver_service (SolverType):
|    solves the captcha using the desired service - either CapSolver, or 2Captcha (worse of the two)
|
├── solver_api_key (str):
|    api key for the solver provider
│
├── log_level (int):
|    Controls the verbosity of the runtime printouts
|    (0 prints only errors, 1 is info, 2 is all logs. Default is 0.)

Parameters for `scrape_staff()`

Optional
├── company_name (str):
|    company identifier on linkedin, will search for that company if that company id does not exist
|    e.g. openai from https://www.linkedin.com/company/openai
|
├── search_term (str):
|    staff title to search for
|    e.g. software engineer
|
├── location (str):
|    location the staff resides
|    e.g. london
│
├── extra_profile_data (bool)
|    fetches educations, experiences, skills, certifications (Default false)
│
├── max_results (int):
|    number of staff to fetch, default/max is 1000 for a search imposed by LinkedIn
|
├── block (bool):
|    whether to block the user after scraping

Parameters for `scrape_users()`

├── user_ids (list):
|    user ids to scrape from
|     e.g. dougmcmillon from https://www.linkedin.com/in/dougmcmillon
|
├── block (bool):
|    whether to block the user after scraping

Parameters for `scrape_comments()`

├── post_ids (list):
|    post ids to scrape from
|     e.g. 7252381444906364929 from https://www.linkedin.com/posts/williamhgates_technology-transformtheeveryday-activity-7252381444906364929-Bkls

Parameters for `scrape_companies()`

├── company_names (list):
|    list of company names to scrape details from
|     e.g. ['openai', 'microsoft', 'google']

Parameters for `scrape_connections()`

├── max_results (int):
|    maximum number of connections to fetch (default is all)
|
├── extra_profile_data (bool):
|    gets all profile info

LinkedIn notes

- only 1000 max results per search
- extra_profile_data increases runtime by O(n)
- if rate limited, the program will stop scraping
- if using non-browser sign in, turn off 2fa

Frequently Asked Questions

Q: Can I get my account banned?
A: It is a possibility, although there are no recorded incidents. Let me know if you are the first. However, to protect you, the code does not allow you to run it if LinkedIn is blocking you

Q: Scraped 999 staff members, with 869 hidden LinkedIn Members?
A: It means your LinkedIn account is bad. Not sure how they classify it but unverified email, new account, low connections and a bunch of factors go into it.

Q: How to get around the 1000 search limit result?
A: Check the examples folder. We can block the user after searching and try many different locations and search terms to maximize results.

Q: Exception: driver not found for selenium?
A: You need chromedriver installed (not the chrome): https://googlechromelabs.github.io/chrome-for-testing/#stable

Q: Encountering issues with your queries?
A: If problems persist, submit an issue.

Staff Schema

Staff
├── Personal Information
│   ├── search_term
│   ├── id
│   ├── name
│   ├── first_name
│   ├── last_name
│   ├── location
│   └── bio
│
├── Professional Details
│   ├── position
│   ├── profile_id
│   ├── profile_link
│   ├── potential_emails
│   └── estimated_age
│
├── Social Connectivity
│   ├── followers
│   ├── connections
│   └── mutuals_count
│
├── Status
│   ├── influencer
│   ├── creator
│   ├── premium
│   ├── open_to_work
│   ├── is_hiring
│   └── is_connection
│
├── Visuals
│   ├── profile_photo
│   └── banner_photo
│
├── Skills
│   ├── name
│   └── endorsements
│
├── Experiences
│   ├── from_date
│   ├── to_date
│   ├── duration
│   ├── title
│   ├── company
│   ├── location
│   └── emp_type
│
├── Certifications
│   ├── title
│   ├── issuer
│   ├── date_issued
│   ├── cert_id
│   └── cert_link
│
└── Educational Background
    ├── years
    ├── school
    └── degree

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.25

Jan 6, 2025

0.2.24

Jan 1, 2025

This version

0.2.23

Dec 31, 2024

0.2.22

Dec 31, 2024

0.2.20

Nov 11, 2024

0.2.19

Nov 5, 2024

0.2.18

Oct 20, 2024

0.2.17

Oct 20, 2024

0.2.16

Oct 16, 2024

0.2.14

Oct 16, 2024

0.2.13

Sep 29, 2024

0.2.12

Sep 13, 2024

0.2.11

Aug 21, 2024

0.2.10

Aug 7, 2024

0.2.9

Aug 4, 2024

0.2.8

Aug 2, 2024

0.2.7

Aug 2, 2024

0.2.6

Aug 2, 2024

0.2.5

Jul 30, 2024

0.2.4

Jul 30, 2024

0.2.3

Jul 29, 2024

0.2.2

Jul 26, 2024

0.2.1

Jul 26, 2024

0.2.0

Jul 26, 2024

0.1.17

Jul 21, 2024

0.1.16

Jul 21, 2024

0.1.15

Jul 21, 2024

0.1.14

Jul 19, 2024

0.1.13

Jul 16, 2024

0.1.12

Jul 16, 2024

0.1.11

Jul 16, 2024

0.1.10

Jul 16, 2024

0.1.9

Jul 16, 2024

0.1.8

Jun 19, 2024

0.1.7

Jun 19, 2024

0.1.6

Jun 9, 2024

0.1.5

Jun 5, 2024

0.1.4

Jun 3, 2024

0.1.3

Jun 3, 2024

0.1.2

Jun 3, 2024

0.1.1

Jun 3, 2024

0.1.0

Jun 1, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

staffspy-0.2.23.tar.gz (27.7 kB view details)

Uploaded Dec 31, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

staffspy-0.2.23-py3-none-any.whl (32.8 kB view details)

Uploaded Dec 31, 2024 Python 3

File details

Details for the file staffspy-0.2.23.tar.gz.

File metadata

Download URL: staffspy-0.2.23.tar.gz
Upload date: Dec 31, 2024
Size: 27.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for staffspy-0.2.23.tar.gz
Algorithm	Hash digest
SHA256	`984f3045eeae0f04263d0d548830eed4c56a684d80fd248764713088adf55396`
MD5	`8240c47669c730c2eb516dd50dfc170a`
BLAKE2b-256	`44571dc85c9e01366cc51f68b9893d9a84079dd9a4d424679adb90d204cc7f53`

See more details on using hashes here.

File details

Details for the file staffspy-0.2.23-py3-none-any.whl.

File metadata

Download URL: staffspy-0.2.23-py3-none-any.whl
Upload date: Dec 31, 2024
Size: 32.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for staffspy-0.2.23-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9a2b25aef8c66792ac094f62175cf6f8288f393bc772e7df4f62142a4d4e443d`
MD5	`88443979c29f808ea4ddfa6e2baff5fd`
BLAKE2b-256	`dc0de087730fad33116c8c91b37c9205a7ded1a8453176c0f29c5fda0722510f`

See more details on using hashes here.

staffspy 0.2.23

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Features

Installation

Usage

Browser login

Output

Parameters for `LinkedInAccount()`

Parameters for `scrape_staff()`

Parameters for `scrape_users()`

Parameters for `scrape_comments()`

Parameters for `scrape_companies()`

Parameters for `scrape_connections()`

LinkedIn notes

Frequently Asked Questions

Staff Schema

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

staffspy 0.2.23

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Features

Installation

Usage

Browser login

Output

Parameters for LinkedInAccount()

Parameters for scrape_staff()

Parameters for scrape_users()

Parameters for scrape_comments()

Parameters for scrape_companies()

Parameters for scrape_connections()

LinkedIn notes

Frequently Asked Questions

Staff Schema

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Parameters for `LinkedInAccount()`

Parameters for `scrape_staff()`

Parameters for `scrape_users()`

Parameters for `scrape_comments()`

Parameters for `scrape_companies()`

Parameters for `scrape_connections()`