Scrapes user data from Linkedin

Project description

Linkedin Scraper

Scrapes Linkedin User Data

Installation
Setup
Usage
API
- Person
- Company
Contribution

Installation

pip3 install --user linkedin_scraper

Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper

Setup

First, you must set your chromedriver location by

export CHROMEDRIVER=~/chromedriver

Sponsor

Scrape public LinkedIn profile data at scale with Proxycurl APIs.

• Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.
• GDPR, CCPA, SOC2 compliant
• High rate limit - 300 requests/minute
• Fast - APIs respond in ~2s
• Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days
• High accuracy
• Tons of data points returned per profile

Built for developers, by developers.

Usage

To use it, just create the class.

Sample Usage

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122", driver=driver)

NOTE: The account used to log-in should have it's language set English to make sure everything works as expected.

User Scraping

from linkedin_scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

Company Scraping

from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")

Job Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job = Job("https://www.linkedin.com/jobs/collections/recommended/?currentJobId=3456898261", driver=driver, close_on_complete=False)

Job Search Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job_search = JobSearch(driver=driver, close_on_complete=False, scrape=False)
# job_search contains jobs from your logged in front page:
# - job_search.recommended_jobs
# - job_search.still_hiring
# - job_search.more_jobs

job_listings = job_search.search("Machine Learning Engineer") # returns the list of `Job` from the first page

Scraping sites where login is required first

Run ipython or python
In ipython/python, run the following code (you can modify it if you need to specify your driver)

from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)

Login to Linkedin
[OPTIONAL] Logout of Linkedin
In the same ipython/python code, run

person.scrape()

The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as

NOTE: For version >= 2.1.0, scraping can also occur while logged in. Beware that users will be able to see that you viewed their profile.

person.scrape(close_on_complete=False)

so it doesn't close.

Scraping sites and login automatically

From verison 2.4.0 on, actions is a part of the library that allows signing into Linkedin first. The email and password can be provided as a variable into the function. If not provided, both will be prompted in terminal.

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)

API

Person

A Person object can be created with the following inputs:

Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)

`linkedin_url`

This is the linkedin url of their profile

`name`

This is the name of the person

`about`

This is the small paragraph about the person

`experiences`

This is the past experiences they have. A list of linkedin_scraper.scraper.Experience

`educations`

This is the past educations they have. A list of linkedin_scraper.scraper.Education

`interests`

This is the interests they have. A list of linkedin_scraper.scraper.Interest

`accomplishment`

This is the accomplishments they have. A list of linkedin_scraper.scraper.Accomplishment

`company`

This the most recent company or institution they have worked at.

`job_title`

This the most recent job title they have.

`driver`

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

For example

driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)

`scrape`

When this is True, the scraping happens automatically. To scrape afterwards, that can be run by the scrape() function from the Person object.

`scrape(close_on_complete=True)`

This is the meat of the code, where execution of this function scrapes the profile. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.

Company

Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)

`linkedin_url`

This is the linkedin url of their profile

`name`

This is the name of the company

`about_us`

The description of the company

`website`

The website of the company

`headquarters`

The headquarters location of the company

`founded`

When the company was founded

`company_type`

The type of the company

`company_size`

How many people are employeed at the company

`specialties`

What the company specializes in

`showcase_pages`

Pages that the company owns to showcase their products

`affiliated_companies`

Other companies that are affiliated with this one

`driver`

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

`get_employees`

Whether to get all the employees of company

For example

driver = webdriver.Chrome()
company = Company("https://ca.linkedin.com/company/google", driver=driver)

`scrape(close_on_complete=True)`

This is the meat of the code, where execution of this function scrapes the company. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.

Contribution

Project details

Release history Release notifications | RSS feed

This version

2.11.4

Sep 20, 2024

2.11.3

Sep 19, 2024

2.11.2

Jul 4, 2023

2.11.1

May 10, 2023

2.11.0

Feb 19, 2023

2.10.1

Feb 19, 2023

2.10.0

Feb 19, 2023

2.9.2

Feb 13, 2023

2.9.1

Nov 2, 2022

2.9.0

Jul 10, 2021

2.8.2

Apr 21, 2021

2.8.1

Apr 18, 2021

2.8.0

Apr 10, 2021

2.7.7

Apr 10, 2021

2.7.6

Apr 10, 2021

2.7.5

Mar 9, 2021

2.7.4

Mar 8, 2021

2.7.3

Mar 8, 2021

2.7.2

Jan 27, 2021

2.7.1

Dec 12, 2020

2.7.0

Nov 22, 2020

2.6.1

Nov 16, 2020

2.6.0

Nov 3, 2020

2.5.5

Nov 3, 2020

2.5.4

Nov 3, 2020

2.5.3

Nov 3, 2020

2.5.2

Aug 19, 2020

2.5.1

Jul 22, 2020

2.5.0

Jul 3, 2020

2.4.6

Jun 10, 2020

2.4.5

Jun 6, 2020

2.4.4

May 31, 2020

2.4.3

Feb 2, 2020

2.4.2

Dec 22, 2019

2.4.1

Sep 28, 2019

2.4.0

Jun 27, 2019

2.3.2

Jun 26, 2019

2.3.1

Jun 26, 2019

2.3.0

Mar 3, 2019

2.2.0

Apr 9, 2018

2.1.1

Apr 9, 2018

2.1.0

Feb 10, 2018

2.0.1

Jan 2, 2018

2.0.0

Dec 21, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkedin_scraper-2.11.4.tar.gz (30.2 kB view hashes)

Uploaded Sep 20, 2024 Source

Built Distribution

linkedin_scraper-2.11.4-py3-none-any.whl (29.0 kB view hashes)

Uploaded Sep 20, 2024 Python 3

Hashes for linkedin_scraper-2.11.4.tar.gz

Hashes for linkedin_scraper-2.11.4.tar.gz
Algorithm	Hash digest
SHA256	`bb85185caf3df4e524c2aa08a9afcf5df6b1a5049368d373faa6744ad756d813`
MD5	`a991dc9c9f90d86de691389c4e0d9a2a`
BLAKE2b-256	`12623285cefc79effa524238bb1684fcb09ad43728ceb568a15384d993d76ea0`

Hashes for linkedin_scraper-2.11.4-py3-none-any.whl

Hashes for linkedin_scraper-2.11.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`947c85cf92d51061bb89fa1ffc4a39c8bb569f73de160a5c16be3e09e780a8a9`
MD5	`a5e32f61317d9721f89c6d31752ba1b2`
BLAKE2b-256	`febb71bd106774cbffa7c8f918300a92effb68d036eb44d9f7677da3789a81d4`

linkedin-scraper 2.11.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Linkedin Scraper

Installation

Setup

Sponsor

Usage

Sample Usage

User Scraping

Company Scraping

Job Scraping

Job Search Scraping

Scraping sites where login is required first

Scraping sites and login automatically

API

Person

linkedin_url

name

about

experiences

educations

interests

accomplishment

company

job_title

driver

scrape

scrape(close_on_complete=True)

Company

linkedin_url

name

about_us

website

headquarters

founded

company_type

company_size

specialties

showcase_pages

affiliated_companies

driver

get_employees

scrape(close_on_complete=True)

Contribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`linkedin_url`

`name`

`about`

`experiences`

`educations`

`interests`

`accomplishment`

`company`

`job_title`

`driver`

`scrape`

`scrape(close_on_complete=True)`

`linkedin_url`

`name`

`about_us`

`website`

`headquarters`

`founded`

`company_type`

`company_size`

`specialties`

`showcase_pages`

`affiliated_companies`

`driver`

`get_employees`

`scrape(close_on_complete=True)`