Skip to main content

Scrapes user data from Linkedin

Project description

Linkedin Scraper

Scrapes Linkedin User Data

Installation

pip3 install --user linkedin_scraper

Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper

Setup

First, you must set your chromedriver location by

export CHROMEDRIVER=~/chromedriver

Usage

To use it, just create the class.

Sample Usage

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)

NOTE: The account used to log-in should have it's language set English to make sure everything works as expected.

User Scraping

from linkedin_scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

Company Scraping

from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")

Scraping sites where login is required first

  1. Run ipython or python
  2. In ipython/python, run the following code (you can modify it if you need to specify your driver)
from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
  1. Login to Linkedin
  2. [OPTIONAL] Logout of Linkedin
  3. In the same ipython/python code, run
person.scrape()

The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as

NOTE: For version >= 2.1.0, scraping can also occur while logged in. Beware that users will be able to see that you viewed their profile.

person.scrape(close_on_complete=False)

so it doesn't close.

Scraping sites and login automatically

From verison 2.4.0 on, actions is a part of the library that allows signing into Linkedin first. The email and password can be provided as a variable into the function. If not provided, both will be prompted in terminal.

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
email = "some-email@email.address"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)

API

Person

A Person object can be created with the following inputs:

Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)

linkedin_url

This is the linkedin url of their profile

name

This is the name of the person

about

This is the small paragraph about the person

experiences

This is the past experiences they have. A list of linkedin_scraper.scraper.Experience

educations

This is the past educations they have. A list of linkedin_scraper.scraper.Education

interests

This is the interests they have. A list of linkedin_scraper.scraper.Interest

accomplishment

This is the accomplishments they have. A list of linkedin_scraper.scraper.Accomplishment

company

This the most recent company or institution they have worked at.

job_title

This the most recent job title they have.

driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

For example

driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)

scrape

When this is True, the scraping happens automatically. To scrape afterwards, that can be run by the scrape() function from the Person object.

scrape(close_on_complete=True)

This is the meat of the code, where execution of this function scrapes the profile. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.

Company

Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)

linkedin_url

This is the linkedin url of their profile

name

This is the name of the company

about_us

The description of the company

website

The website of the company

headquarters

The headquarters location of the company

founded

When the company was founded

company_type

The type of the company

company_size

How many people are employeed at the company

specialties

What the company specializes in

showcase_pages

Pages that the company owns to showcase their products

affiliated_companies

Other companies that are affiliated with this one

driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

get_employees

Whether to get all the employees of company

For example

driver = webdriver.Chrome()
company = Company("https://ca.linkedin.com/company/google", driver=driver)

scrape(close_on_complete=True)

This is the meat of the code, where execution of this function scrapes the company. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.

Contribution

Buy Me A Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkedin_scraper-2.9.1.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

linkedin_scraper-2.9.1-py3-none-any.whl (24.7 kB view details)

Uploaded Python 3

File details

Details for the file linkedin_scraper-2.9.1.tar.gz.

File metadata

  • Download URL: linkedin_scraper-2.9.1.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/29.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.8.3 keyring/22.3.0 rfc3986/1.5.0 colorama/0.4.3 CPython/3.6.9

File hashes

Hashes for linkedin_scraper-2.9.1.tar.gz
Algorithm Hash digest
SHA256 3d2baea650e3edd46f85145f142480b08efc7c8c8111ac770c6df3254f205ef3
MD5 8f2ac3258ef46a644d49fe4eb91646a9
BLAKE2b-256 ba52dbe57b6dc85b23d741fc248a5f4cb4be5ed2d7c5f10937d6011852cde0b7

See more details on using hashes here.

File details

Details for the file linkedin_scraper-2.9.1-py3-none-any.whl.

File metadata

  • Download URL: linkedin_scraper-2.9.1-py3-none-any.whl
  • Upload date:
  • Size: 24.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/29.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.64.0 importlib-metadata/4.8.3 keyring/22.3.0 rfc3986/1.5.0 colorama/0.4.3 CPython/3.6.9

File hashes

Hashes for linkedin_scraper-2.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3b8cee8d5fc79c6087d28e2b8af0d5ff15a2640232676fda1e5ae878163ea25
MD5 03abc2b10afa70c266d3f317ca261fa9
BLAKE2b-256 76ccd9ee4f5fb378688878aa729a90b58247557701b9ade45b119e77a49860bf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page