Skip to main content

Scraping API for LinkedIn, Built on the back of linkedin_api by Tom Quirk

Project description

LI scrAPI for Python

API using the current endpoints on linkedin solely for scraping data without the official api. Based entirely on linkedin-api by Tom Quirk

Why use this library rather than the other one?

  • Async support
  • Types powered by Pydantic
  • HTTPX so we have http2 support as well
  • You don't need to interact with linkedIn but rather want data

Caution: This library is not officially supported by LinkedIn. Using it might violate LinkedIn's Terms of Service. Use it at your own risk.

Installation

Python >= 3.10 required

Quick Start

Script Client

linkedin = LinkedInScriptApi(credentials["username"], credentials["password"])
jobs = linkedin.search_jobs("Software", total_jobs = 10_000)

Async Version

session = AsyncClient()
client = AsyncLinkedInClient(session=session)
linkedin = AsyncLinkedIn(client)
await linkedin.authenticate(credentials["username"], credentials["password"])
await linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = await linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = await linkedin.get_company_updates(public_id="google")
await linkedin.get_organization("google")
jobs = await linkedin.search_jobs(
    "Software Engineer",
    sort_by=SortBy.DATE,
    location=GeoID.USA,
    remote=[LocationType.ONSITE],
    limit=10,
)
if jobs:
    for job in jobs.elements:
        job_complete = await linkedin.get_job(job.tracking_urn.split(":")[-1])
        job_skills = await linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
    print(job_complete)
await linkedin.search({"keywords": "software"})
res = await linkedin.search_people(keywords="software",include_private_profiles=True)
await linkedin._close()

Sync Version

session = Client()
client = LinkedInClient(session=session)
linkedin = LinkedIn(client)
linkedin.authenticate(credentials["username"], credentials["password"])

linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = linkedin.get_company_updates(public_id="google")
linkedin.get_organization("google")
jobs = linkedin.search_jobs(
    "Software Engineer",
    sort_by=SortBy.DATE,
    location=GeoID.USA,
    remote=[LocationType.ONSITE],
    limit=10,
)
if jobs:
    for job in jobs.elements:
        job_complete = linkedin.get_job(job.tracking_urn.split(":")[-1])
        job_skills = linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
    print(job_complete)
linkedin.search({"keywords": "software"})
res = linkedin.search_people(keywords="software",include_private_profiles=True)
linkedin._close()
session = Client()
client = LinkedInClient(session=session)
linkedin = LinkedIn(client)
linkedin.authenticate(credentials["username"], credentials["password"])

linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = linkedin.get_company_updates(public_id="google")
linkedin.get_organization("google")
jobs = linkedin.search_jobs(
    "Software Engineer",
    sort_by=SortBy.DATE,
    location=GeoID.USA,
    remote=[LocationType.ONSITE],
    limit=10,
)
if jobs:
    for job in jobs.elements:
        job_complete = linkedin.get_job(job.tracking_urn.split(":")[-1])
        job_skills = linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
    print(job_complete)
linkedin.search({"keywords": "software"})
res = linkedin.search_people(keywords="software",include_private_profiles=True)
linkedin._close()        session = Client()
client = LinkedInClient(session=session)
linkedin = LinkedIn(client)
linkedin.authenticate(credentials["username"], credentials["password"])

linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = linkedin.get_company_updates(public_id="google")
linkedin.get_organization("google")
jobs = linkedin.search_jobs(
    "Software Engineer",
    sort_by=SortBy.DATE,
    location=GeoID.USA,
    remote=[LocationType.ONSITE],
    limit=10,
)
if jobs:
    for job in jobs.elements:
        job_complete = linkedin.get_job(job.tracking_urn.split(":")[-1])
        job_skills = linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
    print(job_complete)
linkedin.search({"keywords": "software"})
res = linkedin.search_people(keywords="software",include_private_profiles=True)
linkedin._close()

Documentation

The examples give a quick run down of the documentation if this project takes off or gets some traction I'll make dedicated docs. The code as well has sufficient doc strings and types to get an idea of how to interact with the code

Disclaimer

This library is not endorsed or supported by LinkedIn. It is an unofficial library intended for educational purposes and personal use only. By using this library, you agree to not hold the author or contributors responsible for any consequences resulting from its usage.

Contributing

Any and all contributions are helpful, if you have discovered various IDs LinkedIn uses for anything of interest make a PR and add it to the query options.

If you feel we need a new method or something to pull from LinkedIn then the following would be very helpful:

  1. Add the method to the LinkedIn Interface
  2. Supply the logic to both the sync and async classes
  3. Add mock tests for the assumed LinkedIn response
  4. add the method to the script if necessary
  5. Make a PR and lets merge it in!

Development

Development installation

TODO

Troubleshooting

I keep getting a CHALLENGE

Linkedin will throw you a curve ball in the form of a Challenge URL which requires Javascript to solve. Your best chance at resolution is to in on your browser use a separate library like browser-cookie3, getting the cookie from your browser and passing it to the API.

Search problems

  • Mileage may vary when searching general keywords like "software" using the standard search method. They've recently added some smarts around search whereby they group results by people, company, jobs etc. if the query is general enough. Try to use an entity-specific search method (i.e. search_people) where possible. Likewise if there is something you feel that should be supported please request it with a curl statement to build the request

How it works

This project attempts to provide a simple Python interface for the Linkedin API.

Do you mean the legit Linkedin API?

NO! To retrieve structured data, the Linkedin Website uses a service they call Voyager. Voyager endpoints give us access to pretty much everything we could want from Linkedin: profiles, companies, connections, messages, etc. - anything that you can see on linkedin.com, we can get from Voyager.

How does it work?

Deep dive

Voyager endpoints look like this:

https://www.linkedin.com/voyager/api/identity/profileView/tom-quirk

Or, more clearly

 ___________________________________ _______________________________
|             base path             |            resource           |
https://www.linkedin.com/voyager/api /identity/profileView/tom-quirk

They are authenticated with a simple cookie, which we send with every request, along with a bunch of headers.

To get a cookie, we POST a given username and password (of a valid Linkedin user account) to https://www.linkedin.com/uas/authenticate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

li_scrapi-1.0.0.tar.gz (33.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

li_scrapi-1.0.0-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

File details

Details for the file li_scrapi-1.0.0.tar.gz.

File metadata

  • Download URL: li_scrapi-1.0.0.tar.gz
  • Upload date:
  • Size: 33.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/21.6.0

File hashes

Hashes for li_scrapi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 13ff91df1fc00475e3d5d1818a4ffe56102eebd005988f0405b2f4ac28928613
MD5 40adb158b512cb90248da461a7fda04c
BLAKE2b-256 f67f1207f8c5bc9b7ef3314e16ae6eac618c10dc483074657cea1f1cebfd2add

See more details on using hashes here.

File details

Details for the file li_scrapi-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: li_scrapi-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 41.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/21.6.0

File hashes

Hashes for li_scrapi-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fd7fcd15c6f80572dbca62695b0fa05404298e3085d88b02ed9ae5b6670f6608
MD5 61bc9fac69683213aa4e9cce2a66b861
BLAKE2b-256 01abc8bdee63cf67935c53b14d18754d823de51d4fc55bc59b8b5fe982f91b96

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page