Scraping API for LinkedIn, Built on the back of linkedin_api by Tom Quirk
Project description
LI scrAPI for Python
API using the current endpoints on linkedin solely for scraping data without the official api. Based entirely on linkedin-api by Tom Quirk
Why use this library rather than the other one?
- Async support
- Types powered by Pydantic
- HTTPX so we have http2 support as well
- You don't need to interact with linkedIn but rather want data
Caution: This library is not officially supported by LinkedIn. Using it might violate LinkedIn's Terms of Service. Use it at your own risk.
Installation
Python >= 3.10 required
Quick Start
Script Client
linkedin = LinkedInScriptApi(credentials["username"], credentials["password"])
jobs = linkedin.search_jobs("Software", total_jobs = 10_000)
Async Version
session = AsyncClient()
client = AsyncLinkedInClient(session=session)
linkedin = AsyncLinkedIn(client)
await linkedin.authenticate(credentials["username"], credentials["password"])
await linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = await linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = await linkedin.get_company_updates(public_id="google")
await linkedin.get_organization("google")
jobs = await linkedin.search_jobs(
"Software Engineer",
sort_by=SortBy.DATE,
location=GeoID.USA,
remote=[LocationType.ONSITE],
limit=10,
)
if jobs:
for job in jobs.elements:
job_complete = await linkedin.get_job(job.tracking_urn.split(":")[-1])
job_skills = await linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
print(job_complete)
await linkedin.search({"keywords": "software"})
res = await linkedin.search_people(keywords="software",include_private_profiles=True)
await linkedin._close()
Sync Version
session = Client()
client = LinkedInClient(session=session)
linkedin = LinkedIn(client)
linkedin.authenticate(credentials["username"], credentials["password"])
linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = linkedin.get_company_updates(public_id="google")
linkedin.get_organization("google")
jobs = linkedin.search_jobs(
"Software Engineer",
sort_by=SortBy.DATE,
location=GeoID.USA,
remote=[LocationType.ONSITE],
limit=10,
)
if jobs:
for job in jobs.elements:
job_complete = linkedin.get_job(job.tracking_urn.split(":")[-1])
job_skills = linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
print(job_complete)
linkedin.search({"keywords": "software"})
res = linkedin.search_people(keywords="software",include_private_profiles=True)
linkedin._close()
session = Client()
client = LinkedInClient(session=session)
linkedin = LinkedIn(client)
linkedin.authenticate(credentials["username"], credentials["password"])
linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = linkedin.get_company_updates(public_id="google")
linkedin.get_organization("google")
jobs = linkedin.search_jobs(
"Software Engineer",
sort_by=SortBy.DATE,
location=GeoID.USA,
remote=[LocationType.ONSITE],
limit=10,
)
if jobs:
for job in jobs.elements:
job_complete = linkedin.get_job(job.tracking_urn.split(":")[-1])
job_skills = linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
print(job_complete)
linkedin.search({"keywords": "software"})
res = linkedin.search_people(keywords="software",include_private_profiles=True)
linkedin._close() session = Client()
client = LinkedInClient(session=session)
linkedin = LinkedIn(client)
linkedin.authenticate(credentials["username"], credentials["password"])
linkedin.get_profile_privacy_settings("khalid-a-53a190142")
profile = linkedin.search_people(current_company=[CompanyID.GOOGLE], past_companies=[CompanyID.APPLE], include_private_profiles=True)
company = linkedin.get_company_updates(public_id="google")
linkedin.get_organization("google")
jobs = linkedin.search_jobs(
"Software Engineer",
sort_by=SortBy.DATE,
location=GeoID.USA,
remote=[LocationType.ONSITE],
limit=10,
)
if jobs:
for job in jobs.elements:
job_complete = linkedin.get_job(job.tracking_urn.split(":")[-1])
job_skills = linkedin.get_job_skills(job.tracking_urn.split(":")[-1])
print(job_complete)
linkedin.search({"keywords": "software"})
res = linkedin.search_people(keywords="software",include_private_profiles=True)
linkedin._close()
Documentation
The examples give a quick run down of the documentation if this project takes off or gets some traction I'll make dedicated docs. The code as well has sufficient doc strings and types to get an idea of how to interact with the code
Disclaimer
This library is not endorsed or supported by LinkedIn. It is an unofficial library intended for educational purposes and personal use only. By using this library, you agree to not hold the author or contributors responsible for any consequences resulting from its usage.
Contributing
Any and all contributions are helpful, if you have discovered various IDs LinkedIn uses for anything of interest make a PR and add it to the query options.
If you feel we need a new method or something to pull from LinkedIn then the following would be very helpful:
- Add the method to the LinkedIn Interface
- Supply the logic to both the sync and async classes
- Add mock tests for the assumed LinkedIn response
- add the method to the script if necessary
- Make a PR and lets merge it in!
Development
Development installation
TODO
Troubleshooting
I keep getting a CHALLENGE
Linkedin will throw you a curve ball in the form of a Challenge URL which requires Javascript to solve. Your best chance at resolution is to in on your browser use a separate library like browser-cookie3, getting the cookie from your browser and passing it to the API.
Search problems
- Mileage may vary when searching general keywords like "software" using the standard
searchmethod. They've recently added some smarts around search whereby they group results by people, company, jobs etc. if the query is general enough. Try to use an entity-specific search method (i.e. search_people) where possible. Likewise if there is something you feel that should be supported please request it with a curl statement to build the request
How it works
This project attempts to provide a simple Python interface for the Linkedin API.
Do you mean the legit Linkedin API?
NO! To retrieve structured data, the Linkedin Website uses a service they call Voyager. Voyager endpoints give us access to pretty much everything we could want from Linkedin: profiles, companies, connections, messages, etc. - anything that you can see on linkedin.com, we can get from Voyager.
Deep dive
Voyager endpoints look like this:
https://www.linkedin.com/voyager/api/identity/profileView/tom-quirk
Or, more clearly
___________________________________ _______________________________
| base path | resource |
https://www.linkedin.com/voyager/api /identity/profileView/tom-quirk
They are authenticated with a simple cookie, which we send with every request, along with a bunch of headers.
To get a cookie, we POST a given username and password (of a valid Linkedin user account) to https://www.linkedin.com/uas/authenticate.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file li_scrapi-1.0.0.tar.gz.
File metadata
- Download URL: li_scrapi-1.0.0.tar.gz
- Upload date:
- Size: 33.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13ff91df1fc00475e3d5d1818a4ffe56102eebd005988f0405b2f4ac28928613
|
|
| MD5 |
40adb158b512cb90248da461a7fda04c
|
|
| BLAKE2b-256 |
f67f1207f8c5bc9b7ef3314e16ae6eac618c10dc483074657cea1f1cebfd2add
|
File details
Details for the file li_scrapi-1.0.0-py3-none-any.whl.
File metadata
- Download URL: li_scrapi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 41.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/21.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd7fcd15c6f80572dbca62695b0fa05404298e3085d88b02ed9ae5b6670f6608
|
|
| MD5 |
61bc9fac69683213aa4e9cce2a66b861
|
|
| BLAKE2b-256 |
01abc8bdee63cf67935c53b14d18754d823de51d4fc55bc59b8b5fe982f91b96
|