Async LinkedIn scraper for profiles, companies, and jobs

These details have not been verified by PyPI

Project links

Project description

LinkedIn Scraper

Async LinkedIn scraper built with Playwright for extracting profile, company, and job data from LinkedIn.

⚠️ Breaking Changes in v3.0.0

Version 3.0.0 introduces breaking changes and is NOT backwards compatible with previous versions.

What Changed:

Playwright instead of Selenium - Complete rewrite using Playwright for better performance and reliability
Async/await throughout - All methods are now async and require await
New package structure - Imports have changed (e.g., from linkedin_scraper import PersonScraper)
Updated data models - Using Pydantic models instead of simple objects
Different API - Method signatures and return types have changed

Migration Guide:

Before (v2.x with Selenium):

from linkedin_scraper import Person

person = Person("https://linkedin.com/in/username", driver=driver)
print(person.name)

After (v3.0+ with Playwright):

import asyncio
from linkedin_scraper import BrowserManager, PersonScraper

async def main():
    async with BrowserManager() as browser:
        await browser.load_session("session.json")
        scraper = PersonScraper(browser.page)
        person = await scraper.scrape("https://linkedin.com/in/username")
        print(person.name)

asyncio.run(main())

If you need the old Selenium-based version:

pip install linkedin-scraper==2.11.2

Quick Testing

To test that this works, you can clone this repo, install dependencies with

git clone https://github.com/joeyism/linkedin_scraper.git
cd linkedin_scraper
pip3 install -e .

then run

python3 samples/create_session.py
python3 samples/scrape_company.py
python3 samples/scrape_person.py

and you will see the scraping in action.

Features

Person Profiles - Scrape comprehensive profile information
- Basic info (name, headline, location, about)
- Work experience with details
- Education history
- Skills and accomplishments
Company Pages - Extract company information
- Company overview and details
- Industry and size
- Headquarters location
Company Posts - Scrape posts from company pages
- Post content and text
- Reactions, comments, reposts counts
- Posted date and images
Job Listings - Scrape job postings
- Job details and requirements
- Company information
- Application links
Async/Await - Modern async Python with Playwright
Type Safety - Full Pydantic models for all data
Progress Callbacks - Track scraping progress
Session Management - Reuse authenticated sessions

Installation

pip install linkedin-scraper

Install Playwright browsers:

playwright install chromium

Quick Start

Basic Usage

import asyncio
from linkedin_scraper import BrowserManager, PersonScraper

async def main():
    # Initialize browser
    async with BrowserManager(headless=False) as browser:
        # Load authenticated session
        await browser.load_session("session.json")
        
        # Create scraper
        scraper = PersonScraper(browser.page)
        
        # Scrape a profile
        person = await scraper.scrape("https://linkedin.com/in/williamhgates/")
        
        # Access data
        print(f"Name: {person.name}")
        print(f"Headline: {person.headline}")
        print(f"Location: {person.location}")
        print(f"Experiences: {len(person.experiences)}")
        print(f"Education: {len(person.educations)}")

asyncio.run(main())

Company Scraping

from linkedin_scraper import CompanyScraper

async def scrape_company():
    async with BrowserManager(headless=False) as browser:
        await browser.load_session("session.json")
        
        scraper = CompanyScraper(browser.page)
        company = await scraper.scrape("https://linkedin.com/company/microsoft/")
        
        print(f"Company: {company.name}")
        print(f"Industry: {company.industry}")
        print(f"Size: {company.company_size}")
        print(f"About: {company.about_us[:200]}...")

asyncio.run(scrape_company())

Job Scraping

from linkedin_scraper import JobSearchScraper

async def search_jobs():
    async with BrowserManager(headless=False) as browser:
        await browser.load_session("session.json")
        
        scraper = JobSearchScraper(browser.page)
        jobs = await scraper.search(
            keywords="Python Developer",
            location="San Francisco",
            limit=10
        )
        
        for job in jobs:
            print(f"{job.title} at {job.company}")
            print(f"Location: {job.location}")
            print(f"Link: {job.linkedin_url}")
            print("---")

asyncio.run(search_jobs())

Company Posts Scraping

from linkedin_scraper import BrowserManager, CompanyPostsScraper

async def scrape_company_posts():
    async with BrowserManager(headless=False) as browser:
        await browser.load_session("session.json")
        
        scraper = CompanyPostsScraper(browser.page)
        posts = await scraper.scrape(
            "https://linkedin.com/company/microsoft/",
            limit=10
        )
        
        for post in posts:
            print(f"Posted: {post.posted_date}")
            print(f"Text: {post.text[:200]}...")
            print(f"Reactions: {post.reactions_count}")
            print(f"Comments: {post.comments_count}")
            print(f"URL: {post.linkedin_url}")
            print("---")

asyncio.run(scrape_company_posts())

Authentication

LinkedIn requires authentication. You need to create a session file first:

Option 1: Manual Login Script

from linkedin_scraper import BrowserManager, wait_for_manual_login

async def create_session():
    async with BrowserManager(headless=False) as browser:
        # Navigate to LinkedIn
        await browser.page.goto("https://www.linkedin.com/login")
        
        # Wait for manual login (opens browser)
        print("Please log in to LinkedIn...")
        await wait_for_manual_login(browser.page, timeout=300)
        
        # Save session
        await browser.save_session("session.json")
        print("✓ Session saved!")

asyncio.run(create_session())

Option 2: Programmatic Login

from linkedin_scraper import BrowserManager, login_with_credentials
import os

async def login():
    async with BrowserManager(headless=False) as browser:
        # Login with credentials
        await login_with_credentials(
            browser.page,
            username=os.getenv("LINKEDIN_EMAIL"),
            password=os.getenv("LINKEDIN_PASSWORD")
        )
        
        # Save session for reuse
        await browser.save_session("session.json")

asyncio.run(login())

Progress Tracking

Track scraping progress with callbacks:

from linkedin_scraper import ConsoleCallback, PersonScraper

async def scrape_with_progress():
    callback = ConsoleCallback()  # Prints progress to console
    
    async with BrowserManager(headless=False) as browser:
        await browser.load_session("session.json")
        
        scraper = PersonScraper(browser.page, callback=callback)
        person = await scraper.scrape("https://linkedin.com/in/williamhgates/")

asyncio.run(scrape_with_progress())

Custom Callbacks

from linkedin_scraper import ProgressCallback

class MyCallback(ProgressCallback):
    async def on_start(self, scraper_type: str, url: str):
        print(f"Starting {scraper_type} scraping: {url}")
    
    async def on_progress(self, message: str, percent: int):
        print(f"[{percent}%] {message}")
    
    async def on_complete(self, scraper_type: str, url: str):
        print(f"Completed {scraper_type}: {url}")
    
    async def on_error(self, error: Exception):
        print(f"Error: {error}")

Data Models

All scraped data is returned as Pydantic models:

Person

class Person(BaseModel):
    name: str
    headline: Optional[str]
    location: Optional[str]
    about: Optional[str]
    linkedin_url: str
    experiences: List[Experience]
    educations: List[Education]
    skills: List[str]
    accomplishments: Optional[Accomplishment]

Company

class Company(BaseModel):
    name: str
    industry: Optional[str]
    company_size: Optional[str]
    headquarters: Optional[str]
    founded: Optional[str]
    specialties: List[str]
    about: Optional[str]
    linkedin_url: str

Job

class Job(BaseModel):
    title: str
    company: str
    location: Optional[str]
    description: Optional[str]
    employment_type: Optional[str]
    seniority_level: Optional[str]
    linkedin_url: str

Post

class Post(BaseModel):
    linkedin_url: Optional[str]
    urn: Optional[str]
    text: Optional[str]
    posted_date: Optional[str]
    reactions_count: Optional[int]
    comments_count: Optional[int]
    reposts_count: Optional[int]
    image_urls: List[str]

Advanced Usage

Browser Configuration

browser = BrowserManager(
    headless=False,  # Show browser window
    slow_mo=100,     # Slow down operations (ms)
    viewport={"width": 1920, "height": 1080},
    user_agent="Custom User Agent"
)

Error Handling

from linkedin_scraper import (
    AuthenticationError,
    RateLimitError,
    ProfileNotFoundError
)

try:
    person = await scraper.scrape(url)
except AuthenticationError:
    print("Not logged in - session expired")
except RateLimitError:
    print("Rate limited by LinkedIn")
except ProfileNotFoundError:
    print("Profile not found or private")

Best Practices

Rate Limiting - Add delays between requests

import asyncio
await asyncio.sleep(2)  # 2 second delay

Session Reuse - Save and reuse sessions to avoid frequent logins
Error Handling - Always handle exceptions (rate limits, auth errors, etc.)
Headless Mode - Use headless=False during development, True for production
Respect LinkedIn - Don't scrape aggressively, respect rate limits

Requirements

Python 3.8+
Playwright
Pydantic 2.0+
aiofiles
python-dotenv (optional, for credentials)

License

Apache License 2.0 - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Disclaimer

This tool is for educational purposes only. Make sure to comply with LinkedIn's Terms of Service and use responsibly. The authors are not responsible for any misuse of this tool.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.1.4

Feb 13, 2026

This version

3.1.3

Feb 13, 2026

3.1.2

Feb 13, 2026

3.1.1

Feb 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

linkedin_scraper_patchright-3.1.3.tar.gz (48.6 kB view details)

Uploaded Feb 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

linkedin_scraper_patchright-3.1.3-py3-none-any.whl (55.2 kB view details)

Uploaded Feb 13, 2026 Python 3

File details

Details for the file linkedin_scraper_patchright-3.1.3.tar.gz.

File metadata

Download URL: linkedin_scraper_patchright-3.1.3.tar.gz
Upload date: Feb 13, 2026
Size: 48.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for linkedin_scraper_patchright-3.1.3.tar.gz
Algorithm	Hash digest
SHA256	`99038565f2e9f78fc6c2f4fc9db886e6f2ae34427da7b0ce2517563c7fe5caa2`
MD5	`bf20c33106bdeed719ca859f15d1966e`
BLAKE2b-256	`e9b5f294b785b1291df745dfefe3afc3326938a6449be81ef08dd7a5902542fa`

See more details on using hashes here.

File details

Details for the file linkedin_scraper_patchright-3.1.3-py3-none-any.whl.

File metadata

Download URL: linkedin_scraper_patchright-3.1.3-py3-none-any.whl
Upload date: Feb 13, 2026
Size: 55.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for linkedin_scraper_patchright-3.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b27f4f89e6d15ed4fead2fbf2a54703cba28cc7e5eb55b6f343925940233779f`
MD5	`b2cdd2f7f042e601f3894a81a5a20105`
BLAKE2b-256	`2bad05d08ac2995ce002c8acc1ac708a95c101d0402effb740f6bbf412cf09fc`

See more details on using hashes here.

linkedin-scraper-patchright 3.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LinkedIn Scraper

⚠️ Breaking Changes in v3.0.0

What Changed:

Migration Guide:

Quick Testing

Features

Installation

Install Playwright browsers:

Quick Start

Basic Usage

Company Scraping

Job Scraping

Company Posts Scraping

Authentication

Option 1: Manual Login Script

Option 2: Programmatic Login

Progress Tracking

Custom Callbacks

Data Models

Person

Company

Job

Post

Advanced Usage

Browser Configuration

Error Handling

Best Practices

Requirements

License

Contributing

Disclaimer

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes