A professional CLI tool for scraping LinkedIn profiles via Google Search (linkedin-spider was taken)
Project description
๐ท๏ธ LinkedIn Spider
A professional CLI tool for scraping LinkedIn profiles via Google Search
๐ฆ PyPI Package Name: This project is available on PyPI as
linkedin-tarantulabecauselinkedin-spiderwas already taken by another project. The GitHub repository remainslinkedin-spider.
๐ Overview
LinkedIn Spider is a powerful, user-friendly command-line tool that helps you collect and analyze LinkedIn profiles at scale. By leveraging Google Search instead of direct LinkedIn scraping, it significantly reduces the risk of account restrictions while providing comprehensive profile data.
โจ Features
- ๐ Smart Search - Find profiles via Google Search to avoid LinkedIn rate limits
- ๐จ Beautiful CLI - Interactive arrow-key menu navigation with ASCII art
- ๐ Data Export - Export to CSV, JSON, or Excel formats
- ๐ Secure - Environment-based configuration for credentials
- ๐ VPN Support - Optional IP rotation for enhanced privacy
- โก Fast & Efficient - Progress tracking and batch processing
- ๐ก๏ธ Anti-Detection - Random delays, user agents, and human-like behavior
- ๐ค CAPTCHA Handler - Automatic CAPTCHA detection with auto-resume
- ๐ฎ Interactive Menu - Navigate with arrow keys (โโ) and Enter
๐ฆ Installation
Option 1: PyPI Installation (Recommended)
Note: The PyPI package is named
linkedin-tarantula(notlinkedin-spider) because the latter name was already taken.
# Install from PyPI
pip install linkedin-tarantula
# Or with Excel export support
pip install linkedin-tarantula[excel]
This installs the linkedin-spider command globally.
Option 2: Quick Install from Source
# Clone the repository
git clone https://github.com/alexcolls/linkedin-spider.git
cd linkedin-spider
# Run the installation script
./install.sh
The installation script provides three options:
- System Installation - Installs globally as
linkedin-spidercommand - Development Installation - Installs locally with Poetry for testing
- Both - Installs both system and development modes
Option 3: Development Installation
# Install from source with Poetry
poetry install
# Optional: Install with Excel support
poetry install -E excel
# Activate the virtual environment
poetry shell
Option 4: Install from GitHub (Direct)
# Install directly from GitHub
pip install git+https://github.com/alexcolls/linkedin-spider.git
# Or with Excel support
pip install "linkedin-spider[excel] @ git+https://github.com/alexcolls/linkedin-spider.git"
โ๏ธ Configuration
1. Environment Variables
cp .env.sample .env
# Edit .env with your LinkedIn credentials
2. Configuration File
Edit config.yaml for advanced settings (delays, VPN, export format, etc.)
๐ฏ Usage
Quick Start
# If installed from PyPI (pip install linkedin-tarantula)
linkedin-spider
# If installed from source with system mode
linkedin-spider
# If installed with development mode
./run.sh
# Or with Poetry directly
poetry run python -m linkedin_spider
Interactive Mode
The CLI provides an interactive menu with ASCII art and arrow-key navigation:
linkedin-spider # or ./run.sh for development
Navigation:
- Use โโ arrow keys to navigate
- Press Enter to select
- Or type the number directly
Menu options:
- ๐ Search & Collect Profile URLs
- ๐ Scrape Profile Data
- ๐ค Auto-Connect to Profiles
- ๐ View/Export Results
- โ๏ธ Configure Settings
- โ Help
- ๐ช Exit
Command-Line Mode
# Search for profiles
linkedin-spider search "Python Developer" "San Francisco" --max-pages 10
# Scrape profiles from file
linkedin-spider scrape --urls data/profile_urls.txt --output results --format csv
# Show version
linkedin-spider version
๐๏ธ Uninstallation
To remove LinkedIn Spider from your system:
./uninstall.sh
This will:
- Remove the system command (if installed)
- Clean up Poetry virtual environments
- Optionally remove .env and data files
๐ง Key Features Explained
CAPTCHA Handling
LinkedIn Spider automatically detects and handles Google CAPTCHA challenges:
- Automatic Detection: Instantly detects when CAPTCHA appears
- Clear Instructions: Shows what to do in the terminal
- Auto-Resume: Automatically continues when CAPTCHA is solved (no manual Enter press needed!)
- Progress Updates: Shows elapsed time every 10 seconds
- Smart Polling: Checks every 2 seconds for resolution
- Timeout Protection: 5-minute maximum wait with fallback
Data Directory
All data is saved in the data/ folder in your current working directory:
- Profile URLs:
data/profile_urls.txt - Exported profiles:
data/profiles_YYYYMMDD_HHMMSS.csv/json/xlsx - Logs:
logs/linkedin-spider.log
โ ๏ธ Legal & Ethical Considerations
- Terms of Service: This tool is for educational purposes. Always comply with LinkedIn's Terms of Service.
- Rate Limiting: Use appropriate delays to avoid overwhelming servers.
- Privacy: Respect privacy. Only collect publicly available information.
- Usage: Use this tool responsibly and ethically.
๐ License
MIT License - see LICENSE file for details.
๐ Acknowledgments
Built with Selenium, Typer, Rich, and Poetry.
๐ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
โญ Show Your Support
If this project helped you, please consider:
- โญ Starring the repository
- ๐ Reporting bugs
- ๐ก Suggesting features
- ๐ค Contributing code
- ๐ข Sharing with others
|
|
|
|
|
|
|
____ | ,
/---.'.__ | ____//
'--.\ | /.---'
_______ \\ | //
/.------.\ \| | .'/ ______
// ___ \ \ ||/|\ // _/_----.\__
|/ /.-.\ \ \:|< >|// _/.'..\ '--'
// \'. | \'.|.'/ /_/ / \\
// \ \_\/" ' ~\-'.-' \\
// '-._| :H: |'-.__ \\
// (/'==='\)'-._\ ||
|| \\ \|
|| \\ '
|/ \\
||
||
\\
'
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโชโโชโโชโโโโโโโโโโโ
โ โ
โ โโโ โโโโโโโ โโโโโโ โโโโโโโโโโโโโโโโโโ โโโโโโโ โโโ โ
โ โโโ โโโโโโโโ โโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ โโโ โ
โ โโโ โโโโโโโโโ โโโโโโโโโโ โโโโโโ โโโ โโโโโโโโโโโโ โโโ โ
โ โโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโ โโโ โโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโ โโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโ โ
โ โโโโโโโโโโโโโโ โโโโโโโโ โโโโโโโโโโโโโโโโโโ โโโโโโ โโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโโโโโ ^.-.^ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ '^\+/^` โ
โ โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโ โโโโโโโโ '/`"'\` โ
โ โโโโโโโโโโโโโโโ โโโโโโ โโโโโโโโโ โโโโโโโโ โ
โ โโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ โโโ โ
โ โโโโโโโโโโโ โโโโโโโโโโ โโโโโโโโโโโ โโโ โ
โ โโโโชโโโชโโชโโชโโโโชโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Professional Network Profile Scraper โ
โ โโโ Weaving Through Networks โโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Made with โค๏ธ and ๐ Python
Get Linkedin profiles at scale
ยฉ 2022 LinkedIn Spider | MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file linkedin_tarantula-0.1.0.tar.gz.
File metadata
- Download URL: linkedin_tarantula-0.1.0.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.8.0-52-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50f1eafac90777c4f58795917101536d982ca24628bfcabae81078fbbe37182e
|
|
| MD5 |
41634ef37756dcef420a87c2e47cc3b5
|
|
| BLAKE2b-256 |
a8cc3ff3dafa230a147dff8ccc2f2ed76eb695da424f7fa119967b2b36e44275
|
File details
Details for the file linkedin_tarantula-0.1.0-py3-none-any.whl.
File metadata
- Download URL: linkedin_tarantula-0.1.0-py3-none-any.whl
- Upload date:
- Size: 47.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.3 Linux/6.8.0-52-generic
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12504993ac975c34ca32e0a21e2c3dbe069d56b5143c211a055609322bb72d20
|
|
| MD5 |
e3349b24a487c7fe768bc399c0763071
|
|
| BLAKE2b-256 |
c07644d0bfefdbabdcfb73560011ebd8898fedf79fbc46393619196fd8a19dff
|