Powerful Twitter/X scraping tool dengan Selenium
Project description
🐦 Panen Tweet — Twitter/X Scraper
Panen Tweet is a Python tool for scraping tweet data from Twitter/X based on keywords, date ranges, language, and tweet types. Suitable for research, data analysis, or thesis purposes.
📋 Table of Contents
- Prerequisites
- Installation
- Getting Auth Token
- Usage
- Output Format (CSV)
- Complete Parameters
- Tips & Tricks
- Troubleshooting
- Disclaimer & Legal
✅ Prerequisites
Before you start, make sure you have:
- Python 3.7 or newer → Download here
- Google Chrome installed on your computer
- An active Twitter/X account
Check your Python version:
python --version
Installation
Method 1: From PyPI (Recommended)
pip install panen-tweet
Method 2: From Source Code (GitHub)
git clone https://github.com/Dhaniaaa/panen-tweet.git
cd panen-tweet
pip install -e .
Special: Google Colab or Linux Server (VPS)
On Google Colab and Linux servers, Google Chrome is not installed by default. Run these commands first:
# 1. Install the library
!pip install panen-tweet
# 2. Install Google Chrome (only needed once)
!panen-tweet install-chrome
Getting Auth Token
What is an auth_token? An auth token is a unique code that proves you are logged into Twitter/X. This tool needs this token to access tweet data.
How to Get the Token (Step-by-Step):
- Open your browser (Chrome or Firefox) and log in to x.com
- Press F12 to open Developer Tools
- Click the Application tab (Chrome) or Storage tab (Firefox)
- In the left panel, click Cookies → select
https://x.com - Find the row named
auth_token - Click the row, then copy the value in the right column
🖼️ The token looks like a long string of characters, example:
1a2b3c4d5e6f7a8b9c0d...
TOKEN SECURITY — MUST READ!
This token is the full access key to your Twitter/X account.
- ❌ DO NOT share the token with anyone
- ❌ DO NOT hardcode the token directly in your Python file
- ❌ DO NOT commit/push files containing the token to GitHub
- ✅ Store the token in a
.envfile (see the guide in SECURITY.md) - ✅ If the token is leaked, immediately change your Twitter/X password
Usage
There are 3 ways to use Panen Tweet. Choose the one that best suits your needs.
Method 1: Command Line Interface (CLI) — Easiest for Beginners
After installation, simply run:
panen-tweet
The program will guide you interactively. You will be asked to enter:
| No. | Question | Example Input |
|---|---|---|
| 1 | Auth token | (paste token from browser) |
| 2 | Search keyword/topic | jakarta flood |
| 3 | Max tweets per session | 100 |
| 4 | Start date | 2024-01-01 |
| 5 | End date | 2024-01-07 |
| 6 | Interval days per session | 1 (1 = per day) |
| 7 | Language code | id (Indonesian), en (English), or leave blank for all |
| 8 | Tweet type | 1 (Top) or 2 (Latest) |
Example terminal output:
TWITTER/X SCRAPER - PANEN TWEET
================================
Enter your auth_token: <paste_token_here>
1. Enter search keyword/topic: jakarta flood
2. How many MAXIMUM tweets to scrape PER SESSION? 100
3. Enter overall START DATE (YYYY-MM-DD): 2024-01-01
4. Enter overall END DATE (YYYY-MM-DD): 2024-01-07
5. How many interval days per session? (1 = per day): 1
6. Enter language code (id / en / ja / etc, or leave blank): en
7. Select tweet type (1 for Top, 2 for Latest): 2
The scraped results will be automatically saved to a CSV file, example:
tweets_jakartaflood_latest_20240101-20240107.csv
Method 2: As a Python Library
Suitable if you want to integrate it into your own notebook or script.
from panen_tweet import TwitterScraper
import datetime
import os
# ✅ Safe way: read token from environment variable
# Run this in terminal first: export TWITTER_AUTH_TOKEN="yourtoken"
auth_token = os.getenv('TWITTER_AUTH_TOKEN')
if not auth_token:
raise ValueError("Token is not set! See SECURITY.md for instructions.")
# Initialize scraper
scraper = TwitterScraper(
auth_token=auth_token,
scroll_pause_time=5, # Pause between scrolls (seconds) - increase if connection is slow
headless=True # True = without browser GUI | False = show browser
)
# Run scraping
df = scraper.scrape_with_date_range(
keyword="jakarta flood",
target_per_session=100,
start_date=datetime.datetime(2024, 1, 1),
end_date=datetime.datetime(2024, 1, 7),
interval_days=1,
lang=None, # Use language code like 'en' or 'id', or None for all languages
search_type='latest' # 'top' or 'latest'
)
# Save to CSV
if df is not None:
scraper.save_to_csv(df, "scraping_results.csv")
print(f"✅ Successfully scraped {len(df)} tweets!")
print(df.head())
else:
print("❌ No data was successfully scraped.")
Method 3: Using a .env File for Token Security
This method is the safest to store the token without the risk of uploading it to GitHub.
Step 1 — Install python-dotenv:
pip install python-dotenv
Step 2 — Create a .env file in your project folder:
TWITTER_AUTH_TOKEN=your_token_here
Step 3 — Load it in your Python code:
from dotenv import load_dotenv
import os
load_dotenv() # Read the .env file
auth_token = os.getenv('TWITTER_AUTH_TOKEN')
The
.envfile is automatically included in.gitignore, so it will not be uploaded to GitHub.
Or if you prefer to use the terminal directly without a .env file:
Windows PowerShell:
$env:TWITTER_AUTH_TOKEN = "your_token_here"
panen-tweet
Linux / Mac:
export TWITTER_AUTH_TOKEN="your_token_here"
panen-tweet
Output Format (CSV)
Scraping results are automatically saved in CSV format with the following columns:
| Column | Description |
|---|---|
username |
Display name of the user |
handle |
Twitter account name (@username) |
timestamp |
Time the tweet was posted (ISO 8601 format) |
tweet_text |
Text content of the tweet |
url |
Direct link to the tweet |
reply_count |
Number of replies |
retweet_count |
Number of retweets |
like_count |
Number of likes |
Example CSV content:
username,handle,timestamp,tweet_text,url,reply_count,retweet_count,like_count
Budi Santoso,@budisant,2024-01-01T10:30:00.000Z,"Severe flood in Jakarta!",https://x.com/budisant/status/123,5,10,25
Complete Parameters
TwitterScraper()
TwitterScraper(
auth_token=None, # (REQUIRED) Token from browser cookie
scroll_pause_time=5, # Pause between scrolls, in seconds (default: 5)
headless=True # True = without browser GUI | False = show browser
)
scrape_with_date_range()
scraper.scrape_with_date_range(
keyword="", # (REQUIRED) Search keyword
target_per_session=100, # Target number of tweets per session (default: 100)
start_date=datetime, # (REQUIRED) Start date, format: datetime(YYYY, M, D)
end_date=datetime, # (REQUIRED) End date, format: datetime(YYYY, M, D)
interval_days=1, # Interval days per session (1 = scraping per day)
lang=None, # Language code: 'en', 'id', 'ja', 'es', etc. None for all.
search_type='top' # 'top' = top tweets | 'latest' = latest tweets
)
Tips & Tricks
Collecting Many Tweets
- Use
interval_days=1to scrape per day for more detailed results - Do not set
target_per_sessiontoo high (recommended 50–200) - Increase
scroll_pause_timeto allow more loading time if your connection is slow
Avoiding Rate Limits
Rate limits mean Twitter/X restricts access because scraping is too fast.
- Use a
scroll_pause_timeof at least 5 seconds - Do not run more than one scraping process simultaneously
- Add a pause of a few minutes between large sessions
Available Language Codes
| Code | Language |
|---|---|
id |
Indonesian |
en |
English |
ja |
Japanese |
es |
Spanish |
fr |
French |
ko |
Korean |
Troubleshooting
❌ Error: WebDriver not found
Chrome is not detected or ChromeDriver does not match.
Solution:
- Make sure Google Chrome is installed
- The package will automatically download the appropriate ChromeDriver
❌ Error: Auth token invalid
The token you entered is invalid or expired.
Solution:
- Reopen x.com in your browser
- Log in again if necessary
- Retrieve the
auth_tokenvalue again from the Developer Tools → Cookies tab - Make sure there are no trailing spaces when copying and pasting
❌ Error: No tweets found
No tweets were found for the parameters you entered.
Solution:
- Check your internet connection
- Try more common/popular keywords
- Check the date range — there might genuinely be no tweets in that period
- Ensure the auth_token is still valid
Browser does not appear
This is normal — the default mode is headless=True (without browser GUI).
If you want to see the scraping process visually:
scraper = TwitterScraper(auth_token=token, headless=False)
Requirements
- Python 3.7+
- Google Chrome (latest version)
- Dependencies (automatically installed with the package):
pandas >= 2.0.0selenium >= 4.0.0webdriver-manager >= 4.0.0
Disclaimer & Legal
This tool was created for educational and scientific research purposes.
By using this tool, you agree to comply with:
- Twitter/X Terms of Service
- Twitter/X Developer Agreement
- Platform rate limiting rules and robots.txt
- Privacy rights and copyrights of other users
The developer is not responsible for any misuse of this tool.
Contributing
Contributions are very welcome! How to contribute:
- Fork this repository
- Create a new branch:
git checkout -b feature/new-feature - Commit changes:
git commit -m 'Add a new feature' - Push to the branch:
git push origin feature/new-feature - Create a Pull Request
License
MIT License — see the LICENSE file for full details.
Support & Contact
- Report Bugs: GitHub Issues
- PyPI Package: pypi.org/project/panen-tweet
- Email: ramadhanigb19@gmail.com
Special Thanks To
- Selenium — Web automation framework
- webdriver-manager — Automatic ChromeDriver management
- pandas — Data processing
Made with ❤️ for the data science & research community
⭐ If this project is helpful, give it a star on GitHub!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file panen_tweet-1.1.1.tar.gz.
File metadata
- Download URL: panen_tweet-1.1.1.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c98453065d5956911676a09533bdea2dd82bc02e65fe09a44229da5620ec6fb
|
|
| MD5 |
aa62a671081e88a1f7d54de145a120e8
|
|
| BLAKE2b-256 |
b8e75a0b0925d5a69ab27ffc0f24b6d2d548f29def979e26ba76bda0124fb69d
|
Provenance
The following attestation bundles were made for panen_tweet-1.1.1.tar.gz:
Publisher:
workflows.yaml on DhaniAAA/panen-tweet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
panen_tweet-1.1.1.tar.gz -
Subject digest:
7c98453065d5956911676a09533bdea2dd82bc02e65fe09a44229da5620ec6fb - Sigstore transparency entry: 1822053277
- Sigstore integration time:
-
Permalink:
DhaniAAA/panen-tweet@d74ddd7b9e3ccb329c72a7426aa08be977dc0371 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/DhaniAAA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflows.yaml@d74ddd7b9e3ccb329c72a7426aa08be977dc0371 -
Trigger Event:
release
-
Statement type:
File details
Details for the file panen_tweet-1.1.1-py3-none-any.whl.
File metadata
- Download URL: panen_tweet-1.1.1-py3-none-any.whl
- Upload date:
- Size: 15.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b89acca142d9a7c1724019b04b236817c194590c1f0d7f5e069036eec6222954
|
|
| MD5 |
23832b4d40d57415ae0a4c88bf6245f7
|
|
| BLAKE2b-256 |
e158016ccaa23fecad66a83d4f82322d1e6f14ac223cda675f114fbeca1e3361
|
Provenance
The following attestation bundles were made for panen_tweet-1.1.1-py3-none-any.whl:
Publisher:
workflows.yaml on DhaniAAA/panen-tweet
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
panen_tweet-1.1.1-py3-none-any.whl -
Subject digest:
b89acca142d9a7c1724019b04b236817c194590c1f0d7f5e069036eec6222954 - Sigstore transparency entry: 1822053307
- Sigstore integration time:
-
Permalink:
DhaniAAA/panen-tweet@d74ddd7b9e3ccb329c72a7426aa08be977dc0371 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/DhaniAAA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
workflows.yaml@d74ddd7b9e3ccb329c72a7426aa08be977dc0371 -
Trigger Event:
release
-
Statement type: