Powerful Twitter/X scraping tool dengan Selenium

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ramadhanigb1997

These details have not been verified by PyPI

Project description

🐦 Panen Tweet — Twitter/X Scraper

Panen Tweet is a Python tool for scraping tweet data from Twitter/X based on keywords, date ranges, language, and tweet types. Suitable for research, data analysis, or thesis purposes.

✅ Prerequisites

Before you start, make sure you have:

Python 3.7 or newer → Download here
Google Chrome installed on your computer
An active Twitter/X account

Check your Python version:

python --version

Installation

Method 1: From PyPI (Recommended)

pip install panen-tweet

Method 2: From Source Code (GitHub)

git clone https://github.com/Dhaniaaa/panen-tweet.git
cd panen-tweet
pip install -e .

Special: Google Colab or Linux Server (VPS)

On Google Colab and Linux servers, Google Chrome is not installed by default. Run these commands first:

# 1. Install the library
!pip install panen-tweet

# 2. Install Google Chrome (only needed once)
!panen-tweet install-chrome

Getting Auth Token

What is an auth_token? An auth token is a unique code that proves you are logged into Twitter/X. This tool needs this token to access tweet data.

How to Get the Token (Step-by-Step):

Open your browser (Chrome or Firefox) and log in to x.com
Press F12 to open Developer Tools
Click the Application tab (Chrome) or Storage tab (Firefox)
In the left panel, click Cookies → select https://x.com
Find the row named auth_token
Click the row, then copy the value in the right column

🖼️ The token looks like a long string of characters, example: 1a2b3c4d5e6f7a8b9c0d...

TOKEN SECURITY — MUST READ!

This token is the full access key to your Twitter/X account.

❌ DO NOT share the token with anyone
❌ DO NOT hardcode the token directly in your Python file
❌ DO NOT commit/push files containing the token to GitHub
✅ Store the token in a .env file (see the guide in SECURITY.md)
✅ If the token is leaked, immediately change your Twitter/X password

Usage

There are 3 ways to use Panen Tweet. Choose the one that best suits your needs.

Method 1: Command Line Interface (CLI) — Easiest for Beginners

After installation, simply run:

panen-tweet

The program will guide you interactively. You will be asked to enter:

No.	Question	Example Input
1	Auth token	(paste token from browser)
2	Search keyword/topic	`jakarta flood`
3	Max tweets per session	`100`
4	Start date	`2024-01-01`
5	End date	`2024-01-07`
6	Interval days per session	`1` (1 = per day)
7	Language code	`id` (Indonesian), `en` (English), or leave blank for all
8	Tweet type	`1` (Top) or `2` (Latest)

Example terminal output:

TWITTER/X SCRAPER - PANEN TWEET
================================
Enter your auth_token: <paste_token_here>

1. Enter search keyword/topic: jakarta flood
2. How many MAXIMUM tweets to scrape PER SESSION? 100
3. Enter overall START DATE (YYYY-MM-DD): 2024-01-01
4. Enter overall END DATE (YYYY-MM-DD): 2024-01-07
5. How many interval days per session? (1 = per day): 1
6. Enter language code (id / en / ja / etc, or leave blank): en
7. Select tweet type (1 for Top, 2 for Latest): 2

The scraped results will be automatically saved to a CSV file, example: tweets_jakartaflood_latest_20240101-20240107.csv

Method 2: As a Python Library

Suitable if you want to integrate it into your own notebook or script.

from panen_tweet import TwitterScraper
import datetime
import os

# ✅ Safe way: read token from environment variable
# Run this in terminal first: export TWITTER_AUTH_TOKEN="yourtoken"
auth_token = os.getenv('TWITTER_AUTH_TOKEN')

if not auth_token:
    raise ValueError("Token is not set! See SECURITY.md for instructions.")

# Initialize scraper
scraper = TwitterScraper(
    auth_token=auth_token,
    scroll_pause_time=5,  # Pause between scrolls (seconds) - increase if connection is slow
    headless=True         # True = without browser GUI | False = show browser
)

# Run scraping
df = scraper.scrape_with_date_range(
    keyword="jakarta flood",
    target_per_session=100,
    start_date=datetime.datetime(2024, 1, 1),
    end_date=datetime.datetime(2024, 1, 7),
    interval_days=1,
    lang=None,            # Use language code like 'en' or 'id', or None for all languages
    search_type='latest'  # 'top' or 'latest'
)

# Save to CSV
if df is not None:
    scraper.save_to_csv(df, "scraping_results.csv")
    print(f"✅ Successfully scraped {len(df)} tweets!")
    print(df.head())
else:
    print("❌ No data was successfully scraped.")

Method 3: Using a `.env` File for Token Security

This method is the safest to store the token without the risk of uploading it to GitHub.

Step 1 — Install python-dotenv:

pip install python-dotenv

Step 2 — Create a .env file in your project folder:

TWITTER_AUTH_TOKEN=your_token_here

Step 3 — Load it in your Python code:

from dotenv import load_dotenv
import os

load_dotenv()  # Read the .env file
auth_token = os.getenv('TWITTER_AUTH_TOKEN')

The .env file is automatically included in .gitignore, so it will not be uploaded to GitHub.

Or if you prefer to use the terminal directly without a .env file:

Windows PowerShell:

$env:TWITTER_AUTH_TOKEN = "your_token_here"
panen-tweet

Linux / Mac:

export TWITTER_AUTH_TOKEN="your_token_here"
panen-tweet

Output Format (CSV)

Scraping results are automatically saved in CSV format with the following columns:

Column	Description
`username`	Display name of the user
`handle`	Twitter account name (`@username`)
`timestamp`	Time the tweet was posted (ISO 8601 format)
`tweet_text`	Text content of the tweet
`url`	Direct link to the tweet
`reply_count`	Number of replies
`retweet_count`	Number of retweets
`like_count`	Number of likes

Example CSV content:

username,handle,timestamp,tweet_text,url,reply_count,retweet_count,like_count
Budi Santoso,@budisant,2024-01-01T10:30:00.000Z,"Severe flood in Jakarta!",https://x.com/budisant/status/123,5,10,25

Complete Parameters

`TwitterScraper()`

TwitterScraper(
    auth_token=None,        # (REQUIRED) Token from browser cookie
    scroll_pause_time=5,    # Pause between scrolls, in seconds (default: 5)
    headless=True           # True = without browser GUI | False = show browser
)

`scrape_with_date_range()`

scraper.scrape_with_date_range(
    keyword="",             # (REQUIRED) Search keyword
    target_per_session=100, # Target number of tweets per session (default: 100)
    start_date=datetime,    # (REQUIRED) Start date, format: datetime(YYYY, M, D)
    end_date=datetime,      # (REQUIRED) End date, format: datetime(YYYY, M, D)
    interval_days=1,        # Interval days per session (1 = scraping per day)
    lang=None,              # Language code: 'en', 'id', 'ja', 'es', etc. None for all.
    search_type='top'       # 'top' = top tweets | 'latest' = latest tweets
)

Tips & Tricks

Collecting Many Tweets

Use interval_days=1 to scrape per day for more detailed results
Do not set target_per_session too high (recommended 50–200)
Increase scroll_pause_time to allow more loading time if your connection is slow

Avoiding Rate Limits

Rate limits mean Twitter/X restricts access because scraping is too fast.

Use a scroll_pause_time of at least 5 seconds
Do not run more than one scraping process simultaneously
Add a pause of a few minutes between large sessions

Available Language Codes

Code	Language
`id`	Indonesian
`en`	English
`ja`	Japanese
`es`	Spanish
`fr`	French
`ko`	Korean

Troubleshooting

❌ Error: `WebDriver not found`

Chrome is not detected or ChromeDriver does not match.

Solution:

Make sure Google Chrome is installed
The package will automatically download the appropriate ChromeDriver

❌ Error: `Auth token invalid`

The token you entered is invalid or expired.

Solution:

Reopen x.com in your browser
Log in again if necessary
Retrieve the auth_token value again from the Developer Tools → Cookies tab
Make sure there are no trailing spaces when copying and pasting

❌ Error: `No tweets found`

No tweets were found for the parameters you entered.

Solution:

Check your internet connection
Try more common/popular keywords
Check the date range — there might genuinely be no tweets in that period
Ensure the auth_token is still valid

Browser does not appear

This is normal — the default mode is headless=True (without browser GUI).

If you want to see the scraping process visually:

scraper = TwitterScraper(auth_token=token, headless=False)

Requirements

Python 3.7+
Google Chrome (latest version)
Dependencies (automatically installed with the package):
- pandas >= 2.0.0
- selenium >= 4.0.0
- webdriver-manager >= 4.0.0

Disclaimer & Legal

This tool was created for educational and scientific research purposes.

By using this tool, you agree to comply with:

Twitter/X Terms of Service
Twitter/X Developer Agreement
Platform rate limiting rules and robots.txt
Privacy rights and copyrights of other users

The developer is not responsible for any misuse of this tool.

Contributing

Contributions are very welcome! How to contribute:

Fork this repository
Create a new branch: git checkout -b feature/new-feature
Commit changes: git commit -m 'Add a new feature'
Push to the branch: git push origin feature/new-feature
Create a Pull Request

License

MIT License — see the LICENSE file for full details.

Support & Contact

Report Bugs: GitHub Issues
PyPI Package: pypi.org/project/panen-tweet
Email: ramadhanigb19@gmail.com

Special Thanks To

Selenium — Web automation framework
webdriver-manager — Automatic ChromeDriver management
pandas — Data processing

Made with ❤️ for the data science & research community

⭐ If this project is helpful, give it a star on GitHub!

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ramadhanigb1997

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.1.1

Jun 15, 2026

1.1.0

Feb 16, 2026

1.0.5

Jan 27, 2026

1.0.4

Jan 27, 2026

1.0.3

Jan 27, 2026

1.0.2

Jan 27, 2026

1.0.1

Jan 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panen_tweet-1.1.1.tar.gz (22.6 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

panen_tweet-1.1.1-py3-none-any.whl (15.7 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file panen_tweet-1.1.1.tar.gz.

File metadata

Download URL: panen_tweet-1.1.1.tar.gz
Upload date: Jun 15, 2026
Size: 22.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for panen_tweet-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`7c98453065d5956911676a09533bdea2dd82bc02e65fe09a44229da5620ec6fb`
MD5	`aa62a671081e88a1f7d54de145a120e8`
BLAKE2b-256	`b8e75a0b0925d5a69ab27ffc0f24b6d2d548f29def979e26ba76bda0124fb69d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for panen_tweet-1.1.1.tar.gz:

Publisher: workflows.yaml on DhaniAAA/panen-tweet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: panen_tweet-1.1.1.tar.gz
- Subject digest: 7c98453065d5956911676a09533bdea2dd82bc02e65fe09a44229da5620ec6fb
- Sigstore transparency entry: 1822053277
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: DhaniAAA/panen-tweet@d74ddd7b9e3ccb329c72a7426aa08be977dc0371
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/DhaniAAA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflows.yaml@d74ddd7b9e3ccb329c72a7426aa08be977dc0371
- Trigger Event: release

File details

Details for the file panen_tweet-1.1.1-py3-none-any.whl.

File metadata

Download URL: panen_tweet-1.1.1-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for panen_tweet-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b89acca142d9a7c1724019b04b236817c194590c1f0d7f5e069036eec6222954`
MD5	`23832b4d40d57415ae0a4c88bf6245f7`
BLAKE2b-256	`e158016ccaa23fecad66a83d4f82322d1e6f14ac223cda675f114fbeca1e3361`

See more details on using hashes here.

Provenance

The following attestation bundles were made for panen_tweet-1.1.1-py3-none-any.whl:

Publisher: workflows.yaml on DhaniAAA/panen-tweet

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: panen_tweet-1.1.1-py3-none-any.whl
- Subject digest: b89acca142d9a7c1724019b04b236817c194590c1f0d7f5e069036eec6222954
- Sigstore transparency entry: 1822053307
- Sigstore integration time: Jun 15, 2026
Source repository:
- Permalink: DhaniAAA/panen-tweet@d74ddd7b9e3ccb329c72a7426aa08be977dc0371
- Branch / Tag: refs/tags/v1.1.1
- Owner: https://github.com/DhaniAAA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflows.yaml@d74ddd7b9e3ccb329c72a7426aa08be977dc0371
- Trigger Event: release

panen-tweet 1.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

🐦 Panen Tweet — Twitter/X Scraper

📋 Table of Contents

✅ Prerequisites

Installation

Method 1: From PyPI (Recommended)

Method 2: From Source Code (GitHub)

Special: Google Colab or Linux Server (VPS)

Getting Auth Token

How to Get the Token (Step-by-Step):

TOKEN SECURITY — MUST READ!

Usage

Method 1: Command Line Interface (CLI) — Easiest for Beginners

Method 2: As a Python Library

Method 3: Using a .env File for Token Security

Output Format (CSV)

Complete Parameters

TwitterScraper()

scrape_with_date_range()

Tips & Tricks

Collecting Many Tweets

Avoiding Rate Limits

Available Language Codes

Troubleshooting

❌ Error: WebDriver not found

❌ Error: Auth token invalid

❌ Error: No tweets found

Browser does not appear

Requirements

Disclaimer & Legal

Contributing

License

Support & Contact

Special Thanks To

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Method 3: Using a `.env` File for Token Security

`TwitterScraper()`

`scrape_with_date_range()`

❌ Error: `WebDriver not found`

❌ Error: `Auth token invalid`

❌ Error: `No tweets found`