Scrape Twitter API without authentication using Nitter.
Project description
Nitter Scraper
This library is a work around for anyone who enjoyed the simplicity of the twitter-scraper library and needs a quick replacement until it comes back up. Nitter Scraper leverages running a docker container instance of nitter to scrape tweets and profile information. I attempted to make the api work as closely as possible to the original twitter-scraper to minimize refactoring of twitter scraping projects.
Use Nitter Scraper with docker
If you have docker installed you can use the NitterScraper context manager. It takes care of running/destorying a local nitter docker container instance. If you don't have docker installed you can use the get_tweets and get_profile to scraping from https://www.nitter.net
Basic Usage
from nitter_scraper import NitterScraper
with NitterScraper(host="0.0.0.0", port=8008) as nitter:
profile = nitter.get_profile("dgnsrekt")
print(profile.json(indent=4))
Output
2020-09-11 10:11:08.533 | INFO | nitter_scraper.nitter:get_client:35 - Docker connection successful.
2020-09-11 10:11:10.227 | INFO | nitter_scraper.nitter:start:102 - Running container admiring_panini 368437c322.
{
"username": "DGNSREKT",
"name": "DGNSREKT",
"profile_photo": "/pic/profile_images%2F1304387984440152064%2FiWh8NV8M.png",
"tweets_count": 2897,
"following_count": 905,
"followers_count": 119,
"likes_count": 4994,
"is_verified": false,
"banner_photo": "/pic/profile_banners%2F2474416796%2F1599825305%2F1500x500",
"biography": "BITCOIN IS DEAD AGAIN. :(",
"user_id": 2474416796,
"location": "Moon",
"website": "https://github.com/dgnsrekt"
}
2020-09-11 10:11:11.007 | INFO | nitter_scraper.nitter:stop:105 - Stopping container admiring_panini 368437c322.
2020-09-11 10:11:16.346 | INFO | nitter_scraper.nitter:stop:108 - Container admiring_panini 368437c322 Destroyed.
Prerequisites
- Docker
- Python ^3.7
Install
pip install nitter-scraper
Examples
Scrape Users Tweets
from nitter_scraper import get_tweets
from nitter_scraper import NitterScraper
from pprint import pprint
users = ["dgnsrekt"]
print("Scraping with local nitter docker instance.")
with NitterScraper(host="0.0.0.0", port=8008) as nitter:
for user in users:
for tweet in nitter.get_tweets(user, pages=2):
print()
pprint(tweet.dict())
print(tweet.json(indent=4))
print("Scraping from https://www.nitter.net.")
for user in users:
for tweet in get_tweets(user, pages=2):
print()
pprint(tweet.dict())
print(tweet.json(indent=4))
Scrape User Profiles
from nitter_scraper import NitterScraper
from nitter_scraper import get_profile
from pprint import pprint
users = ["dgnsrekt"]
print("Scraping with local nitter docker instance.")
with NitterScraper(host="0.0.0.0", port=8008) as nitter:
for user in users:
profile = nitter.get_profile(user, not_found_ok=True)
print(profile)
pprint(profile.dict())
print(profile.json(indent=4))
print("Scraping from https://www.nitter.net.")
for user in users:
profile = get_profile(user, not_found_ok=True)
print(profile)
pprint(profile.dict())
print(profile.json(indent=4))
Profile & Tweet objects
The Profile and Tweet classes inherit from pydantic's BaseModel. This makes it easy to convert to a dict or json.
profile = get_profile("dgnsrekt")
profile.json(indent=4)
profile.dict()
Profile Schema
These are the fields uses in the Profile class which is returned from the get_profile() method.
{
"title": "Profile",
"type": "object",
"properties": {
"username": {
"title": "Username",
"type": "string"
},
"name": {
"title": "Name",
"type": "string"
},
"profile_photo": {
"title": "Profile Photo",
"type": "string"
},
"tweets_count": {
"title": "Tweets Count",
"type": "integer"
},
"following_count": {
"title": "Following Count",
"type": "integer"
},
"followers_count": {
"title": "Followers Count",
"type": "integer"
},
"likes_count": {
"title": "Likes Count",
"type": "integer"
},
"is_verified": {
"title": "Is Verified",
"type": "boolean"
},
"banner_photo": {
"title": "Banner Photo",
"type": "string"
},
"biography": {
"title": "Biography",
"type": "string"
},
"user_id": {
"title": "User Id",
"type": "integer"
},
"location": {
"title": "Location",
"type": "string"
},
"website": {
"title": "Website",
"type": "string"
}
},
"required": [
"username",
"name",
"profile_photo",
"tweets_count",
"following_count",
"followers_count",
"likes_count",
"is_verified"
]
}
Tweet Schema
These are the fields uses in the Tweet class which is returned from the get_tweets() method.
{
"title": "Tweet",
"type": "object",
"properties": {
"tweet_id": {
"title": "Tweet Id",
"type": "integer"
},
"tweet_url": {
"title": "Tweet Url",
"type": "string"
},
"username": {
"title": "Username",
"type": "string"
},
"is_retweet": {
"title": "Is Retweet",
"type": "boolean"
},
"is_pinned": {
"title": "Is Pinned",
"type": "boolean"
},
"time": {
"title": "Time",
"type": "string",
"format": "date-time"
},
"text": {
"title": "Text",
"type": "string"
},
"replies": {
"title": "Replies",
"type": "integer"
},
"retweets": {
"title": "Retweets",
"type": "integer"
},
"likes": {
"title": "Likes",
"type": "integer"
},
"entries": {
"$ref": "#/definitions/Entries"
}
},
"required": [
"tweet_id",
"tweet_url",
"username",
"is_retweet",
"is_pinned",
"time",
"text",
"replies",
"retweets",
"likes",
"entries"
],
"definitions": {
"Entries": {
"title": "Entries",
"type": "object",
"properties": {
"hashtags": {
"title": "Hashtags",
"type": "array",
"items": {
"type": "string"
}
},
"cashtags": {
"title": "Cashtags",
"type": "array",
"items": {
"type": "string"
}
},
"urls": {
"title": "Urls",
"type": "array",
"items": {
"type": "string"
}
},
"photos": {
"title": "Photos",
"type": "array",
"items": {
"type": "string"
}
},
"videos": {
"title": "Videos",
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"hashtags",
"cashtags",
"urls",
"photos",
"videos"
]
}
}
}
Limitation
- So far i've been able to pull about 800 tweets per user without issue.
- Unable to implement a way to scrape trends.
- birthday and is_private are not implemented in the profile.
- If the user does not have a banner the user_id cannot be scraped.
- The user_id cannot be scraped from tweets.
More docs coming soon.
TODO
- Explain args
- More tests
Contact Information
Telegram = Twitter = Tradingview = Discord = @dgnsrekt
Email = dgnsrekt@pm.me
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nitter_scraper-0.4.2.tar.gz
.
File metadata
- Download URL: nitter_scraper-0.4.2.tar.gz
- Upload date:
- Size: 11.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.8.2 Linux/5.4.0-7634-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1ea23d33742c01731fb2f0b006710feaa0b19aa79ab34fc35f90b8a7fd15b1e |
|
MD5 | 7c559704871511bdd341d91eda517655 |
|
BLAKE2b-256 | 0198dd18e1f4b7d6d8fc776d3e67d792f2e85ef2c0462aeafb3dfddef9eb75ff |
File details
Details for the file nitter_scraper-0.4.2-py3-none-any.whl
.
File metadata
- Download URL: nitter_scraper-0.4.2-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.10 CPython/3.8.2 Linux/5.4.0-7634-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99f41a6962d3aa7e7b8b71cfde9fbb9e143da5d2ed3c008e77e01e8f79cfd4b8 |
|
MD5 | 7a1ac2c25f03c7f2650637b1c56a9777 |
|
BLAKE2b-256 | f18cfaa091e4168937f41e545cf774c10884fb5ebf80ac9c823920c098f8cb73 |