Asynchronous scraping library for torrent trackers.
Project description
PyAsyncTracker
PyAsyncTracker is an asynchronous library for scraping torrent tracker data. It provides tools to fetch and analyze torrent seeders, leechers, and completed downloads across multiple trackers efficiently using modern asynchronous techniques.
Features
- Support UDP and HTTP Trackers: Scrapes torrent data from both UDP and HTTP trackers.
- Asynchronous Operations: Leveraging Python's
asyncio
for non-blocking I/O. - Batch Processing: Supports scraping multiple info hashes across multiple trackers efficiently.
- Data Analysis Function: Includes utility functions to analyze and summarize scraped data.
Installation
Install PyAsyncTracker using pip:
pip install pyasynctracker
Usage
Scrape Info Hashes
Scrape torrent data asynchronously from multiple trackers for given info hashes.
from pyasynctracker import scrape_info_hashes
async def main():
info_hashes = ["2b66980093bc11806fab50cb3cb41835b95a0362", "706440a3f8fdac91591d6007c4314f3274317f85"]
trackers = [
"http://bttracker.debian.org:6969/announce",
"udp://tracker.openbittorrent.com:80/announce",
"udp://tracker.opentrackr.org:1337/announce",
]
results = await scrape_info_hashes(info_hashes, trackers)
print(results)
# {
# '706440a3f8fdac91591d6007c4314f3274317f85': [
# {'tracker_url': 'http://bttracker.debian.org:6969/announce', 'seeders': 168, 'peers': 1, 'complete': 769},
# {'tracker_url': 'udp://tracker.opentrackr.org:1337/announce', 'seeders': 5, 'peers': 0, 'complete': 20}
# ],
# '2b66980093bc11806fab50cb3cb41835b95a0362': [
# {'tracker_url': 'http://bttracker.debian.org:6969/announce', 'seeders': 1022, 'peers': 2, 'complete': 14920},
# {'tracker_url': 'udp://tracker.opentrackr.org:1337/announce', 'seeders': 25, 'peers': 0, 'complete': 184}
# ]
# }
# Run the async main function using asyncio
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Batch Scrape Info Hashes
Batch scrape info hashes based on a structured input of info hashes and their respective trackers. This function groups info hashes by their associated trackers and performs scraping in batches.
from pyasynctracker import batch_scrape_info_hashes
async def main():
data_list = [
("2b66980093bc11806fab50cb3cb41835b95a0362", ["http://bttracker.debian.org:6969/announce"]),
("706440a3f8fdac91591d6007c4314f3274317f85", ["udp://tracker.opentrackr.org:1337/announce"])
]
results = await batch_scrape_info_hashes(data_list)
print(results)
if __name__ == "__main__":
import asyncio
asyncio.run(main())
Find Maximum Seeders
Analyze the results from scraping functions to find the maximum number of seeders for each info hash.
from pyasynctracker import find_max_seeders
# Assuming 'results' is populated from the scrape_info_hashes or batch_scrape_info_hashes functions
max_seeders = find_max_seeders(results)
print(max_seeders)
# {'2b66980093bc11806fab50cb3cb41835b95a0362': 1022, '706440a3f8fdac91591d6007c4314f3274317f85': 168}
Contributing
Contributions to PyAsyncTracker are welcome! Please fork the repository and submit pull requests with your proposed changes. Ensure to follow coding standards and write tests for new features.
License
PyAsyncTracker is released under the MIT License. See the LICENSE file for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pyasynctracker-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71de6cbdce8a6c49c4915759f61afd8226dbff32df828ab67c6b77ad7724fde1 |
|
MD5 | 5f3089d3eba9f5d4b7fca48a57df238f |
|
BLAKE2b-256 | 9ee6c3a4ff1bb9072e08a921de3b29ebea95cb3f8e4388fff8f4a7d7f82d4913 |