Python tomography tool
Project description
Pytomo is a YouTube crawler designed to figure out network information out of YouTube video download.
Usage
./start_crawl.py [-r max_rounds] [-u max_crawled_url] [-p max_per_url] [-P max_per_page] [-t time_frame] [-n ping_packets] [-D download_time] [-B buffering_video_duration] [-M min_playout_buffer_size] [-x] [-L log_level]:
Options: -h, --help show this help message and exit -r MAX_ROUNDS Max number of rounds to perform (default 50) -u MAX_CRAWLED_URL Max number of urls to visit (default 50000) -p MAX_PER_URL Max number of related urls from each page (default 2) -P MAX_PER_PAGE Max number of related videos from each page (default 30) -t TIME_FRAME Timeframe for the most popular videos to fetch at start of crawl put 'today', 'week', 'month' or 'all_time' (default 'week') -n PING_PACKETS Number of packets to be sent for each ping (default 3) -D DOWNLOAD_TIME Download time for the video in seconds (default 30.000000) -B BUFFERING_VIDEO_DURATION Buffering video duration in seconds (default 3.000000) -M MIN_PLAYOUT_BUFFER_SIZE Minimum Playout Buffer Size in seconds (default 1.000000) -x Do NOT store public IP address of the machine in the logs -L LOG_LEVEL The log level setting for the Logging module.Choose from: 'DEBUG', 'INFO', 'WARNING', 'ERROR' and 'CRITICAL' (default 'DEBUG') --http-proxy=PROXIES in case of http proxy to reach Internet (default None)
Installation-free
In order to provide installation-free package, we provide binary executables for Linux (32 and 64bits), Windows, and Mac OS X. The binaries files were generated with Pyinstaller (version 1.5RC1).
If you have Python installed, you can directly run the start_crawl.py script at root or the pytomo script in bin directory.
External Resources
We based the lib_youtube_download on YouTube Download script: we simplified it at most and include only the classes we needed (and only YouTube video retrieval).
The dns module is taken from the DNS Python Package: we just modified rdata so that Pyinstaller include all needed modules.
The extraction of metadata out of video files is an adaptation of Kaa Metadata Python Package: it has been modified in order to be independent of Kaa-base (thus pure Python and portable).