Skip to main content

Python tomography tool

Project description

Pytomo is a YouTube crawler designed to figure out network information out of YouTube video download.

Usage

./start_crawl.py [-r max_rounds] [-u max_crawled_url] [-p max_per_url] [-P max_per_page] [-t time_frame] [-n ping_packets] [-D download_time] [-B buffering_video_duration] [-M min_playout_buffer_size] [-x] [-L log_level]:

Options:
  -h, --help            show this help message and exit
  -r MAX_ROUNDS         Max number of rounds to perform (default 50)
  -u MAX_CRAWLED_URL    Max number of urls to visit (default 50000)
  -p MAX_PER_URL        Max number of related urls from each page (default 2)
  -P MAX_PER_PAGE       Max number of related videos from each page (default
                        30)
  -t TIME_FRAME         Timeframe for the most popular videos to fetch at
                        start of crawl put 'today', 'week', 'month' or
                        'all_time' (default 'week')
  -n PING_PACKETS       Number of packets to be sent for each ping (default 3)
  -D DOWNLOAD_TIME      Download time for the video in seconds (default 30.000000)
  -B BUFFERING_VIDEO_DURATION
                        Buffering video duration in seconds (default 3.000000)
  -M MIN_PLAYOUT_BUFFER_SIZE
                        Minimum Playout Buffer Size in seconds (default 1.000000)
  -x                    Do NOT store public IP address of the machine in the
                        logs
  -L LOG_LEVEL          The log level setting for the Logging module.Choose
                        from: 'DEBUG', 'INFO', 'WARNING', 'ERROR' and
                        'CRITICAL' (default 'DEBUG')
  --http-proxy=PROXIES  in case of http proxy to reach Internet (default None)

Installation-free

In order to provide installation-free package, we provide binary executables for Linux (32 and 64bits), Windows, and Mac OS X. The binaries files were generated with Pyinstaller (version 1.5RC1).

If you have Python installed, you can directly run the start_crawl.py script at root or the pytomo script in bin directory.

External Resources

We based the lib_youtube_download on YouTube Download script: we simplified it at most and include only the classes we needed (and only YouTube video retrieval).

The dns module is taken from the DNS Python Package: we just modified rdata so that Pyinstaller include all needed modules.

The extraction of metadata out of video files is an adaptation of Kaa Metadata Python Package: it has been modified in order to be independent of Kaa-base (thus pure Python and portable).

Project details


Release history Release notifications | RSS feed

This version

1.0.8

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page