ctdl · PyPI

Bulk file downloader on any topic.

These details have not been verified by PyPI

Project links

Homepage

Project description

content-downloader

content-downloader a.k.a ctdl is a python package with command line utility and desktop GUI to download files on any topic in bulk!

Features

ctdl can be used as a command line utility as well as a desktop GUI.
ctdl fetches file links related to a search query from Google Search.
Files can be downloaded parallely using multithreading.
ctdl is Python 2 as well as Python 3 compatible.

Installation

To install content-downloader, simply,

$ pip install ctdl

There seem to be some issues with parallel progress bars in tqdm which have been resolved in this pull. Until this pull is merged, please use my patch by running this command:

$ pip install -U git+https://github.com/nikhilkumarsingh/tqdm

Desktop GUI usage

To use ctdl desktop GUI, open terminal and run this command:

$ ctdl-gui

Command line usage

$ ctdl [-h] [-f FILE_TYPE] [-l LIMIT] [-d DIRECTORY] [-p] [-a] [-t]
       [-minfs MIN_FILE_SIZE] [-maxfs MAX_FILE_SIZE] [-nr]
       [query]

Optional arguments are:

-f FILE_TYPE : set the file type. (can take values like ppt, pdf, xml, etc.)
```
Default value: pdf
```
-l LIMIT : specify the number of files to download.
```
Default value: 10
```

-d DIRECTORY : specify the directory where files will be stored.

Default: A directory with same name as the search query in the current directory.

-p : for parallel downloading.
-minfs MIN_FILE_SIZE : specify minimum file size to download in Kilobytes (KB).
```
Default: 0
```
-maxfs MAX_FILE_SIZE : specify maximum file size to download in Kilobytes (KB).
```
Default: -1 (represents no maximum file size)
```
-nr : prevent download redirects.
```
Default: False
```

Examples

To get list of available filetypes:

$ ctdl -a

To get list of potential high threat filetypes:

$ ctdl -t

To download pdf files on topic ‘python’:

$ ctdl python This is the default behaviour which will download 10 pdf files in a folder named ‘python’ in current directory.

To download 3 ppt files on ‘health’:

$ ctdl -f ppt -l 3 health

To explicitly specify download folder:

$ ctdl -d /home/nikhil/Desktop/ml-pdfs machine-learning

To download files parallely: $ ctdl -f pdf -p python
To search for and download in parallel 10 files in PDF format containing the text “python” and “algorithm”, without allowing any url redirects, and where the file size is between 10,000 KB (10 MB) and 100,000KB (100 MB), where KB means Kilobytes, which has an equivalent value expressed in Megabytes: $ ctdl -f pdf -l 10 -minfs 10000 -maxfs 100000 -nr -p "python algorithm"

Usage in Python files

from ctdl import ctdl

ctdl.download_content(
file_type = 'ppt',
limit = 5,
directory = '/home/nikhil/Desktop/ml-pdfs',
query = 'machine learning using python')

TODO

[X] Prompt user before downloading potentially threatful files
[X] Create ctdl GUI
[ ] Implement unit testing
[ ] Use DuckDuckgo API as an option

Want to contribute?

Clone the repository

$ git clone http://github.com/nikhilkumarsingh/content-downloader

Install dependencies $ pip install -r requirements.txt

Note: There seem to be some issues with current version of tqdm. If you do not get expected progress bar behaviour, try this patch:

$ pip uninstall tqdm $ pip install git+https://github.com/nikhilkumarsingh/tqdm

In ctdl/ctdl.py, remove the . prefix from .downloader and .utils for the following imports, so it changes from: python from .downloader import download_series, download_parallel from .utils import FILE_EXTENSIONS, THREAT_EXTENSIONS to: python from downloader import download_series, download_parallel from utils import FILE_EXTENSIONS, THREAT_EXTENSIONS
Run the python file directly python ctdl/ctdl.py ___ (instead of with ctdl ___)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.5.0

Jun 7, 2017

1.4.6

May 28, 2017

1.4.5

May 26, 2017

1.4.4

May 26, 2017

1.4.3

May 26, 2017

1.4.2

May 23, 2017

1.4.1

May 23, 2017

1.4

May 23, 2017

1.3

May 22, 2017

1.2

May 21, 2017

1.1

May 21, 2017

1.0.1

May 21, 2017

1.0.0

May 21, 2017

1.0

May 21, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ctdl-1.5.0.tar.gz (226.6 kB view details)

Uploaded Jun 7, 2017 Source

File details

Details for the file ctdl-1.5.0.tar.gz.

File metadata

Download URL: ctdl-1.5.0.tar.gz
Upload date: Jun 7, 2017
Size: 226.6 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for ctdl-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`1709abdcd1c20e8fdec99dab8945b3447890905e8d25c8f9eb946d727cee28ce`
MD5	`1784218c762e776ca790a4446bd52ba2`
BLAKE2b-256	`8a6764fe6758103e97287be924b57054bc98d780fec5b47b156fcbeb1f8ce633`

See more details on using hashes here.

ctdl 1.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

content-downloader

Features

Installation

Desktop GUI usage

Command line usage

Examples

Usage in Python files

TODO

Want to contribute?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes