Skip to main content

Lightweight python package that allows the download of thousands of files concurrently.

Project description

Downloader light

Lightweight python library that let's you download and process files concurrently.
This package was developed to allows serverless deployment.

Dependencies

Installed automatically with pip

  • requests
  • pysftp

Installation

pip install downloader-light

Usage examples

Download and upload files to AWS S3 For this to work, AWS CLI must be configured

from blackfeed.downloader import Downloader
from blackfeed.adapter.s3 import S3Adapter

queue = [
    {
        'url': 'https://www.example.com/path/to/image.jpg', # Required
        'destination': 'some/key/image.jpg' # S3 key - Required
    },{
        'url': 'https://www.example.com/path/to/image2.jpg',
        'destination': 'some/key/image2.jpg'
    }
]

downloader = Downloader(
    S3Adapter(bucket='bucketname'),
    multi=True, # If true, uploads files to images to S3 with multithreading
    stateless=False # If set to False, it generates and stores md5 hashes of files in a file
    state_id='flux_states' # name of the file where hashes will be stored (states.txt) not required
    bulksize=200 # Number of concurrent downloads
)
downloader.process(queue)
stats = downloader.get_stats() # Returns a dict with information about the process

Download files with states

Loading states can be useful if you don't want to re-download the same file twice.

from blackfeed.downloader import Downloader
from blackfeed.adapter.s3 import S3Adapter

queue = [
...
]

downloader = Downloader(
    S3Adapter(bucket='bucketname'),
    multi=True,
    stateless=False,
    state_id='filename'
)

# You can add a callback function if needed
# This function will be called after each bulk is processed
def callback(responses):
    # response: {
    #    'destination': destination of the file can be local or can be S3 key,
    #    'url': URL from where the file was downloaded,
    #    'httpcode': HTTP code returned by the server,
    #    'status': True|False,
    #    'content-type': Mime type of the downloaded resource Example: image/jpeg
    # }
    # responses: response[]

    pass # Your logic

downloader.set_callback(callback)

downloader.load_states('filename') # This will load states from "filename.txt"
downloader.process(queue)
stats = downloader.get_stats() # Statistics

ElasticDownloader

Let's you to download/retrieve files from FTP, SFTP and HTTP/S servers easily.

Examples

Downloading file from FTP

from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader()
res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional
# .download() function returns False if there was an error or return the local path of the downloaded file if it was a success.
print(res)
/tmp/myfile.csv

Retrieving binary content of file from FTP

from blackfeed.elasticdownloader import ElasticDownloader

uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'

retriever = ElasticDownloader()
res = retriever.retrieve(uri) # Return type: io.BytesIO | False

with open('/tmp/myfile.csv', 'wb') as f:
    f.write(res.getvalue())

ElasticDownloader can handle FTP, SFTP and HTTP URIs automatically. Use the method download to download file locally and use the retrieve method to get the binary content of a file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

downloader_light-0.0.2-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file downloader_light-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: downloader_light-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.5

File hashes

Hashes for downloader_light-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dcdff4bf37184835a778a028b742d89162ac5d82b4cfc5ee9eed7961dc385be1
MD5 2f05a3b2cb7ebd86c6de3bbe9e1bf2ae
BLAKE2b-256 2fcfc30800526b2eb73a4f35c2a1dd4b89a8e0b4c758c3df712542bcc043f16d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page