Lightweight python package that allows the download of thousands of files concurrently.
Project description
Downloader light
Lightweight python library that let's you download and process files concurrently.
This package was developed to allows serverless deployment.
Dependencies
Installed automatically with pip
- requests
- pysftp
Installation
pip install downloader-light
Usage examples
Download and upload files to AWS S3 For this to work, AWS CLI must be configured
from blackfeed.downloader import Downloader
from blackfeed.adapter.s3 import S3Adapter
queue = [
{
'url': 'https://www.example.com/path/to/image.jpg', # Required
'destination': 'some/key/image.jpg' # S3 key - Required
},{
'url': 'https://www.example.com/path/to/image2.jpg',
'destination': 'some/key/image2.jpg'
}
]
downloader = Downloader(
S3Adapter(bucket='bucketname'),
multi=True, # If true, uploads files to images to S3 with multithreading
stateless=False # If set to False, it generates and stores md5 hashes of files in a file
state_id='flux_states' # name of the file where hashes will be stored (states.txt) not required
bulksize=200 # Number of concurrent downloads
)
downloader.process(queue)
stats = downloader.get_stats() # Returns a dict with information about the process
Download files with states
Loading states can be useful if you don't want to re-download the same file twice.
from blackfeed.downloader import Downloader
from blackfeed.adapter.s3 import S3Adapter
queue = [
...
]
downloader = Downloader(
S3Adapter(bucket='bucketname'),
multi=True,
stateless=False,
state_id='filename'
)
# You can add a callback function if needed
# This function will be called after each bulk is processed
def callback(responses):
# response: {
# 'destination': destination of the file can be local or can be S3 key,
# 'url': URL from where the file was downloaded,
# 'httpcode': HTTP code returned by the server,
# 'status': True|False,
# 'content-type': Mime type of the downloaded resource Example: image/jpeg
# }
# responses: response[]
pass # Your logic
downloader.set_callback(callback)
downloader.load_states('filename') # This will load states from "filename.txt"
downloader.process(queue)
stats = downloader.get_stats() # Statistics
ElasticDownloader
Let's you to download/retrieve files from FTP, SFTP and HTTP/S servers easily.
Examples
Downloading file from FTP
from blackfeed.elasticdownloader import ElasticDownloader
uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'
retriever = ElasticDownloader()
res = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional
# .download() function returns False if there was an error or return the local path of the downloaded file if it was a success.
print(res)
/tmp/myfile.csv
Retrieving binary content of file from FTP
from blackfeed.elasticdownloader import ElasticDownloader
uri = 'ftp://user:password@ftp.server.com/path/to/file.csv'
retriever = ElasticDownloader()
res = retriever.retrieve(uri) # Return type: io.BytesIO | False
with open('/tmp/myfile.csv', 'wb') as f:
f.write(res.getvalue())
ElasticDownloader can handle FTP, SFTP and HTTP URIs automatically. Use the method download to download file locally and use the retrieve method to get the binary content of a file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file downloader_light-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: downloader_light-0.0.2-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcdff4bf37184835a778a028b742d89162ac5d82b4cfc5ee9eed7961dc385be1 |
|
MD5 | 2f05a3b2cb7ebd86c6de3bbe9e1bf2ae |
|
BLAKE2b-256 | 2fcfc30800526b2eb73a4f35c2a1dd4b89a8e0b4c758c3df712542bcc043f16d |