A library for efficiently loading data into Python
Project description
Source: https://github.com/stestagg/pytubes
Pytubes is a library that optimizes loading datasets into memory.
At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.
Simple Example
>>> from tubes import Each >>> import glob >>> tube = (Each(glob.glob("*.json")) # Iterate over some filenames .read_files() # Read each file, chunk by chunk .split() # Split the file, line-by-line .json() # parse json .get('country_code', 'null')) # extract field named 'country_code' >>> set(tube) # collect results in a set {'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}
More Complex Example
>>> from tubes import Each >>> import glob
>>> x = (Each(glob.glob('*.jsonz')) .map_files() .gunzip() .split(b'\n') .json() .enumerate() .skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB')) .multi(lambda x: ( x.slot(0), x.slot(1).get('timestamp', 'null'), x.slot(1).get('country_code', 'null'), x.slot(1).get('url', 'null'), x.slot(1).get('file', '{}').get('filename', 'null'), x.slot(1).get('file', '{}').get('project'), x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('python', 'null'), x.slot(1).get('details', '{}').get('system', 'null'), x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('cpu', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'), )) ) >>> print(list(x)[-3]) (15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytubes-0.7.4.tar.gz
(5.3 MB
view hashes)
Built Distributions
Close
Hashes for pytubes-0.7.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 388b4d08c553206ab5470535f0a88a7b33eb1404546dfbb03fe04c8d943ad799 |
|
MD5 | 742409435b5fee76324b153458c39e64 |
|
BLAKE2b-256 | c6e967ee98ebd440c3b9d5e2ca8582b1457d73f6a268c1a60939143769467607 |
Close
Hashes for pytubes-0.7.4-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9a97ca0fd6d0205f6e12fa9c5f30a9437c91519f36f8d96ab0230b371e66f142 |
|
MD5 | b3765342dfbe82bd4c225678785027df |
|
BLAKE2b-256 | 51190465f686d0eb93db617fbc943b8271ef8bd32ddf2b1937ad24a4acc998d6 |
Close
Hashes for pytubes-0.7.4-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84a68de31e457a4578e592a720d7e3a2cd70a787bdfb5f2a4aed48f6290367a5 |
|
MD5 | 8bccce3b656c82e1703ec727429c8cc7 |
|
BLAKE2b-256 | ffdfcc535e0cd0a5fd266faf587c4c657dd9d28b8e822cb55573ff9b889c2075 |
Close
Hashes for pytubes-0.7.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42567a69e85a083c179d86de059957c57539e64a6238e732d413ad4111dc2862 |
|
MD5 | ebdb2ae4b70cb8a7978877f343cc5874 |
|
BLAKE2b-256 | eef94b622d2665bb76aca93b84053656ec8bed63a665bad0f0b2d29041426e63 |
Close
Hashes for pytubes-0.7.4-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9289faa6d4e9f604264a97053b0188e09aa59622d1b5fd1bc1dbf3b3a348d843 |
|
MD5 | a3e609acae94a5bd0c71980d5a5dd1a8 |
|
BLAKE2b-256 | 6a510c1d8240e6426239c64a4a60fb5d7d5aff8b334c6f12ef7368e837d6de09 |
Close
Hashes for pytubes-0.7.4-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79a345313f55b4540595be18161c4d0ad9c2ca4dee62073876deacacc9c2377d |
|
MD5 | 929136548bf0b1a91da91d3a94cfac93 |
|
BLAKE2b-256 | a432ddb983be8f3a5e5640aa7dabe5cf4ff209d5a877534538cd168ddd448ec2 |