A library for efficiently loading data into Python
Project description
Source: https://github.com/stestagg/pytubes
Pytubes is a library that optimizes loading datasets into memory.
At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.
Simple Example
>>> from tubes import Each >>> import glob >>> tube = (Each(glob.glob("*.json")) # Iterate over some filenames .read_files() # Read each file, chunk by chunk .split() # Split the file, line-by-line .json() # parse json .get('country_code', 'null')) # extract field named 'country_code' >>> set(tube) # collect results in a set {'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}
More Complex Example
>>> from tubes import Each >>> import glob
>>> x = (Each(glob.glob('*.jsonz')) .map_files() .gunzip() .split(b'\n') .json() .enumerate() .skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB')) .multi(lambda x: ( x.slot(0), x.slot(1).get('timestamp', 'null'), x.slot(1).get('country_code', 'null'), x.slot(1).get('url', 'null'), x.slot(1).get('file', '{}').get('filename', 'null'), x.slot(1).get('file', '{}').get('project'), x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('python', 'null'), x.slot(1).get('details', '{}').get('system', 'null'), x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('cpu', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'), )) ) >>> print(list(x)[-3]) (15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytubes-0.8.1.tar.gz
(5.3 MB
view hashes)
Built Distributions
Close
Hashes for pytubes-0.8.1-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87fbe177811b53996506fb8ae941efee5685000964fc82b34f1a1f0a33c15c17 |
|
MD5 | b7f43a27daa036dc44bbb3b8b22d3e15 |
|
BLAKE2b-256 | 9625f24c5c7b1e35bb6f3f14e9a317506209882dc0a934c96197d5f417d45e96 |
Close
Hashes for pytubes-0.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3547945795b4ad0e13170586b8178a8b587f92067a68e66510a8a693c84c0b2f |
|
MD5 | 533d131ded9819d0507f57a8193cf495 |
|
BLAKE2b-256 | ed3fe6a8d212dffa502155e92f3d3ab83b6dd0cc799b160e0352b5675040face |
Close
Hashes for pytubes-0.8.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38aa884389c53a99bacd64acbc32ea64653475d51752e0f883d79e32b9fa4874 |
|
MD5 | fa8d420aabae20be80e60500d4e508f0 |
|
BLAKE2b-256 | 53de3cbf28766f069caf148187b64e0c234087c077f3286ab014822d7f90de4f |
Close
Hashes for pytubes-0.8.1-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 175ebb019d216f90583d79e1898b9ed4cbf68d2c419db2d4dd388616ae2db934 |
|
MD5 | 15695e9c65cc0309d19c1b717e89a034 |
|
BLAKE2b-256 | 64a82c1119d2e1c011dc452f7420dbe9b4b18dead2f0786d2e1395b8a6a7e915 |
Close
Hashes for pytubes-0.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47c600bb4d4815c1591124d38b0f6f1aef0530710004d0efff974d65e48857c7 |
|
MD5 | c2f1b06ff2f85361b59d2259f28114bc |
|
BLAKE2b-256 | 0a3842840629d33b35bd5e046def71c55ad14df4a6c80050269e761950811f8f |
Close
Hashes for pytubes-0.8.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 820912f75088f376f0be073d09904abff18b278582a46f583c77866099a0ddcd |
|
MD5 | 2de71b67f21f466b1e711a30c9e3bd14 |
|
BLAKE2b-256 | c5db0ec298ee75e42c3c5ce9b27552adbec87b5ae4a43db05d38fd9cb707e674 |