A library for efficiently loading data into Python
Project description
Source: https://github.com/stestagg/pytubes
Pytubes is a library that optimizes loading datasets into memory.
At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.
Simple Example
>>> from tubes import Each >>> import glob >>> tube = (Each(glob.glob("*.json")) # Iterate over some filenames .read_files() # Read each file, chunk by chunk .split() # Split the file, line-by-line .json() # parse json .get('country_code', 'null')) # extract field named 'country_code' >>> set(tube) # collect results in a set {'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}
More Complex Example
>>> from tubes import Each >>> import glob
>>> x = (Each(glob.glob('*.jsonz')) .map_files() .gunzip() .split(b'\n') .json() .enumerate() .skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB')) .multi(lambda x: ( x.slot(0), x.slot(1).get('timestamp', 'null'), x.slot(1).get('country_code', 'null'), x.slot(1).get('url', 'null'), x.slot(1).get('file', '{}').get('filename', 'null'), x.slot(1).get('file', '{}').get('project'), x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('python', 'null'), x.slot(1).get('details', '{}').get('system', 'null'), x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('cpu', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'), )) ) >>> print(list(x)[-3]) (15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file pytubes-0.8.1.tar.gz
.
File metadata
- Download URL: pytubes-0.8.1.tar.gz
- Upload date:
- Size: 5.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 202752d29875a574145efe91f7a505a274a678aaeff7ac532de6cf013fb8846e |
|
MD5 | 5e4dc5750aad5e710278c248d2caa58d |
|
BLAKE2b-256 | f834282e016959a5e2cf87d269c9cb1e683f6e9732cb3832572068777a1ff87f |
File details
Details for the file pytubes-0.8.1-cp38-cp38-win_amd64.whl
.
File metadata
- Download URL: pytubes-0.8.1-cp38-cp38-win_amd64.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.8, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 87fbe177811b53996506fb8ae941efee5685000964fc82b34f1a1f0a33c15c17 |
|
MD5 | b7f43a27daa036dc44bbb3b8b22d3e15 |
|
BLAKE2b-256 | 9625f24c5c7b1e35bb6f3f14e9a317506209882dc0a934c96197d5f417d45e96 |
File details
Details for the file pytubes-0.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pytubes-0.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 8.5 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3547945795b4ad0e13170586b8178a8b587f92067a68e66510a8a693c84c0b2f |
|
MD5 | 533d131ded9819d0507f57a8193cf495 |
|
BLAKE2b-256 | ed3fe6a8d212dffa502155e92f3d3ab83b6dd0cc799b160e0352b5675040face |
File details
Details for the file pytubes-0.8.1-cp38-cp38-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: pytubes-0.8.1-cp38-cp38-macosx_10_14_x86_64.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.8, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 38aa884389c53a99bacd64acbc32ea64653475d51752e0f883d79e32b9fa4874 |
|
MD5 | fa8d420aabae20be80e60500d4e508f0 |
|
BLAKE2b-256 | 53de3cbf28766f069caf148187b64e0c234087c077f3286ab014822d7f90de4f |
File details
Details for the file pytubes-0.8.1-cp37-cp37m-win_amd64.whl
.
File metadata
- Download URL: pytubes-0.8.1-cp37-cp37m-win_amd64.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.7m, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 175ebb019d216f90583d79e1898b9ed4cbf68d2c419db2d4dd388616ae2db934 |
|
MD5 | 15695e9c65cc0309d19c1b717e89a034 |
|
BLAKE2b-256 | 64a82c1119d2e1c011dc452f7420dbe9b4b18dead2f0786d2e1395b8a6a7e915 |
File details
Details for the file pytubes-0.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pytubes-0.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 8.4 MB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47c600bb4d4815c1591124d38b0f6f1aef0530710004d0efff974d65e48857c7 |
|
MD5 | c2f1b06ff2f85361b59d2259f28114bc |
|
BLAKE2b-256 | 0a3842840629d33b35bd5e046def71c55ad14df4a6c80050269e761950811f8f |
File details
Details for the file pytubes-0.8.1-cp37-cp37m-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: pytubes-0.8.1-cp37-cp37m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.7m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 820912f75088f376f0be073d09904abff18b278582a46f583c77866099a0ddcd |
|
MD5 | 2de71b67f21f466b1e711a30c9e3bd14 |
|
BLAKE2b-256 | c5db0ec298ee75e42c3c5ce9b27552adbec87b5ae4a43db05d38fd9cb707e674 |