A library for efficiently loading data into Python
Project description
Source: https://github.com/stestagg/pytubes
Pytubes is a library that optimizes loading datasets into memory.
At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.
Simple Example
>>> from tubes import Each >>> import glob >>> tube = (Each(glob.glob("*.json")) # Iterate over some filenames .read_files() # Read each file, chunk by chunk .split() # Split the file, line-by-line .json() # parse json .get('country_code', 'null')) # extract field named 'country_code' >>> set(tube) # collect results in a set {'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}
More Complex Example
>>> from tubes import Each >>> import glob
>>> x = (Each(glob.glob('*.jsonz')) .map_files() .gunzip() .split(b'\n') .json() .enumerate() .skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB')) .multi(lambda x: ( x.slot(0), x.slot(1).get('timestamp', 'null'), x.slot(1).get('country_code', 'null'), x.slot(1).get('url', 'null'), x.slot(1).get('file', '{}').get('filename', 'null'), x.slot(1).get('file', '{}').get('project'), x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('python', 'null'), x.slot(1).get('details', '{}').get('system', 'null'), x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('cpu', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'), )) ) >>> print(list(x)[-3]) (15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytubes-0.8.0.tar.gz
(5.3 MB
view hashes)
Built Distributions
Close
Hashes for pytubes-0.8.0-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f0d694b8664a9454ebe7d364d829d9d7fbc86df80ed7adf9db02ca4460d03155 |
|
MD5 | 1ff547ec47ec3d89f2a9bb9238c63bc3 |
|
BLAKE2b-256 | 2e779e26781205b65841c3dd64f9252f215e2357745bb4f4d8ffda0b3857911a |
Close
Hashes for pytubes-0.8.0-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8caac8ccd1de91c2239b89da73ae3d5c0e294dd40d57a53d55b90ce02bf1622 |
|
MD5 | f113fa4b65e362f3e4b1f3879473f95c |
|
BLAKE2b-256 | ed4952eba89b57bf234739b6c02079e157f060a3a3c58690091e96056accac2d |
Close
Hashes for pytubes-0.8.0-cp38-cp38-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da4980b705bdae80261f15e898def88708b7db2c97a1d5e7dc8ee603a99dc4c7 |
|
MD5 | be0e38e6268bb5ee01272479d8538b16 |
|
BLAKE2b-256 | 51bb646f66a11e3a416329f240ebeccc665d0da8bef7558a03de0822d4354ab4 |
Close
Hashes for pytubes-0.8.0-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49a3291e39ddeab3ad0ef8b4c4d32a7b1954bf1de59d98ae0e8edb440ec12d69 |
|
MD5 | caa3e627e636309bb8b13800176d339b |
|
BLAKE2b-256 | 0bc9e3c56e5ba5f54872247161d87523c95ef9f36c5e2e483e9bc1c825f1418b |
Close
Hashes for pytubes-0.8.0-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 847c0355390b1cb03892a6b1d9cd5a229f1380ceabe9794b7e565970f0f8d505 |
|
MD5 | c476886417ab40886f77cb5e56e102c0 |
|
BLAKE2b-256 | 7e4a32dac29012211906a27f2bad5c373cc256faa47e27dc42786d413a9779e2 |
Close
Hashes for pytubes-0.8.0-cp37-cp37m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e2b9bb5467c1ee4d3ee081719ab1473b5b25caa68042d1476b2e6953072816e |
|
MD5 | 290f1173553237596abd5d78e1963228 |
|
BLAKE2b-256 | e669007d9c85a7127ccbd197ccb4f2ae9485f17a517bf6e5b314685323f2b8d6 |