A library for efficiently loading data into Python
Project description
pytubes
=======
.. image:: https://travis-ci.org/stestagg/pytubes.svg?branch=master
:target: https://travis-ci.org/stestagg/pytubes
.. image:: https://readthedocs.org/projects/pytubes/badge/
:target: https://pytubes.readthedocs.io/en/latest/
Source: https://github.com/stestagg/pytubes
Pytubes is a library that optimizes loading datasets into memory.
At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.
Simple Example
--------------
>>> from tubes import Each
>>> import glob
>>> tube = (Each(glob.glob("*.json")) # Iterate over some filenames
.read_files() # Read each file, chunk by chunk
.split() # Split the file, line-by-line
.json() # parse json
.get('country_code', 'null')) # extract field named 'country_code'
>>> set(tube) # collect results in a set
{'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}
More Complex Example
--------------------
>>> from tubes import Each
>>> import glob
>>> x = (Each(glob.glob('*.jsonz'))
.map_files()
.gunzip()
.split(b'\n')
.json()
.enumerate()
.skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB'))
.multi(lambda x: (
x.slot(0),
x.slot(1).get('timestamp', 'null'),
x.slot(1).get('country_code', 'null'),
x.slot(1).get('url', 'null'),
x.slot(1).get('file', '{}').get('filename', 'null'),
x.slot(1).get('file', '{}').get('project'),
x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'),
x.slot(1).get('details', '{}').get('python', 'null'),
x.slot(1).get('details', '{}').get('system', 'null'),
x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'),
x.slot(1).get('details', '{}').get('cpu', 'null'),
x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'),
x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'),
))
)
>>> print(list(x)[-3])
(15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')
Contents
--------
.. role:: noshow
:noshow:`If you're viewing this in github, please visit the docs at: https://pytubes.readthedocs.io/en/latest/`
.. toctree::
:maxdepth: 3
:name: mastertoc
intro_usage
tubes
performance
detail
=======
.. image:: https://travis-ci.org/stestagg/pytubes.svg?branch=master
:target: https://travis-ci.org/stestagg/pytubes
.. image:: https://readthedocs.org/projects/pytubes/badge/
:target: https://pytubes.readthedocs.io/en/latest/
Source: https://github.com/stestagg/pytubes
Pytubes is a library that optimizes loading datasets into memory.
At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.
Simple Example
--------------
>>> from tubes import Each
>>> import glob
>>> tube = (Each(glob.glob("*.json")) # Iterate over some filenames
.read_files() # Read each file, chunk by chunk
.split() # Split the file, line-by-line
.json() # parse json
.get('country_code', 'null')) # extract field named 'country_code'
>>> set(tube) # collect results in a set
{'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}
More Complex Example
--------------------
>>> from tubes import Each
>>> import glob
>>> x = (Each(glob.glob('*.jsonz'))
.map_files()
.gunzip()
.split(b'\n')
.json()
.enumerate()
.skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB'))
.multi(lambda x: (
x.slot(0),
x.slot(1).get('timestamp', 'null'),
x.slot(1).get('country_code', 'null'),
x.slot(1).get('url', 'null'),
x.slot(1).get('file', '{}').get('filename', 'null'),
x.slot(1).get('file', '{}').get('project'),
x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'),
x.slot(1).get('details', '{}').get('python', 'null'),
x.slot(1).get('details', '{}').get('system', 'null'),
x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'),
x.slot(1).get('details', '{}').get('cpu', 'null'),
x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'),
x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'),
))
)
>>> print(list(x)[-3])
(15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')
Contents
--------
.. role:: noshow
:noshow:`If you're viewing this in github, please visit the docs at: https://pytubes.readthedocs.io/en/latest/`
.. toctree::
:maxdepth: 3
:name: mastertoc
intro_usage
tubes
performance
detail
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pytubes-0.6.0.tar.gz
(672.2 kB
view hashes)
Built Distributions
Close
Hashes for pytubes-0.6.0-py3.6-linux-x86_64.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed11318f23b174509a82b26d86acfc6e625a84c6c42c863f5ad837145124b191 |
|
MD5 | 0e09283e0b98187f91e903007ea85816 |
|
BLAKE2b-256 | 6b4522c0f825a50b600bcc35a2d3b043869c8c87686b86b2630c0b710b9b8cbd |
Close
Hashes for pytubes-0.6.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f74ab30316ad220987623ec9a7d30515e773551c4f7bf802ee48e73d78fb556e |
|
MD5 | d94ceb67757570fcb5e0239afa4cde5b |
|
BLAKE2b-256 | 0d8c71188e88ae9320001f1feeda37180de1867cfd16160233020166a8383b54 |
Close
Hashes for pytubes-0.6.0-cp36-cp36m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a08aa870f994cd20b5475ec2fb4c1e3e6aab305409fd82531c3232653f7a197e |
|
MD5 | 6396f7616881437f236c01050c5c1d87 |
|
BLAKE2b-256 | aab20b63910eaeea934dc7cd9781e674b2973a37a1bade147e646a14afc140f6 |
Close
Hashes for pytubes-0.6.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 44d594765ba54358e038be464ce2f7d0db2a40fae7306e939142a03de5c2da97 |
|
MD5 | 7b2616118528263de426bfb1d487da02 |
|
BLAKE2b-256 | 8a5e71fac2bae658dc063177c0c22476ed7ffabe435d8820be02cc69dcf54322 |
Close
Hashes for pytubes-0.6.0-cp35-cp35m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f7c720cfdeed5f478e37dc4dcbab72dcffdf1bf65be57a68a091e0d3414de2c |
|
MD5 | 85475e7ab7d2e92ea02d4b938fafe691 |
|
BLAKE2b-256 | 4ca62e59562d196351d5ff3bffaf14685440f6f0a7d915be19211e33d470dca4 |
Close
Hashes for pytubes-0.6.0-cp34-cp34m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6fcb21f7f138a09794198dadff1e35bed23181f1c2fab57f3ae1788bafa222f |
|
MD5 | 1a571219653f03eb670cbdeda69c48a1 |
|
BLAKE2b-256 | 936b11e744e597c5bfcd436002694d898c75f3b95b7979fa544d7684b6be517f |
Close
Hashes for pytubes-0.6.0-cp34-cp34m-macosx_10_13_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e9fa1f3d4bd40a228e3e9932795c46b1140a3f0ce81c6713acbff11660b803d0 |
|
MD5 | 5f9fedabdcd4e93a24c96a84f1df53b0 |
|
BLAKE2b-256 | 1a920cfc8c83cd5b7e1a1a133a796030e04e845c4d804797c3b959f751f92990 |