Skip to main content

A library for efficiently loading data into Python

Project description

Source: https://github.com/stestagg/pytubes

Pytubes is a library that optimizes loading datasets into memory.

At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.

Simple Example

>>> from tubes import Each
>>> import glob
>>> tube = (Each(glob.glob("*.json"))   # Iterate over some filenames
        .read_files()                   # Read each file, chunk by chunk
        .split()                        # Split the file, line-by-line
        .json()                         # parse json
        .get('country_code', 'null'))   # extract field named 'country_code'
>>> set(tube)                           # collect results in a set
{'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}

More Complex Example

>>> from tubes import Each
>>> import glob
>>> x = (Each(glob.glob('*.jsonz'))
        .map_files()
        .gunzip()
        .split(b'\n')
        .json()
        .enumerate()
        .skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB'))
        .multi(lambda x: (
            x.slot(0),
            x.slot(1).get('timestamp', 'null'),
            x.slot(1).get('country_code', 'null'),
            x.slot(1).get('url', 'null'),
            x.slot(1).get('file', '{}').get('filename', 'null'),
            x.slot(1).get('file', '{}').get('project'),
            x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'),
            x.slot(1).get('details', '{}').get('python', 'null'),
            x.slot(1).get('details', '{}').get('system', 'null'),
            x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'),
            x.slot(1).get('details', '{}').get('cpu', 'null'),
            x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'),
            x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'),
        ))
    )
>>> print(list(x)[-3])
(15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytubes-0.8.1.tar.gz (5.3 MB view details)

Uploaded Source

Built Distributions

pytubes-0.8.1-cp38-cp38-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.8 Windows x86-64

pytubes-0.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

pytubes-0.8.1-cp38-cp38-macosx_10_14_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

pytubes-0.8.1-cp37-cp37m-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.7m Windows x86-64

pytubes-0.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.4 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

pytubes-0.8.1-cp37-cp37m-macosx_10_14_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.7m macOS 10.14+ x86-64

File details

Details for the file pytubes-0.8.1.tar.gz.

File metadata

  • Download URL: pytubes-0.8.1.tar.gz
  • Upload date:
  • Size: 5.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pytubes-0.8.1.tar.gz
Algorithm Hash digest
SHA256 202752d29875a574145efe91f7a505a274a678aaeff7ac532de6cf013fb8846e
MD5 5e4dc5750aad5e710278c248d2caa58d
BLAKE2b-256 f834282e016959a5e2cf87d269c9cb1e683f6e9732cb3832572068777a1ff87f

See more details on using hashes here.

File details

Details for the file pytubes-0.8.1-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pytubes-0.8.1-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pytubes-0.8.1-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 87fbe177811b53996506fb8ae941efee5685000964fc82b34f1a1f0a33c15c17
MD5 b7f43a27daa036dc44bbb3b8b22d3e15
BLAKE2b-256 9625f24c5c7b1e35bb6f3f14e9a317506209882dc0a934c96197d5f417d45e96

See more details on using hashes here.

File details

Details for the file pytubes-0.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pytubes-0.8.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3547945795b4ad0e13170586b8178a8b587f92067a68e66510a8a693c84c0b2f
MD5 533d131ded9819d0507f57a8193cf495
BLAKE2b-256 ed3fe6a8d212dffa502155e92f3d3ab83b6dd0cc799b160e0352b5675040face

See more details on using hashes here.

File details

Details for the file pytubes-0.8.1-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: pytubes-0.8.1-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pytubes-0.8.1-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 38aa884389c53a99bacd64acbc32ea64653475d51752e0f883d79e32b9fa4874
MD5 fa8d420aabae20be80e60500d4e508f0
BLAKE2b-256 53de3cbf28766f069caf148187b64e0c234087c077f3286ab014822d7f90de4f

See more details on using hashes here.

File details

Details for the file pytubes-0.8.1-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pytubes-0.8.1-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pytubes-0.8.1-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 175ebb019d216f90583d79e1898b9ed4cbf68d2c419db2d4dd388616ae2db934
MD5 15695e9c65cc0309d19c1b717e89a034
BLAKE2b-256 64a82c1119d2e1c011dc452f7420dbe9b4b18dead2f0786d2e1395b8a6a7e915

See more details on using hashes here.

File details

Details for the file pytubes-0.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pytubes-0.8.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 47c600bb4d4815c1591124d38b0f6f1aef0530710004d0efff974d65e48857c7
MD5 c2f1b06ff2f85361b59d2259f28114bc
BLAKE2b-256 0a3842840629d33b35bd5e046def71c55ad14df4a6c80050269e761950811f8f

See more details on using hashes here.

File details

Details for the file pytubes-0.8.1-cp37-cp37m-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: pytubes-0.8.1-cp37-cp37m-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 4.3 MB
  • Tags: CPython 3.7m, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for pytubes-0.8.1-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 820912f75088f376f0be073d09904abff18b278582a46f583c77866099a0ddcd
MD5 2de71b67f21f466b1e711a30c9e3bd14
BLAKE2b-256 c5db0ec298ee75e42c3c5ce9b27552adbec87b5ae4a43db05d38fd9cb707e674

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page