Skip to main content

pandas readers on steroids (remote files, glob patterns, cache, etc.)

Project description

Pypi-v Pypi-pyversions Pypi-l Pypi-wheel GitHub Actions codecov

Pea Kina aka 'Giant Panda'

Wrapper around pandas library, which detects separator, encoding and type of the file. It allows to get a group of files with a matching pattern (python or glob regex). It can read both local and remote files (HTTP/HTTPS, FTP/FTPS/SFTP or S3/S3N/S3A).

The supported file types are csv, excel, json, parquet and xml.

:information_source: If the desired type is not yet supported, feel free to open an issue or to directly open a PR with the code !

Please, read the documentation for more information

Installation

pip install peakina

Usage

Considering a file file.csv

a;b
0;0
0;1

Just type

>>> import peakina as pk
>>> pk.read_pandas('file.csv')
   a  b
0  0  0
1  0  1

Or files on a FTPS server:

  • my_data_2015.csv
  • my_data_2016.csv
  • my_data_2017.csv
  • my_data_2018.csv

You can just type

>>> pk.read_pandas('ftps://<path>/my_data_\\d{4}\\.csv$', match='regex', dtype={'a': 'str'})
    a   b     __filename__
0  '0'  0  'my_data_2015.csv'
1  '0'  1  'my_data_2015.csv'
2  '1'  0  'my_data_2016.csv'
3  '1'  1  'my_data_2016.csv'
4  '3'  0  'my_data_2017.csv'
5  '3'  1  'my_data_2017.csv'
6  '4'  0  'my_data_2018.csv'
7  '4'  1  'my_data_2018.csv'

Using cache

You may want to keep the last result in cache, to avoid downloading and extracting the file if it didn't change:

>>> from peakina.cache import Cache
>>> cache = Cache.get_cache('memory')  # in-memory cache
>>> df = pk.read_pandas('file.csv', expire=3600, cache=cache)

In this example, the resulting dataframe will be fetched from the cache, unless file.csv modification time has changed on disk, or unless the cache is older than 1 hour.

For persistent caching, use: cache = Cache.get_cache('hdf', cache_dir='/tmp')

Use only downloading feature

If you just want to download a file, without converting it to a pandas dataframe:

>>> uri = 'https://i.imgur.com/V9x88.jpg'
>>> f = pk.fetch(uri)
>>> f.get_str_mtime()
'2012-11-04T17:27:14Z'
>>> with f.open() as stream:
...     print('Image size:', len(stream.read()), 'bytes')
...
Image size: 60284 bytes

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

peakina-0.14.0.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

peakina-0.14.0-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file peakina-0.14.0.tar.gz.

File metadata

  • Download URL: peakina-0.14.0.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/6.5.0-1025-azure

File hashes

Hashes for peakina-0.14.0.tar.gz
Algorithm Hash digest
SHA256 90a040edcbd8353434745cd46b28b8f9f1e9630d07671de6b820902a5d1f8bf1
MD5 001d03422de5559fc33d476996e56d7d
BLAKE2b-256 9e040eb5fef227a2310b8cf4141a7bf2b96e6c63c5f1fb25cd4e22244002ce52

See more details on using hashes here.

File details

Details for the file peakina-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: peakina-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.10 Linux/6.5.0-1025-azure

File hashes

Hashes for peakina-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d760d06f5ac5781ed830686aaf03513dd15c72c78596cfa6dc8c0ecb04e4186b
MD5 460ab5dd1333826fa7de96346cc5a29c
BLAKE2b-256 e3049371cb7bb96052b77438eaed7839639c79296933c66f00ae50aea3d8788f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page