Skip to main content

Generate Pandas data frames, load and extract data, based on JSON Table Schema descriptors.

Project description

Travis
Coveralls
PyPi
SemVer
Gitter

Generate and load Pandas data frames based on JSON Table Schema descriptors.

Version v0.2 contains breaking changes:
  • removed Storage(prefix=) argument (was a stub)
  • renamed Storage(tables=) to Storage(dataframes=)
  • renamed Storage.tables to Storage.buckets
  • changed Storage.read to read into memory
  • added Storage.iter to yield row by row

Getting Started

Installation

$ pip install datapackage
$ pip install jsontableschema-pandas

Example

You can easily load resources from a data package as Pandas data frames by simply using datapackage.push_datapackage function:

>>> import datapackage

>>> data_url = 'http://data.okfn.org/data/core/country-list/datapackage.json'
>>> storage = datapackage.push_datapackage(data_url, 'pandas')

>>> storage.buckets
['data___data']

>>> type(storage['data___data'])
<class 'pandas.core.frame.DataFrame'>

>>> storage['data___data'].head()
             Name Code
0     Afghanistan   AF
1   Åland Islands   AX
2         Albania   AL
3         Algeria   DZ
4  American Samoa   AS

Also it is possible to pull your existing data frame into a data package:

>>> datapackage.pull_datapackage('/tmp/datapackage.json', 'country_list', 'pandas', tables={
...     'data': storage['data___data'],
... })
Storage

Storage

Package implements Tabular Storage interface.

We can get storage this way:

>>> from jsontableschema_pandas import Storage

>>> storage = Storage()

Storage works as a container for Pandas data frames. You can define new data frame inside storage using storage.create method:

>>> storage.create('data', {
...     'primaryKey': 'id',
...     'fields': [
...         {'name': 'id', 'type': 'integer'},
...         {'name': 'comment', 'type': 'string'},
...     ]
... })

>>> storage.buckets
['data']

>>> storage['data'].shape
(0, 0)

Use storage.write to populate data frame with data:

>>> storage.write('data', [(1, 'a'), (2, 'b')])

>>> storage['data']
id comment
1        a
2        b

Also you can use tabulator to populate data frame from external data file:

>>> import tabulator

>>> with tabulator.Stream('data/comments.csv', headers=1) as stream:
...     storage.write('data', stream)

>>> storage['data']
id comment
1        a
2        b
1     good

As you see, subsequent writes simply appends new data on top of existing ones.

Contributing

Please read the contribution guideline:

How to Contribute

Thanks!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for jsontableschema-pandas, version 0.5.0
Filename, size File type Python version Upload date Hashes
Filename, size jsontableschema_pandas-0.5.0-py2.py3-none-any.whl (9.1 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size jsontableschema-pandas-0.5.0.tar.gz (9.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page