Skip to main content

Tools for documentation-aware data reading, writing, and analysis

Project description

=======
MetaCSV
=======


.. image:: https://travis-ci.org/delgadom/metacsv.svg?branch=master
:target: https://travis-ci.org/delgadom/metacsv

.. image:: https://badge.fury.io/py/metacsv.svg
:target: https://badge.fury.io/py/metacsv

.. image:: https://coveralls.io/repos/github/delgadom/metacsv/badge.svg?branch=master
:target: https://coveralls.io/github/delgadom/metacsv?branch=master


``metacsv`` - Tools for documentation-aware data reading, writing, and analysis

See the full documentation at ReadTheDocs_

.. _ReadTheDocs: http://metacsv.rtfd.org

Overview
=========

Read in CSV data with a yaml-compliant header directly into
a ``pandas`` ``Series``, ``DataFrame``, or ``Panel`` or an ``xarray``
``DataArray`` or ``Dataset``.

Data specification
----------------------------

Data can be specified using a yaml-formatted header, with the doc-separation string
above and below the yaml block. Only one yaml block is allowed. If the doc-separation
string is not the first (non-whitespace) line in the file, all of the file's contents
will be interpreted by the csv reader. The yaml data can have arbitrary complexity.

.. code-block:: python

>>> import metacsv, io
>>> doc = io.StringIO('''
---
author: A Person
date: 2000-01-01
variables:
pop:
name: Population
unit: millions
gdp:
name: Product
unit: 2005 $Bn
---
region,year,pop,gdp
USA,2010,309.3,13599.3
USA,2011,311.7,13817.0
CAN,2010,34.0,1240.0
CAN,2011,34.3,1276.7
''')


Special attributes
~~~~~~~~~~~~~~~~~~~~~~~

The ``coords`` and ``variables`` attributes are keywords and are not simply passed
to the MetaCSV object's ``attrs`` attribute.

``variables`` describes columns in the resulting ``DataFrame`` or ``Data variables``
in the resulting ``xarray.Dataset``. Variables is not used when the CSV has only one
column and the argumetn ``squeeze=True`` is passed to ``read_csv``.

``coords`` describes indices in the resulting ``DataFrame``/``Series``, or
``Coordinates`` in the resulting ``xarray.Dataset/xarray.DataArray``. Coordinates
are categorical or independent variables which index the object's ``values``.



Using MetaCSV-formatted files in python
--------------------------------------------

Read MetaCSV-formatted data into python using pandas-like syntax:

.. code-block:: python

>>> metacsv.read_csv(doc, index_col=[0,1])
>>> df
<metacsv.core.containers.DataFrame (4, 2)>
pop gdp
region year
USA 2010 309.3 13599.3
2011 311.7 13817.0
CAN 2010 34.0 1240.0
2011 34.3 1276.7

Coordinates
* region (region) object CAN, USA
* year (year) int64 2010, 2011
Variables
pop
gdp
Attributes
date: 2000-01-01
author: A Person

Exporting MetaCSV data to other formats
-----------------------------------------------

CSV
~~~~~~~~~

A MetaCSV ``Series`` or ``DataFrame`` can be written as a yaml-prefixed CSV using
the same ``to_csv`` syntax as it's ``pandas`` counterpart:

.. code-block:: python

>>> df.attrs['new attribute'] = 'changed in python!'
>>> # includes changes to data, attributes, variables, and coordinates
... df.to_csv('my_new_data.csv')




pandas
~~~~~~~~~~~~~~~

The coordinates and MetaCSV attributes can be easily stripped from a MetaCSV Container:

.. code-block:: python

>>> df.to_pandas()
pop gdp
region year
USA 2010 309.3 13599.3
2011 311.7 13817.0
CAN 2010 34.0 1240.0
2011 34.3 1276.7



xarray/netCDF
~~~~~~~~~~~~~~~

``xarray`` provides a pandas-like interface to operating on indexed ``ndarray`` data. It
is modeled on the ``netCDF`` data storage format used frequently in climate science, but
is useful for many applications with higher-order data.


.. code-block:: python

>>> ds = df.to_xarray()
>>> ds
<xarray.Dataset>
Dimensions: (region: 2, year: 2)
Coordinates:
* region (region) object 'USA' 'CAN'
* year (year) int64 2010 2011
Data variables:
pop (region, year) float64 309.3 311.7 34.0 34.3
gdp (region, year) float64 1.36e+04 1.382e+04 1.24e+03 1.277e+03
Attributes:
date: 2000-01-01
author: A Person
>>> ds.to_netcdf('my_netcdf_data.nc')

Others
~~~~~~~~~

Currently, MetaCSV only supports conversion back to CSV and to
netCDF through the ``xarray`` module. However, feel free to suggest
additional features and to contribute your own!


TODO
============

* Make ``coords`` and ``attrs`` persistent across slicing operations
(try ``df['pop'].to_xarray()`` from above example and watch it
fail...)

* Improve hooks between ``pandas`` and ``metacsv``:

- update ``coord`` names on ``df.index.names`` assignment
- update ``coords`` on stack/unstack
- update ``coords`` on

* Handle attributes indexed by coord/variable names --> assign to
coord/variable-specific ``attrs``

* Let's start an issue tracker and get rid of this section!

* Should we rethink "special attributes," e.g. coords? Maybe these should
have some special prefix like ``_coords`` when included in yaml headers to
avoid confusion with other generic attributes...

* Allow special attributes (``coords``, ``variables``) in ``read_csv`` call

* Allow external file headers

* Write tests

* Write documentation

* Maybe steal xarray's coordinate handling and save ourselves a whole lotta work?


Feature Requests
==================
* Create syntax for ``multi-csv`` --> ``Panel`` or combining using filename regex
* Eventually? allow for on-disk manipulation of many/large files with dask/xarray
* Eventually? add xml, SQL, other structured syntax language conversions



============== ==========================================================
Python support Python 2.7, >= 3.3
Source https://github.com/delgadom/metacsv
Docs http://metacsv.rtfd.org
Changelog http://metacsv.readthedocs.org/en/latest/history.html
API http://metacsv.readthedocs.org/en/latest/api.html
Issues https://github.com/delgadom/metacsv/issues
Travis http://travis-ci.org/delgadom/metacsv
Test coverage https://coveralls.io/r/delgadom/metacsv
pypi https://pypi.python.org/pypi/metacsv
Ohloh https://www.ohloh.net/p/metacsv
License `BSD`_.
git repo .. code-block:: bash

$ git clone https://github.com/delgadom/metacsv.git
install dev .. code-block:: bash

$ git clone https://github.com/delgadom/metacsv.git metacsv
$ cd ./metacsv
$ virtualenv .env
$ source .env/bin/activate
$ pip install -e .
tests .. code-block:: bash

$ python setup.py test
============== ==========================================================

.. _BSD: http://opensource.org/licenses/BSD-3-Clause
.. _Documentation: http://metacsv.readthedocs.org/en/latest/
.. _API: http://metacsv.readthedocs.org/en/latest/api.html


=========
Changelog
=========

Here you can find the recent changes to MetaCSV..

.. changelog::
:version: dev
:released: Ongoing

.. change::
:tags: docs

Updated CHANGES.

.. changelog::
:version: 0.0.1
:released: 2016-05-04

.. change::
:tags: project

First release on PyPi.

.. todo:: vim: set filetype=rst:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

MetaCSV-0.0.2-py2.py3-none-any.whl (20.3 kB view details)

Uploaded Python 2 Python 3

MetaCSV-0.0.2-py2.7.egg (35.6 kB view details)

Uploaded Source

MetaCSV-0.0.2-py2-none-any.whl (20.3 kB view details)

Uploaded Python 2

File details

Details for the file MetaCSV-0.0.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for MetaCSV-0.0.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6dd5ff26573f7ee8c4fef66c425bc30fe1bd602c43464e94a292afd7ef1c01cd
MD5 c25f5fb5375a367666948230d362aff4
BLAKE2b-256 9aea35c5830cbe54ac280489a37e7aa8a88ba19e0c47a46a6154f93ff2662c1a

See more details on using hashes here.

File details

Details for the file MetaCSV-0.0.2-py2.7.egg.

File metadata

  • Download URL: MetaCSV-0.0.2-py2.7.egg
  • Upload date:
  • Size: 35.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for MetaCSV-0.0.2-py2.7.egg
Algorithm Hash digest
SHA256 b84ccd04d5586127a6a748cba6914b420195bd0f22ddc71adaa749c1782135c4
MD5 42a8154e8dd790c3aa82e2f508baa1b2
BLAKE2b-256 03317487764cdb10a44d0691fce76cc6423a97d98b03d656d7d64870ab328153

See more details on using hashes here.

File details

Details for the file MetaCSV-0.0.2-py2-none-any.whl.

File metadata

File hashes

Hashes for MetaCSV-0.0.2-py2-none-any.whl
Algorithm Hash digest
SHA256 74f787bae0f7147342f7004852ed619cd10b7ed84843836c7b6c6a74539e17b1
MD5 d0a28878b54b07477d57724d59866b5e
BLAKE2b-256 3e74c77cd59212d51716a4657b9506839484660cd2514e902873eea13a1e8f17

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page