Python reader/writer for CSV files with YAML header information.
Project description
CSVY for Python
CSV is a popular format for storing tabular data used in many disciplines. Metadata concerning the contents of the file is often included in the header, but it rarely follows a format that is machine readable - sometimes is not even human readable! In some cases, such information is provided in a separate file, which is not ideal as it is easy for data and metadata to get separated.
CSVY is a small Python package to handle CSV files in which the metadata in the header is formatted in YAML. It supports reading/writing tabular data contained in numpy arrays, pandas DataFrames and nested lists, as well as metadata using a standard python dictionary. Ultimately, it aims to incorporate information about the CSV dialect used and a Table Schema specifying the contents of each column to aid the reading and interpretation of the data.
Instalation
'pycsvy' is available in PyPI therefore its installation is as easy as:
pip install pycsvy
In order to support reading into numpy
arrays or into pandas
DataFrames, you will
need to install those two packages, too.
Usage
In the simplest case, to save some data contained in data
and some metadata contained
in a metadata
dictionary into a CSVY file important_data.csv
(the extension is not
relevant), just do the following:
import csvy
csvy.write("important_data.csv", data, metadata)
The resulting file will have the YAML-formatted header in between ---
markers with,
optionally, a comment character starting each header line. It could look something like
the following:
---
name: my-dataset
title: Example file of csvy
description: Show a csvy sample file.
encoding: utf-8
schema:
fields:
- name: Date
type: object
- name: WTI
type: number
---
Date,WTI
1986-01-02,25.56
1986-01-03,26.00
1986-01-06,26.53
1986-01-07,25.85
1986-01-08,25.87
For reading the information back:
import csvy
# To read into a numpy array
data, metadata = csvy.read_to_array("important_data.csv")
# To read into a pandas DataFrame
data, metadata = csvy.read_to_dataframe("important_data.csv")
The appropriate writer/reader will be selected based on the type of data
:
- numpy array:
np.savetxt
andnp.loadtxt
- pandas DataFrame:
pd.DataFrame.to_csv
andpd.read_csv
- nested lists:'
csv.writer
andcsv.reader
Options can be passed to the tabular data writer/reader by setting the csv_options
dictionary. Likewise you can set the yaml_options
dictionary with whatever options you
want to pass to yaml.safe_load
and yaml.safe_dump
functions, reading/writing the
YAML-formatted header, respectively.
Finally, you can control the character(s) used to indicate comments by setting the
comment
keyword when writing a file. By default, there is no character (""). During reading, the comment character is found atomatically.
Contributors ✨
Thanks goes to these wonderful people (emoji key):
Diego Alonso Álvarez 🚇 🤔 🚧 ⚠️ 🐛 💻 |
Alex Dewar 🤔 ⚠️ 💻 |
Adrian D'Alessandro 🐛 💻 📖 |
This project follows the all-contributors specification. Contributions of any kind welcome!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pycsvy-0.2.2.tar.gz
.
File metadata
- Download URL: pycsvy-0.2.2.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.11.2 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58984837b865f3e370d3bd81ad572a34fda528d4d48af6d459994ea70179c50b |
|
MD5 | ac963b7f5bd7ed457e40341bce2c591a |
|
BLAKE2b-256 | eec69bfb0766d2e267ce49a7528909922020366ff0067f65ee4baeadaa5c00a8 |
File details
Details for the file pycsvy-0.2.2-py3-none-any.whl
.
File metadata
- Download URL: pycsvy-0.2.2-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.4.1 CPython/3.11.2 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d49b1b89fb72178c78f74fbdeead30bf46a44e36a8c1a2e463f9bccd323a448 |
|
MD5 | 32c38dc78536db8eb351b22ed2a5c88b |
|
BLAKE2b-256 | 3f912d7aa077fb68f2089b5193f4744e614a97fa05bda3cfc4dc5a4bdb9c9290 |