Lightweight csv read/write, keeping track of csv dialect and other metadata.
Project description
CSVMeta
CSVMeta is an extremely lightweight Python package designed to work with csv files and attached metadata. It writes data to an arbitrary folder such as mydata.csv/
and creates two internal files:
mydata.csv/data.csv
: the usual csv file.mydata.csv/metadata.json
: metadata about the csv file, such as the csv dialect.
When reading data from mydata.csv/
, it uses dialect information from the metadata file to read the csv data correctly. The metadata file can also be used to store additional information about the data, such as a data schema and a header indicator.
The package has no external dependencies beyond Python's standard library and is tested with Python 3.7+ on Linux, Windows, and macOS.
Installation
pip install csvmeta
Usage
Reading and Writing Data
Input and ouput data formats for the read
and write
functions are modelled on Python's csv module: data to write should be an iterable of rows, and data read will be an iterable of rows with string data types. The data header is always returned as the first row.
import csvmeta as csvm
data = [
['name', 'age', 'state'],
['Nicole', 43, 'CA'],
['John', 28, 'DC']
]
# Write data to a csv file folder
csvm.write('mydata.csv', data)
# Read data from a csv file folder
data = csvm.read('mydata.csv')
## [
## ['name', 'age', 'state'],
## ['Nicole', '43', 'CA'],
## ['John', '28', 'DC']
## ]
##
Reading and Writing Metadata
Metadata is stored in a json file in the csv folder. The metadata file is created automatically when writing data, and only the dialect
object is used when reading data. The dialect
object is a dictionary of csv dialect parameters, such as delimiter
, quotechar
, and lineterminator
. See the csv module documentation for more information.
Arbitrary metadata can be added to the metadata file by passing keyword arguments to the write
function. We recommend setting the header
keyword argument to True
if the first row of the data is a header row, and setting the schema
keyword argument to a list of column names and data types. The frictionless tabular data resource standard is a good reference for metadata schemas.
Metadata can be read using the metadata()
function
import csvmeta as csvm
data = [
['name', 'age', 'state'],
['Nicole', 43, 'CA'],
['John', 28, 'DC']
]
# Write data and metadata to a csv file folder
csvm.write(
'mydata.csv',
data,
header=True,
schema=['name', 'age', 'state'],
dialect={
'delimiter': ',',
'quotechar': '"',
'lineterminator': '\n'
},
description='This is an example dataset.'
)
# Read metadata from a csv file folder
csvm.metadata('mydata.csv')
## {
## "name": "mydata.csv",
## "path": "data.csv",
## "mediatype": "text/csv",
## "dialect": {
## "delimiter": ",",
## "quotechar": "\"",
## "lineterminator": "\n"
## },
## "header": true,
## "schema": [
## "name",
## "age",
## "state"
## ],
## "description": "This is an example dataset."
## }
Reading to Pandas DataFrame
import csvmeta as csvm
data = [
['name', 'age', 'state'],
['Nicole', 43, 'CA'],
['John', 28, 'DC']
]
# Write data and metadata to a csv file folder
csvm.write('mydata.csv', data, header=True)
data = csvm.read('mydata.csv')
metadata = csvm.metadata('mydata.csv')
if metadata.get("header", False):
df = pd.DataFrame(data[1:], columns=data[0])
else:
df = pd.DataFrame(data)
df
## name age state
## 0 Nicole 43 CA
## 1 John 28 DC
Links and References
- CSV Module Documentation
- Frictionless Tabular Data Resource Standard
- Common Format and MIME Type for Comma-Separated Values (CSV) Files
- CSV on the Web
Changelog
1.1.2 (2023-11-25)
- Make
DEFAULT_DIALECT
an explicit dictionary specification rather than "unix". - Add
DEFAULT_DIALECT
to tests.
1.1.1 (2023-11-25)
- Change
Iterable
typing toSequence
to account for order and allow multiple passes over data. - Improve tests.
1.1.0 (2023-11-25)
- Fix read function return type: now return list of lists instead of generator.
1.0.0 (2023-11-25)
- Initial release.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file csvmeta-1.1.2.tar.gz
.
File metadata
- Download URL: csvmeta-1.1.2.tar.gz
- Upload date:
- Size: 14.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64ababfbb41e1dcadb4a4ad61e6f9d2370566a440e55017e1402e79af7163c2b |
|
MD5 | cf79ead3997a47e92c63a1b134291d7e |
|
BLAKE2b-256 | 4068254edc832aaa2e9f61eac22b19ec1ca42ac65868795da459441354435ded |
File details
Details for the file csvmeta-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: csvmeta-1.1.2-py3-none-any.whl
- Upload date:
- Size: 6.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7b0e901aff1753a042594d43b1780d4b37bd172dae1ea816565339fc2d2af80 |
|
MD5 | acb3b42e263585586227337fb94b1917 |
|
BLAKE2b-256 | cd36cb36536063372ca6f32c9a5788d862c2dd90978241914e254e8ca5cd9e3e |