Utility functions for reading and writing CSV files with metadata headers.
Project description
libtabular
A general purpose library for reading and writing tabular data (CSV, TSV, gsheets, ods, xlsx).
Pitch
Imagine a csv.DictReader
-like API you can use to "open" and "read" any source
of tabular data (CSV, TSV, gsheets, ods, xlsx) without having to worry about a
million libraries and authentication APIs.
Tabular data with metadata headers
The main "new feature" that libtabular
provides is a way to parse "metadata headers"
in tabular data (e.g. CSV) automatically. These "CSV metadata headers" are directly
analogous to the YAML headers that sometimes appear in Markdown files used in
static site generators.
Example
Using libtabular
, you could "extract" the data and metadata from this source
file using a few commands:
>>> from libtabular import fromcsvwithheader
>>> table = fromcsvwithheader('samples/minimal.csv')
>>> table.metadata
{'key1': 'value1',
'key2': 'value2',
'title': 'Minimal sample document',
'description': 'This is a sample document that consists of four sections',
'doc_id': 'Sample-doc-001',
'comment': 'This is not part of document metadata; just a comment...'}
>>> list(table.dicts())
[{'section_id': '002',
'slug': 'dataformat',
'title': 'CSV files with metadata',
'description': 'Description of the CSV-with-metadata-header data format',
'url': 'https://github.com/rocdata/libtabular/blob/main/docs/dataformat.md'},
{'section_id': '003',
'slug': 'tutorial',
'title': 'Tutorial',
'description': 'Hands-on examples of using libtabular',
'url': 'https://github.com/rocdata/libtabular/blob/main/docs/tutorial.md'},
{'section_id': '004',
'slug': 'backends',
'title': 'Backends',
'description': 'Description of integrations to various spreadsheets formats and APIs ',
'url': 'https://github.com/rocdata/libtabular/blob/main/docs/backends.md'}]
>>> table.header
('section_id', 'slug', 'title', 'description', 'url')
Why is this needed?
Recent work on a repository of curriculum documents, see rocdata.global, requires an easy-to-use process for import and export of curriculum data like:
- Curriculum standards documents (excel sheets that specify what students should be learning)
- Content collections data (excel sheets that consists of links to useful learning resources)
- Content correlations data (excel sheets that contain "links" between curriculum standards and relevant learning resources)
The spreadsheet/CSV format is a natural choice for teachers and administrators, who have experience working with this file type, so it is worth developing tools that facilitate reading and writing tabular data:
- Curriculum bodies and ministries of education can publish curriculum standards documents information in machine-readable formats (instead of publishing PDFs, publish spreadsheets).
- Teachers can download standards data in easy-to-use spreadsheet formats (use standards for your grade level to plan your lessons).
- Curriculum experts and teachers can download blank templates with appropriate headers to fill in when need to specify standards documents or content correlations.
Related projects
csv
in stdlibpandas.read_csv
which is a few more bells and whistles.petl
has a lot of functionality for loading CSV, TSV, Excel, and a bunch of other format. Thepetl
library also supports convenient transformation of columns. There is even a PR for integration with google sheets (not merged).pyexcel
is a general-purpose backend for all kinds of spreadsheet formats (csv, xlsx, ods, etc.)- For other spreadsheet Python libs, see http://www.python-excel.org/
TODOs
- Add
fromxlsxwithheader
for parsing Excel files - Add
fromodswithheader
based onpyexcel
- Add minimal tests to check all source formats result in same data
- Add
prependheader(metadata, header, data)
function to export in this format, ideally generic workflow that works for any output format (csv,ods,xlsx).
Roadmap
- Add tree-parsing logic utils? (libtree ;)
- Includes statements (include another .CSV file as a node in current location)
- Optional enhancements for templates: add formatting of header etc. (only for Excel and ODS)
Ideas
- Investigate CSVW standard and libraries, specifically options for validation.
- Investigate
tablib
as an alternative base topetl
(although seems less versatile).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file libtabular-0.0.3.tar.gz
.
File metadata
- Download URL: libtabular-0.0.3.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95368354408938613c11c25399fa237f45ae059404f39d1c8727ff8bc9f8d4d0 |
|
MD5 | 7340d8b2378f387ffefccabd815007aa |
|
BLAKE2b-256 | d0f3ed324819051c9dcf258474105ad2af0286e99dbcf99a8c9d8f3b970ae6cf |