Utility functions for reading and writing CSV files with metadata headers.

These details have not been verified by PyPI

Project links

Project description

libtabular

A general purpose library for reading and writing tabular data (CSV, TSV, gsheets, ods, xlsx).

Pitch

Imagine a csv.DictReader-like API you can use to "open" and "read" any source of tabular data (CSV, TSV, gsheets, ods, xlsx) without having to worry about a million libraries and authentication APIs.

Tabular data with metadata headers

The main "new feature" that libtabular provides is a way to parse "metadata headers" in tabular data (e.g. CSV) automatically. These "CSV metadata headers" are directly analogous to the YAML headers that sometimes appear in Markdown files used in static site generators.

Example

Minimal sample data format

Using libtabular, you could "extract" the data and metadata from this source file using a few commands:

>>> from libtabular import fromcsvwithheader

>>> table = fromcsvwithheader('samples/minimal.csv')

>>> table.metadata
{'key1': 'value1',
 'key2': 'value2',
 'title': 'Minimal sample document',
 'description': 'This is a sample document that consists of four sections',
 'doc_id': 'Sample-doc-001',
 'comment': 'This is not part of document metadata; just a comment...'}

>>> list(table.dicts())
[{'section_id': '002',
  'slug': 'dataformat',
  'title': 'CSV files with metadata',
  'description': 'Description of the CSV-with-metadata-header data format',
  'url': 'https://github.com/rocdata/libtabular/blob/main/docs/dataformat.md'},
 {'section_id': '003',
  'slug': 'tutorial',
  'title': 'Tutorial',
  'description': 'Hands-on examples of using libtabular',
  'url': 'https://github.com/rocdata/libtabular/blob/main/docs/tutorial.md'},
 {'section_id': '004',
  'slug': 'backends',
  'title': 'Backends',
  'description': 'Description of integrations to various spreadsheets formats and APIs  ',
  'url': 'https://github.com/rocdata/libtabular/blob/main/docs/backends.md'}]


>>> table.header
('section_id', 'slug', 'title', 'description', 'url')

Why is this needed?

Recent work on a repository of curriculum documents, see rocdata.global, requires an easy-to-use process for import and export of curriculum data like:

Curriculum standards documents (excel sheets that specify what students should be learning)
Content collections data (excel sheets that consists of links to useful learning resources)
Content correlations data (excel sheets that contain "links" between curriculum standards and relevant learning resources)

The spreadsheet/CSV format is a natural choice for teachers and administrators, who have experience working with this file type, so it is worth developing tools that facilitate reading and writing tabular data:

Curriculum bodies and ministries of education can publish curriculum standards documents information in machine-readable formats (instead of publishing PDFs, publish spreadsheets).
Teachers can download standards data in easy-to-use spreadsheet formats (use standards for your grade level to plan your lessons).
Curriculum experts and teachers can download blank templates with appropriate headers to fill in when need to specify standards documents or content correlations.

Related projects

csv in stdlib
pandas.read_csv which is a few more bells and whistles.
petl has a lot of functionality for loading CSV, TSV, Excel, and a bunch of other format. The petl library also supports convenient transformation of columns. There is even a PR for integration with google sheets (not merged).
pyexcel is a general-purpose backend for all kinds of spreadsheet formats (csv, xlsx, ods, etc.)
For other spreadsheet Python libs, see http://www.python-excel.org/

TODOs

Add fromxlsxwithheader for parsing Excel files
Add fromodswithheader based on pyexcel
Add minimal tests to check all source formats result in same data
Add prependheader(metadata, header, data) function to export in this format, ideally generic workflow that works for any output format (csv,ods,xlsx).

Roadmap

Add tree-parsing logic utils? (libtree ;)
Includes statements (include another .CSV file as a node in current location)
Optional enhancements for templates: add formatting of header etc. (only for Excel and ODS)

Ideas

Investigate CSVW standard and libraries, specifically options for validation.
Investigate tablib as an alternative base to petl (although seems less versatile).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.3

Feb 6, 2021

0.0.2

Feb 2, 2021

0.0.1

Feb 2, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libtabular-0.0.3.tar.gz (5.8 kB view details)

Uploaded Feb 6, 2021 Source

File details

Details for the file libtabular-0.0.3.tar.gz.

File metadata

Download URL: libtabular-0.0.3.tar.gz
Upload date: Feb 6, 2021
Size: 5.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/52.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.6.9

File hashes

Hashes for libtabular-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`95368354408938613c11c25399fa237f45ae059404f39d1c8727ff8bc9f8d4d0`
MD5	`7340d8b2378f387ffefccabd815007aa`
BLAKE2b-256	`d0f3ed324819051c9dcf258474105ad2af0286e99dbcf99a8c9d8f3b970ae6cf`

See more details on using hashes here.

libtabular 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

libtabular

Pitch

Tabular data with metadata headers

Example

Why is this needed?

Related projects

TODOs

Roadmap

Ideas

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes