Tabulates structured data into a mergeable CSV format
Project description
OpenTabulate
OpenTabulate is a Python package designed to organize, tabulate, and process structured data. It currently aims to be a data processing framework for the Linkable Open Data Environment, an exploratory project by the Data Exploration and Integration Lab (DEIL) within the Center for Special Business Projects (CSBP) at Statistics Canada. OpenTabulate offers
- automated data retrieval
- a systematic way of organizing and retrieving data using sources files (inspired by OpenAddresses),
- tabulation of data into a standardized CSV format that is suitable for merging and linkage,
- various methods to process data, including address parsing, cleaning and reformatting.
OpenTabulate's API defines several classes and methods, such that when put together form a processing pipeline. This simplifies the processing procedure as a sequence of class method invocations. Moreover, this design allows for ease of addition, modification and removal of code.
Requirements
A basic setup of the data processing software will at least require
Optional dependencies
To process sources with the full_addr
key, an address parser is required. Below are the currently supported address parsers.
Installation
Be sure to have a Python package manager that can access the Python Package Index. For example, if you have pip
, run
$ pip install opentabulate
After installing the package, initialize the OpenTabulate environment by running
$ opentab --initialize
which creates ~/.opentabulate
and other subdirectories.
Documentation
Please see our GitHub wiki.
Issues
You can post questions, enhancement requests, and bugs in Issues.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for opentabulate-1.0.0b1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f39b5f57402fe5de8c1ad2ca9418dcc637b111b180adedae2df5b7fb30a10c1 |
|
MD5 | 105b03a36318809c5191e332e7c1b8aa |
|
BLAKE2b-256 | 1dab872ba74ad12e8a30c453e6956a2ac0b3fbaa2578798069ddd1a16471fea5 |