pytablereader is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
Project description
Summary
pytablereader is a Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
Features
- Extract structured tabular data from various data format:
CSV / Tab separated values (TSV) / Space separated values (SSV)
Microsoft Excel TM file
HTML
JSON
Line-delimited JSON(LDJSON) / NDJSON / JSON Lines
Markdown
MediaWiki
SQLite database file
- Supported data sources are:
Files on a local file system
Accessible URLs
str instances
- Loaded table data can be used as:
pandas.DataFrame instance
dict instance
Examples
Load a CSV table
- Sample Code:
import pytablereader as ptr import pytablewriter as ptw # prepare data --- file_path = "sample_data.csv" csv_text = "\n".join([ '"attr_a","attr_b","attr_c"', '1,4,"a"', '2,2.1,"bb"', '3,120.9,"ccc"', ]) with open(file_path, "w") as f: f.write(csv_text) # load from a csv file --- loader = ptr.CsvTableFileLoader(file_path) for table_data in loader.load(): print("\n".join([ "load from file", "==============", "{:s}".format(ptw.dump_tabledata(table_data)), ])) # load from a csv text --- loader = ptr.CsvTableTextLoader(csv_text) for table_data in loader.load(): print("\n".join([ "load from text", "==============", "{:s}".format(ptw.dump_tabledata(table_data)), ]))
- Output:
load from file ============== .. table:: sample_data ====== ====== ====== attr_a attr_b attr_c ====== ====== ====== 1 4.0 a 2 2.1 bb 3 120.9 ccc ====== ====== ====== load from text ============== .. table:: csv2 ====== ====== ====== attr_a attr_b attr_c ====== ====== ====== 1 4.0 a 2 2.1 bb 3 120.9 ccc ====== ====== ======
Get loaded table data as pandas.DataFrame instance
- Sample Code:
import pytablereader as ptr loader = ptr.CsvTableTextLoader( "\n".join([ "a,b", "1,2", "3.3,4.4", ])) for table_data in loader.load(): print(table_data.as_dataframe())
- Output:
a b 0 1 2 1 3.3 4.4
For more information
More examples are available at https://pytablereader.rtfd.io/en/latest/pages/examples/index.html
Installation
pip install pytablereader
Some of the formats require additional dependency packages, you can install the dependency packages as follows:
- Excel
pip install pytablereader[excel]
- Google Sheets
pip install pytablereader[gs]
- Mediawiki
pip install pytablereader[mediawiki]
- SQLite
pip install pytablereader[sqlite]
- Load from URLs
pip install pytablereader[url]
- All of the extra dependencies
pip install pytablereader[all]
Dependencies
Python 2.7+ or 3.4+
Mandatory Python packages
DataProperty (Used to extract data types)
Optional Python packages
Optional packages (other than Python packages)
libxml2 (faster HTML conversion)
pandoc (required when loading MediaWiki file)
Test dependencies
Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pytablereader-0.25.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a2869cd18a5ce5ab9380bdd9d60ea159e826d5831ac37a42ecc5a437be34fa6 |
|
MD5 | 93f535b24fefa6506f5487c7e7ba12e0 |
|
BLAKE2b-256 | 3be5dd947d51a6a3bc7d56527c4998ccb3301413457781c413f089a5af46c4de |