A Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
Project description
pytablereader
Summary
A Python library to load structured table data from files/strings/URL with various data format: CSV / Excel / Google-Sheets / HTML / JSON / LDJSON / LTSV / Markdown / SQLite / TSV.
Features
- Extract structured tabular data from various data format:
CSV / Tab separated values (TSV) / Space separated values (SSV)
Microsoft Excel TM file
HTML
JSON
Line-delimited JSON(LDJSON) / NDJSON / JSON Lines
Markdown
MediaWiki
SQLite database file
- Supported data sources are:
Files on a local file system
Accessible URLs
str instances
- Loaded table data can be used as:
pandas.DataFrame instance
dict instance
Examples
Load a CSV table
- Sample Code:
import pytablereader as ptr import pytablewriter as ptw # prepare data --- file_path = "sample_data.csv" csv_text = "\n".join([ '"attr_a","attr_b","attr_c"', '1,4,"a"', '2,2.1,"bb"', '3,120.9,"ccc"', ]) with open(file_path, "w") as f: f.write(csv_text) # load from a csv file --- loader = ptr.CsvTableFileLoader(file_path) for table_data in loader.load(): print("\n".join([ "load from file", "==============", "{:s}".format(ptw.dump_tabledata(table_data)), ])) # load from a csv text --- loader = ptr.CsvTableTextLoader(csv_text) for table_data in loader.load(): print("\n".join([ "load from text", "==============", "{:s}".format(ptw.dump_tabledata(table_data)), ]))
- Output:
load from file ============== .. table:: sample_data ====== ====== ====== attr_a attr_b attr_c ====== ====== ====== 1 4.0 a 2 2.1 bb 3 120.9 ccc ====== ====== ====== load from text ============== .. table:: csv2 ====== ====== ====== attr_a attr_b attr_c ====== ====== ====== 1 4.0 a 2 2.1 bb 3 120.9 ccc ====== ====== ======
Get loaded table data as pandas.DataFrame instance
- Sample Code:
import pytablereader as ptr loader = ptr.CsvTableTextLoader( "\n".join([ "a,b", "1,2", "3.3,4.4", ])) for table_data in loader.load(): print(table_data.as_dataframe())
- Output:
a b 0 1 2 1 3.3 4.4
For more information
More examples are available at https://pytablereader.rtfd.io/en/latest/pages/examples/index.html
Installation
pip install pytablereader
Some of the formats require additional dependency packages, you can install the dependency packages as follows:
- Excel
pip install pytablereader[excel]
- Google Sheets
pip install pytablereader[gs]
- Mediawiki
pip install pytablereader[mediawiki]
- SQLite
pip install pytablereader[sqlite]
- All of the extra dependencies
pip install pytablereader[all]
Dependencies
Python 2.7+ or 3.4+
Mandatory Python packages
DataProperty (Used to extract data types)
Optional Python packages
- Excel
- MediaWiki
- SQLite
- pandas
required to get table data as a pandas data frame
Optional packages (other than Python packages)
libxml2 (faster HTML conversion)
pandoc (required when loading MediaWiki file)
Test dependencies
Documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pytablereader-0.22.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8d05ee798f849fa8ad95bb61ecdaad4b43f5acada655be944332d664ce9ff1e |
|
MD5 | 369a4cc77af531ccc7cd54c843a4bd2a |
|
BLAKE2b-256 | b0eda0416ef497aa5ac989724f6af1cd183142401348095ab0c3ef5beb5e5697 |