A blazingly fast spreadsheet parser for .xlsx files
Project description
SheetReader Python Bindings
SheetReader allows to read your Excel spreadsheet files (.xlsx) blazingly fast. This repository contains the Python bindings, as the core library is implemented in C++.
Quickstart
Sheetreader is available through:
pip install pysheetreader
After successful installation, spreadsheets can be loaded:
import pysheetreader as sr
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")
To convert a spreadsheet into a pandas Dataframe:
import pysheetreader as sr
import pandas as pd
sheet = sr.read_xlsx("my_favorite_sheet.xlsx")
df = pd.DataFrame.from_dict(sheet[0])
Parameters:
| Parameter | Type | Description | Default |
|---|---|---|---|
path |
string |
The path of the .xlsx file to parse. |
- |
sheet |
integer or string |
The sheet of the file to parse, can be either the index (starting at 1) or the name. | 1 |
headers |
boolean |
Whether to interpret the first parsed row as headers. | True |
skip_rows |
integer |
How many rows to skip before parsing data. | 0 |
skip_columns |
integer |
How many columns to skip before parsing data. | 0 |
num_threads |
integer |
How many threads to use for parsing. Use -1 for automatic threading. |
-1 |
col_types |
dict or list |
How to interpret parsed data, either by names (dict) or by position (list). Types: numeric, text, logical, date, skip, guess. |
None |
Build Instructions
First install the submodules, which contain the sheetreader-core dependency with:
git clone --recurse-submodules https://github.com/polydbms/sheetreader-python.git
To build from source, this repository provides a pyproject.toml.
The SheetReader wheel file can be generated through:
python -m build .
or installed with pip through:
pip install .
More resources
SheetReader is part of the PolyDB Project. We also provide bindings/extensions for several other environments:
- R language: Load spreadsheets into dataframes, also available via CRAN.
- PostgreSQL FDW: Foreign data wrapper for PostgreSQL; allows to register spreadsheets as foreign tables.
- DuckDB Extension: Extension for DuckDB that allows loading spreadsheets into tables. Also available as a community extension.
Paper
SheetReader was published in the Information Systems Journal. Cite as:
@article{DBLP:journals/is/GavriilidisHZM23,
author = {Haralampos Gavriilidis and
Felix Henze and
Eleni Tzirita Zacharatou and
Volker Markl},
title = {SheetReader: Efficient Specialized Spreadsheet Parsing},
journal = {Inf. Syst.},
volume = {115},
pages = {102183},
year = {2023},
url = {https://doi.org/10.1016/j.is.2023.102183},
doi = {10.1016/J.IS.2023.102183},
timestamp = {Mon, 26 Jun 2023 20:54:32 +0200},
biburl = {https://dblp.org/rec/journals/is/GavriilidisHZM23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysheetreader-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pysheetreader-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82ca21399a26bb0bf9c169c2451977bb9490543b5a69fb9155568198eff0eeaa
|
|
| MD5 |
b9ba7b3e57d8d590a74d69fe832c46a3
|
|
| BLAKE2b-256 |
87714170427f2b4762ddfe9dd7e1df194c76145a12489a537351ad52aa0dd48c
|
File details
Details for the file pysheetreader-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pysheetreader-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4dafe75922d9718df81dc651dad9c732c573a340547fae0ca30adbc52612f74
|
|
| MD5 |
5bdb280a90e62d268524a5756ae79fa8
|
|
| BLAKE2b-256 |
a8214725b27fe82e1e4045e9befa48dda48922593b74f8ada1053709954db381
|
File details
Details for the file pysheetreader-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pysheetreader-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78bd296cb219b316a25e5b0fdffc5d6b2302a6791e06b4516c5d6203f71d9f08
|
|
| MD5 |
1ccf3f173c18ad9e521d7f92246f818d
|
|
| BLAKE2b-256 |
61cb8d9e54acb4a01063848f1e1df09bf0b859957113425b964fb394b1b6cabc
|
File details
Details for the file pysheetreader-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pysheetreader-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
79d308dc9c9d26e57f0a2a73ff119027020a5bcb9dde7d0487cfc1c6a8c85410
|
|
| MD5 |
1824357c41454a1b9737eea4d4eac3cf
|
|
| BLAKE2b-256 |
3408c49313fac89046c61f9fc024324eb6cac9346d436cd8cb979d22fe89b904
|
File details
Details for the file pysheetreader-0.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pysheetreader-0.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fe0429f1f3c374ce8ba403e52e463b995b90a5a0845b732ba3febd86228ff3a
|
|
| MD5 |
ad3357662d9d1f2d681c354b38b8057b
|
|
| BLAKE2b-256 |
239e95429542f74a50c727d37b9ef9887b7f8b99d6e733cd74a115aaf333ec39
|
File details
Details for the file pysheetreader-0.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pysheetreader-0.0.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fc53dcd25b0e5b7ed1e4a813adc93b81989dc60432c06b9f0f51f59c10e217b
|
|
| MD5 |
2b494a1ec52890d710b8b62340f7c370
|
|
| BLAKE2b-256 |
9e97864c29e935f88659a0fc8e5131d175674cce78215ed800a18675380a283a
|