Skip to main content

Read tables from messy spreadsheets.

Project description

Logo

Travis (.com) Codecov Read the Docs PyPI PyPI - Python Version PyPI - Wheel GitHub last commit PyPI - Downloads

fuzzytable is a set of tools for extracting tabular data out of messy spreadsheets.

This library meets the needs of projects relying on spreadsheet data that has been handled by many people. Headers are often missing or misspelled. The data is incorrectly formatted. The table is on the wrong worksheet or you don't know the correct spreadsheet name. Etc...

fuzzytable allows you to quickly extract that data instead of arduously QC'ing the data ahead of time. After extraction, you can inspect the FuzzyTable attributes to e.g. determine which fields were found and how closely the desired header matches the actual header.

Installation

pip install fuzzytable

Example Usage

Here's a light-hearted demo. To read this messy file using, say, the csv module, we'd have to first:

  • Delete rows 1 and 2.
  • Delete columns A and B.
  • Rename the headers.
A B C D E
These are not the droids
you are looking for. He
can go c o l o r first name GivenName
about his Gold C 3PO
business . Blue R2 D2

Let's instead leverage the FuzzyTable class.

>>> from fuzzytable import FuzzyTable

>>> droids = FuzzyTable(
...     path='droids.csv',
...     fields=['first_name', 'last_name', 'color'],
...     approximate_match=True,
...     min_ratio=.3
... )

Now let's play with the data we've extracted.

>>> droids['color']
['Gold', 'Blue']

>>> for droid in droids.records:
...     print(f"{droid['first_name']}-{droid['last_name']} is {droid['color']}.")
C-3PO is Gold.
R2-D2 is Blue.

>>> droids.fields['first_name'].col_num
3

>>> droids.sheet.header_row
2

Links

Supported Formats

  • Excel (.xlsx, .xlsm, .xltx, .xltm)
  • csv (.csv)

Basically, anything that can be read by the openpyxl or csv modules.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzytable-0.19.tar.gz (333.8 kB view details)

Uploaded Source

Built Distribution

fuzzytable-0.19-py3-none-any.whl (82.2 kB view details)

Uploaded Python 3

File details

Details for the file fuzzytable-0.19.tar.gz.

File metadata

  • Download URL: fuzzytable-0.19.tar.gz
  • Upload date:
  • Size: 333.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for fuzzytable-0.19.tar.gz
Algorithm Hash digest
SHA256 f15a9e79de50c37bf2e1ab8f04326d89863ce0bead52efab69e6291ae9c983bb
MD5 ac48d525a23dee82e6dd1c5ee7fc7238
BLAKE2b-256 2b79702aa69b72817fa8047ccc321ba5a84af9a6a24c3d9e86d306b3db4225fb

See more details on using hashes here.

File details

Details for the file fuzzytable-0.19-py3-none-any.whl.

File metadata

  • Download URL: fuzzytable-0.19-py3-none-any.whl
  • Upload date:
  • Size: 82.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for fuzzytable-0.19-py3-none-any.whl
Algorithm Hash digest
SHA256 db22bd382723aff3a38921c545cc53ed92c6e1fcc1134507d3a3e0f7b4909551
MD5 a558f93f54c2d09ff9793280e8a271f1
BLAKE2b-256 a144c481f3b10034c6fa280a83f7afec6326d31ef4de96de14247b6595570f52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page