Skip to main content

Read tables from messy spreadsheets.

Project description

Logo

Travis (.com) Codecov Read the Docs PyPI PyPI - Python Version PyPI - Wheel GitHub last commit PyPI - Downloads

fuzzytable is a set of tools for extracting tabular data out of messy spreadsheets.

This library meets the needs of projects relying on spreadsheet data that has been handled by many people. Headers are often missing or misspelled. The data is incorrectly formatted. The table is on the wrong worksheet or you don't know the correct spreadsheet name. Etc...

fuzzytable allows you to quickly extract that data instead of arduously QC'ing the data ahead of time. After extraction, you can inspect the FuzzyTable attributes to e.g. determine which fields were found and how closely the desired header matches the actual header.

Installation

pip install fuzzytable

Example Usage

Here's a light-hearted demo. To read this messy file using, say, the csv module, we'd have to first:

  • Delete rows 1 and 2.
  • Delete columns A and B.
  • Rename the headers.
A B C D E
These are not the droids
you are looking for. He
can go c o l o r first name GivenName
about his Gold C 3PO
business . Blue R2 D2

Let's instead leverage the FuzzyTable class.

>>> from fuzzytable import FuzzyTable

>>> droids = FuzzyTable(
...     path='droids.csv',
...     fields=['first_name', 'last_name', 'color'],
...     approximate_match=True,
...     min_ratio=.3
... )

Now let's play with the data we've extracted.

>>> droids['color']
['Gold', 'Blue']

>>> for droid in droids.records:
...     print(f"{droid['first_name']}-{droid['last_name']} is {droid['color']}.")
C-3PO is Gold.
R2-D2 is Blue.

>>> droids.fields['first_name'].col_num
3

>>> droids.sheet.header_row
2

Links

Supported Formats

  • Excel (.xlsx, .xlsm, .xltx, .xltm)
  • csv (.csv)

Basically, anything that can be read by the openpyxl or csv modules.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for fuzzytable, version 0.16
Filename, size File type Python version Upload date Hashes
Filename, size fuzzytable-0.16-py3-none-any.whl (70.8 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size fuzzytable-0.16.tar.gz (329.4 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page