Skip to main content

Read tables from messy spreadsheets.

Project description

Logo PyPI PyPI - Python Version Read the Docs PyPI - Wheel GitHub last commit PyPI - Downloads

Overview

fuzzytable is a set of tools for extracting tabular data out of messy spreadsheets.

This library was developed to meet the needs of projects relying on spreadsheet data that has been handled by many people. Headers are often missing or mispelled. The data is incorrectly formatted. The table is on the wrong worksheet or you don't know the correct spreadsheet name. Etc...

fuzzytable allows you to quickly extract that data instead of arduously QC'ing the data ahead of time. After extraction, you can query the FuzzyTable attributes to e.g. determine which fields were found and how closely the desired header matches the actual header.

Installation

pip install fuzzytable

Example Usage

Here's a light-hearted demo. To read this messy file using, say, the csv module, we'd have to first:

  • Delete rows 1 and 2.
  • Delete columns A and B.
  • Rename the headers.
A B C D E
These are not the droids
you are looking for. He
can go c o l o r first name GivenName
about his Gold C 3PO
business . Blue R2 D2

Let's instead leverage the FuzzyTable class.

>>> from fuzzytable import FuzzyTable

>>> droids = FuzzyTable(
...     path='droids.csv',
...     fields=['first_name', 'last_name', 'color'],
...     approximate_match=True,
...     min_ratio=.3
... )

Now let's play with the data we've extracted.

>>> droids['color']
['Gold', 'Blue']

>>> for droid in droids.records:
...     print(f"{droid['first_name']}-{droid['last_name']} is {droid['color']}.")
C-3PO is Gold.
R2-D2 is Blue.

>>> droids.fields['first_name'].col_num
3

>>> droids.sheet.header_row
2

Links

Supported Formats

  • Excel (.xlsx, .xlsm, .xltx, .xltm)
  • csv (.csv)

Basically, anything that can be read by the openpyxl or csv modules.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzytable-0.12a0.tar.gz (322.7 kB view details)

Uploaded Source

Built Distribution

fuzzytable-0.12a0-py3-none-any.whl (62.3 kB view details)

Uploaded Python 3

File details

Details for the file fuzzytable-0.12a0.tar.gz.

File metadata

  • Download URL: fuzzytable-0.12a0.tar.gz
  • Upload date:
  • Size: 322.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for fuzzytable-0.12a0.tar.gz
Algorithm Hash digest
SHA256 03705ba07717ecb5c95ab314617727ff8232856c19b1b23d2492a5fb4f28b67f
MD5 637848499d65dfd9b4df3413ca4f1dd2
BLAKE2b-256 c4046c969f229c21395f9a8e8a4f047104dc15f22e5932dca04d39f95791c7d9

See more details on using hashes here.

File details

Details for the file fuzzytable-0.12a0-py3-none-any.whl.

File metadata

  • Download URL: fuzzytable-0.12a0-py3-none-any.whl
  • Upload date:
  • Size: 62.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.22.0

File hashes

Hashes for fuzzytable-0.12a0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a2ee5afcd53317d3529958f67072ac668c990b6a1a166c595a2ab29f23154db
MD5 db36d81ed16143482d022971ba21b20d
BLAKE2b-256 b74973d9783c6fa1be00abfc6bbdce3d78bd6ab5a436c83ff3e466625ab7b9cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page