Skip to main content

A Python toolbox for dealing with some of the oddities that Excel can introduce into your dates.

Project description

ExcellAint

A Python toolbox for dealing with some of the oddities that Excel can introduce into your dates.

** Under Construction : Library may not work as described **


One of the major issues with trying to import data from Excel into Python is that Excel routinely fucks dates, due to its decision to store dates as floats and then present them in whichever format it is set to. This can lead to all sorts of downstream issues, most commonly (in my experience) the switching of month and day. For example, you might wind up with something like the following:

Date Very Important Data
... ...
01/11/2020 abc
01/12/2020 ijk
13/01/2020 pqrs
14/01/2020 xyz
... ...

The goal of excellaint is to provide some tools which aim to fix these issues.

Installation

pip install excellaint

Usage

Currently, excellaint exposes a configuration class, accessible through excellaint.config, and a single function: excellaint.parse_datetime_column

Configuration is global and aims to take as much of the hassle of setting up how data ought to be treated out of the function call. It is assumed that generally, the datetime issues excellaint addresses will be common to all excel spreadsheets in a dataset, as they are generally a function of locale.

As such, it is recommended to set up the configuration at the start of your script, like so:

import excellaint as ea
import pandas as pd

ea.config.date_sep = "/"
ea.config.datetime_sep = " "


test_file = "./test/test_data/2_digit_yr.xlsx"

test_df = pd.read_excel(test_file)

print(test_df)
          mangled_dates  row_id
0        01/01/20 12:00       0
1        01/01/20 01:00       1
2        01/01/20 02:00       2
3        01/01/20 03:00       3
4        01/01/20 04:00       4
...                 ...     ...
17539  31/12/2021 19:00   17539
17540  31/12/2021 20:00   17540
17541  31/12/2021 21:00   17541
17542  31/12/2021 22:00   17542
17543  31/12/2021 23:00   17543

[17544 rows x 2 columns]

The package should be relatively performant - future versions will be quite significantly faster as the current approach is not very sophisticated.

%%timeit 
df = ea.parse_datetime_column(test_df,"mangled_dates")
print(df)
shape: (17_544, 2)
┌────────┬─────────────────────┐
│ row_id ┆ Datetime            │
│ ---    ┆ ---                 │
│ i64    ┆ datetime[μs]        │
╞════════╪═════════════════════╡
│ 0      ┆ 2020-01-01 12:00:00 │
│ 1      ┆ 2020-01-01 01:00:00 │
│ 2      ┆ 2020-01-01 02:00:00 │
│ 3      ┆ 2020-01-01 03:00:00 │
│ 4      ┆ 2020-01-01 04:00:00 │
│ …      ┆ …                   │
│ 17539  ┆ 2021-12-31 19:00:00 │
│ 17540  ┆ 2021-12-31 20:00:00 │
│ 17541  ┆ 2021-12-31 21:00:00 │
│ 17542  ┆ 2021-12-31 22:00:00 │
│ 17543  ┆ 2021-12-31 23:00:00 │
└────────┴─────────────────────┘

63 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In the future, options to use per-call configuration will be added - for now if it needs to change multiple times in a script you will need to set it each time.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excellaint-0.1.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

excellaint-0.1.0-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file excellaint-0.1.0.tar.gz.

File metadata

  • Download URL: excellaint-0.1.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for excellaint-0.1.0.tar.gz
Algorithm Hash digest
SHA256 799bb0fc2ec1a9c98790407ef64a90a797cd9502fef555858dbe0f30c851101e
MD5 b0630543ecf1c678b49277a05b9eed2f
BLAKE2b-256 dbe632ee0bd2afb8c94d071ff36814efc01a1828d644ea336ac39b5c9b1c0c9e

See more details on using hashes here.

File details

Details for the file excellaint-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: excellaint-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for excellaint-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41e3c21e0681a8e5ca21fdb29aa68f83ca7b39fe24279e633156816c6119f561
MD5 ff0bcb570208e7c799a39a35dcd0cccd
BLAKE2b-256 171df53aae6cf8e6d516c2dc51168930418d5c5c2e084df363a67aa2adaf7751

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page