Skip to main content

A fast and memory efficient way to load large CSV files (Timeseries data) into Pandas

Project description

fast_csv_loader.py

The csv_loader function efficiently loads a partial portion of a large CSV file containing time-series data into a pandas DataFrame.

The function allows:

  • Loading the last N lines from the end of the file.
  • Loading the last N lines from a specific date.

It can load any type of time-series (both timezone aware and Naive) and daily or intraday data.

It is useful for loading large datasets that may not fit entirely into memory. It also improves program execution time, when iterating or loading a large number of CSV files.

Supports Python >= 3.8

Install

pip install fast-csv-loader

Documentation

https://bennythadikaran.github.io/fast_csv_loader/

Performance

Loading a portion of a large file is significantly faster than loading the entire file in memory. Files used in the test were not particularly large. You may need to tweak the chunk_size parameter for your use case.

It is slower for smaller files or if you're loading nearly the entire portion of the file.

I chose a 6Kb chunk size based on testing with my specific requirements. Your requirements may differ.

csv_loader vs pandas.read_csv

Execution time - Last 160 lines

Execution time - Last 160 lines upto 1st Jan 2023

To run this performance test.

py tests/run.py

At the minimum, the CSV file must contain a Date and another column with newline chars at the end to correctly parse and load.

Date,Price\n
2023-12-01,200\n

Unit Test

To run the test:

py tests/test_csv_loader.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_csv_loader-2.1.0.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fast_csv_loader-2.1.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file fast_csv_loader-2.1.0.tar.gz.

File metadata

  • Download URL: fast_csv_loader-2.1.0.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fast_csv_loader-2.1.0.tar.gz
Algorithm Hash digest
SHA256 dbb0c8e7b4bf97b179a73ce28c446f90680e8de4d30b234afd1369a00b3688d5
MD5 939bdef2589a69df0543ce95a9910dd8
BLAKE2b-256 f6d7fc8a185fb75ef3ba7f4de88d087397e03c2b16c25b6e4e681865772221fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_csv_loader-2.1.0.tar.gz:

Publisher: publish-to-pypi.yml on BennyThadikaran/fast_csv_loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_csv_loader-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: fast_csv_loader-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fast_csv_loader-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f626c7993795c51b144dca41cf211816c675cef0b59916720452a8c7fc1a4fe5
MD5 7122b138344a200aa405e15c78ca1072
BLAKE2b-256 c5d225a6ee2205a7d9cdb64ecd095bd4e19bb76ed91c4acc6f08671f7d91f2da

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_csv_loader-2.1.0-py3-none-any.whl:

Publisher: publish-to-pypi.yml on BennyThadikaran/fast_csv_loader

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page