A fast and memory efficient way to load large CSV files (Timeseries data) into Pandas
Project description
fast_csv_loader.py
The csv_loader function efficiently loads a partial portion of a large CSV file containing time-series data into a pandas DataFrame.
The function allows:
- Loading the last N lines from the end of the file.
- Loading the last N lines from a specific date.
It can load any type of time-series (both timezone aware and Naive) and daily or intraday data.
It is useful for loading large datasets that may not fit entirely into memory. It also improves program execution time, when iterating or loading a large number of CSV files.
Supports Python >= 3.8
Install
pip install fast-csv-loader
Documentation
https://bennythadikaran.github.io/fast_csv_loader/
Performance
Loading a portion of a large file is significantly faster than loading the entire file in memory. Files used in the test were not particularly large. You may need to tweak the chunk_size parameter for your use case.
It is slower for smaller files or if you're loading nearly the entire portion of the file.
I chose a 6Kb chunk size based on testing with my specific requirements. Your requirements may differ.
csv_loader vs pandas.read_csv
To run this performance test.
py tests/run.py
At the minimum, the CSV file must contain a Date and another column with newline chars at the end to correctly parse and load.
Date,Price\n
2023-12-01,200\n
Unit Test
To run the test:
py tests/test_csv_loader.py
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fast_csv_loader-2.1.0.tar.gz.
File metadata
- Download URL: fast_csv_loader-2.1.0.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbb0c8e7b4bf97b179a73ce28c446f90680e8de4d30b234afd1369a00b3688d5
|
|
| MD5 |
939bdef2589a69df0543ce95a9910dd8
|
|
| BLAKE2b-256 |
f6d7fc8a185fb75ef3ba7f4de88d087397e03c2b16c25b6e4e681865772221fb
|
Provenance
The following attestation bundles were made for fast_csv_loader-2.1.0.tar.gz:
Publisher:
publish-to-pypi.yml on BennyThadikaran/fast_csv_loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fast_csv_loader-2.1.0.tar.gz -
Subject digest:
dbb0c8e7b4bf97b179a73ce28c446f90680e8de4d30b234afd1369a00b3688d5 - Sigstore transparency entry: 1279787478
- Sigstore integration time:
-
Permalink:
BennyThadikaran/fast_csv_loader@3453ba27e496cbff303e5692cab41fa113bdba0c -
Branch / Tag:
refs/tags/v2.1.0 - Owner: https://github.com/BennyThadikaran
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@3453ba27e496cbff303e5692cab41fa113bdba0c -
Trigger Event:
push
-
Statement type:
File details
Details for the file fast_csv_loader-2.1.0-py3-none-any.whl.
File metadata
- Download URL: fast_csv_loader-2.1.0-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f626c7993795c51b144dca41cf211816c675cef0b59916720452a8c7fc1a4fe5
|
|
| MD5 |
7122b138344a200aa405e15c78ca1072
|
|
| BLAKE2b-256 |
c5d225a6ee2205a7d9cdb64ecd095bd4e19bb76ed91c4acc6f08671f7d91f2da
|
Provenance
The following attestation bundles were made for fast_csv_loader-2.1.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on BennyThadikaran/fast_csv_loader
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fast_csv_loader-2.1.0-py3-none-any.whl -
Subject digest:
f626c7993795c51b144dca41cf211816c675cef0b59916720452a8c7fc1a4fe5 - Sigstore transparency entry: 1279787541
- Sigstore integration time:
-
Permalink:
BennyThadikaran/fast_csv_loader@3453ba27e496cbff303e5692cab41fa113bdba0c -
Branch / Tag:
refs/tags/v2.1.0 - Owner: https://github.com/BennyThadikaran
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@3453ba27e496cbff303e5692cab41fa113bdba0c -
Trigger Event:
push
-
Statement type: