A small library for taking the transpose of arbitrarily large .csvs
Project description
transposecsv: A small Python library to transpose large csv files that can't fit in memory.
Suppose you have an p x m
matrix where your original data is m
points samples with p
features, or in m
points in p
dimensional space. Then we want the column space to be the features, that is, we'd like to consider the m x p
data matrix. This small library is for performing this calculation on arbitrarily large csv files.
It works in the following way:
- Read in chunks that fit in memory
- Transpose those in memory (which is fast)
- Write each transposed chunk to a
.csv
file - Use
paste
to join the files horizontally (columnwise), this is why we don't need to save the index, since it will be the same as the columns of the original file.
This process outputs the m x p
matrix, as desired. This is particularly useful for single-cell data, where expression matrices are often uploaded genewise, but you may want to work with machine learning models that learn cellwise :).
Installation
To install, run pip install transposecsv
How to use
The transpose operation is contained in a lazily-loaded Transpose
class, so the transpose operation isn't started on initialization. For example:
from transposecsv import Transpose
transpose = Transpose(
file_name='massive_dataset.csv',
write_path='massive_dataset_T.csv',
chunksize=400, # Number of rows to read in at each iteration
# leave as default
# insep=',',
# outsep=',',
# chunksize=400,
# save_chunks=False,
# quiet=False,
)
transpose.compute()
Then to upload to S3, we would run
tranpose.upload(
bucket='braingeneersdev',
endpoint_url='https://s3.nautilus.optiputer.net',
aws_secret_key_id=secret,
aws_secret_access_key=access,
remote_name='jlehrer/massive_dataset_T.csv'
)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file transposecsv-0.0.5.tar.gz
.
File metadata
- Download URL: transposecsv-0.0.5.tar.gz
- Upload date:
- Size: 15.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 274e0cb537d4eb7af51425eac03bf196f547e06caf0ead9815729a7675bdb947 |
|
MD5 | 23f66b3eb3ba28d09a03ceb2f409254d |
|
BLAKE2b-256 | bcc6a04ee1c0604909b6f8602c27a59d28bac7199f8f901a2d3a9fd83bf6f88f |
File details
Details for the file transposecsv-0.0.5-py2.py3-none-any.whl
.
File metadata
- Download URL: transposecsv-0.0.5-py2.py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b730ffbe55490a5a95d2c72cdece9fa66a7c8cdaf82310494825c49b1d965c0 |
|
MD5 | 6199915dca6663ef36c8f03e68a337d2 |
|
BLAKE2b-256 | dd10e8137a1cadbc9156a6cca35821ffee14367f40545e99befceed4db136205 |