A tool for managing large datasets
Project description
# shapeshifter Python Module
The official repository for the shapeshifter Python module, which allows for:
* Transforming tabular data sets from one format to another.
* Querying large data sets to filter out useful data.
* Selecting additional columns/features to include in the resulting data set.
* Merging data sets of various formats into a single file.
* Gzipping resulting data sets, as well as the ability to read gzipped files.
Click for information on the [shapeshifter command-line tool](https://github.com/srp33/ShapeShifter-CLI), which combines
the features of shapeshifter with the ease and speed of the command-line!
Basic use is described below, but see the full documentation on [Read the Docs](https://shapeshifter.readthedocs.io/en/latest/).
## Install
`pip3 install shapeshifter`
## Basic Use
After installing, import the ShapeShifter class with `from shapeshifter import ShapeShifter`. A ShapeShifter object
represents the file to be transformed. It is then transformed using the `export_filter_results` method. Here is a simple
example of file called `input_file.tsv` being transformed into an HDF5 file called `output_file.h5`, while filtering
the data on sex and age:
```python
from shapeshifter import ShapeShifter
my_shapeshifter = ShapeShifter("input_file.tsv")
my_shapeshifter.export_filter_results("output_file.h5", filters="Sex == 'M' and Age > 40")
```
Note that the type of file being read and exported to were not stated explicitly but inferred by shapeshifter based on
the file extensions provided. If necessary, `input_file_type` and `output_file_type` can be named explicitly.
## Contributing
We welcome contributions that help expand shapeshifter to be compatible with additional file formats. If you are
interested in contributing, please follow the instructions [here](https://github.com/srp33/ShapeShifter/wiki).
## Currently Supported Formats
#### Input Formats:
* CSV
* TSV
* JSON
* Excel
* HDF5
* Parquet
* MsgPack
* Stata
* Pickle
* SQLite
* ARFF
* GCT
* Kallisto
* GEO
#### Output Formats:
* CSV
* TSV
* JSON
* Excel
* HDF5
* Parquet
* MsgPack
* Stata
* Pickle
* SQLite
* ARFF
* GCT
* RMarkdown
* JupyterNotebook
## Future Formats to Support
We are working hard to expand ShapeShifter to work with even more file formats! Expect the following formats to be
included in future releases:
* Fixed-width files (fwf)
* Genomic Data Commons clinical XML
The official repository for the shapeshifter Python module, which allows for:
* Transforming tabular data sets from one format to another.
* Querying large data sets to filter out useful data.
* Selecting additional columns/features to include in the resulting data set.
* Merging data sets of various formats into a single file.
* Gzipping resulting data sets, as well as the ability to read gzipped files.
Click for information on the [shapeshifter command-line tool](https://github.com/srp33/ShapeShifter-CLI), which combines
the features of shapeshifter with the ease and speed of the command-line!
Basic use is described below, but see the full documentation on [Read the Docs](https://shapeshifter.readthedocs.io/en/latest/).
## Install
`pip3 install shapeshifter`
## Basic Use
After installing, import the ShapeShifter class with `from shapeshifter import ShapeShifter`. A ShapeShifter object
represents the file to be transformed. It is then transformed using the `export_filter_results` method. Here is a simple
example of file called `input_file.tsv` being transformed into an HDF5 file called `output_file.h5`, while filtering
the data on sex and age:
```python
from shapeshifter import ShapeShifter
my_shapeshifter = ShapeShifter("input_file.tsv")
my_shapeshifter.export_filter_results("output_file.h5", filters="Sex == 'M' and Age > 40")
```
Note that the type of file being read and exported to were not stated explicitly but inferred by shapeshifter based on
the file extensions provided. If necessary, `input_file_type` and `output_file_type` can be named explicitly.
## Contributing
We welcome contributions that help expand shapeshifter to be compatible with additional file formats. If you are
interested in contributing, please follow the instructions [here](https://github.com/srp33/ShapeShifter/wiki).
## Currently Supported Formats
#### Input Formats:
* CSV
* TSV
* JSON
* Excel
* HDF5
* Parquet
* MsgPack
* Stata
* Pickle
* SQLite
* ARFF
* GCT
* Kallisto
* GEO
#### Output Formats:
* CSV
* TSV
* JSON
* Excel
* HDF5
* Parquet
* MsgPack
* Stata
* Pickle
* SQLite
* ARFF
* GCT
* RMarkdown
* JupyterNotebook
## Future Formats to Support
We are working hard to expand ShapeShifter to work with even more file formats! Expect the following formats to be
included in future releases:
* Fixed-width files (fwf)
* Genomic Data Commons clinical XML
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
shapeshifter-1.1.1.tar.gz
(20.5 kB
view details)
Built Distribution
File details
Details for the file shapeshifter-1.1.1.tar.gz
.
File metadata
- Download URL: shapeshifter-1.1.1.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9679af30022df7eeeafa11d9f8dd62408417bb439efb343f272de615eadacd33 |
|
MD5 | ee65c44ffa2c03782a844d0e738eb8aa |
|
BLAKE2b-256 | 6403e4c6848ec7d2e5982bae145b314ef16349df4cb193585ef3450efb939914 |
File details
Details for the file shapeshifter-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: shapeshifter-1.1.1-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.25.0 CPython/3.6.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13c630b6ed9846149cdbc918683059236e1fcdb6171479e66a116dfdc1e68589 |
|
MD5 | 39ffbe69c4f05654994ca6fecb0d6b79 |
|
BLAKE2b-256 | ced490b58c82104ea4140229f528546c20c8ca48a3c1b03ff6c47bf194905358 |