Data processing module implemented with numpy

These details have not been verified by PyPI

Project links

Project description

carefree-data

carefree-data implemented a data processing module with numpy.

Update 2021.02.04

carefree-data now uses datatable as backend, which significantly improves the performances on file inputs!

Why carefree-data?

carefree-data is a data processing module which is capable of handling 'dirty' and 'messy' datasets.

For tabular datasets, `carefree-data` is able to:

Elegantly deal with data pre-processing.
- A Recognizer to recognize whether a column is STRING, NUMERICAL or CATEGORICAL.
- A Converter to convert a column into friendly format (["one", "two"] -> [0, 1]).
- A Processor to further process columns (OneHot, Normalize, MinMax, ...).
- And all the transforms could be inverse! (See tests\unittests\test_tabular.py -> test_recover_labels & test_recover_features).
- And these procedures are all completed AUTOMATICALLY!
Handle datasets saved in files (.txt, .csv).
- For .txt, " " will be the default delimiter.
- For .csv, "," will be the default delimiter, and the first row will be skipped as default.
- delimiter, label index, skip first could be set manually.

Pandas-free

There is one more thing we'd like to mention: carefree-data is 'Pandas-free'. Pandas is an open source library providing easy-to-use data structures on structured datasets. Although it is a widely used library in almost every famous Machine Learning and Deep Learning module, we finally decided to escape from it, and the reasons are listed below:

carefree-data wants to have full control on the data, and Pandas is not flexible enough.
carefree-data needs higher performances. Pandas is fast, but not as fast as pure numpy (and sometimes cython) codes on some critical code paths.
Pandas provides many powerful functions, but carefree-data doesn't need that much, which means Pandas is a little 'heavy' for carefree-data.

In short, Pandas is a more general library, and that's why we've written some codes to cover our needs instead of directly utilizing it.

Currently carefree-data only supports tabular datasets.

Installation

carefree-data requires Python 3.8 or higher.

pip install carefree-data

git clone https://github.com/carefree0910/carefree-data.git
cd carefree-data
pip install -e .

Basic Usages

Get scikit-learn datasets

from cfdata.tabular import TabularDataset

iris = TabularDataset.iris()

Read from array / dataset

from cfdata.tabular import *

iris = TabularDataset.iris()
x, y = iris.xy
assert TabularData().read(x, y) == TabularData.from_dataset(iris)

Read from file

from cfdata.tabular import TabularData

file = "/path/to/your/file"
data = TabularData().read(file)
assert data.processed == data.transform(file)

License

carefree-data is MIT licensed, as found in the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.9

Aug 16, 2022

0.2.8

Jun 16, 2022

0.2.7

Jun 15, 2022

0.2.6.1

May 1, 2022

0.2.5

Jul 4, 2021

0.2.4.2

Mar 7, 2021

0.2.3

Feb 4, 2021

0.2.2

Dec 12, 2020

0.2.1

Nov 22, 2020

0.1.7

Sep 21, 2020

0.1.6

Aug 25, 2020

0.1.5

Aug 2, 2020

0.1.4

Jul 28, 2020

0.1.3

Jun 28, 2020

0.1.2

Jun 8, 2020

0.1.1

Jun 4, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carefree-data-0.2.9.tar.gz (35.5 kB view details)

Uploaded Aug 16, 2022 Source

File details

Details for the file carefree-data-0.2.9.tar.gz.

File metadata

Download URL: carefree-data-0.2.9.tar.gz
Upload date: Aug 16, 2022
Size: 35.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.13

File hashes

Hashes for carefree-data-0.2.9.tar.gz
Algorithm	Hash digest
SHA256	`28a24125b6efedd10eeab466a3bb65833046835798db23c44ab177eb8df7e79e`
MD5	`812c539ad338d13fcf1b77317f8e75e1`
BLAKE2b-256	`7ea4f518261e4b61d105dd22db20e45dbb9935fbf33c682645bc4f75bb62da04`

See more details on using hashes here.

carefree-data 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

carefree-data

Update 2021.02.04

Why carefree-data?

For tabular datasets, `carefree-data` is able to:

Pandas-free

Installation

Basic Usages

Get scikit-learn datasets

Read from array / dataset

Read from file

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

carefree-data 0.2.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

carefree-data

Update 2021.02.04

Why carefree-data?

For tabular datasets, carefree-data is able to:

Pandas-free

Installation

Basic Usages

Get scikit-learn datasets

Read from array / dataset

Read from file

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

For tabular datasets, `carefree-data` is able to: