Data type system for different data structures (arrays, lists of dictionaries, etc.).
Project description
DataTypeSystem
This Python package provides a type system for different data structures that are coercible to full arrays. It is Python translation of the code of the Raku package "Data::Reshapers", [AAp1].
Installation
Install from GitHub
pip install -e git+https://github.com/antononcube/Python-packages.git#egg=DataTypeSystem-antononcube\&subdirectory=DataTypeSystem
From PyPi
pip install DataTypeSystem
Usage examples
The type system conventions follow those of Mathematica's
Dataset
-- see the presentation
"Dataset improvements".
Here we get the Titanic dataset, change the "passengerAge" column values to be numeric, and show dataset's dimensions:
import pandas
dfTitanic = pandas.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
dfTitanic = dfTitanic[["sex", "age", "pclass", "survived"]]
dfTitanic = dfTitanic.rename(columns ={"pclass": "class"})
dfTitanic.shape
(891, 4)
Here is a sample of dataset's records:
from DataTypeSystem import *
dfTitanic.sample(3)
sex | age | class | survived | |
---|---|---|---|---|
555 | male | 62.0 | 1 | 0 |
278 | male | 7.0 | 3 | 0 |
266 | male | 16.0 | 3 | 0 |
Here is the type of a single record:
deduce_type(dfTitanic.iloc[12].to_dict())
Struct([age, class, sex, survived], [float, int, str, int])
Here is the type of single record's values:
deduce_type(dfTitanic.iloc[12].to_dict().values())
Tuple([Atom(<class 'str'>), Atom(<class 'float'>), Atom(<class 'int'>), Atom(<class 'int'>)])
Here is the type of the whole dataset:
deduce_type(dfTitanic.to_dict())
Assoc(Atom(<class 'str'>), Assoc(Atom(<class 'int'>), Atom(<class 'str'>), 891), 4)
Here is the type of "values only" records:
valArr = dfTitanic.transpose().to_dict().values()
deduce_type(valArr)
Vector(Struct([age, class, sex, survived], [float, int, str, int]), 891)
References
[AAp1] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for DataTypeSystem-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f10139d6cbaf486359c01a89bd7592e119b3c6279fcd9597b6901c881e71de3 |
|
MD5 | 6fd49d7779e7a0819cd1e7cc6af93044 |
|
BLAKE2b-256 | 12201968d733503efdc4c87070758b38f1d81944c74c69c60f072d2c75e08401 |