Skip to main content

Data type system for different data structures (arrays, lists of dictionaries, etc.).

Project description

DataTypeSystem

This Python package provides a type system for different data structures that are coercible to full arrays. It is Python translation of the code of the Raku package "Data::TypeSystem", [AAp1].


Installation

Install from GitHub

pip install -e git+https://github.com/antononcube/Python-packages.git#egg=DataTypeSystem-antononcube\&subdirectory=DataTypeSystem

From PyPi

pip install DataTypeSystem

Usage examples

The type system conventions follow those of Mathematica's Dataset -- see the presentation "Dataset improvements".

Here we get the Titanic dataset, change the "passengerAge" column values to be numeric, and show dataset's dimensions:

import pandas

dfTitanic = pandas.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv')
dfTitanic = dfTitanic[["sex", "age", "pclass", "survived"]]
dfTitanic = dfTitanic.rename(columns ={"pclass": "class"})
dfTitanic.shape
(891, 4)

Here is a sample of dataset's records:

from DataTypeSystem import *

dfTitanic.sample(3)
sex age class survived
555 male 62.0 1 0
278 male 7.0 3 0
266 male 16.0 3 0

Here is the type of a single record:

deduce_type(dfTitanic.iloc[12].to_dict())
Struct([age, class, sex, survived], [float, int, str, int])

Here is the type of single record's values:

deduce_type(dfTitanic.iloc[12].to_dict().values())
Tuple([Atom(<class 'str'>), Atom(<class 'float'>), Atom(<class 'int'>), Atom(<class 'int'>)])

Here is the type of the whole dataset:

deduce_type(dfTitanic.to_dict())
Assoc(Atom(<class 'str'>), Assoc(Atom(<class 'int'>), Atom(<class 'str'>), 891), 4)

Here is the type of "values only" records:

valArr = dfTitanic.transpose().to_dict().values()
deduce_type(valArr)
Vector(Struct([age, class, sex, survived], [float, int, str, int]), 891)

References

[AAp1] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DataTypeSystem-0.1.1.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

DataTypeSystem-0.1.1-py3-none-any.whl (7.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page