TAB-analysis : A tool to Analyse tabular and multi-dimensionnal structures
Project description
TAB-analysis : A tool to Analyse tabular and multi-dimensionnal structures
TAB-analysis analyzes and measures the relationships between Fields in any tabular Dataset.
The TAB-analysis tool is part of the Environmental Sensing Project
For more information, see the user guide or the github repository.
What is TAB-analysis ?
Principles
Each field in a dataset has global properties (e.g. the number of different values). The relationships between two fields can also be characterized in a similar way (e.g. number of pairs of values from the two different fields).
Analyzing these properties gives us a measure of the entire dataset.
The TAB-analysis module carries out these measurements and analyzes. It also identifies data that does not respect given relationships.
Examples
Here is a price list of different foods based on packaging.
'plants' | 'quantity' | 'product' | 'price' |
---|---|---|---|
'fruit' | '1 kg' | 'apple' | 1 |
'fruit' | '10 kg' | 'apple' | 10 |
'fruit' | '1 kg' | 'orange' | 2 |
'fruit' | '10 kg' | 'orange' | 20 |
'vegetable' | '1 kg' | 'peppers' | 1.5 |
'vegetable' | '10 kg' | 'peppers' | 15 |
'vegetable' | '1 kg' | 'carrot' | 0.5 |
'vegetable' | '10 kg' | 'carrot' | 5 |
In this example, we observe two kinds of relationships:
- classification ("derived" relationship): between 'plants' and 'product' (each product belongs a plant)
- crossing ("crossed" relationship): between 'product' and 'quantity' (all the combinations of the two fields are present).
This Dataset can be translated in a matrix between 'quantity' ['1 kg', '10 kg'] and 'product' ['apple', 'orange', 'peppers', 'carrot']
In [1]: # creation of the `analysis` object
from tab_dataset import Sdataset
from tab_analysis import AnaDataset
tabular = {'plants': ['fruit', 'fruit','fruit', 'fruit','vegetable','vegetable','vegetable','vegetable' ],
'quantity': ['1 kg' , '10 kg', '1 kg', '10 kg', '1 kg', '10 kg', '1 kg', '10 kg' ],
'product': ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot', 'carrot' ],
'price': [1, 10, 2, 20, 1.5, 15, 0.5, 5 ]}
analysis = AnaDataset(Sdataset.ntv(tabular).to_analysis(True))
# `analysis` is also available from pandas data
import pandas as pd
import ntv_pandas as npd
analysis = pd.DataFrame(tabular).npd.analysis()
In [2]: # each relationship is evaluated and measured
analysis.get_relation('plants', 'product').typecoupl
Out[2]: 'derived'
In [3]: analysis.get_relation('quantity', 'product').typecoupl
Out[3]: 'crossed'
In [4]: # the 'distance' between to Fields is measured (number of codec links to change to be coupled))
analysis.get_relation('quantity', 'product').distance
Out[4]: 6
In [5]: # the dataset can be represented as a 'derived tree'
print(analysis.tree())
Out[5]: -1: root-derived (8)
1 : quantity (6 - 2)
2 : product (4 - 4)
0 : plants (2 - 2)
3 : price (0 - 8)
In [6]: # 'partitions' are found (partitions are multi-dimensionnal data)'
analysis.partitions(mode='id')
Out[6]: [['product', 'quantity'], ['price']]
In [7]: # the `field_partition` method return the main structure of the dataset
analysis.field_partition(mode='id')
Out[7]: {'primary': ['quantity', 'product'],
'secondary': ['plants'],
'unique': [],
'variable': ['price']}
Uses
A TAB-analysis object is initialized by a set of properties (a dict with specific keys). It can therefore be used from any tabular data manager (e.g. pandas).
Possible uses are as follows:
- control of a dataset in relation to a data model,
- quality indicators of a dataset
- analysis of datasets
and in connection with the tabular application:
- error detection and correction,
- generation of optimized data formats
- interface to specific applications
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tab_analysis-0.1.1.tar.gz
.
File metadata
- Download URL: tab_analysis-0.1.1.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e00a95f2801f3678b57ad6b8f1142de6dfdf98f99995af4ea5850cca7fab19d |
|
MD5 | 15d5f3127d7e31289bb65e0ab866cbce |
|
BLAKE2b-256 | aee89627844f416774b72c48630633dcfb563e30c2b6f31c4a25d6e69c971fd5 |
File details
Details for the file tab_analysis-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: tab_analysis-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b1400f3efb25dd20d04ba9d49f0fa8946dc1aa67f53bd32df06ee124995c918 |
|
MD5 | 57556fcf4824d21171e722e4ef08c7e3 |
|
BLAKE2b-256 | a7b234aa26ba7b16841aa3d7b7a783cbf99068914b30b6df7cb68729ed3bbbfb |