Skip to main content

TAB-analysis : A tool to Analyse tabular and multi-dimensionnal structures

Project description

TAB-analysis : A tool to Analyse tabular and multi-dimensionnal structures

TAB-analysis analyzes and measures the relationships between Fields in any tabular Dataset.

The TAB-analysis tool is part of the Environmental Sensing Project

For more information, see the user guide or the github repository.

What is TAB-analysis ?

Principles

Each field in a dataset has global properties (e.g. the number of different values). The relationships between two fields can also be characterized in a similar way (e.g. number of pairs of values from the two different fields).

Analyzing these properties gives us a measure of the entire dataset.

The TAB-analysis module carries out these measurements and analyzes. It also identifies data that does not respect given relationships.

Examples

Here is a price list of different foods based on packaging.

'plants' 'quantity' 'product' 'price'
'fruit' '1 kg' 'apple' 1
'fruit' '10 kg' 'apple' 10
'fruit' '1 kg' 'orange' 2
'fruit' '10 kg' 'orange' 20
'vegetable' '1 kg' 'peppers' 1.5
'vegetable' '10 kg' 'peppers' 15
'vegetable' '1 kg' 'carrot' 0.5
'vegetable' '10 kg' 'carrot' 5

In this example, we observe two kinds of relationships:

  • classification ("derived" relationship): between 'plants' and 'product' (each product belongs a plant)
  • crossing ("crossed" relationship): between 'product' and 'quantity' (all the combinations of the two fields are present).

This Dataset can be translated in a matrix between 'quantity' ['1 kg', '10 kg'] and 'product' ['apple', 'orange', 'peppers', 'carrot']

In [1]: # creation of the `analysis` object 
        from tab_dataset import Sdataset
        from tab_analysis import AnaDataset
        tabular = {'plants':   ['fruit', 'fruit','fruit',   'fruit','vegetable','vegetable','vegetable','vegetable' ],
                   'quantity': ['1 kg' , '10 kg', '1 kg',   '10 kg',  '1 kg',    '10 kg',   '1 kg',     '10 kg'     ], 
                   'product':  ['apple', 'apple', 'orange', 'orange', 'peppers', 'peppers', 'carrot',   'carrot'    ], 
                   'price':    [1,       10,      2,        20,       1.5,       15,        0.5,        5           ]}
        analysis = AnaDataset(Sdataset.ntv(tabular).to_analysis(True))

In [2]: # each relationship is evaluated and measured 
        analysis.get_relation('plants', 'product').typecoupl
Out[2]: 'derived'

In [3]: analysis.get_relation('quantity', 'product').typecoupl
Out[3]: 'crossed'

In [4]: # the 'distance' between to Fields is measured (number of codec links to change to be coupled))
        analysis.get_relation('quantity', 'product').distance
Out[4]: 6

In [5]: # the dataset can be represented as a 'derived tree'
        print(analysis.tree())
Out[5]: -1: root-derived (8)
           1 : quantity (6 - 2)
           2 : product (4 - 4)
              0 : plants (2 - 2)
           3 : price (0 - 8)

In [6]: # 'partitions' are found (partitions are multi-dimensionnal data)'
        analysis.partitions(mode='id')
Out[6]: [['product', 'quantity'], ['price']]

In [7]: # the `field_partition` method return the main structure of the dataset
        analysis.field_partition(mode='id')
Out[7]: {'primary': ['quantity', 'product'],
         'secondary': ['plants'],
         'unique': [],
         'variable': ['price']}

Uses

A TAB-analysis object is initialized by a set of properties (a dict with specific keys). It can therefore be used from any tabular data manager (e.g. pandas).

Possible uses are as follows:

  • control of a dataset in relation to a data model,
  • quality indicators of a dataset
  • analysis of datasets

and in connection with the tabular application:

  • error detection and correction,
  • generation of optimized data formats
  • interface to specific applications

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tab_analysis-0.0.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

tab_analysis-0.0.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file tab_analysis-0.0.1.tar.gz.

File metadata

  • Download URL: tab_analysis-0.0.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for tab_analysis-0.0.1.tar.gz
Algorithm Hash digest
SHA256 3abbd3cb823baf64a51ce56167876bea33ba424fdb4f5ca088a980d4bdc78876
MD5 4060ccd01975e64e475415991c606ef2
BLAKE2b-256 a217d9c39b50569b213efcf80b64b4fc31d111b092497675980a3f419b719733

See more details on using hashes here.

File details

Details for the file tab_analysis-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: tab_analysis-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for tab_analysis-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d48372d2523dd0f06007fef9ba7b27b02bda900ac2bb11082ef16f6eb8fd9c93
MD5 4c3b90ad25ff2cc80a2e4ea0fad1be5b
BLAKE2b-256 cc7b749ca5417d1a48107acbc0bf62af840481e19400a792012bc243f5334d26

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page