Skip to main content

Manage a list of names with several properties and (overlapping) order criteria

Project description

carbonium

Easily manage a list of names with several properties and (overlapping) order criteria.

Installation

Install carbonium is as easy as run pip install carbonium.

Usage

As first step you should define a name list:

name_list = [
    {
        "domains": ["raw", "output"],
        "name": "var1",
        "alias": "column_1_name",
        "output_order": 1,
        "filling_value": 10,
    },
    {
        "domains": ["raw"],
        "name": "var2",
        "alias": "column_2_name",
        "output_order": 2,
    },
    {
        "domains": ["new", "output"],
        "name": "new_var",
        "alias": "new_column_name",
    },
]

Each name definition is a dictionary that contains some common, mandatory key, and some other keys, domain-specific or name-specific.

Mandatory keys are only three:

  • domains, a list of string, each representing a domain
  • name, a string, uniquely identifiers of a name
  • alias, a string that can be used to refers to the name in context where it is named with this alternative string.

Then, each name belongs to some domains. Domains are used to perfom names selection (give me all names belonging to domain). Names that belongs to the same domain should have the same optional attributes.

After name list definition, you can instantiate the structure class:

from carbonium import Structure

structure = Structure(name_list)

Internally, the Structure class iteratively instantiate a Name class for each name definition. After this step you can access to each Name and its properties through c object, but you can also use one of property or method of the class.

print(structure.names)
# returns:  ['var1', 'var2', 'new_var']

print(structure.domains)
# returns: {'new', 'output', 'raw'}

print(
    structure.var1.name,
    structure.var1.domains,
    structure.var1.output_order
)

Calling structure.var1.name you have access to the string associated to var1... and so on.

ordered_raw_columns = [
        (
            i,
            structure.get(i).output_order,
            structure.get(i).get("filling_value")
        )
        for i in structure.get_names('raw')
]

ordered_raw_columns = sorted(
    ordered_raw_columns,
    key=lambda x: x[1]
)

In this example all the names belonging to raw domain are extracted with some other properties. In this way the same name can be used in different domains or contexts by referring to contexctual relevant properties.

import pandas as pd
df = pd.DataFrame([
    {"var1": 100, "var2": 200},
    {"var2": 220},
])

for name in structure.get_names('raw'):
    alias = structure.get(name).alias
    filling = structure.get(name).get("filling_value")
    if filling:
        df[alias].fillna(filling, inplace=True)

for name in structure.get_names('new'):
    alias = structure.get(name).alias
    df[alias] = "arbitrary"    

output_columns = structure.get_names('new')
df[output_columns].to_parquet('output.parquet')

As you can see, whithout modify the code but only the taxonomy described in name_list, you can affect different columns.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

carbonium-0.10.4.tar.gz (6.3 kB view details)

Uploaded Source

File details

Details for the file carbonium-0.10.4.tar.gz.

File metadata

  • Download URL: carbonium-0.10.4.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for carbonium-0.10.4.tar.gz
Algorithm Hash digest
SHA256 f5c2db0b5d118f024f5e72dc9186a2d7c2cf194a99ac0ddc9d69756322fa4d1c
MD5 e8bcb5910029e465dd3e3a3baaee1af4
BLAKE2b-256 8e4b78250631f8a912fb76e98a2bdbb52265ce7765aebe01ff97080058eb1f80

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page