diselect

smart and convenient dict flatten library for complex container nested with dict and list

These details have not been verified by PyPI

Project links

Homepage

Project description

1. Introduction

- A smart and convenience single function for extracting container value consisting of list and dict

- Query a container with a complex structure, mainly generated from json, and flatten it into a dict with a single structure.

- Get freedom from code that indexes complex json data one by one and writes for loops like this below...

# extracting from json loads data..

sample_from_json = json.loads('sample.json')

count = sample_from_json['count']
data_list = sample_from_json.get('data_list')

for cityinfo in data_list:
    for key, value in cityinfo.items():
        if key == 'city':
            city_name = value['names']['en']
        if key == 'subdivisions':
            subdiv = []
            for subsubdivision in value:
                sv = subsubdivision['names']['en']
                subdiv.append(sv)
                ...
                ...
            ...
        ...
        ...
    ...
    ...
    OMG...
    ...
    ....

2. Installation and Usage

Made in Python 3.8 or later

pip install diselect

from diselect import diselect

# example
# list of dict in dict in list in dict in list of dict in....
sample_from_json = {
    'count': 1,
    'date': '2022-5-31',
    'data_list': [
        {
            "city": {
                "names": {
                    "en": "Songpa-gu"
                }
            },
            "continent": {
                "code": "AS",
                "names": {"pt-BR": "Ásia", "de": "Asien", "en": "Asia",}
            },
            "country": {
                "iso_code": "KR", 
                "names": {
                    "de": "Südkorea",
                    "en": "South Korea",
                }
            },
            "location": {"latitude": 37.5013, "longitude": 127.1188, "time_zone": "Asia/Seoul"},

            # multiple childerns of list
            "subdivisions": [
                {
                    "iso_code": "11",
                    "names": {"zh-CN": "首尔特别市", "en": "Seoul", "ja": "ソウル特別市"}
                },
                {
                    "iso_code": "12",
                    "names": {"en": "Hangang"}
                }
            ],
            
        },
        {
            "city": {
                "names": {
                    "en": "Songpa-gu2"
                }
            },
            "continent": {
                "code": "AS2",
                "names": {"pt-BR": "Ásia2", "de": "Asien", "en": "Asia2",}
            },
            "country": {
                "iso_code": "KR2", 
                "names": {
                    "de": "Südkorea2",
                    "en": "South Korea2",
                }
            },
            "location": {"latitude": 37.5013, "longitude": 127.1188, "time_zone": "Asia/Seoul2"},

            # multiple childerns of list
            "subdivisions": [
                {
                    "iso_code": "112",
                    "names": {"zh-CN": "首尔特别市", "en": "Seoul2", "ja": "ソウル特別市"}
                },
                {
                    "iso_code": "122",
                    "names": {"en": "Hangang2"}
                }
            ],
            
        },
    ]
}

# Useage 1) Specify only the column name
# When taking the highest values in container without the risk of duplication

query_only_key = ['count', 'date'] # key name to column
for r in diselect(sample_from_json, query_only_key):
    print(r)

# results {'count': 1, 'date': '2022-5-31'}

# Useage 2) Extract nested values
# parent paths tuple keys of target 'terminal' value
# If there are few parental generations, duplicate matching may occur.
# Exception when duplicate occurs

query_deep_path = [('city', 'names', 'en'), ('country', 'names', 'en')] # en is key of terminal value
for r in diselect(sample_from_json, query_deep_path):
    print(r)

# results 
# {('city', 'names', 'en'): 'Songpa-gu', ('country', 'names', 'en'): 'South Korea'}
# {('city', 'names', 'en'): 'Songpa-gu2', ('country', 'names', 'en'): 'South Korea2'}

# Useage 3) Aliasing query to column name
# Change the query to an usable column name

query_aliases = {
    ('city', 'names', 'en'): 'city_name',
    ('country', 'names', 'en'): 'country_name',
    ('subdivisions', 'names', 'en'): 'subdivision_name'
}
# or
query_aliases = [
    {('city', 'names', 'en'): 'city_name'},
    {('country', 'names', 'en'): 'country_name'},
    {('subdivisions', 'names', 'en'): 'subdivision_names'}
]

for r in diselect(sample_from_json, query_aliases):
    print(r)

# results:
# {'city_name': 'Songpa-gu', 'country_name': 'South Korea', 'subdivision_names': ['Seoul', 'Hangang']}
# {'city_name': 'Songpa-gu2', 'country_name': 'South Korea2', 'subdivision_names': ['Seoul2', 'Hangang2']}
# multiple children values of subdivision_names has coaleased to list ['Seoul', 'Hangang']

# Useage 4) join listed children values
# pass tuple value of aliase and function

query_aliases_and_join_children = {
    ('city', 'names', 'en'): 'city_name',
    ('country', 'names', 'en'): 'country_name',
    ('subdivisions', 'names', 'en'): ('subdivision_names', ','.join), # alias, join function
}

for r in diselect(sample_from_json, query_aliases_and_join_children):
    print(r)

# results
# {'city_name': 'Songpa-gu', 'country_name': 'South Korea', 'subdivision_names': 'Seoul,Hangang'}
# {'city_name': 'Songpa-gu2', 'country_name': 'South Korea2', 'subdivision_names': 'Seoul2,Hangang2'}
# Soule, Hangang has joined with sep ','

query_aliases_and_join_children = {
    ('city', 'names', 'en'): 'city_name',
    ('country', 'names', 'en'): 'country_name',
    ('subdivisions', 'names', 'en'): [
        'subdivision_names',
        ','.join, str.upper # alias, chaining function
    ]
}

for r in diselect(sample_from_json, query_aliases_and_join_children):
    print(r)
# results
# {'city_name': 'Songpa-gu', 'country_name': 'South Korea', 'subdivision_names': 'SEOUL,HANGANG'}
# {'city_name': 'Songpa-gu2', 'country_name': 'South Korea2', 'subdivision_names': 'SEOUL2,HANGANG2'}

# Useage 5) merge muliple select
 
query = {
    (('continent', 'names', 'en'), ('country', 'names', 'en'), ('city', 'names', 'en')):[
        'address',
        '/' # if str, be a shorcut of join function
    ],
    (('latitude',), ('longitude',)): [
        'coordinate'
    ]
}
for r in diselect(sample_from_json, query):
    print(r)

# {'address': 'Asia/South Korea/Songpa-gu', 'coordinate': [37.5013, 127.1188]}
# {'address': 'Asia2/South Korea2/Songpa-gu2', 'coordinate': [37.5013, 127.1188]}

# appling functions to coordinate...
query = {
    (('continent', 'names', 'en'), ('country', 'names', 'en'), ('city', 'names', 'en')):[ #tuple of multiple paths,
        'address', '/'
    ],
    (('latitude',), ('longitude',)): [ 
        'coordinate',
        str,    # convert individual float type elements to str for join
        ','     
    ]
}
for r in diselect(sample_from_json, query):
    print(r)

# {'address': 'Asia/South Korea/Songpa-gu', 'coordinate': '37.5013,127.1188'}
# {'address': 'Asia2/South Korea2/Songpa-gu2', 'coordinate': '37.5013,127.1188'}

# 4) Summary
query = {
    ('city', 'names', 'en'): 'city_name',
    ('continent', 'code'): 'continent_code',
    ('continent', 'names', 'en'): 'continent_name',
    ('country', 'iso_code'): 'country_code',
    ('country', 'names', 'en'): 'country_name',
    ('location', 'time_zone'): 'timezone',
    (('latitude',), ('longitude',)): [
        'coordinate',
        str, ','
    ],
    ('subdivisions', 'names', 'en'): [
        'subdivision_name',
        ',', str.upper
    ]
}

for r in diselect(container=sample_from_json, query=query):
    print(r)

# {'city_name': 'Songpa-gu', 'continent_code': 'AS', 'continent_name': 'Asia', 'country_code': 'KR', 'country_name': 'South Korea', 'timezone': 'Asia/Seoul', 'coordinate': '37.5013,127.1188', 'subdivision_name': 'SEOUL,HANGANG'}
# {'city_name': 'Songpa-gu2', 'continent_code': 'AS2', 'continent_name': 'Asia2', 'country_code': 'KR2', 'country_name': 'South Korea2', 'timezone': 'Asia/Seoul2', 'coordinate': '37.5013,127.1188', 'subdivision_name': 'SEOUL2,HANGANG2'}

3. Arguments

1. container

> nested with dict and list complex data

2. query

query1 = {
    key1, key2,
    {(key3, key2): alias},
    {(key4, key5): (alias2, apply)},
}
query2 = [
    'column1', 'column2',
    {
        ('path1', 'path2'): 'alias1',
        ('patt1', 'path2', 'path3'): ('alias2', dateutil.parser.parse),
    },
    'column4'
]

non-overlapping 'minimum' path of value item (need not be fullpath)
parents path lists key of target 'terminal' value (target value must be scalar value, like str, int...)
More detail is better to avoid duplication (...great-grandparent, grandparent, parent)
You can mix dict and tuple
The results column order of the output matches the order of the query
alias: column name representing the query
apply: function to be applied to value

3. caution

If there is no query matching the key path of the container, a warning is output and it does not appear into the result column.
If the matching of the query is duplicated, an exception is raised and a more detailed query is required.
Consider the data structure of the container. Suggested queries are aggregated by matching top-level keys of matched with query.

# date and count in the presented example data are single entities as top-level keys.
  # 'count': 1,
  # 'date': '2022-5-31',
  # 'data_list': [ ...
# but data_list is multiple row value
# Querying data from both tendencies at the same time leads to unpredictable behavior.

greedy_query = [
    # query for top level single context value
    'count', 'date', 
    # query for row values
    {
        ('city', 'names', 'en'): 'city_name',
        ('continent', 'code'): 'continent_code',
        ('continent', 'names', 'en'): 'continent_name',
        ('country', 'iso_code'): 'country_code',
        ('country', 'names', 'en'): 'country_name',
        ('location', 'time_zone'): 'timezone',
        ('subdivisions', 'names', 'en'): ('subdivision_name', ','), 
    }
]

for r in diselect(sample_from_json, greedy_query):
    print(r)

# results
# {'count': 1, 'date': '2022-5-31', 'city_name': ['Songpa-gu', 'Songpa-gu2'], 'continent_code': ['AS', 'AS2'], 'continent_name': ['Asia', 'Asia2'], 'country_code': ['KR', 'KR2'], 'country_name': ['South Korea', 'South Korea2'], 'timezone': ['Asia/Seoul', 'Asia/Seoul2'], 'subdivision_name': 'Seoul,Hangang,Seoul2,Hangang2'}

# The data is organized vertically with the top keys count and date. Maybe this is what you want.
# This can be used as a trick to get the column dataset


## Tip. separate query by structure for get two of them both
query_context = ['count', 'date']

query_list = {
    ('city', 'names', 'en'): 'city_name',
    ('continent', 'code'): 'continent_code',
    ('continent', 'names', 'en'): 'continent_name',
    ('country', 'iso_code'): 'country_code',
    ('country', 'names', 'en'): 'country_name',
    ('location', 'time_zone'): 'timezone',
    ('subdivisions', 'names', 'en'): ('subdivision_name', ','), 
}



[context_data] = list(diselect(sample_from_json, query_context)) # may one
count = context_data['count']
date = context_data['date']

# or may be simple and better just direct indexing when values are easy to access
count = sample_from_json['count']
date = sample_from_json['date']

data_list = list(diselect(sample_from_json, query_list)) # many

4. More Useages

1. typing values

- value typing via apply function

import dateutil

data = [
    {
        'place_id': 142213,
        'visit_count': '5',
        'visit_date': '2022/2/21',
        'rating': '2.5',
    },
    {
        'place_id': 154321,
        'visit_count': '12',
        'visit_date': '2022.3.7.',
        'rating': '4.5',
    },
]

parsed = diselect(data,
{
    'place_id': ('place_id', str),
    'visit_count': ('visit_count', int),
    'rating': ('point', float),
    'visit_date': ('visit_count', dateutil.parser.parse),
})
for row in parsed:
    print(row)
# results
# {'place_id': '142213', 'visit_count': datetime.datetime(2022, 2, 21, 0, 0), 'point': 2.5}
# {'place_id': '154321', 'visit_count': datetime.datetime(2022, 3, 7, 0, 0), 'point': 4.5}

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.12

Jan 15, 2024

1.0.11

Aug 1, 2022

1.0.10

Jul 27, 2022

1.0.9

Jun 20, 2022

1.0.8

Jun 19, 2022

1.0.7

Jun 19, 2022

1.0.6

Jun 19, 2022

1.0.5

Jun 14, 2022

1.0.4

Jun 13, 2022

1.0.3

Jun 8, 2022

1.0.2

Jun 8, 2022

1.0.1

Jun 6, 2022

1.0.0

Jun 6, 2022

0.0.3

May 31, 2022

0.0.1

May 31, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diselect-1.0.12.tar.gz (13.0 kB view details)

Uploaded Jan 15, 2024 Source

File details

Details for the file diselect-1.0.12.tar.gz.

File metadata

Download URL: diselect-1.0.12.tar.gz
Upload date: Jan 15, 2024
Size: 13.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.11

File hashes

Hashes for diselect-1.0.12.tar.gz
Algorithm	Hash digest
SHA256	`654813bd9b9188437f361172c4912386b32fe098ceecb7a99a852483c7cc1351`
MD5	`b845bf7078b7c2a53162a1c9123df5cf`
BLAKE2b-256	`a085035c1403d7395ce7a45463ebb4b18277552efe4a91ab705bde6cd1c73a64`

See more details on using hashes here.

diselect 1.0.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

1. Introduction

- A smart and convenience single function for extracting container value consisting of list and dict

- Query a container with a complex structure, mainly generated from json, and flatten it into a dict with a single structure.

- Get freedom from code that indexes complex json data one by one and writes for loops like this below...

2. Installation and Usage

3. Arguments

1. container

2. query

3. caution

4. More Useages

1. typing values

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes