Keeping track of aliases
Project description
A very small Python package for keeping track of aliases.
Installation
$ pip install aliases
Getting Started
Keeping track of aliases in your data can be annoying. This small packages provides three small classes than can help you in the bookkeeping associated with the occurrences of aliases in your data. There are also pandas accessors that make it possible to enforce aliases immediately for a whole pandas Series or DataFrame.
The AliasSpace objects keeps track of existing aliases. As input is accepts a dictionary where a string (the “preferred” form) points to a list of all its aliases. Using the str method on the space, we can transform regular strings into AliasAwareString objects.
>>> s = AliasSpace(
>>> {
>>> "The Netherlands": ["NL", "Netherlands", "Holland"],
>>> "The Hague": ["Den Haag", "'s-Gravenhage"],
>>> "Amsterdam": ["Adam"],
>>> },
>>> case_sensitive=False,
>>> )
>>>
>>> s.str("nl")
<'nl' in AliasSpace>
The preferred form of an AliasAwareString is called its representative.
>>> s.str("nl").representative
'The Netherlands'
AliasAwareString objects with the same representative are considered equal and have the same hash.
>>> s.str("holland") == s.str("NL")
True
>>>
>>> data = {s.str("holland"): 12345}
>>> data[s.str("nl")]
12345
The example above already shows how alias aware strings can be used to store data without worrying too much about the different aliases around. However, it is still annoying to cast to an AliasAwareString every time manually. To solve this you can use the AliasAwareDict. This object can be created using the dict method on the space.
>>> data = s.dict(holland=12345)
>>> data['nl']
12345
Finally, when you have pandas installed, the aliases package will register accessors for series and dataframes. This allows you to easily enforce aliases in your pandas DataFrame. The following example was the original motivation for building this package:
>>> import pandas as pd
>>> df = pd.DataFrame(
>>> {
>>> "Country": ["NL", "Netherlands", "Belgium"],
>>> "City": ["Den Haag", "amsterdam", "Brussel"],
>>> "SomeData": [10, 11, 12],
>>> }
>>> )
>>> df
Country City SomeData
0 NL Den Haag 10
1 Netherlands amsterdam 11
2 Belgium Brussel 12
>>>
>>> df.Country.aliases.representative(space=s)
0 The Netherlands
1 The Netherlands
2 Belgium
Name: Country, dtype: object
>>>
>>> df.aliases.representative(space=s, missing=pd.NA)
Country City SomeData
0 The Netherlands The Hague 10
1 The Netherlands Amsterdam 11
2 <NA> <NA> 12
Documentation
Coming soon…
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file aliases-0.5.6.tar.gz
.
File metadata
- Download URL: aliases-0.5.6.tar.gz
- Upload date:
- Size: 6.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6361f885150b6a4a2e4929f94cd703b2c802e2ac2acd699cc559c674c62ce26f |
|
MD5 | 6a4478e3c3f63c97282d5c3c01d13929 |
|
BLAKE2b-256 | 2bf71b26ba8723246f757e62a44193e0a7a5007ad73e5e1d873a773686724ae8 |
File details
Details for the file aliases-0.5.6-py3-none-any.whl
.
File metadata
- Download URL: aliases-0.5.6-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cef0a137166a14f07e8b5c810f1353627ff6a546a175c85e5a3dcde88d61ddc4 |
|
MD5 | 9a343c959e48a77340029102ce468a6d |
|
BLAKE2b-256 | 8f4a70b5773508b1c94c0801657eb36cca44f203a9057ae4863933a461cc9356 |