An auto mapper that accepts a list of string and a list of objects of the format {'code', 'name'} and return a list of object where each 'code' is mapped to the most similar strings from the list of strings
Project description
Auto Mapper
A package that maps a set of strings to another set of objects where each object is described as:
{
'code': 'unique_code',
'name': 'given name'
}
How it works
Tha process mainly starts by a text-cleaning, which is just another way of saying text processing, with the help of certain dependencies (NLTK for the current release).
The cleaning operation transforms each string to a list of tokens. By running a text similarity algorithm on the resulting vectors we're able to map certain fields with the most similar columns.
Another phase of mapping consist of an additional text processing step, which is stemming
, combined with a lower similarity threshold is applyed on the unmapped fields.
The final step is to measure the semantic similarity between the unmapped fields and columns. Thanks to Datamuse and their greate API we were able to ~externalize this operation.
Installation
First you need to install the package, then run a setup script that will download the necessary ntlk packages
$ pip install auto-mapper
$ setup-nltk
NOTE:
if you are using a virtual environment, please check it out before running the nltk setup It downloads the packages to the environment folder
Usage
It's pretty straightforward
>>> from mapper import AutoMapper
>>> mapper = AutoMapper()
>>> cols = ['city', 'Location Name']
>>> fields = [{'code': 'loc_name', 'name': 'location names'}, {'code': 'town', 'name': 'Town'}]
>>> mapping_result, unmapped_columns_indices, unmapped_fields_indices = mapper.map(column_names=cols, fields=fields)
>>> print(mapping_result)
[{'source': ['city'], 'target': 'town'}, {'source': ['Location Name'], 'target': 'loc_name'}]
>>> print(unmapped_columns_indices)
set()
>>> print(unmapped_fields_indices)
set()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file auto-mapper-0.1.2.tar.gz
.
File metadata
- Download URL: auto-mapper-0.1.2.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c02d1921e072e700c16a38bb55aaec3c608265c31743338b9f1aa3e9ba91bf9 |
|
MD5 | 2dba30039517f5a9e605f38fd335742b |
|
BLAKE2b-256 | 741c19e8adbcbc3c5ba679d4ce660c93ce2f7fd85da1f88dc717a2188b92dcf4 |
File details
Details for the file auto_mapper-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: auto_mapper-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bdef520dff11642a3e8c1651b82ffee9eb85800eb648b66fd243b79a6097461 |
|
MD5 | bb4471ec19c853ab4a270ceb536a104f |
|
BLAKE2b-256 | 4958c13940b3461cf600c76aba64d381b47c5d096f7f20239511354dad96d207 |