Skip to main content

An auto mapper that accepts a list of string and a list of objects of the format {'code', 'name'} and return a list of object where each 'code' is mapped to the most similar strings from the list of strings

Project description

Auto Mapper

A package that maps a set of strings to another set of objects where each object is described as:

{
  'code': 'unique_code',
  'name': 'given name'
}

How it works

Tha process mainly starts by a text-cleaning, which is just another way of saying text processing, with the help of certain dependencies (NLTK for the current release).
The cleaning operation transforms each string to a list of tokens. By running a text similarity algorithm on the resulting vectors we're able to map certain fields with the most similar columns.
Another phase of mapping consist of an additional text processing step, which is stemming, combined with a lower similarity threshold is applyed on the unmapped fields.
The final step is to measure the semantic similarity between the unmapped fields and columns. Thanks to Datamuse and their greate API we were able to ~externalize this operation.

Installation

First you need to install the package, then run a setup script that will download the necessary ntlk packages

$ pip install auto-mapper
$ setup-nltk

NOTE: if you are using a virtual environment, please check it out before running the nltk setup It downloads the packages to the environment folder

Usage

It's pretty straightforward

>>> from mapper import AutoMapper
>>> mapper = AutoMapper()
>>> cols = ['city', 'Location Name']
>>> fields = [{'code': 'loc_name', 'name': 'location names'}, {'code': 'town', 'name': 'Town'}]
>>> mapping_result, unmapped_columns_indices, unmapped_fields_indices = mapper.map(column_names=cols, fields=fields)
>>> print(mapping_result)
[{'source': ['city'], 'target': 'town'}, {'source': ['Location Name'], 'target': 'loc_name'}]
>>> print(unmapped_columns_indices)
set()
>>> print(unmapped_fields_indices)
set()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto-mapper-0.1.2.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

auto_mapper-0.1.2-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file auto-mapper-0.1.2.tar.gz.

File metadata

  • Download URL: auto-mapper-0.1.2.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for auto-mapper-0.1.2.tar.gz
Algorithm Hash digest
SHA256 5c02d1921e072e700c16a38bb55aaec3c608265c31743338b9f1aa3e9ba91bf9
MD5 2dba30039517f5a9e605f38fd335742b
BLAKE2b-256 741c19e8adbcbc3c5ba679d4ce660c93ce2f7fd85da1f88dc717a2188b92dcf4

See more details on using hashes here.

File details

Details for the file auto_mapper-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: auto_mapper-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.7.7

File hashes

Hashes for auto_mapper-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6bdef520dff11642a3e8c1651b82ffee9eb85800eb648b66fd243b79a6097461
MD5 bb4471ec19c853ab4a270ceb536a104f
BLAKE2b-256 4958c13940b3461cf600c76aba64d381b47c5d096f7f20239511354dad96d207

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page