Skip to main content

An auto mapper that accepts a list of string and a list of objects of the format {'code', 'name'} and return a list of object where each 'code' is mapped to the most similar strings from the list of strings

Project description

Auto Mapper

A package that maps a set of strings to another set of objects where each object is described as:

{
  'code': 'unique_code',
  'name': 'given name'
}

How it works

Tha process mainly starts by a text-cleaning, which is just another way of saying text processing, with the help of certain dependencies (NLTK for the current release).
The cleaning operation transforms each string to a list of tokens. By running a text similarity algorithm on the resulting vectors we're able to map certain fields with the most similar columns.
Another phase of mapping consist of an additional text processing step, which is stemming, combined with a lower similarity threshold is applyed on the unmapped fields.
The final step is to measure the semantic similarity between the unmapped fields and columns. Thanks to Datamuse and their greate API we were able to ~externalize this operation.

Installation

First you need to install the package, then run a setup script that will download the necessary ntlk packages

$ pip install auto-mapper
$ setup-nltk

NOTE: if you are using a virtual environment, please check it out before running the nltk setup It downloads the packages to the environment folder

Usage

It's pretty straightforward

>>> from mapper import AutoMapper
>>> mapper = AutoMapper()
>>> cols = ['city', 'Location Name']
>>> fields = [{'code': 'loc_name', 'name': 'location names'}, {'code': 'town', 'name': 'Town'}]
>>> mapping_result, unmapped_columns_indices, unmapped_fields_indices = mapper.map(column_names=cols, fields=fields)
>>> print(mapping_result)
[{'source': ['city'], 'target': 'town'}, {'source': ['Location Name'], 'target': 'loc_name'}]
>>> print(unmapped_columns_indices)
set()
>>> print(unmapped_fields_indices)
set()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auto-mapper-0.1.2.tar.gz (9.5 kB view hashes)

Uploaded Source

Built Distribution

auto_mapper-0.1.2-py3-none-any.whl (12.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page