Skip to main content

Map, Clean, Merge. For mapping data from one schema to another.

Project description

mcm-core
========

Core MCM - Map Clean Merge




Overview
-----------

MCM has two main peices, a reader and a mapper.

- Reader
* Reads csv files, returns a generator of DictCSVReader parsed rows.
* Optionally chunks the rows into groupings of specified sizes.
- Mapper
* Can build a probabalistic column mapping given a schema and some raw data.
* Will substitute saved values for suggested mapping (e.g. pulling a previous mapping from DB).
* Totally flexible, you pass a callable which takes the raw data and returns a mapping.
* Will clean data based on a Cleaner object for a given type. Type is inferred from the mapping schema.
* Ability to set "initial_data"
* If you always need to set some information in the object that you're mapping data into, this is useful.
* Concatenate rows together with a specified delimiter character.
* Data which doesn't match a given schema's mapping is still saved. It's put in a dictionary called ``extra_data``.


Installing
----------

Once it's hosted on Pypi:
```bash
pip install mcm-core
```

Integration
-----------

```python
from mcm import cleaners, mapper, reader

# Here our mapping is just a dictionary where our keys are raw data representations
# and our values are our normalized attributes that we're mapping to.
mapping = {'Thing': 'thing_1', 'Other thing': 'thing_2'}

# model_class can be any type of object.
model_class = object

# Reading and mapping from a CSV file, simple case.
parser = reader.MCMParser(csv_file_handle)
mapped_objs = [m for m in parser.map_rows(mapping, model_class)]
```


Developing
----------

1. Clone.
2. Create a virtualenv; if you use virtualenv wrapper you'll need to
1. Run ``python setup.py develop`` to hardlink your files into your env.


Testing
-------

Unfortunately, there are some directory path issues still baked in.
To run tests you have to be in the ``tests`` directory:

```bash
cd tests && nosetests
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcm-0.0.1.tar.gz (27.5 kB view details)

Uploaded Source

File details

Details for the file mcm-0.0.1.tar.gz.

File metadata

  • Download URL: mcm-0.0.1.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for mcm-0.0.1.tar.gz
Algorithm Hash digest
SHA256 88602b0651debca8d38c4c85d001791434eeec8736c85946cbf6d89332095cfc
MD5 248f4950908c761c6ff21ce75b9cba58
BLAKE2b-256 022380fdcb5942c3dfb7a617a3660de1f56753afb69f89e3222fb92c58066f79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page