Map, Clean, Merge. For mapping data from one schema to another.
Project description
mcm-core
========
Core MCM - Map Clean Merge
Overview
-----------
MCM has two main peices, a reader and a mapper.
- Reader
* Reads csv files, returns a generator of DictCSVReader parsed rows.
* Optionally chunks the rows into groupings of specified sizes.
- Mapper
* Can build a probabalistic column mapping given a schema and some raw data.
* Will substitute saved values for suggested mapping (e.g. pulling a previous mapping from DB).
* Totally flexible, you pass a callable which takes the raw data and returns a mapping.
* Will clean data based on a Cleaner object for a given type. Type is inferred from the mapping schema.
* Ability to set "initial_data"
* If you always need to set some information in the object that you're mapping data into, this is useful.
* Concatenate rows together with a specified delimiter character.
* Data which doesn't match a given schema's mapping is still saved. It's put in a dictionary called ``extra_data``.
Installing
----------
Once it's hosted on Pypi:
```bash
pip install mcm-core
```
Integration
-----------
```python
from mcm import cleaners, mapper, reader
# Here our mapping is just a dictionary where our keys are raw data representations
# and our values are our normalized attributes that we're mapping to.
mapping = {'Thing': 'thing_1', 'Other thing': 'thing_2'}
# model_class can be any type of object.
model_class = object
# Reading and mapping from a CSV file, simple case.
parser = reader.MCMParser(csv_file_handle)
mapped_objs = [m for m in parser.map_rows(mapping, model_class)]
```
Developing
----------
1. Clone.
2. Create a virtualenv; if you use virtualenv wrapper you'll need to
1. Run ``python setup.py develop`` to hardlink your files into your env.
Testing
-------
Unfortunately, there are some directory path issues still baked in.
To run tests you have to be in the ``tests`` directory:
```bash
cd tests && nosetests
```
========
Core MCM - Map Clean Merge
Overview
-----------
MCM has two main peices, a reader and a mapper.
- Reader
* Reads csv files, returns a generator of DictCSVReader parsed rows.
* Optionally chunks the rows into groupings of specified sizes.
- Mapper
* Can build a probabalistic column mapping given a schema and some raw data.
* Will substitute saved values for suggested mapping (e.g. pulling a previous mapping from DB).
* Totally flexible, you pass a callable which takes the raw data and returns a mapping.
* Will clean data based on a Cleaner object for a given type. Type is inferred from the mapping schema.
* Ability to set "initial_data"
* If you always need to set some information in the object that you're mapping data into, this is useful.
* Concatenate rows together with a specified delimiter character.
* Data which doesn't match a given schema's mapping is still saved. It's put in a dictionary called ``extra_data``.
Installing
----------
Once it's hosted on Pypi:
```bash
pip install mcm-core
```
Integration
-----------
```python
from mcm import cleaners, mapper, reader
# Here our mapping is just a dictionary where our keys are raw data representations
# and our values are our normalized attributes that we're mapping to.
mapping = {'Thing': 'thing_1', 'Other thing': 'thing_2'}
# model_class can be any type of object.
model_class = object
# Reading and mapping from a CSV file, simple case.
parser = reader.MCMParser(csv_file_handle)
mapped_objs = [m for m in parser.map_rows(mapping, model_class)]
```
Developing
----------
1. Clone.
2. Create a virtualenv; if you use virtualenv wrapper you'll need to
1. Run ``python setup.py develop`` to hardlink your files into your env.
Testing
-------
Unfortunately, there are some directory path issues still baked in.
To run tests you have to be in the ``tests`` directory:
```bash
cd tests && nosetests
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mcm-0.0.1.tar.gz
(27.5 kB
view details)
File details
Details for the file mcm-0.0.1.tar.gz
.
File metadata
- Download URL: mcm-0.0.1.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88602b0651debca8d38c4c85d001791434eeec8736c85946cbf6d89332095cfc |
|
MD5 | 248f4950908c761c6ff21ce75b9cba58 |
|
BLAKE2b-256 | 022380fdcb5942c3dfb7a617a3660de1f56753afb69f89e3222fb92c58066f79 |