Skip to main content

A ETL framework to convert data

Project description

PyPI Version Build Status

Data-Conversion is a framework to convert data from origin style to target style easily. With custom settings, data-conversion can read data from MongoDB, convert data by MAPPING Rules in settings, and save to destination collection in MongoDB.

How to Install

Install by pip:

$ pip install data-conversion

How to Use

First, you should create a new settings file, for example, settings_release.py. Then, define custom settings like Setting Template File settings.py in data_conversion/settings.py, whose arguments also describe below. Finally, run asynchronously:

$ etl async settings_release.py

or run synchronously:

$ etl sync settings_release.py

Settings

Argument Description Value Example
MONGODB_HOST Host of MongoDB which store origin data ‘127.0.0.1’
MONGODB_PORT Port of MongoDB which store origin data 27017
MONGODB_USERNAME Username of MongoDB which store origin data None / ‘admin’
MONGODB_PASSWORD Password of MongoDB which store origin data None / ‘123456’
MONGODB_AUTHDB DB of authorization which store username and password ‘admin’
MONGODB_DB DB of MongoDB which store origin data and will store result data ‘data’
MONGODB_SRC_COLL Source Collection of MongoDB which store origin data ‘src_coll’
MONGODB_DST_COLL Destination Collection of MongoDB which will store result data ‘dst_coll’
MONGODB_DST_COLL_INDEX Destination Collection Index of MongoDB which store result data [([(‘url’, pymongo.ASCENDING)], {‘unique’:True}), ([(‘domain’, pymongo.ASCENDING)], {})]
MONGODB_ERROR_COLL Error Collection of MongoDB which will store error data when convert raise exception ‘error_coll’
MONGODB_ERROR_COLL_INDEX Collection Index of Error Collection of MongoDB [([(‘url’, pymongo.ASCENDING)], {‘unique’: True})]
SRC_COLL_QUERY Query condition to select documents to be converted { ‘filter’: {}, ‘projection’: None, ‘start’: 0, ‘limit’: 1000 }
WRITE_CONDITION_DICT write to dst_coll which collection.update({CONDITION}, {$set:{dst_document}}, upsert=True) {‘$set’: [‘url’]}
MAPPING list to mapper, rules of conversion [Mapper(‘url’, ‘url’, str, None)] // src_key, dst_key, dst_type, custom_convert_function
OPERATE_MAPPING_DICT dict to mapper, rules of conversion {‘$set’:MAPPING, ‘$push’: MAPPING2, ‘$addToSet’: MAPPING3}
PROCESS_NUM Number of process to run conversion 1
CONCURRENT_PER_PROCESS number of concurrent group to run in one process 100
LOG_LEVEL Level of logging logging.INFO

Settings explain

The most important part in settings is MAPPING. MAPPING contains a list of Mapper, which is a namedtuple (src_key, dst_key, dst_type, custom_convert). dst_type and custom_convert can be None if you want to preserve origin type and value.

Now, we support ‘$set’, ‘$push’, ‘$addToSet’ operation when update document, if you want to add each array element to an existed array, please add ‘$each_’ by custom_convert_function. .. _$each https://docs.mongodb.com/manual/reference/operator/update/addToSet/#each-modifier

Exception Handling

Exception occured in convert function will be save into error collection which defined in settings.

If you want to record the key of document which excpetion raise, you can raise ValueError('key') contains key as an argument.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
data-conversion-0.0.4.tar.gz (8.0 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page