A ETL framework to convert data
Project description
Data-Conversion is a framework to convert data from origin style to target style easily. With custom settings, data-conversion can read data from MongoDB, convert data by MAPPING Rules in settings, and save to destination collection in MongoDB.
How to Install
Install by pip:
$ pip install data-conversion
How to Use
First, you should create a new settings file, for example, settings_release.py. Then, define custom settings like Setting Template File settings.py in data_conversion/settings.py, whose arguments also describe below. Finally, run asynchronously:
$ etl async settings_release.py
or run synchronously:
$ etl sync settings_release.py
Settings
Argument |
Description |
Value Example |
MONGODB_HOST |
Host of MongoDB which store origin data |
‘127.0.0.1’ |
MONGODB_PORT |
Port of MongoDB which store origin data |
27017 |
MONGODB_USERNAME |
Username of MongoDB which store origin data |
None / ‘admin’ |
MONGODB_PASSWORD |
Password of MongoDB which store origin data |
None / ‘123456’ |
MONGODB_AUTHDB |
DB of authorization which store username and password |
‘admin’ |
MONGODB_DB |
DB of MongoDB which store origin data and will store result data |
‘data’ |
MONGODB_SRC_COLL |
Source Collection of MongoDB which store origin data |
‘src_coll’ |
MONGODB_DST_COLL |
Destination Collection of MongoDB which will store result data |
‘dst_coll’ |
MONGODB_DST_COLL_INDEX |
Destination Collection Index of MongoDB which store result data |
[([(‘url’, pymongo.ASCENDING)], {‘unique’:True}), ([(‘domain’, pymongo.ASCENDING)], {})] |
MONGODB_ERROR_COLL |
Error Collection of MongoDB which will store error data when convert raise exception |
‘error_coll’ |
MONGODB_ERROR_COLL_INDEX |
Collection Index of Error Collection of MongoDB |
[([(‘url’, pymongo.ASCENDING)], {‘unique’: True})] |
SRC_COLL_QUERY |
Query condition to select documents to be converted |
{ ‘filter’: {}, ‘projection’: None, ‘start’: 0, ‘limit’: 1000 } |
WRITE_CONDITION_DICT |
write to dst_coll which collection.update({CONDITION}, {$set:{dst_document}}, upsert=True) |
{‘$set’: [‘url’]} |
MAPPING |
list to mapper, rules of conversion |
[Mapper(‘url’, ‘url’, str, None)] // src_key, dst_key, dst_type, custom_convert_function |
OPERATE_MAPPING_DICT |
dict to mapper, rules of conversion |
{‘$set’:MAPPING, ‘$push’: MAPPING2, ‘$addToSet’: MAPPING3} |
PROCESS_NUM |
Number of process to run conversion |
1 |
CONCURRENT_PER_PROCESS |
number of concurrent group to run in one process |
100 |
LOG_LEVEL |
Level of logging |
logging.INFO |
Settings explain
The most important part in settings is MAPPING. MAPPING contains a list of Mapper, which is a namedtuple (src_key, dst_key, dst_type, custom_convert). dst_type and custom_convert can be None if you want to preserve origin type and value.
Now, we support ‘$set’, ‘$push’, ‘$addToSet’ operation when update document, if you want to add each array element to an existed array, please add ‘$each_’ by custom_convert_function. .. _$each https://docs.mongodb.com/manual/reference/operator/update/addToSet/#each-modifier
Exception Handling
Exception occured in convert function will be save into error collection which defined in settings.
If you want to record the key of document which excpetion raise, you can raise ValueError('key') contains key as an argument.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file data-conversion-0.0.4.tar.gz
.
File metadata
- Download URL: data-conversion-0.0.4.tar.gz
- Upload date:
- Size: 8.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0918ac699d6dea6b1c09a0b7ffe4495f8d5533d5f1844ab66b25806bf4a94f4a |
|
MD5 | b1147123357bbc59e3f69efe91c01739 |
|
BLAKE2b-256 | 8d06da62579e18adb5cfdf9d5f627a2760a316dbf294e593a120bcfc42b69cb8 |