Ambry Partition Message Pack File
This module supports the Ambry ETL framework by providing a file format for storing data and a collection of import routines for other file formats
The Message Pack Rows (MPR) file format consists of a compressed collection of arrays, in message pack, followed by a dictionary of metadata, also in Message Pack. The format efficiently stores tabular data and associates it with metadata, with a few special features for use with data that can come from a variety of sources.
For instance, data a Fixed Width text file may not have a the column titles – headers – in the first row, so the file can store a schema in metadata. Other files, such as those that originate in Excel, may not have their headers on the first so the MPR file can specify a later row to be the start of data.
This module also includes classes for guessing the datatypes of each column, determining where the first row of data begins, and computing statistics.
The module installs a command line program ampr which can be used to inspect MPR files. Run ambry -h for help.
Parameters that can be set on a source file.
$ wget https://github.com/Kozea/Multicorn/archive/v1.2.3.zip $ unzip v1.2.3.zip $ cd Multicorn-1.2.3 $ make && sudo make install
Postgres FDW implementation does not work under virtual environment. You have to install ambry_sources to global environment and create *.pth files for ambry_sources and multicorn in the site-packages of your virtual environment. Create multicorn.pth file containing path to the multicorn package. Example (use your own path instead): /usr/local/lib/python2.7/dist-packages/multicorn-1.2.3_dev-py2.7-linux-i686.egg Add ambry_sources.pth file containing path to the ambry_sources package. Example (use your own path instead): /usr/local/lib/python2.7/dist-packages/ambry_sources
$ git clone email@example.com:CivicKnowledge/ambry_sources.git $ cd ambry_sources $ pip install -r requirements.txt $ python setup.py test
Ignoring slow tests while developing (requires pytest installation). .. code-block:: bash
py.test tests/test_sources -k-slow
The package defines two extras, geo, for geographic file formats, and fdw, for the Foreign Data Wrappers. To install these extras in develop, run from the root of the distribution:
pip install -e .[geo,fdw]
ambry_sources gives read permission to each member of the group of the user who executes ambry_sources. So, to allow postgres read mpr files while executing queries you need to add postgres user to group of the user who executes ambry_sources. Here is an example for debian (ubuntu).
# add postgres user the executor group $ sudo usermod -a -G `id -g -n` postgres
log_min_messages = debug1
import logging import ambry_sources logger = logging.getLogger(ambry_sources.med.postgresql.__name__) logger.setLevel(logging.DEBUG) # Now use ambry_sources.med.postgres # ...