This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

This module supports the Ambry ETL framework by providing a file format for storing data and a collection of import routines for other file formats

The Message Pack Rows (MPR) file format consists of a compressed collection of arrays, in message pack, followed by a dictionary of metadata, also in Message Pack. The format efficiently stores tabular data and associates it with metadata, with a few special features for use with data that can come from a variety of sources.

For instance, data a Fixed Width text file may not have a the column titles – headers – in the first row, so the file can store a schema in metadata. Other files, such as those that originate in Excel, may not have their headers on the first so the MPR file can specify a later row to be the start of data.

This module also includes classes for guessing the datatypes of each column, determining where the first row of data begins, and computing statistics.

Command Line Interface

The module installs a command line program ampr which can be used to inspect MPR files. Run ambry -h for help.

Source File Configuration

Parameters that can be set on a source file.

  • url. The URL of the source file. If the URL has a fragment ( ‘#’ ) the fragment represents a file inside of a zip archive
  • segment. A number that indicates which worksheet to use in an Excel spreadsheet.
  • header_lines. A comma seperated list of line numbers that should be used for the column headers
  • urltype. If zip, indicates that the URL is for a zip file, for zip file that don’t end in a ‘zip’ extension.
  • filetype. A file extension to use for the file.
  • encoding. A python encoding name. If missing, defaults to ‘ascii’, and is most often set to ‘utf8’

Multicorn install

$ wget https://github.com/Kozea/Multicorn/archive/v1.2.3.zip
$ unzip v1.2.3.zip
$ cd Multicorn-1.2.3
$ make && sudo make install

Virtualenv hint

Postgres FDW implementation does not work under virtual environment. You have to install ambry_sources to global environment and create *.pth files for ambry_sources and multicorn in the site-packages of your virtual environment. Create multicorn.pth file containing path to the multicorn package. Example (use your own path instead): /usr/local/lib/python2.7/dist-packages/multicorn-1.2.3_dev-py2.7-linux-i686.egg Add ambry_sources.pth file containing path to the ambry_sources package. Example (use your own path instead): /usr/local/lib/python2.7/dist-packages/ambry_sources

Running tests

$ git clone git@github.com:CivicKnowledge/ambry_sources.git
$ cd ambry_sources
$ pip install -r requirements.txt
$ python setup.py test

Ignoring slow tests while developing (requires pytest installation). .. code-block:: bash

py.test tests/test_sources -k-slow

Installing Extras in Development

The package defines two extras, geo, for geographic file formats, and fdw, for the Foreign Data Wrappers. To install these extras in develop, run from the root of the distribution:

pip install -e .[geo,fdw]

Making mpr files readable by postgres user.

ambry_sources gives read permission to each member of the group of the user who executes ambry_sources. So, to allow postgres read mpr files while executing queries you need to add postgres user to group of the user who executes ambry_sources. Here is an example for debian (ubuntu).

# add postgres user the executor group
$ sudo usermod -a -G `id -g -n` postgres

Debugging postgres FDW

  1. Set postgres log level to debug by changing log_min_messages to DEBUG1:
log_min_messages = debug1
  1. Set level of the ambry_sources.med.postgres to DEBUG level:
import logging
import ambry_sources
logger = logging.getLogger(ambry_sources.med.postgresql.__name__)
logger.setLevel(logging.DEBUG)
# Now use ambry_sources.med.postgres
# ...
  1. Restart postgres and run code. Check both - postgres and ambry_sources log files.
Release History

Release History

0.1.10

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
ambry-sources-0.1.10.tar.gz (72.5 kB) Copy SHA256 Checksum SHA256 Source Dec 13, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting