This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

pysemantic

A traits based data validation and data cleaning module for pandas data structures.

Dependencies

  • Traits
  • PyYaml
  • pandas
  • docopt

Quick Start

Installing with pip

Run:

$ pip install pysemantic

Installing from source

You can install pysemantic by cloning this repository, installing the dependencies and running:

$ python setup.py install

in the root directory of your local clone.

Usage

Create an empty file named pysemantic.conf in your home directory. This can be as simple as running:

$ touch ~/pysemantic.conf

After installing pysemantic, you should have a command line script called semantic. Try it out by running:

$ semantic list

This should do nothing. This means that you don’t have any projects regiestered under pysemantic. A _project_ in pysemantic is just a collection of _datasets_. pysemantic manages your datasets like an IDE manages source code files in that it groups them under different projects, and each project has it’s own tree structure, build toolchains, requirements, etc. Similarly, different pysemantic projects group under them a set of datasets, and manages them depending on their respective user-defined specifications. Projects are uniquely identified by their names.

For now, let’s add and configure a demo project called, simply, “pysemantic_demo”. You can create a project and register it with pysemantic using the add subcommand of the semantic script as follows:

$ semantic add pysemantic_demo

As you can see, this does not fit the supported usage of the add subcommand. We additionally need a file containing the specifications for this project. (Note that this file, containing the specifications, is referred to throughout the documentation interchangeably as a specfile or a data dictionary.) Before we create this file, let’s download the well known Fisher iris datset, which we will use as the sample dataset for this demo. You can download it here.

Once the dataset is downloaded, fire up your favourite text editor and create a file named demo_specs.yaml. Fill it up with the following content.

iris:
  path: /absolute/path/to/iris.csv

Now we can use this file as the data dictionary of the pysemantic_demo project. Let’s tell pysemantic that we want to do so, by running the following command:

$ semantic add pysemantic_demo /path/to/demo_specs.yaml

We’re all set. To see how we did, start a Python interpreter and type the following statements:

>>> from pysemantic import Project
>>> demo = Project("pysemantic_demo")
>>> iris = demo.load_dataset("iris")

Voila! The Python object named iris is actually a pandas DataFrame containing the iris dataset! Well, nothing really remarkable so far. In fact, we cloned and installed a module, wrote two seemingly unnecessary files, and typed three lines of Python code to do something that could have been achieved by simply writing:

>>> iris = pandas.read_csv("/path/to/iris.csv")

Most datasets, however, are not as well behaved as this one. In fact they can be a nightmare to deal with. Pysemantic can be far more intricate and far smarter than this when dealing with mangled, badly encoded, ugly data with inconsistent data types. Check the IPython notebooks in the examples to see how to use Pysemantic for such data.

Release History

Release History

v0.1.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

v0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
pysemantic-0.1.1.tar.gz (370.8 kB) Copy SHA256 Checksum SHA256 Source Jul 1, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting