pysemantic·PyPI

A traits based data validation module for pandas data structures.

These details have not been verified by PyPI

Project links

Project description

pysemantic

A traits based data validation and data cleaning module for pandas data structures.

Dependencies

Traits
PyYaml
pandas
docopt

Quick Start

Installing with pip

Run:

$ pip install pysemantic

Installing from source

You can install pysemantic by cloning this repository, installing the dependencies and running:

$ python setup.py install

in the root directory of your local clone.

Usage

Create an empty file named pysemantic.conf in your home directory. This can be as simple as running:

$ touch ~/pysemantic.conf

After installing pysemantic, you should have a command line script called semantic. Try it out by running:

$ semantic list

This should do nothing. This means that you don’t have any projects regiestered under pysemantic. A _project_ in pysemantic is just a collection of _datasets_. pysemantic manages your datasets like an IDE manages source code files in that it groups them under different projects, and each project has it’s own tree structure, build toolchains, requirements, etc. Similarly, different pysemantic projects group under them a set of datasets, and manages them depending on their respective user-defined specifications. Projects are uniquely identified by their names.

For now, let’s add and configure a demo project called, simply, “pysemantic_demo”. You can create a project and register it with pysemantic using the add subcommand of the semantic script as follows:

$ semantic add pysemantic_demo

As you can see, this does not fit the supported usage of the add subcommand. We additionally need a file containing the specifications for this project. (Note that this file, containing the specifications, is referred to throughout the documentation interchangeably as a specfile or a data dictionary.) Before we create this file, let’s download the well known Fisher iris datset, which we will use as the sample dataset for this demo. You can download it here.

Once the dataset is downloaded, fire up your favourite text editor and create a file named demo_specs.yaml. Fill it up with the following content.

iris:
  path: /absolute/path/to/iris.csv

Now we can use this file as the data dictionary of the pysemantic_demo project. Let’s tell pysemantic that we want to do so, by running the following command:

$ semantic add pysemantic_demo /path/to/demo_specs.yaml

We’re all set. To see how we did, start a Python interpreter and type the following statements:

>>> from pysemantic import Project
>>> demo = Project("pysemantic_demo")
>>> iris = demo.load_dataset("iris")

Voila! The Python object named iris is actually a pandas DataFrame containing the iris dataset! Well, nothing really remarkable so far. In fact, we cloned and installed a module, wrote two seemingly unnecessary files, and typed three lines of Python code to do something that could have been achieved by simply writing:

>>> iris = pandas.read_csv("/path/to/iris.csv")

Most datasets, however, are not as well behaved as this one. In fact they can be a nightmare to deal with. Pysemantic can be far more intricate and far smarter than this when dealing with mangled, badly encoded, ugly data with inconsistent data types. Check the IPython notebooks in the examples to see how to use Pysemantic for such data.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

v0.1.1

Jul 1, 2015

v0.1

Jul 1, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysemantic-0.1.1.tar.gz (370.8 kB view details)

Uploaded Jul 1, 2015 Source

File details

Details for the file pysemantic-0.1.1.tar.gz.

File metadata

Download URL: pysemantic-0.1.1.tar.gz
Upload date: Jul 1, 2015
Size: 370.8 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pysemantic-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b3bf42b9e66afd1d04fd84ca9dfab2dd806b250d9bc8e623bfab422bf3484bc7`
MD5	`330ce56736e14ccb19bba764c2a3588d`
BLAKE2b-256	`ce721661c0b85d22875478051016c4cca39a38fa65e7445b4514e5d07fe81a99`

See more details on using hashes here.

pysemantic v0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pysemantic

Dependencies

Quick Start

Installing with pip

Installing from source

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes