Skip to main content

Utility library and scripts for simpler data-processing tasks

Project description

Author:

Craig Kelly

Introduction

Source Note: The authoritative version of this file is the Markdown version. The RST version is automatically created from the Markdown by pandoc.

This is an Apache licensed library and set of command-line tools for simple data processing tasks and pipelines. It is assumed that it will be used with tools like dmk and that serious work will be done with serious tools (like jupyterlab and scipy).

If it feels like a mishmash of functionality, that’s because it is. This is mainly a collection of odds and ends that keeps getting used in projects in a very specific analytics and data science team.

Installing

The normal way:

$ pip install datasimple

However, we use Python 3 and prefer user installs, so on a system like Ubuntu you probably want:

$ python3 -m pip install --user --upgrade datasimple

HOWEVER, The CORRECT usage is a Pipfile controlled by pipenv.

See below (in Hacking) for installing in development mode if you need to make source code changes.

What you get

The datasimple library and some handy scripts (see ./bin). Of note is a class designed to help you write scripts to convert anything to Excel spreadsheets. (Once again, this is functionality we need for a particular business environment. It is expressly NOT an endorsement of Excel for data science.)

Requirements

This is Python 3. Don’t submit requests for Python 2 compatibility.

See setup.py for dependencies (which will get installed automatically when you install this package with pip)

Hacking

You should be developing in a virtualenv. Since you are probably forced to work in a Vagrant Ubuntu VM on a Windows machine, and you’ll want to use the shared /vagrant folder, you might want to consider using pipenv and pyenv with the virtualenv plugin.

Use make test for testing (which will also handle linting). In fact, see the Makefile for what we automate with this project.

Contributing

The following guidelines are used when accepting external contributions:

  • ./lint should not find any issues

  • There should be appropriate tests add to the appropriate module in ./tests

  • There should be an existing and compelling use case.

The ./lint script in the root of this repo uses pylama which you must install. Currently it also expects a pylama linter plugin called “quotes”. See Craig (the maintainer) for this plugin. NOTE: if even ONE PERSON contacts me I’ll make that plugin public :)

If you don’t currently have pylama installed, you can get the latest installed for your user with python3 -m pip install --user --upgrade pylama.

You should also test using the ./test script in the root of this repo. It runs tests using nosetests. Our setup also requires the package nose-exclude. However, the test script delegates via setup.py so you shouldn’t need to worry about this.

Note that both pylama and nosetests have configuration specified in setup.cfg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasimple-1.0.7.tar.gz (29.1 kB view hashes)

Uploaded Source

Built Distribution

datasimple-1.0.7-py2.py3-none-any.whl (35.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page