Skip to main content

Visions

Project description


And these visions of data types, they kept us up past the dawn.

The Semantic Data Library

Visions provides a set of tools for defining and using semantic data types.

  • Semantic type detection & inference on sequence data.

  • Automated data processing

  • Completely customizable. Visions makes it easy to build and modify semantic data types for domain specific purposes

  • Out of the box support for multiple backend implementations including pandas, spark, numpy, and python

  • A robust set of default types and typesets covering the most common use cases.

Check out the complete documentation here.

Installation

Source code is available on github and binary installers via pip.

# Pip
pip install visions

Complete installation instructions (including extras) are available in the docs.

Quick Start Guide

If you want to play immediately check out the examples folder on . Otherwise, let's get some data

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38.0 1 0 PC 17599 71.2833 C85 C

The most important abstraction in visions are Types - these represent semantic notions about data. You have access to a range of well tested types like Integer, Float, and Files covering the most common software development use cases. Types can be bundled together into typesets. Behind the scenes, visions builds a traversable graph for any collection of types.

from visions import types, typesets

# StandardSet is the basic builtin typeset
typeset = typesets.CompleteSet()
typeset.plot_graph()

Note: Plots require pygraphviz to be installed.

Because of the special relationship between types these graphs can be used to detect the type of your data or infer a more appropriate one.

# Detection looks like this
typeset.detect_type(df)

# While inference looks like this
typeset.infer_type(df)

# Inference works well even if we monkey with the data, say by converting everything to strings
typeset.infer_type(df.astype(str))
>> {
    'PassengerId': Integer,
    'Survived': Integer,
    'Pclass': Integer,
    'Name': String,
    'Sex': String,
    'Age': Float,
    'SibSp': Integer,
    'Parch': Integer,
    'Ticket': String,
    'Fare': Float,
    'Cabin': String,
    'Embarked': String
}

Visions solves many of the most common problems working with tabular data for example, sequences of Integers are still recognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something else altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.

cleaned_df = typeset.cast_to_inferred(df)

This is only a small taste of everything visions can do including building your own domain specific types and typesets so please check out the API documentation or the examples/ directory for more info!

Supported frameworks

Thanks to its dispatch based implementation Visions is able to exploit framework specific capabilities offered by libraries like pandas and spark. Currently it works with the following backends by default.

  • Pandas (feature complete)
  • Numpy (boolean, complex, date time, float, integer, string, time deltas, string, objects)
  • Spark (boolean, categorical, date, date time, float, integer, numeric, object, string)
  • Python (string, float, integer, date time, time delta, boolean, categorical, object, complex - other datatypes are untested)

If you're using pandas it will also take advantage of parallelization tools like swifter if available.

It also offers a simple annotation based API for registering new implementations as needed. For example, if you wished to extend the categorical data type to include a Dask specific implementation you might do something like

from visions.types.categorical import Categorical
from pandas.api import types as pdt
import dask


@Categorical.contains_op.register
def categorical_contains(series: dask.dataframe.Series, state: dict) -> bool:
    return pdt.is_categorical_dtype(series.dtype)

Contributing and support

Contributions to visions are welcome. For more information, please visit the community contributions page and join on us on slack. The github issues tracker is used for reporting bugs, feature requests and support questions.

Also, please check out some of the other companies and packages using visions including:

If you're currently using visions or would like to be featured here please let us know.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visions-0.7.6.tar.gz (566.6 kB view details)

Uploaded Source

Built Distribution

visions-0.7.6-py3-none-any.whl (104.8 kB view details)

Uploaded Python 3

File details

Details for the file visions-0.7.6.tar.gz.

File metadata

  • Download URL: visions-0.7.6.tar.gz
  • Upload date:
  • Size: 566.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for visions-0.7.6.tar.gz
Algorithm Hash digest
SHA256 00f494a7f78917db2292e11ea832c6e026b64783e688b11da24f4c271ef1631d
MD5 925f05016023c051028cfa040dee6e71
BLAKE2b-256 40178ddcab3699d442a3a21c9859b5573a5b96ec19c51b85525653433bc28f5e

See more details on using hashes here.

File details

Details for the file visions-0.7.6-py3-none-any.whl.

File metadata

  • Download URL: visions-0.7.6-py3-none-any.whl
  • Upload date:
  • Size: 104.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for visions-0.7.6-py3-none-any.whl
Algorithm Hash digest
SHA256 72b7f8dbc374e9d6055e938c8c67b0b8da52f3bcb8320f25d86b1a57457e7aa6
MD5 c5878d1e304305eeb9989167fd3468ce
BLAKE2b-256 7cbf612b24e711ae25dea9af19b9304634b8949faa0b035fad47e8bcadf62f59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page