Skip to main content

Visions

Project description


And these visions of data types, they kept us up past the dawn.

The Semantic Data Library

Visions provides a set of tools for defining and using semantic data types.

  • Semantic type detection & inference on sequence data.

  • Automated data processing

  • Completely customizable. Visions makes it easy to build and modify semantic data types for domain specific purposes

  • Out of the box support for multiple backend implementations including pandas, spark, numpy, and python

  • A robust set of default types and typesets covering the most common use cases.

Check out the complete documentation here.

Installation

Source code is available on github and binary installers via pip.

# Pip
pip install visions

Complete installation instructions (including extras) are available in the docs.

Quick Start Guide

If you want to play immediately check out the examples folder on . Otherwise, let's get some data

import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv")
df.head(2)
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer) female 38.0 1 0 PC 17599 71.2833 C85 C

The most important abstraction in visions are Types - these represent semantic notions about data. You have access to a range of well tested types like Integer, Float, and Files covering the most common software development use cases. Types can be bundled together into typesets. Behind the scenes, visions builds a traversable graph for any collection of types.

from visions import types, typesets

# StandardSet is the basic builtin typeset
typeset = typesets.CompleteSet()
typeset.plot_graph()

Note: Plots require pygraphviz to be installed.

Because of the special relationship between types these graphs can be used to detect the type of your data or infer a more appropriate one.

# Detection looks like this
typeset.detect_type(df)

# While inference looks like this
typeset.infer_type(df)

# Inference works well even if we monkey with the data, say by converting everything to strings
typeset.infer_type(df.astype(str))
>> {
    'PassengerId': Integer,
    'Survived': Integer,
    'Pclass': Integer,
    'Name': String,
    'Sex': String,
    'Age': Float,
    'SibSp': Integer,
    'Parch': Integer,
    'Ticket': String,
    'Fare': Float,
    'Cabin': String,
    'Embarked': String
}

Visions solves many of the most common problems working with tabular data for example, sequences of Integers are still recognized as integers whether they have trailing decimal 0's from being cast to float, missing values, or something else altogether. Much of this cleaning is performed automatically providing nicely cleaned and processed data as well.

cleaned_df = typeset.cast_to_inferred(df)

This is only a small taste of everything visions can do including building your own domain specific types and typesets so please check out the API documentation or the examples/ directory for more info!

Supported frameworks

Thanks to its dispatch based implementation Visions is able to exploit framework specific capabilities offered by libraries like pandas and spark. Currently it works with the following backends by default.

  • Pandas (feature complete)
  • Numpy (boolean, complex, date time, float, integer, string, time deltas, string, objects)
  • Spark (boolean, categorical, date, date time, float, integer, numeric, object, string)
  • Python (string, float, integer, date time, time delta, boolean, categorical, object, complex - other datatypes are untested)

If you're using pandas it will also take advantage of parallelization tools like swifter if available.

It also offers a simple annotation based API for registering new implementations as needed. For example, if you wished to extend the categorical data type to include a Dask specific implementation you might do something like

from visions.types.categorical import Categorical
from pandas.api import types as pdt
import dask


@Categorical.contains_op.register
def categorical_contains(series: dask.dataframe.Series, state: dict) -> bool:
    return pdt.is_categorical_dtype(series.dtype)

Contributing and support

Contributions to visions are welcome. For more information, please visit the community contributions page and join on us on slack. The github issues tracker is used for reporting bugs, feature requests and support questions.

Also, please check out some of the other companies and packages using visions including:

If you're currently using visions or would like to be featured here please let us know.

Acknowledgements

This package is part of the dylan-profiler project. The package is core component of pandas-profiling. More information can be found here. This work was partially supported by SIDN Fonds.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visions-0.8.1.tar.gz (566.4 kB view details)

Uploaded Source

Built Distribution

visions-0.8.1-py3-none-any.whl (105.4 kB view details)

Uploaded Python 3

File details

Details for the file visions-0.8.1.tar.gz.

File metadata

  • Download URL: visions-0.8.1.tar.gz
  • Upload date:
  • Size: 566.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for visions-0.8.1.tar.gz
Algorithm Hash digest
SHA256 37f55c37d7bcf124054e612850b28a4f0bab9b200733cedd8f1781c3ac5cc3f0
MD5 2456e88c82a6a72d5c1c5ead9675b0f7
BLAKE2b-256 9cb3254fb50734da453cfb660812994f5652a6bf2b3d5300dc413865cd701470

See more details on using hashes here.

File details

Details for the file visions-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: visions-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 105.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for visions-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aca16b66c93acf6c39d3b6b952429947605203e02c0678a42ea06257fcbb1211
MD5 7f6285fc7ab75076b08dc528de6ac0b6
BLAKE2b-256 90364a0d674198adabadba21eb4048df5cc2e25a4ecff38d75e974d51a83fda2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page