Skip to main content

YData SDK allows to use the *Data-Centric* tools from the YData ecosystem to accelerate AI development

Project description

YData SDK

YData Logo

pypi Pythonversion downloads


🎊 YData SDK for improved data quality everywhere!

ydata-sdk v0.1.0 is here! Create a YData account so you can start using today!


Documentation | More on YData

Overview

The YData SDK is an ecosystem of methods that allows users to, through a python interface, adopt a Data-Centric approach towards the AI development. The solution includes a set of integrated components for data ingestion, standardized data quality evaluation and data improvement, such as synthetic data generation, allowing an iterative improvement of the datasets used in high-impact business applications.

Synthetic data can be used as Machine Learning performance enhancer, to augment or mitigate the presence of bias in real data. Furthermore, it can be used as a Privacy Enhancing Technology, to enable data-sharing initiatives or even to fuel testing environments.

Under the YData SDK hood, you can find a set of algorithms and metrics based on statistics and deep learning based techniques, that will help you to accelerate your data preparation.

What you can expect:

YData SDK is composed by the following main modules:

  • Datasources

    • YData’s SDK includes several connectors for easy integration with existing data sources. It supports several storage types, like filesystems and RDBMS. Check the list of connectors.
    • SDK’s Datasources run on top of Dask, which allows it to deal with not only small workloads but also larger volumes of data.
  • Synthesizers

    • Simplified interface to train a generative model and learn in a data-driven manner the behavior, the patterns and original data distribution. Optimize your model for privacy or utility use-cases.
    • From a trained synthesizer, you can generate synthetic samples as needed and parametrise the number of records needed.
  • Synthetic data quality report Coming soon

    • An extensive synthetic data quality report that measures 3 dimensions: privacy, utility and fidelity of the generated data. The report can be downloaded in PDF format for ease of sharing and compliance purposes or as a JSON to enable the integration in data flows.
  • Profiling Coming soon

    • A set of metrics and algorithms summarizes datasets quality in three main dimensions: warnings, univariate analysis and a multivariate perspective.

Supported data formats

  • Tabular The RegularSynthesizer is perfect to synthesize high-dimensional data, that is time-independent with high quality results.
  • Time-Series The TimeSeriesSynthesizer is perfect to synthesize both regularly and not evenly spaced time-series, from smart-sensors to stock.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

ydata_sdk-0.2.0-py310-none-any.whl (107.1 kB view details)

Uploaded Python 3.10

ydata_sdk-0.2.0-py39-none-any.whl (106.5 kB view details)

Uploaded Python 3.9

ydata_sdk-0.2.0-py38-none-any.whl (106.6 kB view details)

Uploaded Python 3.8

File details

Details for the file ydata_sdk-0.2.0-py310-none-any.whl.

File metadata

  • Download URL: ydata_sdk-0.2.0-py310-none-any.whl
  • Upload date:
  • Size: 107.1 kB
  • Tags: Python 3.10
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ydata_sdk-0.2.0-py310-none-any.whl
Algorithm Hash digest
SHA256 32e6b1f19685ca62cab5143bf39c4ed85a31ed7912e0a19f9f0cd0213100da79
MD5 1a83c7eecb7b9d1e48d11432c8e35ae1
BLAKE2b-256 b370ac2b94f039790d92115b2be89125328ff1079e22be828a9ba2e4259f8be1

See more details on using hashes here.

File details

Details for the file ydata_sdk-0.2.0-py39-none-any.whl.

File metadata

  • Download URL: ydata_sdk-0.2.0-py39-none-any.whl
  • Upload date:
  • Size: 106.5 kB
  • Tags: Python 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ydata_sdk-0.2.0-py39-none-any.whl
Algorithm Hash digest
SHA256 0db6d77c9c08035b7a726c8b9db7024d4729d2ca34aeed79eec8534b4ab0478a
MD5 1e54d018c7d4eaf4ba931d560a2185e2
BLAKE2b-256 e17a000c450f8c650740621f87a497b1521b74a404c041833dd78f5f0c0b37a3

See more details on using hashes here.

File details

Details for the file ydata_sdk-0.2.0-py38-none-any.whl.

File metadata

  • Download URL: ydata_sdk-0.2.0-py38-none-any.whl
  • Upload date:
  • Size: 106.6 kB
  • Tags: Python 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.2

File hashes

Hashes for ydata_sdk-0.2.0-py38-none-any.whl
Algorithm Hash digest
SHA256 212da767ef1d3f077f02a7cb42bc0f7ffbba82b77e5a5e333381b2778e3a3b4b
MD5 993949a791b4e009e7cc21f64c557f0e
BLAKE2b-256 49315a61f28a5e9338ab3da9e680c8c581cbe7399ab303c6255e92ba859a4eab

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page