YData SDK allows to use the *Data-Centric* tools from the YData ecosystem to accelerate AI development
Project description
YData SDK
🎊 YData SDK for improved data quality everywhere!
ydata-sdk v0.1.0 is here! Create a YData account so you can start using today!
Overview
The YData SDK is an ecosystem of methods that allows users to, through a python interface, adopt a Data-Centric approach towards the AI development. The solution includes a set of integrated components for data ingestion, standardized data quality evaluation and data improvement, such as synthetic data generation, allowing an iterative improvement of the datasets used in high-impact business applications.
Synthetic data can be used as Machine Learning performance enhancer, to augment or mitigate the presence of bias in real data. Furthermore, it can be used as a Privacy Enhancing Technology, to enable data-sharing initiatives or even to fuel testing environments.
Under the YData SDK hood, you can find a set of algorithms and metrics based on statistics and deep learning based techniques, that will help you to accelerate your data preparation.
What you can expect:
YData SDK is composed by the following main modules:
-
Datasources
- YData’s SDK includes several connectors for easy integration with existing data sources. It supports several storage types, like filesystems and RDBMS. Check the list of connectors.
- SDK’s Datasources run on top of Dask, which allows it to deal with not only small workloads but also larger volumes of data.
-
Synthesizers
- Simplified interface to train a generative model and learn in a data-driven manner the behavior, the patterns and original data distribution. Optimize your model for privacy or utility use-cases.
- From a trained synthesizer, you can generate synthetic samples as needed and parametrise the number of records needed.
-
Synthetic data quality report Coming soon
- An extensive synthetic data quality report that measures 3 dimensions: privacy, utility and fidelity of the generated data. The report can be downloaded in PDF format for ease of sharing and compliance purposes or as a JSON to enable the integration in data flows.
-
Profiling Coming soon
- A set of metrics and algorithms summarizes datasets quality in three main dimensions: warnings, univariate analysis and a multivariate perspective.
Supported data formats
- Tabular The RegularSynthesizer is perfect to synthesize high-dimensional data, that is time-independent with high quality results.
- Time-Series The TimeSeriesSynthesizer is perfect to synthesize both regularly and not evenly spaced time-series, from smart-sensors to stock.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for ydata_sdk-0.2.0-py310-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 32e6b1f19685ca62cab5143bf39c4ed85a31ed7912e0a19f9f0cd0213100da79 |
|
MD5 | 1a83c7eecb7b9d1e48d11432c8e35ae1 |
|
BLAKE2b-256 | b370ac2b94f039790d92115b2be89125328ff1079e22be828a9ba2e4259f8be1 |
Hashes for ydata_sdk-0.2.0-py39-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0db6d77c9c08035b7a726c8b9db7024d4729d2ca34aeed79eec8534b4ab0478a |
|
MD5 | 1e54d018c7d4eaf4ba931d560a2185e2 |
|
BLAKE2b-256 | e17a000c450f8c650740621f87a497b1521b74a404c041833dd78f5f0c0b37a3 |
Hashes for ydata_sdk-0.2.0-py38-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 212da767ef1d3f077f02a7cb42bc0f7ffbba82b77e5a5e333381b2778e3a3b4b |
|
MD5 | 993949a791b4e009e7cc21f64c557f0e |
|
BLAKE2b-256 | 49315a61f28a5e9338ab3da9e680c8c581cbe7399ab303c6255e92ba859a4eab |