Synthetic data generation tools for financial markets
Project description
DataHub
Synthetic data generation
DataHub is a set of python libraries dedicated to the production of synthetic data to be used in tests, machine learning training, statistical analysis, and other use cases wiki. DataHub uses existing datasets to generate synthetic models. If no existing data is available it will use user-provided scripts and data rules to generate synthetic data using out-of-the-box helper datasets.
Synthetic datasets are simply artificiality manufactured sets, produced to a desired degree of accuracy. Real Data does play a part in synthetic generation, all depending on the realism you require. The product roadmaps details out the functionality planned in this respect.
DataHub's core is predominantly based around pandas data frames and object generation. A common question: Now that I have a data frame of synthetic-data, what do I do with it? The Pandas library comes with an array of options here - so for the time being sinking to databases is out of the scope of the core library, however see that examples in the test folder for some common patterns.
note As we build out a config based synthetic spec generator, we will bring this back into scope - please see our roadmap/issue list and get involved in the discussion.
Key documents
- For information on how to get started with DataHub see our Getting Started Guide
- For more technical information about DataHub and how to customize it, see the Developer Guide
- For a high-level road map see Road Map
Overview of Synthetic data
- Synthetic data is information that's is artificially manufactured rather than generated by *real-world events.
- Synthetic data is created algorithmically, and can be used as a stand-in for test datasets of production data
- Real data does play a part in synthetic data generation - depending on how realistic you want the output
License
Copyright 2020 Citigroup
Distributed under the Apache License, Version 2.0.
SPDX-License-Identifier: Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datahub_core-grovesy-0.9.10.tar.gz
.
File metadata
- Download URL: datahub_core-grovesy-0.9.10.tar.gz
- Upload date:
- Size: 502.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6a5a541f5a1aa4e51c8fd24f5f1c300161adde72b8553003285c6974b650bf5 |
|
MD5 | 54b8d197bc4802b73d0b247a62e871a1 |
|
BLAKE2b-256 | af717b5840add5afbceb0192325f48f93a280b4bba8cc43f6beb37e6c161567c |
File details
Details for the file datahub_core_grovesy-0.9.10-py3-none-any.whl
.
File metadata
- Download URL: datahub_core_grovesy-0.9.10-py3-none-any.whl
- Upload date:
- Size: 432.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5e539e84052b72985411d39e2ba327babd52e3223b3c7cb998fb2c29c5b7bc27 |
|
MD5 | aec781cfb2b3f0c5f0724129da0593d8 |
|
BLAKE2b-256 | f4896cb252defb5029605bc391fb32be25a10180880fc92254fc7e8baa4e620c |