Skip to main content

Synthetic data generation tools for financial markets

Project description

DataHub

DataHub logo

Synthetic data generation

DataHub is a set of python libraries dedicated to the production of synthetic data to be used in tests, machine learning training, statistical analysis, and other use cases wiki. DataHub uses existing datasets to generate synthetic models. If no existing data is available it will use user-provided scripts and data rules to generate synthetic data using out-of-the-box helper datasets.

Synthetic datasets are simply artificiality manufactured sets, produced to a desired degree of accuracy. Real Data does play a part in synthetic generation, all depending on the realism you require. The product roadmaps details out the functionality planned in this respect.

DataHub's core is predominantly based around pandas data frames and object generation. A common question: Now that I have a data frame of synthetic-data, what do I do with it? The Pandas library comes with an array of options here - so for the time being sinking to databases is out of the scope of the core library, however see that examples in the test folder for some common patterns.

note As we build out a config based synthetic spec generator, we will bring this back into scope - please see our roadmap/issue list and get involved in the discussion.

Key documents

  1. For information on how to get started with DataHub see our Getting Started Guide
  2. For more technical information about DataHub and how to customize it, see the Developer Guide
  3. For a high-level road map see Road Map

Overview of Synthetic data

  • Synthetic data is information that's is artificially manufactured rather than generated by *real-world events.
  • Synthetic data is created algorithmically, and can be used as a stand-in for  test datasets of production data
  • Real data does play a part in synthetic data generation - depending on how realistic you want the output

License

Copyright 2020 Citigroup

Distributed under the Apache License, Version 2.0.

SPDX-License-Identifier: Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datahub_core-grovesy-0.9.10.tar.gz (502.4 kB view details)

Uploaded Source

Built Distribution

datahub_core_grovesy-0.9.10-py3-none-any.whl (432.0 kB view details)

Uploaded Python 3

File details

Details for the file datahub_core-grovesy-0.9.10.tar.gz.

File metadata

  • Download URL: datahub_core-grovesy-0.9.10.tar.gz
  • Upload date:
  • Size: 502.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for datahub_core-grovesy-0.9.10.tar.gz
Algorithm Hash digest
SHA256 d6a5a541f5a1aa4e51c8fd24f5f1c300161adde72b8553003285c6974b650bf5
MD5 54b8d197bc4802b73d0b247a62e871a1
BLAKE2b-256 af717b5840add5afbceb0192325f48f93a280b4bba8cc43f6beb37e6c161567c

See more details on using hashes here.

File details

Details for the file datahub_core_grovesy-0.9.10-py3-none-any.whl.

File metadata

  • Download URL: datahub_core_grovesy-0.9.10-py3-none-any.whl
  • Upload date:
  • Size: 432.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.1.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for datahub_core_grovesy-0.9.10-py3-none-any.whl
Algorithm Hash digest
SHA256 5e539e84052b72985411d39e2ba327babd52e3223b3c7cb998fb2c29c5b7bc27
MD5 aec781cfb2b3f0c5f0724129da0593d8
BLAKE2b-256 f4896cb252defb5029605bc391fb32be25a10180880fc92254fc7e8baa4e620c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page