Skip to main content

Create synthetic sequence data.

Project description

sequgen

Purpose

Programmatically generate synthetic sequence data such as time series, strings, DNA, etc. Sequence data generation is fully controlled by the user. sequgen does not build models from real-world sequence data.

Badges

fair-software.nl recommendations
(1/5) code repository github repo badge
(2/5) license github license badge
(3/5) community registry pypi badge
(4/5) citation DOI
(5/5) checklist core infrastructures badge
overall fair-software badge
Other best practices
Documentation Documentation Status
Supported Python versions python versions badge
Code quality Quality Gate Status
Code coverage of unit tests Coverage
GitHub Actions
Citation metadata consistency workflow cffconvert badge
Unit tests workflow tests badge

Install

pip3 install sequgen

Usage example

This usage example involves generating time series data. We generate a time series with three channels: 1. a normal distribution, 2. Gaussian noise, and 3. the combination (sum) of the first two channels. The normal distribution is positioned between 8 and 12 on an abstract time axis of 100 intervals starting at 0 and ending at 20. The standard deviation of the distribution is a value between 1 and 2 and its peak has a height between 4 and 5. For the Gaussian noise we use the default values (standard deviation 1 and average value 0). The third channel is defined as the sum of the other two channels. After creating the three channels, graphs with their values are plotted:

from matplotlib import pyplot as plt
import numpy
from sequgen.deterministic.normal_peak import normal_peak
from sequgen.stochastic.gaussian import gaussian
from sequgen.parameter_space import ParameterSpace
from sequgen.dimension import Dimension

time_axis = numpy.linspace(start=0, stop=20, num=101)
parameter_space_0 = ParameterSpace([
    Dimension("location", 8, 12),
    Dimension("stddev", 1, 2),
    Dimension("height", 4, 5),
])

channel_1 = normal_peak(time_axis, **parameter_space_0.sample())
channel_2 = gaussian(time_axis)
channel_3 = channel_1 + channel_2
channels = { "channel 1: normal peak": channel_1,
             "channel 2: gaussian noise": channel_2,
             "channel 3: combined": channel_3 }

i = 0
for title, channel in channels.items():
    plt.subplot(len(channels), 1, i+1)
    plt.plot(time_axis, channel)
    plt.title(title, y=0.75, x=0.01, loc="left")
    i += 1
plt.show()

And these are the results:

usage example

You can find more usage examples in the notebooks repository on GitHub: https://github.com/sequgen/notebooks.

Contributing

For developer documentation, go to the developer's README.

If you want to contribute to the development of sequgen, have a look at the contribution guidelines.

Credits

This package was created with Cookiecutter and the NLeSC/python-template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequgen-0.2.0.tar.gz (14.3 kB view hashes)

Uploaded Source

Built Distribution

sequgen-0.2.0-py3-none-any.whl (14.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page