Skip to main content

Nucleotide A.I.

Project description

Settings Variables

In 'settings.py' the following variables are defined:

REPEAT_CHAR

Because repeated masked regions are indicated by lower case nucleotides, to know what channel to encode repeats in the unique character REPEAT_CHAR is used.

REPEAT_CHAR defaults to '.'

FASTA_CHARS

The set of valid fasta characters which can be used for sequence validation. Only lowercase letters are used. So make sure to test via:

char.lower() in FASTA_CHARS

ENCODE_ORDER

The channel order used in the encoding. Defaults to ['a', 'c', 't', 'g', 'u']

INCLUDE_URACIL

Whether or not uracil should be included in the encoding. Defaults to False

INCLUDE_REPEAT

Whether or not repeated masked regions should be included in the encoding. If True it is expected that REPEAT_CHAR appears in the ENCODE_ORDER. Defaults to False

FASTA_ENCODEX

The dictionary mapping a fasta character to its corresponding nucleotide(s).

FASTA_ENCODEX takes a fasta character and returns all the nucleotides it indicates as the FASTA format is defined. Repeated masked regions (lower case) are treated the same as normal nucleotides, as repeats are handled as their own channel (see INCLUDE_REPEAT)

FASTA_DECODEX

The dictionary mapping (a) nucleotide(s) to its/their corresponding fasta character.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ntai-0.0.0.tar.gz (3.5 kB view hashes)

Uploaded Source

Built Distribution

ntai-0.0.0-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page