Skip to main content

No project description provided

Project description

seqspec

seqspec is a machine-readable YAML file format for genomic library sequence and structure. It was inspired by and builds off of the Teichmann Lab Single Cell Genomics Library Structure by Xi Chen.

A list of seqspec examples for multiple assays can be found in the assays/ folder. Each spec.yaml describes the 5'->3' "Final library structure" for the assay. Sequence specification files can be formatted with the seqspec command line tool.

pip install git+https://github.com/sbooeshaghi/seqspec.git
seqspec format --help

Specification

Each assay is described by two objects: the Assay object and the Region object. A library is described by one Assay object and multiple (possibly nested) Region objects. The Region objects are grouped with a join operation and an order on the subRegions specified. A simple (but not fully specified example) looks like the following:

modalities:
    - Modality1
    - Modality2
assay_spec:
    - region_id: Modality1
      join:
          how: Union
          order: [Region2, Region1]
          regions:
              - region_id: Region1
                  ...
              - region_id: Region2
                  ...
    - region_id: Modality2
        ...

In order to catalogue relevant information for each library structure, multiple properties are specified for each Assay and each Region.

Assay object

Assays have the following structure:

---
"$schema": https://json-schema.org/draft/2020-12/schema
"$id": Assay.schema.yaml
title: Assay
description: A Assay of DNA
type: object
properties:
  name:
    description: The name of the assay
    type: string
  doi:
    description: the doi of the paper that describes the assay
    type: string
  description:
    description: A short description of the assay
    type: string
  modalities:
    description: The modalities the assay targets
    type: array
    items:
      type: string
  lib_struct:
    description: The link to Teichmann's libstructs page derived for this sequence
    type: string
  assay_spec:
    description: The spec for the assay
    type: array
    items:
      "$ref": "Region.schema.yaml"
required:
- name
- doi
- description
- modalities
- lib_struct

Region object

Regions have the following structure:

---
"$schema": https://json-schema.org/draft/2020-12/schema
"$id": Region.schema.yaml
title: Region
description: A region of DNA
type: object
properties:
  region_id:
    description: identifier for the region
    type: string
  sequence_type:
    description: The type of the sequence
    type: string
  sequence:
    description: The sequence
    type: string
  min_len:
    description: The minimum length of the sequence
    type: integer
    minimum: 0
    maximum: 2048
  max_len:
    description: The maximum length of the sequence
    type: integer
    minimum: 0
    maximum: 2048
  onlist:
    description: The file containing the sequence if seq_type = onlist
    type:
    - object
    - 'null'
    properties:
      filename:
        description: filename for the onlist
        type: string
      md5:
        description: md5sum for the file pointed to by filename
        type: string
  join:
    description: Join operator on regions
    type:
    - object
    - 'null'
    properties:
      how:
        description: How the regions will be joined
        type: string
      order:
        description: The order of the regions being joined
        type: array
        items:
          type: string
      regions:
        description: The regions being joined
        type: array
        items:
          "$ref": "#/$defs/region"
    required:
    - how
    - order
    - regions
required:
- region_id
- sequence_type
- sequence
- min_len
- max_len

Contributing

Thank you for wanting to improve seqspec. If you have a bug that is related to seqspec please create an issue. The issue should contain

  • the seqspec command ran,
  • the error message, and
  • the seqspec and python version.

If you'd like to add assays sequence specifications or make modifications to the seqspec tool please do the following:

  1. Fork the project.
# Press "Fork" at the top right of the GitHub page
  1. Clone the fork and create a branch for your feature
git clone https://github.com/<USERNAME>/seqspec.git
cd seqspec
git checkout -b cool-new-feature
  1. Make changes, add files, and commit
# make changes, add files, and commit them
git add path/to/file1.yaml path/to/file2.yaml
git commit -m "I made these changes"
  1. Push changes to GitHub
git push origin cool-new-feature
  1. Submit a pull request

If you are unfamilar with pull requests, you can find more information on the GitHub help page.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

seqspec-0.0.0-py3-none-any.whl (10.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page