sample-sheet

An Illumina Sample Sheet parsing utility.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Scientific/Engineering :: Bio-Informatics

Project description

<h1 align="center">sample-sheet</h2>

<p align="center">A Python 3.6 library for handling Illumina sample sheets</p>

<p align="center">
<a href="#installation"><strong>Installation</strong></a>
·
<a href="#tutorial"><strong>Tutorial</strong></a>
·
<a href="#command-line-utility"><strong>Command Line Utility</strong></a>
·
<a href="#contributing"><strong>Contributing</strong></a>
</p>

<p align="center">
<a href="https://travis-ci.org/clintval/sample-sheet"><img src="https://travis-ci.org/clintval/sample-sheet.svg?branch=master"></img></a>
<a href="https://codecov.io/gh/clintval/sample-sheet"><img src="https://codecov.io/gh/clintval/sample-sheet/branch/master/graph/badge.svg"></img></a>
<a href="https://badge.fury.io/py/sample_sheet"><img src="https://badge.fury.io/py/sample_sheet.svg" alt="PyPI version"></img></a>
<a href="https://codeclimate.com/github/clintval/sample-sheet/maintainability"><img src="https://api.codeclimate.com/v1/badges/80b4ce92cc622e857c79/maintainability"></img></a>
<a href="https://github.com/clintval/sample-sheet/blob/master/LICENSE"><img src="https://img.shields.io/pypi/l/sample-sheet.svg"></img></a>
</p>

<br>

The intent of this library is to obviate the need to use Illumina's proprietary [Experiment Manager](https://support.illumina.com/sequencing/sequencing_software/experiment_manager.html) and to enable interactive reading, _de novo_ creation, and writing of Sample Sheets.
As of `v0.5.0` this library supports the entire Illumina specification for a sample sheet as defined in [this manual](https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf).

<h3 align="center">Installation</h3>

```
❯ pip install sample_sheet
```

<br>

<h3 align="center">Tutorial</h3>

To demonstrate the features of this library we will use a test file available at this remote location:

- [`sample-sheet/tests/resources/paired-end-single-index.csv`](tests/resources/paired-end-single-index.csv)

```python
from sample_sheet import SampleSheet

url = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'

sample_sheet = SampleSheet(url)
```

The metadata of the sample sheet can be accessed with the `Header`, `Reads` and, `Settings` attributes:

```python
>>> sample_sheet.Header.Assay
'SureSelectXT'

>>> sample_sheet.Reads
[151, 151]

>>> sample_sheet.is_paired_end
True

>>> sample_sheet.Settings.BarcodeMismatches
'2'
```

The samples can be accessed directly or _via_ iteration:

```python
>>> sample_sheet.samples
[Sample({"Sample_ID": "1823A", "Sample_Name": "1823A-tissue", "index": "GAATCTGA"}),
Sample({"Sample_ID": "1823B", "Sample_Name": "1823B-tissue", "index": "AGCAGGAA"}),
Sample({"Sample_ID": "1824A", "Sample_Name": "1824A-tissue", "index": "GAGCTGAA"}),
Sample({"Sample_ID": "1825A", "Sample_Name": "1825A-tissue", "index": "AAACATCG"}),
Sample({"Sample_ID": "1826A", "Sample_Name": "1826A-tissue", "index": "GAGTTAGC"}),
Sample({"Sample_ID": "1826B", "Sample_Name": "1823A-tissue", "index": "CGAACTTA"}),
Sample({"Sample_ID": "1829A", "Sample_Name": "1823B-tissue", "index": "GATAGACA"})]

>>> for sample in sample_sheet:
>>> print(sample)
>>> break
"1823A"
```

If a column labeled `Read_Structure` is provided _per_ sample, then additional functionality is enabled.

```python
>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure="151T8B151T")

>>> first_sample.Read_Structure.total_cycles
310

>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']
```

#### Sample Sheet Creation

Sample sheets can be created _de novo_ and written to a file-like object. The following snippet shows how to add attributes to mandatory sections, add optional user-defined sections, and add samples before writing to the standard output.

```python
import sys

sample_sheet = SampleSheet()

# [Header] section
# Adding an attribute with spaces must be done with the add_attr() method
sample_sheet.Header.IEM4FileVersion = 4
sample_sheet.Header.add_attr(attr='Investigator_Name', value='jdoe', name='Investigator Name')

# [Settings] section
sample_sheet.Settings.CreateFastqForIndexReads = 1
sample_sheet.Settings.BarcodeMismatches = 2

# Optional sample sheet sections can be added and then accessed
sample_sheet.add_section('Manifests')
sample_sheet.Manifests.PoolDNA = "DNAMatrix.txt"

# Specify a paired-end kit with 151 template bases per read
sample_sheet.Reads = [151, 151]

# Add a single-indexed sample with both a name, ID, and index
sample = Sample(dict(Sample_ID='1823A', Sample_Name='1823A-tissue', index='ACGT'))
sample_sheet.add_sample(sample)

# Write to standard outpout!
sample_sheet.write(sys.stdout)
```

```python
"""
[Header],,
IEM4FileVersion,4,
Investigator Name,jdoe,
,,
[Reads],,
151,,
151,,
,,
[Manifests],,
PoolDNA,DNAMatrix.txt,
,,
[Settings],,
CreateFastqForIndexReads,1,
BarcodeMismatches,2,
,,
[Data],,
Sample_ID,Sample_Name,index
1823A,1823A-tissue,ACGT
"""
```

#### IPython Integration

A quick summary of the samples can be displayed in Markdown ASCII or HTML rendered Markdown if run in an IPython environment:

```python
>>> sample_sheet.experimental_design
"""
| Sample_ID | Sample_Name | Library_ID | Description |
|:------------|:--------------|:-------------|:-----------------|
| 1823A | 1823A-tissue | 2017-01-20 | 0.5x treatment |
| 1823B | 1823B-tissue | 2017-01-20 | 0.5x treatment |
| 1824A | 1824A-tissue | 2017-01-20 | 1.0x treatment |
| 1825A | 1825A-tissue | 2017-01-20 | 10.0x treatment |
| 1826A | 1826A-tissue | 2017-01-20 | 100.0x treatment |
| 1826B | 1823A-tissue | 2017-01-17 | 0.5x treatment |
| 1829A | 1823B-tissue | 2017-01-17 | 0.5x treatment |
"""
```

<br>

<h3 align="center">Command Line Utility</h3>

Along with an option for pretty-printing the sample sheet to terminal (`summary` tool), one can stream the sample sheet into JSON:

```bash
❯ sample-sheet to_json paired-end-single-index.csv | jq
{
"Header": {
"IEM1FileVersion": "4",
"Investigator Name": "jdoe",
"Experiment Name": "exp001",
"Date": "11/16/2017",
"Workflow": "SureSelectXT",
"Application": "NextSeq FASTQ Only",
"Assay": "SureSelectXT",
"Description": "A description of this flow cell",
"Chemistry": "Default"
},
"Reads": [
151,
151
],
"Settings": {
"CreateFastqForIndexReads": "1",
"BarcodeMismatches": "2"
},
"Data": [
{
"Sample_Project": "exp001",
"Description": "0.5x treatment",
"Reference_Name": "mm10",
"Sample_Name": "1823A-tissue",
"index": "GAATCTGA",
"Library_ID": "2017-01-20",
"Read_Structure": "151T8B151T",
"Sample_ID": "1823A",
"Target_Set": "Intervals-001"
},
...
]
}
```

<br>

<h3 align="center">Contributing</h3>

Pull requests, feature requests, and issues welcome!

To make a development install:

```bash
❯ git clone git@github.com:clintval/sample-sheet.git
❯ pip install -e 'sample-sheet[ci]'
```

To run the tests:

```
Name Stmts Miss Cover
---------------------------------------------------
sample_sheet/__init__.py 1 0 100%
sample_sheet/_sample_sheet.py 334 0 100%
---------------------------------------------------
TOTAL 335 0 100%

OK! 65 tests, 0 failures, 0 errors in 0.1s
```

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
- Python :: 3.7
Topic
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

0.13.0

Aug 12, 2022

0.12.0

Apr 21, 2020

0.11.0

Dec 10, 2019

0.10.0

Dec 9, 2019

0.9.4

Aug 20, 2019

0.9.3

Aug 19, 2019

0.9.2

Jul 31, 2019

0.9.1

Jul 4, 2019

0.9.0

May 20, 2019

0.8.0

Aug 18, 2018

0.7.0

May 23, 2018

This version

0.6.0

May 15, 2018

0.5.0

May 12, 2018

0.4.0

Feb 19, 2018

0.3.0

Feb 4, 2018

0.2.0

Jan 19, 2018

0.1.1

Dec 31, 2017

0.1.0

Dec 31, 2017

0.0.2

Dec 27, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sample_sheet-0.6.0.tar.gz (16.7 kB view hashes)

Uploaded May 15, 2018 Source

Hashes for sample_sheet-0.6.0.tar.gz

Hashes for sample_sheet-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`1a715b57d3198432bd21a6280432607e3ccd4eae51116ab497c6b03e754419a6`
MD5	`807fbb75b80ec28a982fd75046838040`
BLAKE2b-256	`fc7569cab3b91ea745a909bedc53f30789414eedc11ce2ebe6189733560a9583`