An Illumina Sample Sheet parsing utility.
Project description
.. raw:: html
<h1 align="center">
sample-sheet
.. raw:: html
</h2>
.. raw:: html
<p align="center">
A Python 3.6 library for handling Illumina sample sheets
.. raw:: html
</p>
.. raw:: html
<p align="center">
Installation · Tutorial · Command Line Utility · Contributing
.. raw:: html
</p>
.. raw:: html
<p align="center">
.. raw:: html
</p>
The intent of this library is to obviate the need to use Illumina’s
proprietary `Experiment
Manager <https://support.illumina.com/sequencing/sequencing_software/experiment_manager.html>`__
and to enable interactive reading, *de novo* creation, and writing of
Sample Sheets for all Illumina platforms. As of ``v0.5.0`` this library
supports the entire Illumina specification for a sample sheet as defined
in `this
manual <https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf>`__.
.. raw:: html
<h3 align="center">
Installation
.. raw:: html
</h3>
::
❯ pip install sample_sheet
.. raw:: html
<h3 align="center">
Tutorial
.. raw:: html
</h3>
To demonstrate the features of this library we a test file available in
this repostiory at the relative location:
```sample-sheet/tests/resources/paired-end-single-index.csv`` <tests/resources/paired-end-single-index.csv>`__.
.. code:: python
from sample_sheet import SampleSheet
host = 'https://raw.githubusercontent.com/'
url = host + 'clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'
sample_sheet = SampleSheet(url)
The metadata of the sample sheet can be accessed with the ``Header``,
``Reads`` and, ``Settings`` attributes:
.. code:: python
>>> sample_sheet.Header.Assay
'SureSelectXT'
>>> sample_sheet.Reads
[151, 151]
>>> sample_sheet.is_paired_end
True
>>> sample_sheet.Settings.BarcodeMismatches
'2'
The samples can be accessed directly or *via* iteration:
.. code:: python
>>> sample_sheet.samples
[Sample({"Sample_ID": "1823A", "Sample_Name": "1823A-tissue", "index": "GAATCTGA"}),
Sample({"Sample_ID": "1823B", "Sample_Name": "1823B-tissue", "index": "AGCAGGAA"}),
Sample({"Sample_ID": "1824A", "Sample_Name": "1824A-tissue", "index": "GAGCTGAA"}),
Sample({"Sample_ID": "1825A", "Sample_Name": "1825A-tissue", "index": "AAACATCG"}),
Sample({"Sample_ID": "1826A", "Sample_Name": "1826A-tissue", "index": "GAGTTAGC"}),
Sample({"Sample_ID": "1826B", "Sample_Name": "1823A-tissue", "index": "CGAACTTA"}),
Sample({"Sample_ID": "1829A", "Sample_Name": "1823B-tissue", "index": "GATAGACA"})]
>>> for sample in sample_sheet:
>>> print(sample)
>>> break
"1823A"
If a column labeled ``Read_Structure`` is provided *per* sample, then
additional functionality is enabled.
.. code:: python
>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure="151T8B151T")
>>> first_sample.Read_Structure.total_cycles
310
>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']
Sample Sheet Creation
^^^^^^^^^^^^^^^^^^^^^
Sample sheets can be created *de novo* and written to a file-like
object. The following snippet shows how to add attributes to mandatory
sections, add optional user-defined sections, and add samples before
writing the file to a file-like object.
.. code:: python
import sys
sample_sheet = SampleSheet()
# Fill out the [Header] section of the sample sheet.
sample_sheet.Header.IEM4FileVersion = 4
# If you want to use a key with whitespace it in you must use the `add_attr`
# method and specify and alternate name.
sample_sheet.Header.add_attr(attr='Investigator_Name', value='jdoe', name='Investigator Name')
# An optional [Manifests] section can be added.
sample_sheet.add_section('Manifests')
# Fill out the [Settings] section of the sample sheet.
sample_sheet.Settings.CreateFastqForIndexReads = 1
sample_sheet.Settings.BarcodeMismatches = 2
# Create a paired-end flowcell with 151 template bases.
sample_sheet.Reads = [151, 151]
# Create your first single-indexed sample with both a name and ID.
sample = Sample(dict(Sample_ID='1823A', Sample_Name='1823A-tissue', index='ACGT'))
sample_sheet.add_sample(sample)
sample_sheet.write(sys.stdout)
.. code:: python
"""
[Header],,
IEM4FileVersion,4,
Investigator Name,jdoe,
,,
[Reads],,
151,,
151,,
,,
[Manifests],,
,,
[Settings],,
CreateFastqForIndexReads,1,
BarcodeMismatches,2,
,,
[Data],,
Sample_ID,Sample_Name,index
1823A,1823A-tissue,ACGT
"""
IPython Integration
^^^^^^^^^^^^^^^^^^^
A quick summary of the samples can be displayed in Markdown ASCII or
HTML rendered Markdown if run in an IPython environment:
.. code:: python
>>> sample_sheet.experimental_design
"""
| Sample_ID | Sample_Name | Library_ID | Description |
|:------------|:--------------|:-------------|:-----------------|
| 1823A | 1823A-tissue | 2017-01-20 | 0.5x treatment |
| 1823B | 1823B-tissue | 2017-01-20 | 0.5x treatment |
| 1824A | 1824A-tissue | 2017-01-20 | 1.0x treatment |
| 1825A | 1825A-tissue | 2017-01-20 | 10.0x treatment |
| 1826A | 1826A-tissue | 2017-01-20 | 100.0x treatment |
| 1826B | 1823A-tissue | 2017-01-17 | 0.5x treatment |
| 1829A | 1823B-tissue | 2017-01-17 | 0.5x treatment |
"""
.. raw:: html
<h3 align="center">
Command Line Utility
.. raw:: html
</h3>
Prints a tabular summary of the sample sheet.
.. code:: bash
❯ sample-sheet summary paired-end-single-index.csv
┌Header─────────────┬─────────────────────────────────┐
│ IEM1FileVersion │ 4 │
│ Investigator_Name │ jdoe │
│ Experiment_Name │ exp001 │
│ Date │ 11/16/2017 │
│ Workflow │ SureSelectXT │
│ Application │ NextSeq FASTQ Only │
│ Assay │ SureSelectXT │
│ Description │ A description of this flow cell │
│ Chemistry │ Default │
└───────────────────┴─────────────────────────────────┘
┌Settings──────────────────┬──────────┐
│ CreateFastqForIndexReads │ 1 │
│ BarcodeMismatches │ 2 │
│ Reads │ 151, 151 │
└──────────────────────────┴──────────┘
┌Identifiers┬──────────────┬────────────┬──────────┬────────┐
│ Sample_ID │ Sample_Name │ Library_ID │ index │ index2 │
├───────────┼──────────────┼────────────┼──────────┼────────┤
│ 1823A │ 1823A-tissue │ 2017-01-20 │ GAATCTGA │ │
│ 1823B │ 1823B-tissue │ 2017-01-20 │ AGCAGGAA │ │
│ 1824A │ 1824A-tissue │ 2017-01-20 │ GAGCTGAA │ │
│ 1825A │ 1825A-tissue │ 2017-01-20 │ AAACATCG │ │
│ 1826A │ 1826A-tissue │ 2017-01-20 │ GAGTTAGC │ │
│ 1826B │ 1823A-tissue │ 2017-01-17 │ CGAACTTA │ │
│ 1829A │ 1823B-tissue │ 2017-01-17 │ GATAGACA │ │
└───────────┴──────────────┴────────────┴──────────┴────────┘
┌Descriptions──────────────────┐
│ Sample_ID │ Description │
├───────────┼──────────────────┤
│ 1823A │ 0.5x treatment │
│ 1823B │ 0.5x treatment │
│ 1824A │ 1.0x treatment │
│ 1825A │ 10.0x treatment │
│ 1826A │ 100.0x treatment │
│ 1826B │ 0.5x treatment │
│ 1829A │ 0.5x treatment │
└───────────┴──────────────────┘
.. raw:: html
<h3 align="center">
Contributing
.. raw:: html
</h3>
Pull requests, feature requests, and issues welcome!
To make a development install:
.. code:: bash
❯ git clone git@github.com:clintval/sample-sheet.git
❯ pip install -e 'sample-sheet[fancytest]'
To run the tests:
::
Name Stmts Miss Cover
---------------------------------------------------
sample_sheet/__init__.py 1 0 100%
sample_sheet/_sample_sheet.py 334 0 100%
---------------------------------------------------
TOTAL 335 0 100%
OK! 65 tests, 0 failures, 0 errors in 0.1s
<h1 align="center">
sample-sheet
.. raw:: html
</h2>
.. raw:: html
<p align="center">
A Python 3.6 library for handling Illumina sample sheets
.. raw:: html
</p>
.. raw:: html
<p align="center">
Installation · Tutorial · Command Line Utility · Contributing
.. raw:: html
</p>
.. raw:: html
<p align="center">
.. raw:: html
</p>
The intent of this library is to obviate the need to use Illumina’s
proprietary `Experiment
Manager <https://support.illumina.com/sequencing/sequencing_software/experiment_manager.html>`__
and to enable interactive reading, *de novo* creation, and writing of
Sample Sheets for all Illumina platforms. As of ``v0.5.0`` this library
supports the entire Illumina specification for a sample sheet as defined
in `this
manual <https://www.illumina.com/content/dam/illumina-marketing/documents/products/technotes/sequencing-sheet-format-specifications-technical-note-970-2017-004.pdf>`__.
.. raw:: html
<h3 align="center">
Installation
.. raw:: html
</h3>
::
❯ pip install sample_sheet
.. raw:: html
<h3 align="center">
Tutorial
.. raw:: html
</h3>
To demonstrate the features of this library we a test file available in
this repostiory at the relative location:
```sample-sheet/tests/resources/paired-end-single-index.csv`` <tests/resources/paired-end-single-index.csv>`__.
.. code:: python
from sample_sheet import SampleSheet
host = 'https://raw.githubusercontent.com/'
url = host + 'clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'
sample_sheet = SampleSheet(url)
The metadata of the sample sheet can be accessed with the ``Header``,
``Reads`` and, ``Settings`` attributes:
.. code:: python
>>> sample_sheet.Header.Assay
'SureSelectXT'
>>> sample_sheet.Reads
[151, 151]
>>> sample_sheet.is_paired_end
True
>>> sample_sheet.Settings.BarcodeMismatches
'2'
The samples can be accessed directly or *via* iteration:
.. code:: python
>>> sample_sheet.samples
[Sample({"Sample_ID": "1823A", "Sample_Name": "1823A-tissue", "index": "GAATCTGA"}),
Sample({"Sample_ID": "1823B", "Sample_Name": "1823B-tissue", "index": "AGCAGGAA"}),
Sample({"Sample_ID": "1824A", "Sample_Name": "1824A-tissue", "index": "GAGCTGAA"}),
Sample({"Sample_ID": "1825A", "Sample_Name": "1825A-tissue", "index": "AAACATCG"}),
Sample({"Sample_ID": "1826A", "Sample_Name": "1826A-tissue", "index": "GAGTTAGC"}),
Sample({"Sample_ID": "1826B", "Sample_Name": "1823A-tissue", "index": "CGAACTTA"}),
Sample({"Sample_ID": "1829A", "Sample_Name": "1823B-tissue", "index": "GATAGACA"})]
>>> for sample in sample_sheet:
>>> print(sample)
>>> break
"1823A"
If a column labeled ``Read_Structure`` is provided *per* sample, then
additional functionality is enabled.
.. code:: python
>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure="151T8B151T")
>>> first_sample.Read_Structure.total_cycles
310
>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']
Sample Sheet Creation
^^^^^^^^^^^^^^^^^^^^^
Sample sheets can be created *de novo* and written to a file-like
object. The following snippet shows how to add attributes to mandatory
sections, add optional user-defined sections, and add samples before
writing the file to a file-like object.
.. code:: python
import sys
sample_sheet = SampleSheet()
# Fill out the [Header] section of the sample sheet.
sample_sheet.Header.IEM4FileVersion = 4
# If you want to use a key with whitespace it in you must use the `add_attr`
# method and specify and alternate name.
sample_sheet.Header.add_attr(attr='Investigator_Name', value='jdoe', name='Investigator Name')
# An optional [Manifests] section can be added.
sample_sheet.add_section('Manifests')
# Fill out the [Settings] section of the sample sheet.
sample_sheet.Settings.CreateFastqForIndexReads = 1
sample_sheet.Settings.BarcodeMismatches = 2
# Create a paired-end flowcell with 151 template bases.
sample_sheet.Reads = [151, 151]
# Create your first single-indexed sample with both a name and ID.
sample = Sample(dict(Sample_ID='1823A', Sample_Name='1823A-tissue', index='ACGT'))
sample_sheet.add_sample(sample)
sample_sheet.write(sys.stdout)
.. code:: python
"""
[Header],,
IEM4FileVersion,4,
Investigator Name,jdoe,
,,
[Reads],,
151,,
151,,
,,
[Manifests],,
,,
[Settings],,
CreateFastqForIndexReads,1,
BarcodeMismatches,2,
,,
[Data],,
Sample_ID,Sample_Name,index
1823A,1823A-tissue,ACGT
"""
IPython Integration
^^^^^^^^^^^^^^^^^^^
A quick summary of the samples can be displayed in Markdown ASCII or
HTML rendered Markdown if run in an IPython environment:
.. code:: python
>>> sample_sheet.experimental_design
"""
| Sample_ID | Sample_Name | Library_ID | Description |
|:------------|:--------------|:-------------|:-----------------|
| 1823A | 1823A-tissue | 2017-01-20 | 0.5x treatment |
| 1823B | 1823B-tissue | 2017-01-20 | 0.5x treatment |
| 1824A | 1824A-tissue | 2017-01-20 | 1.0x treatment |
| 1825A | 1825A-tissue | 2017-01-20 | 10.0x treatment |
| 1826A | 1826A-tissue | 2017-01-20 | 100.0x treatment |
| 1826B | 1823A-tissue | 2017-01-17 | 0.5x treatment |
| 1829A | 1823B-tissue | 2017-01-17 | 0.5x treatment |
"""
.. raw:: html
<h3 align="center">
Command Line Utility
.. raw:: html
</h3>
Prints a tabular summary of the sample sheet.
.. code:: bash
❯ sample-sheet summary paired-end-single-index.csv
┌Header─────────────┬─────────────────────────────────┐
│ IEM1FileVersion │ 4 │
│ Investigator_Name │ jdoe │
│ Experiment_Name │ exp001 │
│ Date │ 11/16/2017 │
│ Workflow │ SureSelectXT │
│ Application │ NextSeq FASTQ Only │
│ Assay │ SureSelectXT │
│ Description │ A description of this flow cell │
│ Chemistry │ Default │
└───────────────────┴─────────────────────────────────┘
┌Settings──────────────────┬──────────┐
│ CreateFastqForIndexReads │ 1 │
│ BarcodeMismatches │ 2 │
│ Reads │ 151, 151 │
└──────────────────────────┴──────────┘
┌Identifiers┬──────────────┬────────────┬──────────┬────────┐
│ Sample_ID │ Sample_Name │ Library_ID │ index │ index2 │
├───────────┼──────────────┼────────────┼──────────┼────────┤
│ 1823A │ 1823A-tissue │ 2017-01-20 │ GAATCTGA │ │
│ 1823B │ 1823B-tissue │ 2017-01-20 │ AGCAGGAA │ │
│ 1824A │ 1824A-tissue │ 2017-01-20 │ GAGCTGAA │ │
│ 1825A │ 1825A-tissue │ 2017-01-20 │ AAACATCG │ │
│ 1826A │ 1826A-tissue │ 2017-01-20 │ GAGTTAGC │ │
│ 1826B │ 1823A-tissue │ 2017-01-17 │ CGAACTTA │ │
│ 1829A │ 1823B-tissue │ 2017-01-17 │ GATAGACA │ │
└───────────┴──────────────┴────────────┴──────────┴────────┘
┌Descriptions──────────────────┐
│ Sample_ID │ Description │
├───────────┼──────────────────┤
│ 1823A │ 0.5x treatment │
│ 1823B │ 0.5x treatment │
│ 1824A │ 1.0x treatment │
│ 1825A │ 10.0x treatment │
│ 1826A │ 100.0x treatment │
│ 1826B │ 0.5x treatment │
│ 1829A │ 0.5x treatment │
└───────────┴──────────────────┘
.. raw:: html
<h3 align="center">
Contributing
.. raw:: html
</h3>
Pull requests, feature requests, and issues welcome!
To make a development install:
.. code:: bash
❯ git clone git@github.com:clintval/sample-sheet.git
❯ pip install -e 'sample-sheet[fancytest]'
To run the tests:
::
Name Stmts Miss Cover
---------------------------------------------------
sample_sheet/__init__.py 1 0 100%
sample_sheet/_sample_sheet.py 334 0 100%
---------------------------------------------------
TOTAL 335 0 100%
OK! 65 tests, 0 failures, 0 errors in 0.1s
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sample_sheet-0.5.0.tar.gz
(14.4 kB
view hashes)