Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

An Illumina Sample Sheet parsing utility.

Project Description

.. raw:: html

<h1 align="center">

sample-sheet

.. raw:: html

</h2>

.. raw:: html

<p align="center">

A Python 3.6 library for handling Illumina sample sheets

.. raw:: html

</p>

.. raw:: html

<p align="center">

Installation · Tutorial · Command Line Utility · Contributing

.. raw:: html

</p>

.. raw:: html

<p align="center">

.. raw:: html

</p>

.. raw:: html

<h3 align="center">

Installation

.. raw:: html

</h3>

::

❯ pip install sample_sheet

.. raw:: html

<h3 align="center">

Tutorial

.. raw:: html

</h3>

A sample sheet can be read from S3, HDFS, WebHDFS, HTTP as well as local
(compressed or not).

.. code:: python

>>> from sample_sheet import SampleSheet
>>> SampleSheet('s3://bucket/prefix/SampleSheet.csv')
SampleSheet("s3://bucket/prefix/SampleSheet.csv")

An example sample sheet can be found at
```tests/resources/paired-end-single-index.csv`` <tests/resources/paired-end-single-index.csv>`__.

.. code:: python

>>> from sample_sheet import SampleSheet
>>>
>>> url = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'
>>> sample_sheet = SampleSheet(url)

The metadata of the sample sheet can be accessed with the ``Header``,
``Reads`` and, ``Settings`` attributes:

.. code:: python

>>> sample_sheet.header.Assay
'SureSelectXT'

>>> sample_sheet.Reads
[151, 151]

>>> sample_sheet.is_paired_end
True

>>> sample_sheet.Settings.BarcodeMismatches
'2'

The samples can be accessed directly or *via* iteration:

.. code:: python

>>> sample_sheet.samples
[Sample({"Sample_ID": "1823A", "Sample_Name": "1823A-tissue", "index": "GAATCTGA"}),
Sample({"Sample_ID": "1823B", "Sample_Name": "1823B-tissue", "index": "AGCAGGAA"}),
Sample({"Sample_ID": "1824A", "Sample_Name": "1824A-tissue", "index": "GAGCTGAA"}),
Sample({"Sample_ID": "1825A", "Sample_Name": "1825A-tissue", "index": "AAACATCG"}),
Sample({"Sample_ID": "1826A", "Sample_Name": "1826A-tissue", "index": "GAGTTAGC"}),
Sample({"Sample_ID": "1826B", "Sample_Name": "1823A-tissue", "index": "CGAACTTA"}),
Sample({"Sample_ID": "1829A", "Sample_Name": "1823B-tissue", "index": "GATAGACA"})]

>>> for sample in sample_sheet:
>>> print(sample)
>>> break
"1823A"

If a column labeled ``Read_Structure`` is provided *per* sample, then
additional functionality is enabled.

.. code:: python

>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure="151T8B151T")

>>> first_sample.Read_Structure.total_cycles
310

>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']

Sample Sheet Creation
^^^^^^^^^^^^^^^^^^^^^

Sample sheets can be created *de novo* and written to a file-like
object:

.. code:: python

>>> sample_sheet = SampleSheet()
>>>
>>> sample_sheet.Header.IEM4FileVersion = 4
>>> sample_sheet.Header.add_attr(
>>> attr='Investigator_Name',
>>> value='jdoe',
>>> name='Investigator Name')
>>>
>>> sample_sheet.Settings.CreateFastqForIndexReads = 1
>>> sample_sheet.Settings.BarcodeMismatches = 2
>>>
>>> sample_sheet.Reads = [151, 151]
>>>
>>> sample = Sample(dict(
>>> Sample_ID='1823A',
>>> Sample_Name='1823A-tissue',
>>> index='ACGT'))
>>>
>>> sample_sheet.add_sample(sample)
>>>
>>> import sys
>>> sample_sheet.write(sys.stdout)
"""
[Header],,
IEM4FileVersion,4,
Investigator Name,jdoe,
,,
[Reads],,
151,,
151,,
,,
[Settings],,
CreateFastqForIndexReads,1,
BarcodeMismatches,2,
,,
[Data],,
Sample_ID,Sample_Name,index
1823A,1823A-tissue,ACGT
"""

IPython Integration
^^^^^^^^^^^^^^^^^^^

A quick summary of the samples can be displayed in Markdown ASCII or
HTML rendered Markdown if run in an IPython environment:

.. code:: python

>>> sample_sheet.experimental_design
"""
| Sample_ID | Sample_Name | Library_ID | Description |
|:------------|:--------------|:-------------|:-----------------|
| 1823A | 1823A-tissue | 2017-01-20 | 0.5x treatment |
| 1823B | 1823B-tissue | 2017-01-20 | 0.5x treatment |
| 1824A | 1824A-tissue | 2017-01-20 | 1.0x treatment |
| 1825A | 1825A-tissue | 2017-01-20 | 10.0x treatment |
| 1826A | 1826A-tissue | 2017-01-20 | 100.0x treatment |
| 1826B | 1823A-tissue | 2017-01-17 | 0.5x treatment |
| 1829A | 1823B-tissue | 2017-01-17 | 0.5x treatment |
"""

.. raw:: html

<h3 align="center">

Command Line Utility

.. raw:: html

</h3>

Prints a tabular summary of the sample sheet.

.. code:: bash

❯ sample-sheet summary paired-end-single-index.csv
┌Header─────────────┬─────────────────────────────────┐
│ IEM1FileVersion │ 4 │
│ Investigator_Name │ jdoe │
│ Experiment_Name │ exp001 │
│ Date │ 11/16/2017 │
│ Workflow │ SureSelectXT │
│ Application │ NextSeq FASTQ Only │
│ Assay │ SureSelectXT │
│ Description │ A description of this flow cell │
│ Chemistry │ Default │
└───────────────────┴─────────────────────────────────┘
┌Settings──────────────────┬──────────┐
│ CreateFastqForIndexReads │ 1 │
│ BarcodeMismatches │ 2 │
│ Reads │ 151, 151 │
└──────────────────────────┴──────────┘
┌Identifiers┬──────────────┬────────────┬──────────┬────────┐
│ Sample_ID │ Sample_Name │ Library_ID │ index │ index2 │
├───────────┼──────────────┼────────────┼──────────┼────────┤
│ 1823A │ 1823A-tissue │ 2017-01-20 │ GAATCTGA │ │
│ 1823B │ 1823B-tissue │ 2017-01-20 │ AGCAGGAA │ │
│ 1824A │ 1824A-tissue │ 2017-01-20 │ GAGCTGAA │ │
│ 1825A │ 1825A-tissue │ 2017-01-20 │ AAACATCG │ │
│ 1826A │ 1826A-tissue │ 2017-01-20 │ GAGTTAGC │ │
│ 1826B │ 1823A-tissue │ 2017-01-17 │ CGAACTTA │ │
│ 1829A │ 1823B-tissue │ 2017-01-17 │ GATAGACA │ │
└───────────┴──────────────┴────────────┴──────────┴────────┘
┌Descriptions──────────────────┐
│ Sample_ID │ Description │
├───────────┼──────────────────┤
│ 1823A │ 0.5x treatment │
│ 1823B │ 0.5x treatment │
│ 1824A │ 1.0x treatment │
│ 1825A │ 10.0x treatment │
│ 1826A │ 100.0x treatment │
│ 1826B │ 0.5x treatment │
│ 1829A │ 0.5x treatment │
└───────────┴──────────────────┘

.. raw:: html

<h3 align="center">

Contributing

.. raw:: html

</h3>

Pull requests and issues welcome!

To make a development install:

.. code:: bash

❯ git clone git@github.com:clintval/sample-sheet.git
❯ pip install -e 'sample-sheet[fancytest]'

To run the tests:

::

❯ ./sample-sheet/tests/run-tests
Name Stmts Miss Cover
---------------------------------------------------
sample_sheet/__init__.py 1 0 100%
sample_sheet/_sample_sheet.py 280 0 100%
---------------------------------------------------
TOTAL 281 0 100%

OK! 58 tests, 0 failures, 0 errors in 0.0s

Release History

This version
History Node

0.4.0

History Node

0.3.0

History Node

0.2.0

History Node

0.1.1

History Node

0.1.0

History Node

0.0.2

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, Size & Hash SHA256 Hash Help File Type Python Version Upload Date
sample_sheet-0.4.0.tar.gz
(15.6 kB) Copy SHA256 Hash SHA256
Source None Feb 19, 2018

Supported By

Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Google Google Cloud Servers DreamHost DreamHost Log Hosting