An Illumina Sample Sheet parsing utility.
Project description
.. raw:: html
<h1 align="center">
sample-sheet
.. raw:: html
</h2>
.. raw:: html
<p align="center">
A Python 3.6 library for handling Illumina sample sheets
.. raw:: html
</p>
.. raw:: html
<p align="center">
Installation · Tutorial · Command Line Utility · Contributing
.. raw:: html
</p>
.. raw:: html
<p align="center">
.. raw:: html
</p>
.. raw:: html
<h3 align="center">
Installation
.. raw:: html
</h3>
::
❯ pip install sample_sheet
.. raw:: html
<h3 align="center">
Tutorial
.. raw:: html
</h3>
A sample sheet can be read from S3, HDFS, WebHDFS, HTTP as well as local
(compressed or not).
.. code:: python
>>> from sample_sheet import SampleSheet
>>> SampleSheet('s3://bucket/prefix/SampleSheet.csv')
SampleSheet("s3://bucket/prefix/SampleSheet.csv")
An example sample sheet can be found at
```tests/resources/paired-end-single-index.csv`` <tests/resources/paired-end-single-index.csv>`__.
.. code:: python
>>> from sample_sheet import SampleSheet
>>>
>>> url = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'
>>> sample_sheet = SampleSheet(url)
The metadata of the sample sheet can be accessed with the ``Header``,
``Reads`` and, ``Settings`` attributes:
.. code:: python
>>> sample_sheet.header.Assay
'SureSelectXT'
>>> sample_sheet.Reads
[151, 151]
>>> sample_sheet.is_paired_end
True
>>> sample_sheet.Settings.BarcodeMismatches
'2'
The samples can be accessed directly or *via* iteration:
.. code:: python
>>> sample_sheet.samples
[Sample({"Sample_ID": "1823A", "Sample_Name": "1823A-tissue", "index": "GAATCTGA"}),
Sample({"Sample_ID": "1823B", "Sample_Name": "1823B-tissue", "index": "AGCAGGAA"}),
Sample({"Sample_ID": "1824A", "Sample_Name": "1824A-tissue", "index": "GAGCTGAA"}),
Sample({"Sample_ID": "1825A", "Sample_Name": "1825A-tissue", "index": "AAACATCG"}),
Sample({"Sample_ID": "1826A", "Sample_Name": "1826A-tissue", "index": "GAGTTAGC"}),
Sample({"Sample_ID": "1826B", "Sample_Name": "1823A-tissue", "index": "CGAACTTA"}),
Sample({"Sample_ID": "1829A", "Sample_Name": "1823B-tissue", "index": "GATAGACA"})]
>>> for sample in sample_sheet:
>>> print(sample)
>>> break
"1823A"
If a column labeled ``Read_Structure`` is provided *per* sample, then
additional functionality is enabled.
.. code:: python
>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure="151T8B151T")
>>> first_sample.Read_Structure.total_cycles
310
>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']
Sample Sheet Creation
^^^^^^^^^^^^^^^^^^^^^
Sample sheets can be created *de novo* and written to a file-like
object:
.. code:: python
>>> sample_sheet = SampleSheet()
>>>
>>> sample_sheet.Header.IEM4FileVersion = 4
>>> sample_sheet.Header.add_attr(
>>> attr='Investigator_Name',
>>> value='jdoe',
>>> name='Investigator Name')
>>>
>>> sample_sheet.Settings.CreateFastqForIndexReads = 1
>>> sample_sheet.Settings.BarcodeMismatches = 2
>>>
>>> sample_sheet.Reads = [151, 151]
>>>
>>> sample = Sample(dict(
>>> Sample_ID='1823A',
>>> Sample_Name='1823A-tissue',
>>> index='ACGT'))
>>>
>>> sample_sheet.add_sample(sample)
>>>
>>> import sys
>>> sample_sheet.write(sys.stdout)
"""
[Header],,
IEM4FileVersion,4,
Investigator Name,jdoe,
,,
[Reads],,
151,,
151,,
,,
[Settings],,
BarcodeMismatches,2,
,,
[Data],,
Sample_ID,Sample_Name,index
1823A,1823A-tissue,ACGT
"""
IPython Integration
^^^^^^^^^^^^^^^^^^^
A quick summary of the samples can be displayed in Markdown ASCII or
HTML rendered Markdown if run in an IPython environment:
.. code:: python
>>> sample_sheet.experimental_design
"""
| Sample_ID | Sample_Name | Library_ID | Description |
|:------------|:--------------|:-------------|:-----------------|
| 1823A | 1823A-tissue | 2017-01-20 | 0.5x treatment |
| 1823B | 1823B-tissue | 2017-01-20 | 0.5x treatment |
| 1824A | 1824A-tissue | 2017-01-20 | 1.0x treatment |
| 1825A | 1825A-tissue | 2017-01-20 | 10.0x treatment |
| 1826A | 1826A-tissue | 2017-01-20 | 100.0x treatment |
| 1826B | 1823A-tissue | 2017-01-17 | 0.5x treatment |
| 1829A | 1823B-tissue | 2017-01-17 | 0.5x treatment |
"""
.. raw:: html
<h3 align="center">
Command Line Utility
.. raw:: html
</h3>
Prints a tabular summary of the sample sheet.
.. code:: bash
❯ sample-sheet summary paired-end-single-index.csv
┌Header─────────────┬─────────────────────────────────┐
│ IEM1FileVersion │ 4 │
│ Investigator_Name │ jdoe │
│ Experiment_Name │ exp001 │
│ Date │ 11/16/2017 │
│ Workflow │ SureSelectXT │
│ Application │ NextSeq FASTQ Only │
│ Assay │ SureSelectXT │
│ Description │ A description of this flow cell │
│ Chemistry │ Default │
└───────────────────┴─────────────────────────────────┘
┌Settings──────────────────┬──────────┐
│ CreateFastqForIndexReads │ 1 │
│ BarcodeMismatches │ 2 │
│ Reads │ 151, 151 │
└──────────────────────────┴──────────┘
┌Identifiers┬──────────────┬────────────┬──────────┬────────┐
│ Sample_ID │ Sample_Name │ Library_ID │ index │ index2 │
├───────────┼──────────────┼────────────┼──────────┼────────┤
│ 1823A │ 1823A-tissue │ 2017-01-20 │ GAATCTGA │ │
│ 1823B │ 1823B-tissue │ 2017-01-20 │ AGCAGGAA │ │
│ 1824A │ 1824A-tissue │ 2017-01-20 │ GAGCTGAA │ │
│ 1825A │ 1825A-tissue │ 2017-01-20 │ AAACATCG │ │
│ 1826A │ 1826A-tissue │ 2017-01-20 │ GAGTTAGC │ │
│ 1826B │ 1823A-tissue │ 2017-01-17 │ CGAACTTA │ │
│ 1829A │ 1823B-tissue │ 2017-01-17 │ GATAGACA │ │
└───────────┴──────────────┴────────────┴──────────┴────────┘
┌Descriptions──────────────────┐
│ Sample_ID │ Description │
├───────────┼──────────────────┤
│ 1823A │ 0.5x treatment │
│ 1823B │ 0.5x treatment │
│ 1824A │ 1.0x treatment │
│ 1825A │ 10.0x treatment │
│ 1826A │ 100.0x treatment │
│ 1826B │ 0.5x treatment │
│ 1829A │ 0.5x treatment │
└───────────┴──────────────────┘
.. raw:: html
<h3 align="center">
Contributing
.. raw:: html
</h3>
Pull requests and issues welcome!
To make a development install:
.. code:: bash
❯ git clone git@github.com:clintval/sample-sheet.git
❯ pip install -e 'sample-sheet[fancytest]'
To run the tests:
::
❯ ./sample-sheet/tests/run-tests
Name Stmts Miss Cover
---------------------------------------------------
sample_sheet/__init__.py 1 0 100%
sample_sheet/_sample_sheet.py 280 0 100%
---------------------------------------------------
TOTAL 281 0 100%
OK! 58 tests, 0 failures, 0 errors in 0.0s
<h1 align="center">
sample-sheet
.. raw:: html
</h2>
.. raw:: html
<p align="center">
A Python 3.6 library for handling Illumina sample sheets
.. raw:: html
</p>
.. raw:: html
<p align="center">
Installation · Tutorial · Command Line Utility · Contributing
.. raw:: html
</p>
.. raw:: html
<p align="center">
.. raw:: html
</p>
.. raw:: html
<h3 align="center">
Installation
.. raw:: html
</h3>
::
❯ pip install sample_sheet
.. raw:: html
<h3 align="center">
Tutorial
.. raw:: html
</h3>
A sample sheet can be read from S3, HDFS, WebHDFS, HTTP as well as local
(compressed or not).
.. code:: python
>>> from sample_sheet import SampleSheet
>>> SampleSheet('s3://bucket/prefix/SampleSheet.csv')
SampleSheet("s3://bucket/prefix/SampleSheet.csv")
An example sample sheet can be found at
```tests/resources/paired-end-single-index.csv`` <tests/resources/paired-end-single-index.csv>`__.
.. code:: python
>>> from sample_sheet import SampleSheet
>>>
>>> url = 'https://raw.githubusercontent.com/clintval/sample-sheet/master/tests/resources/paired-end-single-index.csv'
>>> sample_sheet = SampleSheet(url)
The metadata of the sample sheet can be accessed with the ``Header``,
``Reads`` and, ``Settings`` attributes:
.. code:: python
>>> sample_sheet.header.Assay
'SureSelectXT'
>>> sample_sheet.Reads
[151, 151]
>>> sample_sheet.is_paired_end
True
>>> sample_sheet.Settings.BarcodeMismatches
'2'
The samples can be accessed directly or *via* iteration:
.. code:: python
>>> sample_sheet.samples
[Sample({"Sample_ID": "1823A", "Sample_Name": "1823A-tissue", "index": "GAATCTGA"}),
Sample({"Sample_ID": "1823B", "Sample_Name": "1823B-tissue", "index": "AGCAGGAA"}),
Sample({"Sample_ID": "1824A", "Sample_Name": "1824A-tissue", "index": "GAGCTGAA"}),
Sample({"Sample_ID": "1825A", "Sample_Name": "1825A-tissue", "index": "AAACATCG"}),
Sample({"Sample_ID": "1826A", "Sample_Name": "1826A-tissue", "index": "GAGTTAGC"}),
Sample({"Sample_ID": "1826B", "Sample_Name": "1823A-tissue", "index": "CGAACTTA"}),
Sample({"Sample_ID": "1829A", "Sample_Name": "1823B-tissue", "index": "GATAGACA"})]
>>> for sample in sample_sheet:
>>> print(sample)
>>> break
"1823A"
If a column labeled ``Read_Structure`` is provided *per* sample, then
additional functionality is enabled.
.. code:: python
>>> first_sample, *_ = sample_sheet.samples
>>> first_sample.Read_Structure
ReadStructure(structure="151T8B151T")
>>> first_sample.Read_Structure.total_cycles
310
>>> first_sample.Read_Structure.tokens
['151T', '8B', '151T']
Sample Sheet Creation
^^^^^^^^^^^^^^^^^^^^^
Sample sheets can be created *de novo* and written to a file-like
object:
.. code:: python
>>> sample_sheet = SampleSheet()
>>>
>>> sample_sheet.Header.IEM4FileVersion = 4
>>> sample_sheet.Header.add_attr(
>>> attr='Investigator_Name',
>>> value='jdoe',
>>> name='Investigator Name')
>>>
>>> sample_sheet.Settings.CreateFastqForIndexReads = 1
>>> sample_sheet.Settings.BarcodeMismatches = 2
>>>
>>> sample_sheet.Reads = [151, 151]
>>>
>>> sample = Sample(dict(
>>> Sample_ID='1823A',
>>> Sample_Name='1823A-tissue',
>>> index='ACGT'))
>>>
>>> sample_sheet.add_sample(sample)
>>>
>>> import sys
>>> sample_sheet.write(sys.stdout)
"""
[Header],,
IEM4FileVersion,4,
Investigator Name,jdoe,
,,
[Reads],,
151,,
151,,
,,
[Settings],,
BarcodeMismatches,2,
,,
[Data],,
Sample_ID,Sample_Name,index
1823A,1823A-tissue,ACGT
"""
IPython Integration
^^^^^^^^^^^^^^^^^^^
A quick summary of the samples can be displayed in Markdown ASCII or
HTML rendered Markdown if run in an IPython environment:
.. code:: python
>>> sample_sheet.experimental_design
"""
| Sample_ID | Sample_Name | Library_ID | Description |
|:------------|:--------------|:-------------|:-----------------|
| 1823A | 1823A-tissue | 2017-01-20 | 0.5x treatment |
| 1823B | 1823B-tissue | 2017-01-20 | 0.5x treatment |
| 1824A | 1824A-tissue | 2017-01-20 | 1.0x treatment |
| 1825A | 1825A-tissue | 2017-01-20 | 10.0x treatment |
| 1826A | 1826A-tissue | 2017-01-20 | 100.0x treatment |
| 1826B | 1823A-tissue | 2017-01-17 | 0.5x treatment |
| 1829A | 1823B-tissue | 2017-01-17 | 0.5x treatment |
"""
.. raw:: html
<h3 align="center">
Command Line Utility
.. raw:: html
</h3>
Prints a tabular summary of the sample sheet.
.. code:: bash
❯ sample-sheet summary paired-end-single-index.csv
┌Header─────────────┬─────────────────────────────────┐
│ IEM1FileVersion │ 4 │
│ Investigator_Name │ jdoe │
│ Experiment_Name │ exp001 │
│ Date │ 11/16/2017 │
│ Workflow │ SureSelectXT │
│ Application │ NextSeq FASTQ Only │
│ Assay │ SureSelectXT │
│ Description │ A description of this flow cell │
│ Chemistry │ Default │
└───────────────────┴─────────────────────────────────┘
┌Settings──────────────────┬──────────┐
│ CreateFastqForIndexReads │ 1 │
│ BarcodeMismatches │ 2 │
│ Reads │ 151, 151 │
└──────────────────────────┴──────────┘
┌Identifiers┬──────────────┬────────────┬──────────┬────────┐
│ Sample_ID │ Sample_Name │ Library_ID │ index │ index2 │
├───────────┼──────────────┼────────────┼──────────┼────────┤
│ 1823A │ 1823A-tissue │ 2017-01-20 │ GAATCTGA │ │
│ 1823B │ 1823B-tissue │ 2017-01-20 │ AGCAGGAA │ │
│ 1824A │ 1824A-tissue │ 2017-01-20 │ GAGCTGAA │ │
│ 1825A │ 1825A-tissue │ 2017-01-20 │ AAACATCG │ │
│ 1826A │ 1826A-tissue │ 2017-01-20 │ GAGTTAGC │ │
│ 1826B │ 1823A-tissue │ 2017-01-17 │ CGAACTTA │ │
│ 1829A │ 1823B-tissue │ 2017-01-17 │ GATAGACA │ │
└───────────┴──────────────┴────────────┴──────────┴────────┘
┌Descriptions──────────────────┐
│ Sample_ID │ Description │
├───────────┼──────────────────┤
│ 1823A │ 0.5x treatment │
│ 1823B │ 0.5x treatment │
│ 1824A │ 1.0x treatment │
│ 1825A │ 10.0x treatment │
│ 1826A │ 100.0x treatment │
│ 1826B │ 0.5x treatment │
│ 1829A │ 0.5x treatment │
└───────────┴──────────────────┘
.. raw:: html
<h3 align="center">
Contributing
.. raw:: html
</h3>
Pull requests and issues welcome!
To make a development install:
.. code:: bash
❯ git clone git@github.com:clintval/sample-sheet.git
❯ pip install -e 'sample-sheet[fancytest]'
To run the tests:
::
❯ ./sample-sheet/tests/run-tests
Name Stmts Miss Cover
---------------------------------------------------
sample_sheet/__init__.py 1 0 100%
sample_sheet/_sample_sheet.py 280 0 100%
---------------------------------------------------
TOTAL 281 0 100%
OK! 58 tests, 0 failures, 0 errors in 0.0s
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sample_sheet-0.3.0.tar.gz
(15.4 kB
view hashes)