A django library for running Nextflow pipelines and storing their result.

These details have not been verified by PyPI

Project links

Homepage

Project description

django-nextflow

django-nextflow is Django app for running Nextflow pipelines and storing their results in a database within a Django web app.

Installation

nextflow.py is available through PyPI:

pip install django-nextflow

You must install the Nextflow executable itself separately: see the Nextflow Documentation for help with this.

Setup

To use the app within Django, add django-nextflow to your list of INSTALLED_APPS.

You must define four values in your settings.py:

NEXTFLOW_PIPELINE_ROOT - the location on disk where the Nextflow pipelines are stored. All references to pipeline files will use this as the root.
NEXTFLOW_DATA_ROOT - the location on disk to store execution records.
NEXTFLOW_UPLOADS_ROOT - the location on disk to store uploaded data.
NEXTFLOW_PUBLISH_DIR - the name of the folder published files will be saved to. Within an execution directory, django-nextflow will look in NEXTFLOW_PUBLISH_DIR/process_name for output files for that process. These files must be published as symlinks, not copies, otherwise django-nextflow will not recognise them.

Usage

Begin by defining one or more Pipelines. These are .nf files somewhere within the NEXTFLOW_PIPELINE_ROOT you defined:

from django_nextflow.models import Pipeline

pipeline = Pipeline.objects.create(path="workflows/main.nf")

You can also provide paths to a JSON input schema file (structured using the nf-core style) and a config file to use when running it:

pipeline = Pipeline.objects.create(
    path="workflows/main.nf",
    description="Some useful pipeline.",
    schema_path="main.json",
    config_path="nextflow.config"
)
print(pipeline.input_schema) # Returns inputs as dict

To run the pipeline:

execution = pipeline.run(params={"param1": "xxx"})

This will run the pipeline using Nextflow, and save database entries for three different models:

The Execution that is returned represents the running of this pipeline on this occasion. It stores the stdout and stderr of the command, and has a get_log_text() method for reading the full log file from disk. A directory will be created in NEXTFLOW_DATA_ROOT for the execution to take place in.
ProcessExecution records for each process that execution within the running of the pipeline. These also have their own stdout and stderr, as well as status information etc.
Data records for each file published by the processes in the pipeline. Note that this is not every file produced - but specifically those output by the process via its output channel. For this to work the processes must be configured to publish these files to a particular directory name (the one that NEXTFLOW_PUBLISH_DIR is set to), and to a subdirectory within that directory with the process's name.

If you want to supply a file for which there is a Data object as the input to a pipeline, you can do so as follows:

execution = pipeline.run(
    params={"param1": "xxx"},
    data_params={"param2": 23, "param3": [24, 25]}
)

...where 23, 24 and 25 are the IDs of Data objects.

You can also supply entire executions as inputs, in which case they will be provided to the pipeline as a directory of symlinked files:

execution = pipeline.run(
    params={"param1": "xxx"},
    execution_params={"genome1": 23, "genome2": 24}
)

The Data objects above were created by running some pipeline, but you might want to create one from scratch without running a pipeline. You can do so either from a path string, or from a Django UploadedFile object:

data1 = Data.create_from_path("/path/to/file.txt")
data2 = Data.create_from_upload(django_upload_object)

The file will be copied to NEXTFLOW_UPLOADS_ROOT in this case.

Changelog

0.5

3rd February, 2022

Pipelines can now take execution inputs.
Fixed method for detecting downstream data products.

0.4

12th January, 2022

Better support for multiple data objects.
Data objects can now be directories, which will be automatically zipped.
When creating upstream data connections, data objects will be created if needed.

0.3.2

26th December, 2021

Allow IDs to be big ints.

0.3.1

24th December, 2021

Data file sizes can now be more than 2³².
Data file names can now be 1000 characters long.

0.3

21st December, 2021

Pipelines can now take multiple data inputs per param.
Profiles can now be specified when running a pipeline.
Compression extension .gz now ignored when detecting filetype.
Process executions start and end times are now recorded.
Improved system for identifying upstream data inputs.
Improved publish_dir identification.
Improved log file reading.

0.2

14th November, 2021

Pipelines now have description fields.
Data objects now have creation time fields.
Added upstream data objects as well as downstream to process executions.

0.1.1

3rd November, 2021

Fixed duration string parsing.

0.1

29th October, 2021

Initial models for pipelines, execution, process executions and data.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.12.1

Jun 4, 2022

0.12.0

May 2, 2022

0.11.0

Apr 22, 2022

0.10.0

Apr 19, 2022

0.9.3

Mar 23, 2022

0.9.2

Mar 21, 2022

0.9.1

Mar 7, 2022

0.9.0

Mar 6, 2022

0.8.0

Feb 27, 2022

0.7.0

Feb 19, 2022

0.6.0

Feb 14, 2022

This version

0.5

Feb 3, 2022

0.4.0

Jan 13, 2022

0.3.2

Dec 26, 2021

0.3.1

Dec 24, 2021

0.3.0

Dec 21, 2021

0.2.0

Nov 14, 2021

0.1.1

Nov 3, 2021

0.1.0

Oct 29, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

django_nextflow-0.5-py3-none-any.whl (21.7 kB view hashes)

Uploaded Feb 3, 2022 Python 3

Hashes for django_nextflow-0.5-py3-none-any.whl

Hashes for django_nextflow-0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7e599a2f85e4cb02205aced0c0f3aa9c44d1175631f2b96e74223270f5580272`
MD5	`2965eacf10b24685ce990b53fdb85ebe`
BLAKE2b-256	`b273155ddeac05877385fa4a76b31393d14bf4511908be0978c9c9d09cd818c9`