A django library for running Nextflow pipelines and storing their result.
Project description
django-nextflow
django-nextflow is Django app for running Nextflow pipelines and storing their results in a database within a Django web app.
Installation
nextflow.py is available through PyPI:
pip install django-nextflow
You must install the Nextflow executable itself separately: see the Nextflow Documentation for help with this.
Setup
To use the app within Django, add django-nextflow
to your list of
INSTALLED_APPS
.
You must define four values in your settings.py
:
-
NEXTFLOW_PIPELINE_ROOT
- the location on disk where the Nextflow pipelines are stored. All references to pipeline files will use this as the root. -
NEXTFLOW_DATA_ROOT
- the location on disk to store execution records. -
NEXTFLOW_UPLOADS_ROOT
- the location on disk to store uploaded data. -
NEXTFLOW_PUBLISH_DIR
- the name of the folder published files will be saved to. Within an execution directory, django-nextflow will look in NEXTFLOW_PUBLISH_DIR/process_name for output files for that process.
Usage
Begin by defining one or more Pipelines. These are .nf files somewhere within
the NEXTFLOW_PIPELINE_ROOT
you defined:
from django_nextflow.models import Pipeline
pipeline = Pipeline.objects.create(path="workflows/main.nf")
You can also provide paths to a JSON input schema file (structured using the nf-core style) and a config file to use when running it:
pipeline = Pipeline.objects.create(
path="workflows/main.nf",
description="Some useful pipeline.",
schema_path="main.json",
config_path="nextflow.config"
)
print(pipeline.input_schema) # Returns inputs as dict
To run the pipeline:
execution = pipeline.run(params={"param1": "xxx"})
This will run the pipeline using Nextflow, and save database entries for three different models:
-
The
Execution
that is returned represents the running of this pipeline on this occasion. It stores the stdout and stderr of the command, and has aget_log_text()
method for reading the full log file from disk. A directory will be created inNEXTFLOW_DATA_ROOT
for the execution to take place in. -
ProcessExecution
records for each process that execution within the running of the pipeline. These also have their own stdout and stderr, as well as status information etc. -
Data
records for each file published by the processes in the pipeline. Note that this is not every file produced - but specifically those output by the process via its output channel. For this to work the processes must be configured to publish these files to a particular directory name (the one thatNEXTFLOW_PUBLISH_DIR
is set to), and to a subdirectory within that directory with the process's name.
If you want to supply a file for which there is a Data
object as the input to
a pipeline, you can do so as follows:
execution = pipeline.run(
params={"param1": "xxx"},
data_params={"param2": 23, "param3": [24, 25]}
)
...where 23, 24 and 25 are the IDs of Data
objects.
The Data
objects above were created by running some pipeline, but you might
want to create one from scratch without running a pipeline. You can do so either
from a path string, or from a Django UploadedFile
object:
data1 = Data.create_from_path("/path/to/file.txt")
data2 = Data.create_from_upload(django_upload_object)
The file will be copied to NEXTFLOW_UPLOADS_ROOT
in this case.
Changelog
0.3.2
26th December, 2021
- Allow IDs to be big ints.
0.3.1
24th December, 2021
- Data file sizes can now be more than 232.
- Data file names can now be 1000 characters long.
0.3
21st December, 2021
- Pipelines can now take multiple data inputs per param.
- Profiles can now be specified when running a pipeline.
- Compression extension .gz now ignored when detecting filetype.
- Process executions start and end times are now recorded.
- Improved system for identifying upstream data inputs.
- Improved publish_dir identification.
- Improved log file reading.
0.2
14th November, 2021
- Pipelines now have description fields.
- Data objects now have creation time fields.
- Added upstream data objects as well as downstream to process executions.
0.1.1
3rd November, 2021
- Fixed duration string parsing.
0.1
29th October, 2021
- Initial models for pipelines, execution, process executions and data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for django_nextflow-0.3.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 53da36f66fac15ece517b5bbc9a1e57d9d179a9ab8a35525641603a595a0601b |
|
MD5 | 62e0e71b052d6c261264fb51addcd221 |
|
BLAKE2b-256 | 64ac84ddf94c66d17619a16277e5c5a8a2221304a48a36d47b0e49046fde4e84 |