Immport upload preparation
Project description
Immpload - Immport upload preparation
immpload converts input data files into files formatted from Immport upload templates.
Prerequisites
- Python3 with the pip package installer
Installation
-
Install the
immpload
Python package and executable:pip install immpload
Usage
The simplest case copies input columns whose name matches the corresponding output Immport template column:
$ immpload subjectAnimals /path/to/input/subjects.txt
which will create the Immport upload file subjectAnimals.txt
in the current directory. To place the output in a different
directory, use the -o
or --outDir
option:
$ immpload -o /path/to/output subjectAnimals /path/to/input/subjects.xslx
Note that the input can be either a .xslx
Excel spreadsheet
or a tab-delimited text file.
The command:
$ immpload --help
shows all immpload
arguments and options.
It is often useful to specify the conversion mapping in a YAML configuration file. For example, the following configuration:
columns:
Subject ID: ID
Arm Or Cohort ID: Cohort
converts the ID
and Cohort
input values to Subject ID
and
Arm Or Cohort ID
output values, resp. The command is invoked
with the -c
or --config
option, e.g:
$ immpload -o /path/to/output --config /path/to/conf/subjects.yaml \
subjectAnimals /path/to/input/subjects.xslx
The configuration can include value mappings, e.g.:
values:
Species: Mus musculus
sets the output Species
to Mus musculus
for all rows.
The configuration:
columns:
Gender: Sex
values:
Gender:
n/a: Not Specified
transforms the input Sex
value n/a
to the output Gender
value
Not Specified
. Other input values are copied without change.
immpload
can flatten each input row into several output rows based
on matching input column names against a pattern. The configuration:
columns:
Subject ID: ID
Arm Or Cohort ID: Cohort
Study Day: day
patterns:
Result Value Reported: D(?P<day>\d+)$
converts an input row with columns D1
, D2
and D3
into three
output rows with column Study Day
values 1
, 2
and 3
and Result Value Reported
values given by the D1
, D2
and D3
input values, resp.
immpload
supplies certain required output columns with a reasonable
default, as follows:
- Experiment Samples (
experimentSamples.*.txt
)Experiment ID
- lower-case, underscoredExperiment Name
Biosample ID
-Expsample ID
, if present, otherwise the lower-case, underscoredBiosample Name
, if present, otherwise derived from theSubject ID
,Treatment ID
andExperiment ID
Expsample ID
-Biosample ID
(defaulted, if necessary)
- Treatments (
treatments.txt
)Name
- derived from the values and unitsUser Defined ID
- lower-case, underscoredName
Use Treatment?
- default isYes
- Assessments (
assessments.txt
)Planned Visit ID
-Study ID
followed byd
and theStudy Day
Panel Name Reported
- copied from theAssessment Type
Assessment Panel ID
- derived from thePanel Name Reported
User Defined ID
- derived from theSubject ID
,Planned Visit ID
andComponent Name Reported
The default is set if and only if the mapped column value is missing.
For advanced usage, the immpload
Python module can be used directly
in a Python script with a callback function, e.g.:
from immpload import munger
def add_results(in_row, in_col_ndx_map, out_col_ndx_map, out_row):
"""
Modifies the output row after the configuration-based conversion.
:param: in_row: the input data row
:param: in_col_ndx_map: the input {column: index} dictionary
:param: out_col_ndx_map: the output {column: index} dictionary
:param: out_row :the output row
:return: a list of rows derived from the given output row
"""
###
### Modify out_row or create new output rows here...
###
# Return an array of rows.
return [out_row]
# Convert the input file.
munger.munge('assessments', /path/to/input.xslx, callback=add_results)
The munger.munge
method signature is as follows:
def munge(template, *in_files, config=None, out_dir=None,
sheet=None, input_filter=None, callback=None, **kwargs):
"""
Builds the Immport upload file for the given input file.
The template is a supported Immport template name, e.g.
`assessments`. The output is the Immport upload file,
e.g. `assessments,txt`, placed in the output directory.
The key word arguments (_kwargs_) are static output
_column_`=`_value_ definitions that are applied to every
output row. The column name can be underscored, e.g.
`Study_ID`.
:param template: the required Immport template name
:param in_files: the input file(s) to munge
:param config: the configuration dictionary or file name
:param out_dir: the target location (default current directory)
:param sheet: for an Excel workbook input file, the sheet to open
:param input_filter: optional generator which has parameter
input row and yields valid rows
:param callback: optional callback with parameters
in_row, in_col_ndx_map, out_col_ndx_map and out_row returning
an array of rows to write to the output file
:param kwargs: the optional static _column_`=`_value_ definitions
:return: the output file name
"""
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.