# Housekeeper [![Build Status][travis-image]][travis-url] [![Coverage Status][coveralls-image]][coveralls-url]
Housekeeper is a tool to be used to keep track of **successful** analyses. It will keep track of various output files and metadata. It also provides archival functions. It's pipeline agnostic as far as possible.
1. Store access to analysis output files
2. Offer stable public API to files and listing completed analyses
3. Keep track of status of analyses (active, archived, backed-up)
It's outside the scope of the tool to store results and provide detailed access to them. Housekeeper will only provide easy access to reading in the data by other tools.
**QUESTION**: should we allow for samples to belong to multiple analyses?
## Usage guide
The first thing to do is setting up a root folder and database.
$ housekeeper init /path/to/root
The folder can't exist already or the tool will complain. If successful it will put a config file in the root directory. It will store the location of the root folder in the database so the only thing you need to supply is the path to the database.
### Adding new analyses
Housekeeper can store files from completed analyses. Supported pipelines include:
$ housekeeper add mip /path/to/familyId_config.yaml
This command will do some pre-processing and collect assets to be linked. In the case of MIP it will pre-calculate the mapping rate since it isn't available in the main QC metrics file.
Housekeeper will use create an analysis id in the format of `[customerId]-[familyId]`.
### Deleting an existing analysis
You can of course delete an analysis you've stored in the database. It will remove the reference to the analysis along with all the links to the assets.
$ housekeeper delete customer-family
Are you sure? [Y/n]
### Getting files
This is where the fun starts! Since we have control over all the assets and how they relate to analyses and samples we can hand back information to you.
Say you wanted to know the path to the raw BCF file for a given analysis. Let's ask Housekeeper!
$ housekeeper get --analysis customer-family --category bcf-raw
Note that it will print to console without new line so you can just as well do:
$ ls -l $(housekeeper get --analysis customer-family --category bcf-raw)
-rw-r--r-- 2 robinandeer staff 72K Jul 27 14:33 /path/to/root/analyses/customer-family/all.variants.bcf
And if multiple files match the query it will simply print them on one line separated by a single space.
### Archiving an analysis
When you add a new analysis you tell Housekeeper which files are eventually to be archived. We can certainly do a lot more with this functionality but for now what happens when you archive an analysis is:
1. you update the status to "archived"
2. remove all files and references that are not marked as "to_archive"
$ housekeeper archive customer-family
Are you sure? [Y/n]
## API structure and architecture
This section will describe the implementation.
SQL(ite) database containing references to the analyses and in which state they belong. It should have a straight-forward API to query which analyses have been completed and e.g. which are archived.
Likely the main entry point for accessing the API. However, it should to do the least possible. Abstract away anything that isn't directly concerning parsing command line arguments etc. Uses the Click-framework.
### Web interface
Built using the Flask-framework. Barebones. Should provide overviews for analyses in different states. Could additionally provide access to manually archiving/unarchiving analyses.
## File structure
This section describes how analysis output will be stored on the file system level. It's important that this is an implementation detail that won't be exposed to third-party tools.
The goal is to create as structure that is as flat as possible while still maintaining the original file names as far as possible.
│ ├── alignment.sample_1.bam
│ ├── variants.vcf
│ └── traceback.log
TODO: Brief introduction on what you do with files - including link to relevant help section.