Skip to main content

FAIRsoft package for the aggregation of Life Sciences software metadata and FAIR evaluation.

Project description

FAIRsoft

Library for the aggregation of Life Sciences software metadata and FAIR evaluation.

Installation

Install using pip:

pip install FAIRsoft

Requirements

In order to use the Bioconda, Galaxy Toolshed and repositories (GitHub and Bitbucket) metadata importers, the following tools need to be installed:

  • bioconda-utils is required by the bioconda importer.

    bioconda-utils is a bioconda package and thus requires Conda.

    ❗️ The large size of bioconda-utils package can cause Conda to crash during the installation process. Using Mamba instead of Conda prevents this problem.

    ❗️ bioconda-utils requires Python 3.7 or lower. Simulating a compatible platform might be necessary. To do so, use the following commands:

    # create the environment
    mamba create -n myenv
    
    # activate the environment
    conda activate myenv
    
    # before installing anything in the environment, set the usage of x86_64 architecture
    conda config --env --set subdir osx-64
    
  • opeb-enrichers/repoEnricher is required by the Source Code Respositories importer.

  • AnyStyle is required by the Galaxy Toolshed importer.

Usage

Configuration is done through environment variables. Those refering to the database where extracted and/or proccessd software metadata is stored are:

Name Description Default Notes
DBHOST Host of database where output will be pushed localhost
DBPORT Port of database where output will be pushed 27017
DB Name of database where output will be pushed observatory
ALAMBIQUE Name of collection where importers output will be stored alambique Needed for importation only
PRETOOLS Name of collection where output of transformation step (harmonized version of data in ALAMBIQUE collection) will be pushed. It is also the collection from which the following step, integration, will use as source of input data pretools Needed for transformation and integration
TOOLS Name of collection where output of integration will be stored. This is the final collection os the porccess. Thus, it is the collection that can be use for the evaluation of FAIRness, calculation of statictics, etc tools Needed for integration

Data extraction

Data extraction is done through the execution of importers. Each importer is responsible for extracting metadata from a specific source.

All importers require the environment variables DBHOST, DBPORT, DB, ALAMBIQUE and PRETOOLS (previously explained) to be set.

Bioconda importer

Configuration:

Name Description Default Notes
RECIPES_PATH Path to bioconda recipes (from repository) ./bioconda-recipes/recipes Only required when running natively AND if the location of bioconda recipes changes

To run the importer use:

FAIRsoft_import_bioconda -e=[env-file] -l=[log-level] -d=[log-directory]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.
  • -d/--logdir/ is optional. It specifies the path to the directory where the logs will be written. Default is ./logs.

Galaxy Toolshed importer

Configuration:

Name Description Default Notes
GALAXY_METADATA Path to metadata extracted from Galaxy Metadata. This JSON file, automatically generated after the extraction of repositories metadata, constains identifiers that are necessary for the download of repositories, which contain the recipes. ./data/galaxy_metadata.json

To run the importer use:

FAIRsoft_import_toolshed -e=[env-file] -l=[log-level] -d=[log-directory]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.
  • -d/--logdir/ is optional. It specifies the path to the directory where the logs will be written. Default is ./logs.

Source Code Repositories (GitHub and Bitbucket) importer

This importer is actually and "enricher" of tools in OpenEBench Tools API. It only extracts metadata from the repositories associted to those tools. It requires the following environment variables to be set:

Name Description Default Notes
REPOENRICHER_PATH Path to repoEnricher program. ./opeb-enrichers/repoEnricher/repoEnricher.pl

In addition, it requires a file containing the credentials for the GitHub and BitBucket APIs: config.ini. This file must be palced in the REPOENRICHER_PATH. Details here

To run the importer use:

FAIRsoft_import_repositories

OpenEBench Tools importer

Configuration:

Name Description Default Notes
URL_OPEB_TOOLS URL to OpenEBench Tools API https://openebench.bsc.es/monitor/tool

To use the importer, run the following command:

FAIRsoft_import_opeb_tools -e=[env-file] -l=[log-level] -d=[log-directory]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.
  • -d/--logdir/ is optional. It specifies the path to the directory where the logs will be written. Default is ./logs.

OpenEBench Metrics importer

Configuration:

Name Description Default Notes
URL_OPEB_METRICS URL to OpenEBench Metrics API https://openebench.bsc.es/monitor/metrics/

To use the importer run:

FAIRsoft_import_opeb_metrics -e=[env-file] -l=[log-level] -d=[log-directory]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.
  • -d/--logdir/ is optional. It specifies the path to the directory where the logs will be written. Default is ./logs.

Bioconductor importer

Configuration:

Name Description Default Notes
URL_BIOCONDUCTOR Path to file containing the URLs of the bioconductor packages to be scraped. ./data/bioconductor_opeb.txt

To run the importer use:

FAIRsoft_import_bioconductor -e=[env-file] -l=[log-level] -d=[log-directory]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.
  • -d/--logdir/ is optional. It specifies the path to the directory where the logs will be written. Default is ./logs.

SourceForge importer

Configuration:

Name Description Default Notes
URL_SOURCEFORGE_PACKAGES URL to SourceForge packages of our interest https://sourceforge.net/directory/science-engineering/bioinformatics/

To run the importer use:

FAIRsoft_import_sourceforge -e=[env-file] -l=[log-level] -d=[log-directory]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.
  • -d/--logdir/ is optional. It specifies the path to the directory where the logs will be written. Default is ./logs.

Data transformation

Data transformation requires the environment variables DBHOST, DBPORT, DB, ALAMBIQUE and PRETOOLS (previously explained) to be set.

Execute the following command to transform data:

FAIRsoft_transform --env-file=[env-file] -l=[log-level]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.

Data integration

Data integration requires the environment variables DBHOST, DBPORT, DB, PRETOOLS and TOOLS (previously explained) to be set.

Execute the following command to integrate data:

FAIRsoft_integrate --env-file=[env-file] -l=[log-level]
  • -e/--env-file is optional. It specifies the path to the file containing the environment variables. Default is .env.
  • -l/--loglevel is optional. It can be DEBUG, INFO, WARNING, ERROR or CRITICAL. Default is INFO.

FAIRsoft indicators evaluation

FAIRness indicators evaluation requires the environment variables DBHOST, DBPORT, DB and TOOLS (previously explained) to be set. Additionally, FAIR is required:

Name Description Default Notes
FAIR Name of collection where FAIRness indicators will be stored fair

To run the evaluation use:

FAIRsoft_indicators_evaluation --env-file=[env-file] -l=[log-level]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FAIRsoft-0.2.1.tar.gz (58.5 kB view hashes)

Uploaded Source

Built Distribution

FAIRsoft-0.2.1-py3-none-any.whl (69.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page