Skip to main content

A set of helper functions for working with biological metadata from the SRA.

Project description

# Biometalib

Biometalib is a set of useful libraries and tools for working with SRA biological metadata.

## Installation

This library is designed for python 3+ and can be installed with pip or conda.

### Pip

Biometalib can be installed using pip.

`bash pip install -y biometalib `

Or the latest version can be installed by pip.

`bash pip install git+https://github.com/jfear/sramongo pip install git+https://github.com/jfear/biometalib `

### Conda [Suggested]

First make sure you have a working installation of Anaconda, I suggest [Miniconda](https://conda.io/miniconda.html).

`bash conda install -c jfear biometalib `

## Attribute Selector

Attribute selector is a helper script for selecting which attributes you want to focus on for a project. The biological metadata submitted by users contain a variety of different types of attributes. Sometimes these include things like misspellings or different forms of a word, it also includes attributes that are unique to a single project. This tool is to be used to quickly curate these columns. Attribute selector uses a YAML formatted file to store attribute decisions.

In the YAML file, selected attributes will be the keys. When merging multiple attributes into a single selected attribute they will be stored as values. For example:

` sex: - sex - Sex - gender `

Here the selected attribute sex has the attributes Sex and gender associated with it. There is also a special selected attribute ignore that will store a list of attributes that you want to ignore.

Using the BioSample selection sheet I have created a starting YAML that can be used when running attribute_selector.

To run the attributes selector on my public version of the Biometa database type:

`bash # Download example YAML $ wget -O my_attribute_selection.yaml https://raw.githubusercontent.com/jfear/biometalib/master/data/flybase_example.yaml $ attribute_selector --host mongo.geneticsunderground.com --port 27022 --db sra --username sra --password oliver --authenticationDatabase user-data --config my_attribute_selection.yaml `

attribute_selector is an interactive command line tool. Iterates overall attribute column names that are not already selected attributes in the YAML. The current attribute is displayed in red. At the prompt you can type:

  • k to set the current attribute as a selected attribute [keep]

  • r to rename the current attribute, this will set the current attribute as value of the renamed selected attribute [rename]

  • i adds current attribute to ignore list [ignore]

  • e show example values listed under the current attribute [example]

  • s show attributes with similar names (fuzzy string match). Here selected attributes will appear in yellow [similar]

  • n skip and go to the next attribute [next]

  • quit exit out of the program, but save progress.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biometalib-0.0.4.tar.gz (13.3 kB view details)

Uploaded Source

Built Distribution

biometalib-0.0.4-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file biometalib-0.0.4.tar.gz.

File metadata

  • Download URL: biometalib-0.0.4.tar.gz
  • Upload date:
  • Size: 13.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for biometalib-0.0.4.tar.gz
Algorithm Hash digest
SHA256 5a8a023acc35945be782581bb0da2b8a8fa0cc766d3c7e9d5af794932c8f5acf
MD5 8c25c56f9a797f858ee467f3553b1976
BLAKE2b-256 b3e7df161442f9af6ea3228fad0c94a08f32100ec645e3bdea350b1025773b93

See more details on using hashes here.

File details

Details for the file biometalib-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for biometalib-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 89792c0e7f1671d90e636f7fe322697029df1e8c966793a6aa41b224fd97f762
MD5 e6114a8bba9c59417eefeb97295a64f1
BLAKE2b-256 f55b47af964464d81973bd504739f8cadeaab6010551f8b6c72924c8f2d34010

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page