circhemy - The alchemy of circular RNA ID conversion
Project description
**circhemy**
======================================================================
**The alchemy of circular RNA ID conversion**
.. image:: https://github.com/jakobilab/circhemy/raw/main/circhemy/web/static/logo_small.png
:alt: circhemy - The alchemy of circular RNA ID conversion
:target: https://circhemy.jakobilab.org/
|downloads| |pypi| |ci| |docker|
Introduction
-------------
Circular RNAs (circRNAs) originate through back-splicing events from linear
primary transcripts, are resistant to exonucleases, typically not
polyadenylated, and have been shown to be highly specific for cell type and
developmental stage.
The prediction of circular RNAs is a multi-stage bioinformatics process starting
with raw sequencing data and usually ending with a list of potential circRNA
candidates which, depending on tissue and condition may contain hundreds to
thousands of potential circRNAs. While there are a number of tools for the
prediction process (e.g. circtools developed by our group) a unified naming
convention for circRNA is not available.
Multiple databases gathered hundreds of thousands of circRNAs, however, most
databases employ their own naming scheme, making it harder and harder to keep
track of known circRNAs and their identifiers.
Circhemy
-------------
We developed circhemy, a modular, Python3-based framework for circRNA ID
conversion that unifies several functionalities in a single Python package.
Three different routes are implemented within package to access more than 2
million circRNA IDs:
* User-friendly web application at `circhemy.jakobilab.org <https://circhemy.jakobilab.org>`__
* Streamlined CLI application for direct access to the prepackaged local SQLite3 database
* A public `REST API <https://circhemy.jakobilab.org/rest/>`__ that enables direct access to the most recent ID database from HPC systems using curl or similar tools
Circhemy includes two different modes of action: ``convert`` and ``query``. Convert
allows the user to convert from one type of circRNA ID to a wide variety of
other database identifiers, while query allows users to run direct queries on
the circRNA database to extract circRNAs fulfilling a user-defined set of
constraints.
Moreover, circhemy is the first circRNA resource that supports and integrates
the first version of the **C**\ircRNA **S**\tandard **N**\omenclature (abbreviated
**CSNv1** in circhemy) as outlined in `"A guide to naming eukaryotic circular RNAs", Chen et al. 2023 <https://www.nature.com/articles/s41556-022-01066-9>`__.
Currently, circhemy contains computationally generated CSNv1 names for nearly 1
million circRNAs of Human, mouse, and rat.
Installation
-------------
The circhemy CLI package is written in Python3 (>=3.8) and consists of two
core modules, namely ``convert`` and ``query``. The command line version requires
only one external dependency, ``sqlite3``, for access to the internal SQLite3
database with circRNA ID data
Installation is managed through ``python3 -m pip install circhemy`` or ``python3 setup.py
install`` when installed from the cloned GitHub repository. No sudo access is
required if the installation is executed with ``--user`` which will install the
package in a user-writeable folder. The binaries should be installed
to ``/home/$user/.local/bin/`` in case of Debian-based systems.
circhemy was developed and tested on Debian Buster, but should run with
any other distribution.
The latest release version of circhemy can be installed via pip:
.. code-block:: console
python3 -m pip install circhemy
Additionally, this repository offers the latest development version:
.. code-block:: console
python3 -m pip install git+https://github.com/jakobilab/circhemy.git
Command Line Interface
-----------------------
Circhemy currently offers two modules:
Convert module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The convert module is able to convert from a range of input circRNA ID into different one or more database identifiers.
Example: Convert a list of CircAtlas2 IDs read via STDIN from file input.csv into Circpedia2 IDs, but also output CircAtlas2 IDs, while writing the output to /tmp/output.csv:
.. code-block:: console
cat input.csv | circhemy convert -q STDIN -i CircAtlas2 -o Circpedia2 CircAtlas2 -O /tmp/output.csv
Query module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The query module is able to retrieve circRNA IDs from the internal database that fulfil a set of user-defined constraints.
Example: Retrieve a list of circbase and CircAtlas2 circRNA IDs that are located on chromosome 3 of the species rattus norvegicus; only print out circRNAs from the rn6 genome build.
.. code-block:: console
circhemy query -o circbase CircAtlas2 -C chr3 -s rattus_norvegicus -g rn6
Representational State Transfer Interface (REST)
-------------------------------------------------
Representational State Transfer, or REST for short, allows users and software
developers to easily access circhemy from within their own tools or pipelines.
Circhemy's REST API uses JSON for input queries and returning output, making it
easy to format queries from every programming language or even by hand.
The REST API it publicly available and uses a fixed set of keywords to perform
conversions or queries. Two examples for the two different modes of action are
shown below.
Convert module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The convert module is able to convert from a range of input circRNA ID into
different one or more database identifiers.
Example: Convert a list of CircAtlas2 IDs into circBase and
into Circpedia2 IDs, including the Genome build.
.. code-block:: console
curl -X 'POST' 'https://circhemy.jakobilab.org/api/convert'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"input": "CircAtlas2",
"output": ["Circpedia2","CircAtlas2","Genome"],
"query": ["hsa-MYH9_0004","hsa-MYH9_0004"]
}'
Output is returned as JSON-formatted string which can directly be used for AG
Grid tables for any other postprocessing:
.. code-block:: json
{
"columnDefs": [
{
"headerName": "circBase",
"field": "circBase"
},
{
"headerName": "Circpedia2",
"field": "Circpedia2"
}
{
"headerName": "Genome",
"field": "Genome"
}
],
"rowData": [
{
"circBase": "hsa_circ_0004470",
"Circpedia2": "HSA_CIRCpedia_36582"
"Genome": "hg38"
},
{
"circBase": "hsa_circ_0004470",
"Circpedia2": "HSA_CIRCpedia_36582"
"Genome": "hg19"
}
]
}
Query module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The query module is able to retrieve circRNA IDs from the internal database that
fulfil a set of user-defined constraints.
Example: Retrieve all circRNAs with a CircAtlas2 ID containing *nppa* in the
species homo sapiens, return the IDs in circBase and CircAtlas2 format:
.. code-block:: console
curl -X 'POST'
'https://circhemy.jakobilab.org/api/query'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"input": [
{
"query": "nppa",
"field": "CircAtlas2",
"operator1": "AND",
"operator2": "LIKE"
},
{
"query": "homo_sapiens",
"field": "Species",
"operator1": "AND",
"operator2": "is"
}
],
"output": [
"circBase",
"CircAtlas2"
]
}'
Output is returned as JSON-formatted string which can directly be used for AG
Grid tables for any other postprocessing:
.. code-block:: json
{
"columnDefs": [
{
"headerName": "circBase",
"field": "circBase"
},
{
"headerName": "CircAtlas2",
"field": "CircAtlas2"
}
],
"rowData": [
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0001"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0001"
},
{
"circBase": "hsa_circ_0009871",
"CircAtlas2": "hsa-NPPA-AS1_0004"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0003"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0001"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0001"
},
{
"circBase": "hsa_circ_0009871",
"CircAtlas2": "hsa-NPPA-AS1_0004"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0003"
}
]
}
.. |downloads| image:: https://pepy.tech/badge/circhemy
:alt: Python Package Index Downloads
:scale: 100%
:target: https://pepy.tech/project/circhemy
.. |pypi| image:: https://badge.fury.io/py/circhemy.svg
:alt: Python package version
:scale: 100%
:target: https://badge.fury.io/py/circhemy
.. |ci| image:: https://github.com/jakobilab/circhemy/actions/workflows/run_circhemy_ci.yml/badge.svg
:alt: CI tests
:scale: 100%
:target: https://github.com/jakobilab/circhemy/actions/workflows/run_circhemy_ci.yml
.. |docker| image:: https://github.com/jakobilab/circhemy/actions/workflows/build_docker.yml/badge.svg
:alt: Docker build process
:scale: 100%
:target: https://github.com/jakobilab/circhemy/actions/workflows/build_docker.yml
About
-------------
Circhemy is developed at the `Jakobi Lab <https://jakobilab.org/>`__, part of
the `Translational Cardiovascular Research Center (TCRC) <https://phoenixmed.arizona.edu/tcrc/>`__, in the Department of Internal Medicine at `The University of Arizona College of Medicine – Phoenix <https://phoenixmed.arizona.edu/>`__.
Contact: **circhemy@jakobilab.org**
======================================================================
**The alchemy of circular RNA ID conversion**
.. image:: https://github.com/jakobilab/circhemy/raw/main/circhemy/web/static/logo_small.png
:alt: circhemy - The alchemy of circular RNA ID conversion
:target: https://circhemy.jakobilab.org/
|downloads| |pypi| |ci| |docker|
Introduction
-------------
Circular RNAs (circRNAs) originate through back-splicing events from linear
primary transcripts, are resistant to exonucleases, typically not
polyadenylated, and have been shown to be highly specific for cell type and
developmental stage.
The prediction of circular RNAs is a multi-stage bioinformatics process starting
with raw sequencing data and usually ending with a list of potential circRNA
candidates which, depending on tissue and condition may contain hundreds to
thousands of potential circRNAs. While there are a number of tools for the
prediction process (e.g. circtools developed by our group) a unified naming
convention for circRNA is not available.
Multiple databases gathered hundreds of thousands of circRNAs, however, most
databases employ their own naming scheme, making it harder and harder to keep
track of known circRNAs and their identifiers.
Circhemy
-------------
We developed circhemy, a modular, Python3-based framework for circRNA ID
conversion that unifies several functionalities in a single Python package.
Three different routes are implemented within package to access more than 2
million circRNA IDs:
* User-friendly web application at `circhemy.jakobilab.org <https://circhemy.jakobilab.org>`__
* Streamlined CLI application for direct access to the prepackaged local SQLite3 database
* A public `REST API <https://circhemy.jakobilab.org/rest/>`__ that enables direct access to the most recent ID database from HPC systems using curl or similar tools
Circhemy includes two different modes of action: ``convert`` and ``query``. Convert
allows the user to convert from one type of circRNA ID to a wide variety of
other database identifiers, while query allows users to run direct queries on
the circRNA database to extract circRNAs fulfilling a user-defined set of
constraints.
Moreover, circhemy is the first circRNA resource that supports and integrates
the first version of the **C**\ircRNA **S**\tandard **N**\omenclature (abbreviated
**CSNv1** in circhemy) as outlined in `"A guide to naming eukaryotic circular RNAs", Chen et al. 2023 <https://www.nature.com/articles/s41556-022-01066-9>`__.
Currently, circhemy contains computationally generated CSNv1 names for nearly 1
million circRNAs of Human, mouse, and rat.
Installation
-------------
The circhemy CLI package is written in Python3 (>=3.8) and consists of two
core modules, namely ``convert`` and ``query``. The command line version requires
only one external dependency, ``sqlite3``, for access to the internal SQLite3
database with circRNA ID data
Installation is managed through ``python3 -m pip install circhemy`` or ``python3 setup.py
install`` when installed from the cloned GitHub repository. No sudo access is
required if the installation is executed with ``--user`` which will install the
package in a user-writeable folder. The binaries should be installed
to ``/home/$user/.local/bin/`` in case of Debian-based systems.
circhemy was developed and tested on Debian Buster, but should run with
any other distribution.
The latest release version of circhemy can be installed via pip:
.. code-block:: console
python3 -m pip install circhemy
Additionally, this repository offers the latest development version:
.. code-block:: console
python3 -m pip install git+https://github.com/jakobilab/circhemy.git
Command Line Interface
-----------------------
Circhemy currently offers two modules:
Convert module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The convert module is able to convert from a range of input circRNA ID into different one or more database identifiers.
Example: Convert a list of CircAtlas2 IDs read via STDIN from file input.csv into Circpedia2 IDs, but also output CircAtlas2 IDs, while writing the output to /tmp/output.csv:
.. code-block:: console
cat input.csv | circhemy convert -q STDIN -i CircAtlas2 -o Circpedia2 CircAtlas2 -O /tmp/output.csv
Query module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The query module is able to retrieve circRNA IDs from the internal database that fulfil a set of user-defined constraints.
Example: Retrieve a list of circbase and CircAtlas2 circRNA IDs that are located on chromosome 3 of the species rattus norvegicus; only print out circRNAs from the rn6 genome build.
.. code-block:: console
circhemy query -o circbase CircAtlas2 -C chr3 -s rattus_norvegicus -g rn6
Representational State Transfer Interface (REST)
-------------------------------------------------
Representational State Transfer, or REST for short, allows users and software
developers to easily access circhemy from within their own tools or pipelines.
Circhemy's REST API uses JSON for input queries and returning output, making it
easy to format queries from every programming language or even by hand.
The REST API it publicly available and uses a fixed set of keywords to perform
conversions or queries. Two examples for the two different modes of action are
shown below.
Convert module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The convert module is able to convert from a range of input circRNA ID into
different one or more database identifiers.
Example: Convert a list of CircAtlas2 IDs into circBase and
into Circpedia2 IDs, including the Genome build.
.. code-block:: console
curl -X 'POST' 'https://circhemy.jakobilab.org/api/convert'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"input": "CircAtlas2",
"output": ["Circpedia2","CircAtlas2","Genome"],
"query": ["hsa-MYH9_0004","hsa-MYH9_0004"]
}'
Output is returned as JSON-formatted string which can directly be used for AG
Grid tables for any other postprocessing:
.. code-block:: json
{
"columnDefs": [
{
"headerName": "circBase",
"field": "circBase"
},
{
"headerName": "Circpedia2",
"field": "Circpedia2"
}
{
"headerName": "Genome",
"field": "Genome"
}
],
"rowData": [
{
"circBase": "hsa_circ_0004470",
"Circpedia2": "HSA_CIRCpedia_36582"
"Genome": "hg38"
},
{
"circBase": "hsa_circ_0004470",
"Circpedia2": "HSA_CIRCpedia_36582"
"Genome": "hg19"
}
]
}
Query module
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The query module is able to retrieve circRNA IDs from the internal database that
fulfil a set of user-defined constraints.
Example: Retrieve all circRNAs with a CircAtlas2 ID containing *nppa* in the
species homo sapiens, return the IDs in circBase and CircAtlas2 format:
.. code-block:: console
curl -X 'POST'
'https://circhemy.jakobilab.org/api/query'
-H 'accept: application/json'
-H 'Content-Type: application/json'
-d '{
"input": [
{
"query": "nppa",
"field": "CircAtlas2",
"operator1": "AND",
"operator2": "LIKE"
},
{
"query": "homo_sapiens",
"field": "Species",
"operator1": "AND",
"operator2": "is"
}
],
"output": [
"circBase",
"CircAtlas2"
]
}'
Output is returned as JSON-formatted string which can directly be used for AG
Grid tables for any other postprocessing:
.. code-block:: json
{
"columnDefs": [
{
"headerName": "circBase",
"field": "circBase"
},
{
"headerName": "CircAtlas2",
"field": "CircAtlas2"
}
],
"rowData": [
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0001"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0001"
},
{
"circBase": "hsa_circ_0009871",
"CircAtlas2": "hsa-NPPA-AS1_0004"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0003"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0001"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0001"
},
{
"circBase": "hsa_circ_0009871",
"CircAtlas2": "hsa-NPPA-AS1_0004"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0002"
},
{
"circBase": "",
"CircAtlas2": "hsa-NPPA-AS1_0003"
}
]
}
.. |downloads| image:: https://pepy.tech/badge/circhemy
:alt: Python Package Index Downloads
:scale: 100%
:target: https://pepy.tech/project/circhemy
.. |pypi| image:: https://badge.fury.io/py/circhemy.svg
:alt: Python package version
:scale: 100%
:target: https://badge.fury.io/py/circhemy
.. |ci| image:: https://github.com/jakobilab/circhemy/actions/workflows/run_circhemy_ci.yml/badge.svg
:alt: CI tests
:scale: 100%
:target: https://github.com/jakobilab/circhemy/actions/workflows/run_circhemy_ci.yml
.. |docker| image:: https://github.com/jakobilab/circhemy/actions/workflows/build_docker.yml/badge.svg
:alt: Docker build process
:scale: 100%
:target: https://github.com/jakobilab/circhemy/actions/workflows/build_docker.yml
About
-------------
Circhemy is developed at the `Jakobi Lab <https://jakobilab.org/>`__, part of
the `Translational Cardiovascular Research Center (TCRC) <https://phoenixmed.arizona.edu/tcrc/>`__, in the Department of Internal Medicine at `The University of Arizona College of Medicine – Phoenix <https://phoenixmed.arizona.edu/>`__.
Contact: **circhemy@jakobilab.org**
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
circhemy-0.1.0.tar.gz
(5.8 MB
view details)
Built Distribution
File details
Details for the file circhemy-0.1.0.tar.gz
.
File metadata
- Download URL: circhemy-0.1.0.tar.gz
- Upload date:
- Size: 5.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
a6808a0b255b96829fa5c016e2820811bb48f07c82d8d917bf53ada1778bdab5
|
|
MD5 |
4382a7c91c06fbe5379693ab2e1793e9
|
|
BLAKE2b-256 |
b1ae17b4ad4757b1654f50697628d6987be65354e6eb97862d4d7bbfa9953c02
|
File details
Details for the file circhemy-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: circhemy-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.8 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
c284b75086e8fb91cb40c775a3ac9bb31f73ea694105b04a0ecd54b9ef058d6a
|
|
MD5 |
347b9566fa3af508ac908cc19984251b
|
|
BLAKE2b-256 |
18ed8c6969d1a83898f589389f85111677e12be22640ac73e3d19f490af03339
|