PyUniProt

Importing and querying UniProt

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Database
- Scientific/Engineering :: Bio-Informatics

Project description

Project logo

Apache 2.0 License

PyUniProt is a Python package to access and query UniProt data provided by the European Bioinformatics Institute (EMBL-EBI), the SIB Swiss Institute of Bioinformatics and the Protein Information Resource (PIR).

Data are installed in a (local or remote) RDBMS enabling bioinformatic algorithms very fast response times to sophisticated queries and high flexibility by using SOLAlchemy database layer. PyUniProt is developed by the Department of Bioinformatics at the Fraunhofer Institute for Algorithms and Scientific Computing SCAI For more in for information about pyUniProt go to the documentation.

This development is supported by following IMI projects:

AETIONOMY and
PHAGO.

Supported databases

PyUniProt uses SQLAlchemy to cover a wide spectrum of RDMSs (Relational database management system). For best performance MySQL or MariaDB is recommended. But if you have no possibility to install software on your system SQLite - which needs no further installation - also works. Following RDMSs are supported (by SQLAlchemy):

Firebird
Microsoft SQL Server
MySQL / MariaDB
Oracle
PostgreSQL
SQLite
Sybase

Getting Started

This is a quick start tutorial for impatient.

Installation

PyUniProt can be installed with pip.

pip install pyuniprot

If you fail because you have no rights to install use superuser (sudo on Linux before the commend) or …

pip install --user pyuniprot

If you want to make sure you are installing this under python3 use …

python3 -m pip install pyuniprot

SQLite

If you don’t know what all that means skip the section MySQL/MariaDB setup.

Don’t worry! You can always later change the configuration. For more information about changing database system later go to the subtitle Changing database configuration Changing database configuration in the documentation on readthedocs.

MySQL/MariaDB setup

CREATE DATABASE pyuniprot CHARACTER SET utf8 COLLATE utf8_general_ci;
GRANT ALL PRIVILEGES ON pyuniprot.* TO 'pyuniprot_user'@'%' IDENTIFIED BY 'pyuniprot_passwd';
FLUSH PRIVILEGES;

There are two options to set the MySQL/MariaDB.

The simplest is to start the command line tool

pyuniprot mysql

You will be guided with input prompts. Accept the default value in squared brackets with RETURN. You will see something like this

server name/ IP address database is hosted [localhost]:
MySQL/MariaDB user [pyuniprot_user]:
MySQL/MariaDB password [pyuniprot_passwd]:
database name [pyuniprot]:
character set [utf8]:

Connection will be tested and in case of success return Connection was successful. Otherwise you will see following hinte

Test was NOT successful

Please use one of the following connection schemas
MySQL/MariaDB (strongly recommended):
        mysql+pymysql://user:passwd@localhost/database?charset=utf8

PostgreSQL:
        postgresql://user:passwd@localhost/database

MsSQL (pyodbc needed):
        mssql+pyodbc://user:passwd@database

SQLite (always works):

- Linux:
        sqlite:////absolute/path/to/database.db

- Windows:
        sqlite:///C:\absolute\path\to\database.db

Oracle:
        oracle://user:passwd@localhost:1521/database

2. The second option is to start a python shell and set the MySQL configuration. If you have not changed anything in the SQL statements above …

import pyuniprot
pyuniprot.set_mysql_connection()

If you have used you own settings, please adapt the following command to you requirements.

import pyuniprot
pyuniprot.set_mysql_connection(host='localhost', user='pyuniprot_user', passwd='pyuniprot_passwd', db='pyuniprot')

Updating

The updating process will download the uniprot_sprot.xml.gz file provided by the UniProt team on their ftp server download page

It is strongly recommended to restrict the entries liked to specific organisms your are interested in by parsing a list of NCBI Taxonomy IDs to the parameter taxids. To identify correct NCBI Taxonomy IDs please go to NCBI Taxonomy web form. In the following example we use 9606 as identifier for Homo sapiens, 10090 for Mus musculus and 10116 for Rattus norvegicus.

There are two options to import the data:

Command line import

pyuniprot update --taxids 9606,10090,10116

Python

import pyuniprot
pyuniprot.update(taxids=[9606, 10090, 10116])

We only recommend to import the whole UniProt dataset if you don’t want to restrict your search. Import with no restrictions will take several hours and take a lot of disk space.

If you want to load all UniProt entries in the database:

import pyuniprot
pyuniprot.update() # not recommended, please read the notes above

The update uses the download file if it still exists on you system (~/.pyuniprot/data/uniprot_sprot.xml.gz). If you use the parameter force_download the current file from UniProt will be downloaded.

import pyuniprot
pyuniprot.update(force_download=True, taxids=[9606, 10090, 10116])

Quick start with query functions

Initialize the query object

query = pyuniprot.query()

Get all entries

all_entries = query.entry()

Use parameters like gene_name to find specific entries

>>> entry = query.entry(gene_name='YWHAE', taxid=9606, recommended_short_name='14-3-3E', name='1433E_HUMAN')[0]
>>> entry
14-3-3 protein epsilon

Entry is the root element in the database. Form here you can reach all other data

>>> entry.accessions
[P62258, B3KY71, D3DTH5, P29360, P42655, Q4VJB6, Q53XZ5, Q63631, Q7M4R4]
>>> entry.functions
["Adapter protein implicated in the regulation of a large spectrum of both ..."]

If a parameter ends on a s you can search

>>> alcohol_dehydrogenases = q.entry(ec_numbers='1.1.1.1')
>>> [x.name for x in q.get_entry(ec_numbers='1.1.1.1')]
['ADHX_RAT', 'ADH1_RAT', 'ADHX_HUMAN', 'ADHX_MOUSE']
>>> query.entry(ec_numbers=('1.1.1.1', '1.1.1.2'))
['Adh5', 'Adh1', 'ADH5', 'Adh5', 'Adh6', 'ADH7', 'Adh7', 'Adh7', 'Adh1']

As dataframe with a limit of 10 and accession number starts with Q9 (% used as wildcard)

>>> query.accession(as_df=True, limit=3, accession='Q9%')
   id accession  entry_id
0   1    Q9CQV8         1
1  32    Q9GIK8         6
2  33    Q9TQB4         6

Full documentation on query function you will find here

More information

See the installation documentation for more advanced instructions. Also, check the change log at CHANGELOG.rst.

UniProt tools and licence (use of data)

UniProt provides also many online query interfaces on their website.

Please be aware of the UniProt licence.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
Topic
- Database
- Scientific/Engineering :: Bio-Informatics

Release history Release notifications | RSS feed

This version

0.0.10

Aug 28, 2017

0.0.9

Aug 22, 2017

0.0.8

Aug 22, 2017

0.0.7

Aug 22, 2017

0.0.6

Aug 21, 2017

0.0.5

Aug 21, 2017

0.0.4

Aug 21, 2017

0.0.3

Aug 21, 2017

0.0.2

Aug 21, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyUniProt-0.0.10.tar.gz (1.7 MB view details)

Uploaded Aug 28, 2017 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

PyUniProt-0.0.10-py3-none-any.whl (36.6 kB view details)

Uploaded Aug 28, 2017 Python 3

File details

Details for the file PyUniProt-0.0.10.tar.gz.

File metadata

Download URL: PyUniProt-0.0.10.tar.gz
Upload date: Aug 28, 2017
Size: 1.7 MB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for PyUniProt-0.0.10.tar.gz
Algorithm	Hash digest
SHA256	`433d2ae73ca05b9952ca9e346e5ddd940e3c1a7f07aba9186cd3bc7c730df6c6`
MD5	`fe7be155c049869a71cdd34409026047`
BLAKE2b-256	`db0d88ddb56df96eae453f609cec6dae4176bbc781b04d6c66bbfbb7d738c1df`

See more details on using hashes here.

File details

Details for the file PyUniProt-0.0.10-py3-none-any.whl.

File metadata

Download URL: PyUniProt-0.0.10-py3-none-any.whl
Upload date: Aug 28, 2017
Size: 36.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for PyUniProt-0.0.10-py3-none-any.whl
Algorithm	Hash digest
SHA256	`da6940f6b8ab96e9c7847e72e79694c77c77ced280aaa0b178e5ee1d4b1a59b2`
MD5	`b9e947232664713ee26d5828feae288c`
BLAKE2b-256	`1aa7c196ad1877cf0b69c293df79ccad540bab827de861c2fd4f91ccc96c1c69`

See more details on using hashes here.

PyUniProt 0.0.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Supported databases

Getting Started

Installation

SQLite

MySQL/MariaDB setup

Updating

Quick start with query functions

More information

UniProt tools and licence (use of data)

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes