Python REST API for Entrez E-Utilities: stateless, easy to use, reliable.
Project description
easy-entrez
Python REST API for Entrez E-Utilities, aiming to be easy to use and reliable.
Easy-entrez:
- makes common tasks easy thanks to simple Pythonic API,
- is typed and integrates well with mypy,
- is tested on Windows, Mac and Linux across Python 3.6, 3.7, 3.8 and 3.9,
- is limited in scope, allowing to focus on the reliability of the core code,
- does not use the stateful API as it is error-prone as seen on example of the alternative entrezpy.
Status: beta (pending tutorial write-up and documentation improvements before official release).
from easy_entrez import EntrezAPI
entrez_api = EntrezAPI(
'your-tool-name',
'e@mail.com',
# optional
return_type='json'
)
# find up to 10 000 results for cancer in human
result = entrez_api.search('cancer AND human[organism]', max_results=10_000)
# data will be populated with JSON or XML (depending on the `return_type` value)
result.data
See more in the Demo notebook and documentation.
For a real-world example (i.e. used for this publication) see notebooks in multi-omics-state-of-the-field repository.
Example: fetching genes for a variant from dbSNP
Fetch the SNP record for rs6311
:
rs6311 = entrez_api.fetch(['rs6311'], max_results=1, database='snp').data[0]
rs6311
Display the result:
from xml.dom import minidom
from xml.etree import ElementTree
def xml_to_sting(element):
return (
minidom.parseString(ElementTree.tostring(element))
.toprettyxml(indent=' ' * 4)
)
print(xml_to_sting(rs6311))
Find the gene names for rs6311
:
namespaces = {'ns0': 'https://www.ncbi.nlm.nih.gov/SNP/docsum'}
genes = [
name.text
for name in rs6311.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
]
print(genes)
['HTR2A']
Fetch data for multiple variants at once:
result = entrez_api.fetch(['rs6311', 'rs662138'], max_results=10, database='snp')
gene_names = {
'rs' + document_summary.get('uid'): [
element.text
for element in document_summary.findall('.//ns0:GENE_E/ns0:NAME', namespaces)
]
for document_summary in result.data
}
print(gene_names)
{'rs6311': ['HTR2A'], 'rs662138': ['SLC22A1']}
Example: obtaining the SNP rs ID number from chromosomal position
You can use the query string directly:
results = entrez_api.search(
'13[CHROMOSOME] AND human[ORGANISM] AND 31873085[POSITION]',
database='snp',
max_results=10
)
print(results.data['esearchresult']['idlist'])
['59296319', '17076752', '7336701', '4']
Or pass a dictionary (no validation of arguments is performed, AND
conjunction is used):
results = entrez_api.search(
dict(chromosome=13, organism='human', position=31873085),
database='snp',
max_results=10
)
print(results.data['esearchresult']['idlist'])
['59296319', '17076752', '7336701', '4']
The base position should use the latest genome assembly (GRCh38 at the time of writing);
you can use the position in previous assembly coordinates by replacing POSITION
with POSITION_GRCH37
.
For more information of the arguments accepted by the SNP database see the entrez help page on NCBI website.
Installation
Requires Python 3.6+. Install with:
pip install easy-entrez
If you wish to enable (optional, tqdm-based) progress bars use:
pip install easy-entrez[with_progress_bars]
Alternatives:
You might want to try:
- biopython.Entrez - biopython is a heavy dependency, but probably good choice if you already use it
- pubmedpy - provides interesting utilities for parsing the responses
- entrez - appears to have a comparable scope but quite different API
I have tried and do not recommend:
- entrezpy - in addition to the history problems, watch out for documentation issues and basically no reaction to pull requests.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for easy_entrez-0.3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aed6d709807b3747ede979166c3b4e3da564f6b2924d46b561a15abe22bcf140 |
|
MD5 | 60217e5c364aae63611c9ae3100579ff |
|
BLAKE2b-256 | 59c1745a41ada25b56d75b02dcd5be76eef42847ab0e1658783d4d5f0ecb895d |