Skip to main content

A package to facilitate making API requests to the IMPC Solr API

Project description

IMPC_API

impc_api is a Python package.

The functions in this package are intended for use in a Jupyter Notebook.

Installation Instructions

  1. Create a virtual environment (optional but recommended): On Mac:
python3 -m venv .venv
source .venv/bin/activate
  1. Install the package: pip install impc_api
  2. Install Jupyter: pip install jupyter
  3. Run the Jupyter Notebook: jupyter notebook

After executing the command, the Jupyter interface should open in your browser. If it does not, follow the instructions provided in the terminal.

  1. Try it out:

Create a Jupyter Notebook and try some of the examples below:

Available functions

The available functions can be imported as:

from impc_api import solr_request, batch_solr_request

1. Solr request

The most basic request to the IMPC solr API

num_found, df = solr_request(
    core='genotype-phenotype', 
    params={
        'q': '*:*',
        'rows': 10, 
        'fl': 'marker_symbol,allele_symbol,parameter_stable_id'
    }
)

a. Facet request

solr_request allows facet requests

num_found, df = solr_request(
    core="genotype-phenotype",
    params={
         "q": "*:*",
         "rows": 0,
         "facet": "on",
         "facet.field": "zygosity",
         "facet.limit": 15,
         "facet.mincount": 1,
    }
)

b. Solr request validation

A common pitfall when writing a query is the misspelling of core and fields arguments. For this, we have included a validate argument that raises a warning when these values are not as expected. Note this does not prevent you from executing a query; it just alerts you to a potential issue.

Core validation

num_found, df = solr_request(
    core='invalid_core',
    params={
        'q': '*:*',
        'rows': 10
    },
    validate=True
)

> InvalidCoreWarning: Invalid core: "genotype-phenotyp", select from the available cores:
> dict_keys(['experiment', 'genotype-phenotype', 'impc_images', 'phenodigm', 'statistical-result'])

Field list validation

num_found, df = solr_request(
    core='genotype-phenotype',
    params={
        'q': '*:*',
        'rows': 10,
        'fl': 'invalid_field,marker_symbol,allele_symbol'
    },
    validate=True
)
> InvalidFieldWarning: Unexpected field name: "invalid_field". Check the spelling of fields.
> To see expected fields check the documentation at: https://www.ebi.ac.uk/mi/impc/solrdoc/

2. Batch Solr Request

batch_solr_request is available for large queries. This solves issues where a request is too large to fit into memory or where it puts a lot of strain on the API.

Use batch_solr_request for:

  • Large queries (>1,000,000)
  • Querying multiple items in a list
  • Downloading data in json or csv format.

Large queries

For large queries you can choose between seeing them in a DataFrame or downloading them in json or csv format.

a. Large query - see in DataFrame

This will fetch your data using the API responsibly and return a Pandas DataFrame

When your request is larger than recommended and you have not opted for downloading the data, a warning will be presented and you should follow the instructions to proceed.

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*'
    },
    download=False,
    batch_size=30000
)
print(df.head())

b. Large query - Download

When using the download=True option, a file with the requested information will be saved as filename. The format is selected based on the wt parameter. A DataFrame may be returned, provided it does not exceed the memory available on your laptop. If the DataFrame is too large, an error will be raised. For these cases, we recommend you read the downloaded file in batches/chunks.

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*',
        'wt':'csv'
    },
    download=True,
    filename='geno_pheno_query',
    batch_size=100000
)
print(df.head())

c. Query by multiple values

batch_solr_request also allows to search multiple items in a list provided they belong to them same field. Pass the list to the field_list param and specify the type of fl in field_type.

# List of gene symbols
genes = ["Zfp580", "Firrm", "Gpld1", "Mbip"]

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*',
        'fl': 'marker_symbol,mp_term_name,p_value',
        'field_list': genes,
        'field_type': 'marker_symbol'
    },
    download = False
)
print(df.head())

This can be downloaded too:

# List of gene symbols
genes = ["Zfp580", "Firrm", "Gpld1", "Mbip"]

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*',
        'fl': 'marker_symbol,mp_term_name,p_value',
        'field_list': genes,
        'field_type': 'marker_symbol'
    },
    download = True,
    filename='gene_list_query'
)
print(df.head())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impc_api-1.0.5.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

impc_api-1.0.5-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file impc_api-1.0.5.tar.gz.

File metadata

  • Download URL: impc_api-1.0.5.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for impc_api-1.0.5.tar.gz
Algorithm Hash digest
SHA256 64adb02df7705538d92531e900f037be11ecf1a3f2f53e3890d4e479ad0c567c
MD5 b9bb4edb79ec8af0c13c4824fa212bcc
BLAKE2b-256 40e17f4ed6f172c6d3dc78af0c9a1284048685af5839712f4d6b73f410151c01

See more details on using hashes here.

File details

Details for the file impc_api-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: impc_api-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for impc_api-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3362f46c5cb410270c68045effc7e564b7b6bb4ac1d719991857d23336f797b0
MD5 44fed2355b2d446b1630008652d7ca7c
BLAKE2b-256 f026daa64ee5a7ac7015c51f2ec25c1c188ec9925e9079415a7c61785ca723f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page