A package to facilitate making API requests to the IMPC Solr API

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

Project description

IMPC_API

impc_api is a Python package which provides several helper functions that wrap around the IMPC SOLR API. The functions in this package are intended for use in a Jupyter Notebook.

Installation Instructions

Ensure that Python is installed on your system. The minimum required version is 3.10.
Create a virtual environment (optional but recommended): On Mac or Linux:

python3 -m venv .venv
source .venv/bin/activate

Install the package: pip install impc_api
Run the Jupyter Notebook: jupyter notebook

After executing the command, the Jupyter interface should open in your browser. If it does not, follow the instructions provided in the terminal.

Try it out:

Create a Jupyter Notebook and try some of the examples below:

Available functions

The available functions can be imported as:

from impc_api import solr_request, batch_solr_request

1. Solr request

The most basic request to the IMPC solr API

num_found, df = solr_request(
    core='genotype-phenotype', 
    params={
        'q': '*:*',
        'rows': 10, 
        'fl': 'marker_symbol,allele_symbol,parameter_stable_id'
    }
)

a. Facet request

solr_request allows facet requests

num_found, df = solr_request(
    core="genotype-phenotype",
    params={
         "q": "*:*",
         "rows": 0,
         "facet": "on",
         "facet.field": "zygosity",
         "facet.limit": 15,
         "facet.mincount": 1,
    }
)

b. Solr request validation

A common pitfall when writing a query is the misspelling of core and fields arguments. For this, we have included a validate argument that raises a warning when these values are not as expected. Note this does not prevent you from executing a query; it just alerts you to a potential issue.

Core validation

num_found, df = solr_request(
    core='invalid_core',
    params={
        'q': '*:*',
        'rows': 10
    },
    validate=True
)

> InvalidCoreWarning: Invalid core: "invalid_core", select from the available cores:
> dict_keys(['experiment', 'genotype-phenotype', 'impc_images', 'phenodigm', 'statistical-result'])

Field list validation

num_found, df = solr_request(
    core='genotype-phenotype',
    params={
        'q': '*:*',
        'rows': 10,
        'fl': 'invalid_field,marker_symbol,allele_symbol'
    },
    validate=True
)
> InvalidFieldWarning: Unexpected field name: "invalid_field". Check the spelling of fields.
> To see expected fields check the documentation at: https://www.ebi.ac.uk/mi/impc/solrdoc/

c. URL only

Users might want help producing the URL to fetch the data without the need of a DataFrame. Use the flag url_only=True to print or return the URL for your query.

url, _ = solr_request(
    core='genotype-phenotype',
    params={
        'q': '*:*',
        'rows': 10,
        'fl': 'marker_symbol,allele_symbol'
    },
    url_only=True
)
> "https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=%2A%3A%2A&rows=10&fl=marker_symbol%2Callele_symbol"

print(url)
> "https://www.ebi.ac.uk/mi/impc/solr/genotype-phenotype/select?q=%2A%3A%2A&rows=10&fl=marker_symbol%2Callele_symbol"

2. Batch Solr Request

batch_solr_request is available for large queries. This solves issues where a request is too large to fit into memory or where it puts a lot of strain on the API.

Use batch_solr_request for:

Large queries (>100,000 rows)
Querying multiple items in a list
Downloading data in json or csv format.

Large queries

For large queries you can choose between seeing them in a DataFrame or downloading them in json or csv format.

a. Large query - see in DataFrame

This will fetch your data using the API responsibly and return a Pandas DataFrame

When your request is larger than recommended and you have not opted for downloading the data, a warning will be presented and you should follow the instructions to proceed.

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*'
    },
    download=False,
    batch_size=30000
)
print(df.head())

b. Large query - Download

When using the download=True option, a file with the requested information will be saved as filename. The format is selected based on the wt parameter. A DataFrame may be returned, provided it does not exceed the memory available on your laptop. If the DataFrame is too large, an error will be raised. For these cases, we recommend you read the downloaded file in batches/chunks.

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*',
        'wt':'csv'
    },
    download=True,
    filename='geno_pheno_query',
    batch_size=100000
)
print(df.head())

c. Query by multiple values

batch_solr_request also allows to search multiple items in a list provided they belong to them same field. Pass the list to the field_list param and specify the type of fl in field_type.

# List of gene symbols
genes = ["Zfp580", "Firrm", "Gpld1", "Mbip"]

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*',
        'fl': 'marker_symbol,mp_term_name,p_value',
        'field_list': genes,
        'field_type': 'marker_symbol'
    },
    download = False
)
print(df.head())

This can be downloaded too:

# List of gene symbols
genes = ["Zfp580", "Firrm", "Gpld1", "Mbip"]

df = batch_solr_request(
    core='genotype-phenotype',
    params={
        'q':'*:*',
        'fl': 'marker_symbol,mp_term_name,p_value',
        'field_list': genes,
        'field_type': 'marker_symbol'
    },
    download = True,
    filename='gene_list_query'
)
print(df.head())

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

dpavam marina-kan

Release history Release notifications | RSS feed

This version

1.0.7

Apr 22, 2025

1.0.6

Oct 30, 2024

1.0.5

Oct 28, 2024

1.0.4

Oct 24, 2024

1.0.3

Oct 24, 2024

1.0.1

Oct 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

impc_api-1.0.7.tar.gz (26.3 kB view details)

Uploaded Apr 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

impc_api-1.0.7-py3-none-any.whl (18.4 kB view details)

Uploaded Apr 22, 2025 Python 3

File details

Details for the file impc_api-1.0.7.tar.gz.

File metadata

Download URL: impc_api-1.0.7.tar.gz
Upload date: Apr 22, 2025
Size: 26.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for impc_api-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`b8eceb8022a3f226a1b92ee5761abb429b72927fc15c2b6d4dc0e7255c5c41af`
MD5	`923725943f3c5833d148c3c6e8ad9310`
BLAKE2b-256	`3233304861c554c3195471b8e617471e82e47711b4aa32bf12cf705f65eda79a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for impc_api-1.0.7.tar.gz:

Publisher: publish.yml on mpi2/impc-api

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: impc_api-1.0.7.tar.gz
- Subject digest: b8eceb8022a3f226a1b92ee5761abb429b72927fc15c2b6d4dc0e7255c5c41af
- Sigstore transparency entry: 200600629
- Sigstore integration time: Apr 22, 2025
Source repository:
- Permalink: mpi2/impc-api@f037b41391e11f23e6f931efe757c95adc7e0c52
- Branch / Tag: refs/tags/v1.0.6
- Owner: https://github.com/mpi2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f037b41391e11f23e6f931efe757c95adc7e0c52
- Trigger Event: release

File details

Details for the file impc_api-1.0.7-py3-none-any.whl.

File metadata

Download URL: impc_api-1.0.7-py3-none-any.whl
Upload date: Apr 22, 2025
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for impc_api-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`18dc4698e14a456cc78c85155d64cb089bf3c5172b761f9071fa9bb9eee6a288`
MD5	`c4c6f263aa48b479f58c013e1b51d8d4`
BLAKE2b-256	`c8deb1d06075d25deefe39eeb32631e66d6bf78d638dfab7a2e352bc33eca95e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for impc_api-1.0.7-py3-none-any.whl:

Publisher: publish.yml on mpi2/impc-api

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: impc_api-1.0.7-py3-none-any.whl
- Subject digest: 18dc4698e14a456cc78c85155d64cb089bf3c5172b761f9071fa9bb9eee6a288
- Sigstore transparency entry: 200600632
- Sigstore integration time: Apr 22, 2025
Source repository:
- Permalink: mpi2/impc-api@f037b41391e11f23e6f931efe757c95adc7e0c52
- Branch / Tag: refs/tags/v1.0.6
- Owner: https://github.com/mpi2
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f037b41391e11f23e6f931efe757c95adc7e0c52
- Trigger Event: release

impc-api 1.0.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

IMPC_API

Installation Instructions

Available functions

1. Solr request

a. Facet request

b. Solr request validation

Core validation

Field list validation

c. URL only

2. Batch Solr Request

Large queries

a. Large query - see in DataFrame

b. Large query - Download

c. Query by multiple values

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance