Skip to main content

GeoVectorSearch is a lightweight Python SDK and command-line tool for semantic discovery of GEO datasets suitable for differential gene expression analysis. Powered by FAISS-based vector search and optional GPT-based filtering, it helps researchers and developers quickly identify relevant RNA-seq or microarray datasets.

Project description

🧬 GeoVectorSearch

GeoDatasetFinder is a lightweight Python SDK and command-line tool for discovering high-quality GEO gene expression datasets relevant to a disease or biological condition — optimized for differential expression (DE) analysis.

It combines semantic search using sentence embeddings with optional GPT-based filtering to help you rapidly identify suitable datasets for your research or pipeline.


🔍 Features

  • Natural language search for GEO datasets
  • Fast vector search using FAISS and prebuilt sentence embeddings
  • 🧠 Optional GPT filtering to assess dataset quality for DE analysis
  • 🧬 Supports microarray and RNA-seq datasets
  • 🖥️ Interactive CLI for a smooth user experience
  • 🧩 Easy to integrate into larger pipelines or SDKs
  • 💾 Save results locally for downstream analysis

📦 Installation

Install using your preferred package manager:

uv pip install geo-pysearch

Or clone the repository and install locally:

git clone https://github.com/Tinfloz/geo-vector-search.git
cd geo-vector-search
uv pip install .

🧪 Example (Python SDK)

from geo_pysearch.sdk import search_datasets

results = search_datasets(
    query="duchenne muscular dystrophy",
    dataset_type="microarray",
    top_k=50,
    use_gpt_filter=True,
    return_all_gpt_results=True
)

print(results.head())

Convenience methods:

from geo_pysearch.sdk import search_microarray, search_rnaseq

search_microarray("breast cancer")
search_rnaseq("lung fibrosis", use_gpt_filter=True)

💻 Example (CLI)

Launch the interactive CLI:

geo-search
  • Use the arrow keys to select dataset type and filtering options
  • Enter your disease query
  • Results will be saved to a local CSV file in a new directory
  • Review and use the datasets for downstream DE analysis

🧠 GPT Filtering (Optional)

If enabled, the SDK uses GPT to evaluate whether each dataset is suitable for differential gene expression analysis. You can configure GPT behavior with:

  • Adjustable confidence thresholds

📁 Project Structure

gse-pysearch/
├── geo_pysearch/
│   ├── data/                # Prebuilt FAISS index, vectors, metadata
│   ├── vector_search/
│   │   ├── vector_search.py
│   │   ├── gpt_filter.py
│   ├── sdk.py               # Main SDK interface
│   └── cli.py               # CLI implementation
├── examples/                # Example usage scripts
├── .env                     # Optional environment variables


🛠️ Requirements

  • Python 3.8+
  • faiss-cpu, pandas, sentence-transformers

📖 License

GNU General Public License v3.0

This project is licensed under the GNU GPLv3, which guarantees end users the freedom to run, study, share, and modify the software.

If you redistribute or modify this software, your contributions must also be licensed under the same terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geo_pysearch-0.1.0.tar.gz (27.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geo_pysearch-0.1.0-py3-none-any.whl (28.2 kB view details)

Uploaded Python 3

File details

Details for the file geo_pysearch-0.1.0.tar.gz.

File metadata

  • Download URL: geo_pysearch-0.1.0.tar.gz
  • Upload date:
  • Size: 27.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.17

File hashes

Hashes for geo_pysearch-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e8d2f12b33d0264546de0108ba5eb1e7c3a94681744af3069fc87a31593a9803
MD5 2f42f8d2266529aa35512a5c98995cd9
BLAKE2b-256 5e0fd55b7d10f67862fb7ab342521a0a22a3ac113eb18218f3b48ab457a43b45

See more details on using hashes here.

File details

Details for the file geo_pysearch-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for geo_pysearch-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41fd7ca0bc655477c01e6819bf429d78222b35e77a0a2659e75d9580e14dfdab
MD5 e610d5ea7d5dda90f0eed5e68bb68d40
BLAKE2b-256 81761f6c91ff4045d7efd43053bf09cf543a66f6d8f2d25285929434efa1a721

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page