Skip to main content

A SDK for computational chemistry LLM hackthon

Project description

QDX ChemLLMHack SDK

Welcome to the QDX Computational Chemistry and Large Language Model (LLM) Hackathon! This SDK is tailored specifically for the hackathon and is designed to seamlessly integrate with the most advanced RUSH's computational cloud platform (https://rush.cloud). It enables the community to effortlessly develop and apply cutting-edge LLM and Artificial Intelligence (AI) technologies in computational chemistry.

Installation

To install the QDX ChemLLMHack SDK, simply run the following command:

pip install chemllmhack

Features

  • Get help information about the Rex language. Retrieve specific expressions used in RUSH modules from Rex Database.

  • Download the paper dataset and Chroma vector database. The Paper Dataset comprises a comprehensive collection of scientific papers sourced from open-access databases including arXiv, bioRxiv, chemRxiv, and medRxiv. Below are the statistics for the dataset:

    Tool Number of Papers
    MMseqs2 1000
    PLIP 2000
    Gina 3000
    RDock 4000
    Auto3d 5000
  • Use MongoDB Atlas cloud vector database to retrieve vectors and query online.

  • Submit your Rex expression to the RUSH platform.

  • Retrieve the results and stats against benchmark of your submitted Rex expression.

QDX Hackathon Setup Guide

Prerequisites

Before you begin, make sure you have a Google account. You'll need this to register for the QDX Hackathon. You also need a OPENAI_API_KEY set up in your environment. You will be granted a RUSH token, make sure you set it up in your environment.

TENGU_TOKEN=<your-rush-token>

Registration

Register for the QDX Hackathon here.

Getting Help Information for Rex Language

To get help information about the Rex language, use the following command:

chemllmhack --rex-help language

Retrieving Specific Rex Expressions

To retrieve a specific Rex expression associated with a module, use the command below:

chemllmhack --rex-help expression -rex <module_name>

Replace <module_name> with the actual name of the module you're interested in. or you could use python language to query the rex expression:

from chemllmhack import get_rex_expression
get_rex_expression('module_name')

Querying with natural language

The SDK allows a LLM friendly way to query, to query with natural language, use the following command:

from chemllmhack import query
query('your-natural-language-query')

Submitting Rex Expressions to RUSH

To submit a Rex expression to the RUSH platform, use the following command:

from chemllmhack import submit_rex_expression
submit_rex_expression('your-rex-expression')

Downloading Datasets

you can download the necessary datasets:

  • Paper Dataset
  • Chroma Vector Database

Configuring Google Cloud CLI

To interact with Cloud Storage using the Google Cloud CLI, follow these steps:

  1. Run the following command to authenticate:
    sudo gcloud auth login
    
    or
    sudo gcloud auth application-default login
    
  2. Provide the path to your credentials file. Typically, it is located at:
    /Users/<your_user_name>/.config/gcloud/application_default_credentials.json
    

Replace <your_user_name> with your actual username on your system. Make sure you grant appropriate permissions to the json file. For more information, refer to the Google Cloud CLI documentation.

Downloading Datasets with the Command Line

Use the following command to initiate the download:

command-to-download-datasets

Downloading Datasets with functions

from chemllmhack import download_vector_db
download_vector_db(credential_path='your-credential-path', destination_file_name='your-destination-file-name')

Your Task

Build an AI experiment system with the provided paper database and RAG Database LLM and RUSH platform to beat the benchmark.

Contributing

We welcome contributions from the community. If you'd like to contribute, please fork the repository and use a feature branch. Pull requests are warmly welcome.

Contact Information

For any questions or comments, please email bowen.zhang@qdx.co. Alternatively, you can open an issue in this repository's issue tracker.

Acknowledgments

Thanks to everyone participating in the development and use of this SDK. We hope it serves you well in the QDX Hackathon.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemllmhack-0.1.6.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemllmhack-0.1.6-py3-none-any.whl (6.5 MB view details)

Uploaded Python 3

File details

Details for the file chemllmhack-0.1.6.tar.gz.

File metadata

  • Download URL: chemllmhack-0.1.6.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.5.0

File hashes

Hashes for chemllmhack-0.1.6.tar.gz
Algorithm Hash digest
SHA256 600e86901029944e11f5a4a1b1d842621b2ea015877405a1d481a2f8e9bda84d
MD5 b05ca53e6df3372572d276e8506e7864
BLAKE2b-256 153b579d9248d4b7f5ecde20711d716bd5e5412899fde2e2563c85dbb38353a8

See more details on using hashes here.

File details

Details for the file chemllmhack-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: chemllmhack-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 6.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.5.0

File hashes

Hashes for chemllmhack-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0a31211100c9c684bc73bc8bbd0084bdb6619f858f0bbcfb4d251fdbf4a2cb18
MD5 cdf7a5c0653abbdfd5c8c981d84eec9b
BLAKE2b-256 aa9be123b2a447e39448f0cc23e1e4b14028ee44335e1227858afe74e2aae1cf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page