A Lakehouse LLM Explorer. Wrapper for spark, databricks and langchain processes
Project description
Occlusion LLM Explorer
Lakehouse Analytics & Advanced ML
Setup
Important This package requires Open AI & HuggingFace API key. Remember to run from a folder with the .streamlit/secrets.toml
file.
See here and here for more details.
Quick Install
python -m pip install llm-explorer
llm_explorer
Initial load could take some time as it downloads the model and the tokenizer. Remember to include the secrets.toml file under .streamlit/ folder.
Build from source
Clone the repository
git clone https://github.com/Occlusion-Solutions/llm_explorer.git
Install the package
cd llm_explorer && make install
Run the package
llm_explorer
Build manaually
After cloning, ceate a virtual environment
conda create -n llm_explorer python=3.10
conda activate llm_explorer
Install the requirements
pip install -r requirements.txt
Run the python installation
python setup.py install
llm_explorer
Usage
Use the demo@occlusion.solutions
user and DEMO@occlusion
password to login.
The deployment requires a secrets.toml file created under .streamlit/:
touch .streamlit/secrets.toml
It should have a schema like this:
[connections.openai]
api_key="sk-..." # OpenAI API Key
[connections.huggingface]
api_key="shf_..." # HuggingFace API Key
[connections.databricks]
server_hostname="your databricks host"
http_path="http path under cluster JDBC/ODBC connectivity"
access_token="your databricks access token"
Run Modes
Chain
An assistant Query engine, that is asked naturally with table references and helps in the query generation. The execution of the queries is manual
Agent
It uses the pandas agent to generate the queries and execute them. It is a more natural way of querying the data and it operates autonomously until it thinks it finds and answer.
Chat
It uses the HuggingFace Transformers Agent chat to operate in a conversational way.
Lakehouse Agent Sample
Agent is queried for the top 10 producing wells. It identifies the tables it has access to and understands that the request could be satified by the padalloc table. It then creates a query that returns the top 10 producing assets and return the results.
> Entering new AgentExecutor chain...
Observation: logs, wells
Thought: I should look at the schema of the microchip_logs and padalloc tables to see what columns I can use.
Action: schema_sql_db
Action Input: "wells"
Observation: DDL
Thought: I should query the padalloc table to get the top 10 producing wells.
Action: query_sql_db
Action Input: "SELECT WELL_CODE, SUM(PROD_GAS_VOLUME_MCF) AS total_gas_volume_mcf FROM padalloc GROUP BY WELL_CODE ORDER BY total_gas_volume_mcf DESC LIMIT 10"
Observation: results_dataframe
Thought: I now know the top 10 producing wells.
Final Answer: The top 10 producing wells are 1222344, 1212560, 1222345, 1212503, 1222335, 1222340, 1222338, 1222367, 1220189, and 1222352.
> Finished chain.
Attribution
This is an adapted implementation from the GitHub repository. See the contibutions list for more details:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file llm_explorer-0.0.9.tar.gz
.
File metadata
- Download URL: llm_explorer-0.0.9.tar.gz
- Upload date:
- Size: 37.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1a594484392051eaa356561cf0d3228e0cca08ee8e1fc91eb902fbe5a770a7b5 |
|
MD5 | 21ec833b411fcb963c078241a442c93c |
|
BLAKE2b-256 | aeb29d29d3a88b6a89b1b75ab0f538ced10c6ea8ae714020929e7501753d7eeb |
File details
Details for the file llm_explorer-0.0.9-py2.py3-none-any.whl
.
File metadata
- Download URL: llm_explorer-0.0.9-py2.py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 697d6b3a031d6da3452e5ebd0616eff276967664806db776b142c9babdd57856 |
|
MD5 | 3f668b8f5c4b9904a00961c1d10335b5 |
|
BLAKE2b-256 | 657743ddb10746da3aa87d2ad90f1e0acd04fc0fa80aa7827eb76f40f76a11e0 |