This repository contains a project that implements a Retrieval-Augmented Generation (RAG) system using the LLaMA3 model. The project focuses on creating embeddings for instructions of a professional bioinformatic software to help users conduct biology research.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License (GPL)
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Build Tools

Project description

RAG-LLaMA3 AI Project

We utilize Retrieval Augmented Generation on the LLaMA3 model to create an AI agent that can answer questions about bioinformatics software DNALinux. It helps users navigate through a large range of bioinformatics tools. Additionally, you will be able to create a simple RAG AI agent with your own resources.

You can check out our code at our repo.

Quick Start with Pre-installed AI Agent

Install the package:
```
pip install rag-llama3
```
If you are unsure whether you have installed this package, you can use the following command:
```
pip show rag_llama3
```
Ask a question to our pre-installed DNALinux AI agent:
```
rag-llama3 "your question goes here"
```

How to Construct Your Own RAG AI Agent

Make sure that you have already installed the package. There is a Instruction.ipynb you can use to test the code.

Configuration:
- Set up the directory paths for storing your data and Chroma database. You do not need to create them manually; just specify where they should be, and they will be automatically created:
  - input_dir: Directory for PDF, HTML files, and URLs.
  - urls_path: Path to a file named urls.txt where you put all the URLs.
  - output_dir: Directory where TextExtractor will store all .txt files (for debugging purposes).
  - chroma_db_dir: Directory where your Chroma database will be stored.
  - chroma_db_name: Collection name for your Chroma database.
- Note: The embedding model defaults to 'mxbai-embed-large'. Feel free to choose your preferred Ollama embedding model.
```
from rag_llama3 import RAG as rag
from rag_llama3 import TextExtractor as te
from rag_llama3 import VectorDB as vdb
```
Directory Setup:
- Open a Jupyter notebook and run the following code to ensure that your directory is created:
```
test_vector_db = vdb(input_dir, output_dir, urls_path, chroma_db_dir, chroma_db_name)
```
- This will create an object that you can use to manipulate your Chroma vector database. It will automatically create all the directories and an empty Chroma database. If everything is already created, it will not overwrite existing files.
Add Files:
- Place all PDF and HTML files in the input directory. List all URLs in the urls.txt file, each on a new line.
Load Data:
- Use the test_vector_db object to load files into the vector database:
```
test_vector_db.load_data()
```
- Alternatively, load different types of files individually:
```
test_vector_db.load_url()
test_vector_db.load_pdf()
test_vector_db.load_html()
```
- Loading might take some time. After loading, check if the vector database has been populated successfully:
```
test_vector_db.peek()
test_vector_db.show_sources()
```
- Query data from a specific source:
```
test_vector_db.query_sources(source_name)
```
- To delete data from a source:
```
test_vector_db.delete_source(source_name)
```
- Or to clear the entire database (be cautious as this is destructive):
```
test_vector_db.clear_database()
```

Generate Answers:

In a Jupyter notebook, use:

testRAG = rag(input_dir, output_dir, urls_path, chroma_db_dir, chroma_db_name, model)
print(testRAG.generate_answer("Your question goes here"))

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: GNU General Public License (GPL)
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Software Development :: Build Tools

Release history Release notifications | RSS feed

This version

0.1.4.1

Aug 17, 2024

0.1.4

Aug 17, 2024

0.1.3

Aug 17, 2024

0.1.2

Aug 17, 2024

0.1.2.dev4 pre-release

Aug 17, 2024

0.1.2.dev3 pre-release

Aug 17, 2024

0.1.2.dev2 pre-release

Aug 17, 2024

0.1.2.dev1 pre-release

Aug 17, 2024

0.1.1

Aug 17, 2024

0.1.0

Aug 13, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_llama3-0.1.4.1.tar.gz (27.0 kB view details)

Uploaded Aug 17, 2024 Source

Built Distribution

rag_llama3-0.1.4.1-py3-none-any.whl (25.0 kB view details)

Uploaded Aug 17, 2024 Python 3

File details

Details for the file rag_llama3-0.1.4.1.tar.gz.

File metadata

Download URL: rag_llama3-0.1.4.1.tar.gz
Upload date: Aug 17, 2024
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for rag_llama3-0.1.4.1.tar.gz
Algorithm	Hash digest
SHA256	`9bdbc6d0c844a45cde0c2fe18d7a01885ecef2141da1babd6085a82e555f0714`
MD5	`218d9939573e1ea31fe9ce9b6ffef136`
BLAKE2b-256	`09d0ccddf763485d7c0e7ae9857a92059a7604306af85517624ff4157dfd5fa1`

See more details on using hashes here.

File details

Details for the file rag_llama3-0.1.4.1-py3-none-any.whl.

File metadata

Download URL: rag_llama3-0.1.4.1-py3-none-any.whl
Upload date: Aug 17, 2024
Size: 25.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for rag_llama3-0.1.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e2285742e52cbc1b067f105b2874a30202313f7d7df2ce666af968ef7c77f77`
MD5	`daa18007cdd20179f5c936feb5e0ab51`
BLAKE2b-256	`c69085ad90500f449786909cbd1bbd56a4771f79680cb104bddf97b0d9d67c18`