Skip to main content

RAG-LAB is an open-source lighter, faster and cheaper RAG toolkit supported by Target Pilot, designed to transform the latest RAG concepts into stable and practical engineering tools. The project currently supports GraphRAG and HybridRAG.

Project description


RAG-LAB is an open-source lighter, faster and cheaper RAG toolkit supported by TargetPilot, designed to transform the latest RAG concepts into stable and practical engineering tools. The project currently supports GraphRAG and HybridRAG. Welcome to star our RAG-Lab!

To install: pip install raglab2

About TargetPilot

TargetPilot is a company focused on empowering the e-commerce sector with artificial intelligence. TargetPilot OnlineAssistant has an industry leading RAG technology solution, feel free to click on the link.

Table of Contents

Goals

The primary goal of RAG-LAB is to explore the latest RAG technologies and convert them into the most stable engineering tools. We aim to:

  • Lighter: Use pure Python to design tools specifically for RAG functionality without integrating unnecessary third-party packages (text chunker, LLM integration, etc. can be done using Unstructured, Langchain, LlamaIndex, etc., or the simplified text chunker functions we provide).
  • Faster: Multiple threads can be selected for acceleration.
  • Cheaper: Focus on low-cost development and achieve the best functionality with minimal LLM token consumption.
  • Innovate: Continuously integrate the latest research and advancements in RAG.

Features

GraphRAG (Welcome to try!)

Proposed by Microsoft, GraphRAG integrates graph-based approaches into RAG, offering several key advantages:

  • Enhanced Data Relationships: By leveraging graph structures, GraphRAG can better capture and utilize the relationships between different data points, leading to more accurate and insightful results.
  • Scalability: GraphRAG is designed to handle large-scale data efficiently, making it suitable for applications with extensive datasets.
  • Flexibility: The graph-based approach allows for more flexible data modeling, accommodating complex and dynamic data structures.
  • Improved Query Performance: GraphRAG can optimize query performance by efficiently navigating through the graph, reducing the time required to retrieve relevant information.

HybridRAG (In Progress)

Proposed by Intel, HybridRAG combines different RAG methodologies to enhance performance and flexibility. Its advantages include:

  • Versatility: HybridRAG integrates multiple RAG techniques, allowing it to adapt to various types of data and use cases.
  • Performance Optimization: By combining the strengths of different RAG methods, HybridRAG can achieve higher performance and accuracy in data retrieval and analysis.
  • Robustness: The hybrid approach ensures that the system remains robust and reliable, even when dealing with diverse and complex datasets.
  • Customizability: Users can customize HybridRAG to fit specific requirements, making it a versatile tool for a wide range of applications.

Quick Start Guide

This quick start guide walks you through the process of chunking text, generating expert descriptions, detecting language, creating and disambiguating entity and relationship graphs, generating community reports, saving the graph to a file, and visualizing the knowledge graph. Follow these steps to efficiently process and visualize your data.

For your reference, you can find the code example in:

  • Graph indexing: quick_start_index.py.
  • Searh: quick_start_search.py. In fact, you can implement the SEARCH part of the code according to the searching function example, following the examples we have given and the instructions for each step. This will allow you to use the graph to accommodate more databases and to achieve higher performance searches.

Step-by-Step Instructions (Indexing)

  1. Import tools from raglab

    from raglab.graphrag import (
        disambiguate_entity_executor, 
        disambiguate_relationship_executor, 
        generate_community_reports_executor, 
        generate_entire_chunk_graph_executor,
        detect_text_language,
        generate_expert,
        graph_save_json,
    )
    from raglab.graphrag.visual import (
        visualize_knowledge_graph_echart,
        visualize_knowledge_graph_network_x
    )
    
    # the fast and light text spilter with regex, which is powered by JinaAI. You can explore it in https://jina.ai/segmenter/
    # Also you can use Unstructured, Langchain, LlamaIndex to replace it.
    from raglab.chunk import (
        chuncking_executor, # for English
        character_chunking_executor # for languages exclude English
    )
    
    # import llm from `raglab.llms` or `langchain.llms`.
    # Or You can implement the `llm.invoke` method yourself by inheriting the `LLMBase` class.
    from raglab.llms import (
        AzureOpenAILLM,
        LLMBase
    )
    
    # Also, you can implement the `embed.embed_query` method yourself by inheriting the `EmbeddingBase` class. Or just import it from `raglab.embeddings` or `langchain.embeddings`
    from raglab.embeddings import (
        AzureOpenAIEmbedding, 
        EmbeddingBase
    )
    
  2. Chunking the Text

    the fast and light text spilter with regex, which is powered by JinaAI. You can explore it in https://jina.ai/segmenter/

    # for English, you can use the function `chuncking_executor`
    chunks = chuncking_executor(text=entire_document, max_chunk_size=1000, remove_line_breaks=True)
    chunk_ids = [str(uuid.uuid4()) for _ in range(len(chunks))]
    
    # for Chinese, you can use the function `chuncking_executor`
    chunks = character_chunking_executor(text=entire_document, max_chunk_size=500, remove_line_breaks=True)
    
  3. [Options] Generating Expert Description

    expert = generate_expert(aoai_llm, chunks)
    
  4. [Options] Detecting Language

    language = detect_text_language(aoai_llm, chunks)
    
  5. Generating Entity and Relationship Graph

    entities, relations = generate_entire_chunk_graph_executor(aoai_llm, chunks, chunk_ids, expert, language, strategy, muti_thread)
    
  6. Disambiguating Entities and Relationships

    entities, relations = disambiguate_entity_executor(aoai_llm, entities, relations, expert, language, strategy)
    relations = disambiguate_relationship_executor(aoai_llm, relations, expert, language, strategy)
    
  7. Generating Community Reports

    community_reports = generate_community_reports_executor(aoai_llm, entities, relations, expert, language, strategy, 5, muti_thread)
    
  8. Generating Embeddings for Entities and Communities

    entities = update_graph_embeddings_executor(aoai_embed, entities, num_threads=muti_thread)
    community_reports = update_graph_embeddings_executor(aoai_embed, community_reports, num_threads=muti_thread)
    
  9. Saving the Graph to a Local File

    ## save graph to local as json file
    graph_save_json(entities, relations, community_reports, os.path.join(graph_filepath, "Gullivers-travels.json"))
    ## or you can convert them to DataFrame, and save them as any table format, like csv, excel and so on.
    entities_df, relations_df, community_reports_df = convert_to_dataframe(entities), convert_to_dataframe(relations), convert_to_dataframe(community_reports)
    entities_df.to_csv(os.path.join(graph_filepath, "Gullivers-travels-entities.csv"), index=False)
    relations_df.to_csv(os.path.join(graph_filepath, "Gullivers-travels-relationships.csv"), index=False)
    community_reports_df.to_csv(os.path.join(graph_filepath, "Gullivers-travels-communities.csv"), index=False)
    
  10. [Options] Visualizing the Knowledge Graph

    visualize_knowledge_graph_echart(entities, relations)
    visualize_knowledge_graph_network_x(entities, relations)
    

Step-by-Step Instructions (Search)

  1. Import search tools from raglab

    from raglab.graphrag import (
        graph_load_json
    )
    from raglab.graphrag.search_functions import (
        generate_final_answer_prompt,
        select_community,
        select_entities,
        select_relations
    )
    
    # import llm from `raglab.llms` or `langchain.llms`.
    # Or You can implement the `llm.invoke` method yourself by inheriting the `LLMBase` class.
    from raglab.llms import AzureOpenAILLM
    
    # Also, you can implement the `embed.embed_query` method yourself by inheriting the `EmbeddingBase` class. Or just import it from `raglab.embeddings` or `langchain.embeddings`
    from raglab.embeddings import AzureOpenAIEmbedding
    
  2. Load graph objects

    graph_filepath = "./examples/graphfiles/Gullivers-travels.json"
    entities, relations, communities = graph_load_json(graph_filepath)
    entity_select_num = 5
    
  3. Embed the query and select the most similar search entities

    query = "Who is the king of Lilliput?"
    query_embed = aoai_embed.embed_query(query)
    selected_entity = select_entities(query_embed, entities)
    
  4. Select all relationships from selected entities

    selected_relations = select_relations(selected_entity, relations)
    
  5. Select correct community from selected entities

    selected_commnity = select_community(query_embed, selected_entity, communities)
    
  6. Generate the final answer

    prompt = generate_final_answer_prompt(query, selected_entity, selected_relations, selected_commnity)
    final_answer = aoai_llm.invoke(prompt)
    print(f"Final answer: {final_answer}")
    

Contributing

We welcome contributions from the community. All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

License

This project is licensed under the Apache 2.0 License.

Contact

For more information, please contact us at pxgong@targetpilot.ai, vincentpo@targetpilot.ai.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raglab2-0.2.4.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raglab2-0.2.4-py3-none-any.whl (45.7 kB view details)

Uploaded Python 3

File details

Details for the file raglab2-0.2.4.tar.gz.

File metadata

  • Download URL: raglab2-0.2.4.tar.gz
  • Upload date:
  • Size: 38.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for raglab2-0.2.4.tar.gz
Algorithm Hash digest
SHA256 af27d52e90c99ccf151eddcf019281dd08cb6dc8f9a94ce121efbbb8f37024f1
MD5 a124eae634120aa1bc53a5e2766ef936
BLAKE2b-256 967013820bd61a302b744d37538497be42b78f702c5a878b81f3e75930fccc85

See more details on using hashes here.

File details

Details for the file raglab2-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: raglab2-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 45.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for raglab2-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ec93e0b60872615a462767d9cc1944ff4c2a512f787e890a390c02b80e13ce2f
MD5 28ffc4da39058673d40369337d1ebbb0
BLAKE2b-256 aa8f0793a9a21bf570d73b2e7bacb4f1e6719081939b00b17621570cb12aeacd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page