Skip to main content

(v0.1.0: 异步化三大常用函数,增加三条数据结构常有函数)去除了HippoRAG2中的torch,vllm,甚至openai;完全由siliconflow api和本地cpu实现功能。

Project description

HippoRAG 精简版

鉴于许多应用需求轻量级模块,同时api+cpu能够取得不错的效果,特此对hipporag项目进行了一些修改。同时对中文社区(siliconflow api)进行了深入的支持。 尽管相当不完善,但依然具有一定的可用性。

  • v0.0.2更新,出于模块化考虑,我们去除了对环境变量的依赖,而是直接作为参数显式传入即可
  • v0.0.3更新,汉化了提示词
  • v0.1.0更新,异步化三大常用函数(index,delete,retrieve),新增三大数据结构常用函数(save,size,clear)(异步),还定义了迭代器(输出已存储的文档与其哈希值的字典)
  • v0.1.1更新,对偶发的“chunk、ner和triple数量不匹配”问题打上了补丁

快速上手

conda create -n hipporag python=3.10

conda activate hipporag

pip install hipporag-lite

示例:

import multiprocessing
import asyncio

# 定义一个异步主函数来处理所有操作
async def main():
    from hipporag_lite import HippoRAG

    # 准备数据集
    docs = [
        "Oliver Badman is a politician.",
        "George Rankin is a politician.",
        "Thomas Marwick is a politician.",
        "Cinderella attended the royal ball.",
        "The prince used the lost glass slipper to search the kingdom.",
        "When the slipper fit perfectly, Cinderella was reunited with the prince.",
        "Erik Hort's birthplace is Montebello.",
        "Marina is bom in Minsk.",
        "Montebello is a part of Rockland County."
    ]

    save_dir = 'outputs'
    llm_model_name = 'Pro/deepseek-ai/DeepSeek-V3'
    embedding_model_name = 'Qwen/Qwen3-Embedding-8B'
    llm_base_url = 'https://api.siliconflow.cn/v1/chat/completions'
    embedding_base_url = 'https://api.siliconflow.cn/v1/embeddings'

    try:
        hipporag = HippoRAG(
            api_key="Bearer sk-...", # 你的siliconflow api_key
            save_dir=save_dir,
            llm_model_name=llm_model_name,
            embedding_model_name=embedding_model_name,
            llm_base_url=llm_base_url,
            embedding_base_url=embedding_base_url
        )
        print("HippoRAG实例创建成功")
        print(f"初始索引大小: {hipporag.size()} 文档")
    except Exception as e:
        print(f"创建HippoRAG实例失败: {e}")
        return

    # 异步处理索引操作
    try:
        await hipporag.index(docs=docs)  # 使用await调用异步方法
        print(f"索引操作完成,当前大小: {hipporag.size()} 文档")
    except Exception as e:
        print(f"索引失败: {e}")

    try:
        await hipporag.save()  # 异步保存
        print("系统状态保存成功")
    except Exception as e:
        print(f"保存失败: {e}")

    # 处理查询
    queries = [
        "What is George Rankin's occupation?",
        "How did Cinderella reach her happy ending?",
        "What county is Erik Hort's birthplace a part of?"
    ]

    try:
        retrieval_results = await hipporag.retrieve(queries=queries, num_to_retrieve=2)
        print(f"检索完成: 共处理 {len(retrieval_results)} 个查询")
    except Exception as e:
        print(f"检索失败: {e}")

    # 删除文档
    docs_to_delete = [
        "Oliver Badman is a politician.",
        "Thomas Marwick is a politician."
    ]
    
    try:
        print(f"删除前索引大小: {hipporag.size()} 文档")
        await hipporag.delete(docs_to_delete=docs_to_delete)  # 异步删除
        print(f"删除完成: 移除了 {len(docs_to_delete)} 个文档,当前大小: {hipporag.size()} 文档")
    except Exception as e:
        print(f"删除失败: {e}")

    # 输出文档(迭代器)
    try: 
        for text, hash_id in hipporag:
            print(text)
        print(f"文档输出完毕")
    except Exception as e:
        print(f"输出失败: {e}")

    # 清空系统
    try:
        print(f"清空前索引大小: {hipporag.size()} 文档")
        await hipporag.clear()  # 异步清空
        print(f"系统已清空,当前大小: {hipporag.size()} 文档")
    except Exception as e:
        print(f"清空失败: {e}")
    
    # 验证状态
    try:
        if hipporag.size() == 0:
            print("验证: 文档存储已清空")
        else:
            print(f"警告: 清空后仍有 {hipporag.size()} 个文档")
            
        if hipporag.graph.vcount() == 0:
            print("验证: 知识图谱已重置")
    except Exception as e:
        print(f"状态验证失败: {e}")

    print("所有操作完成")

if __name__ == '__main__':
    multiprocessing.freeze_support()
    # 在主线程中运行异步主函数
    asyncio.run(main())

原项目主页:https://github.com/OSU-NLP-Group/HippoRAG

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hipporag_lite-0.1.1.tar.gz (65.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hipporag_lite-0.1.1-py3-none-any.whl (80.5 kB view details)

Uploaded Python 3

File details

Details for the file hipporag_lite-0.1.1.tar.gz.

File metadata

  • Download URL: hipporag_lite-0.1.1.tar.gz
  • Upload date:
  • Size: 65.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for hipporag_lite-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6053e0fa2367b67617a6481dc10f1cbe2d184d65efdf4c1fa586dc7a38430674
MD5 99bd6738ee78c43dcac7db9a0fc80f40
BLAKE2b-256 35104aa39522f6b250e2eba9d401500c06bc6451573c06d637587168188a39ca

See more details on using hashes here.

File details

Details for the file hipporag_lite-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: hipporag_lite-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 80.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for hipporag_lite-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74bd9d1acc5ef2241d49938f6382c6af4bd7297f42e1bf0d4637a8d61df48b4e
MD5 326003daedd14b6db98f4988344e8188
BLAKE2b-256 5583121cd95c99ed9541a9c8a92469464a117827f4ecc94639d35729693fe179

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page