Skip to main content

Pebblo Gen-AI Data Analyzer

Project description


GitHub MIT license Documentation

PyPI PyPI - Downloads PyPI - Python Version

Discord Twitter Follow

Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organization’s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.

Pebblo has these components.

  1. Pebblo Server - a REST api application with topic-classifier, entity-classifier and reporting features
  2. Pebblo SafeLoader - a thin wrapper to Gen-AI framework's data loaders
  3. Pebblo SafeRetriever - a retrieval QA chain that enforces identity and semantic rules on Vector database retrieval before LLM inference

Pebblo Server

Installation

Using pip

pip install pebblo --extra-index-url https://packages.daxa.ai/simple/

Download python package

Alternatively, download and install the latest Pebblo python .whl package from URL https://packages.daxa.ai/pebblo/0.1.13/pebblo-0.1.13-py3-none-any.whl

Example:

curl -LO "https://packages.daxa.ai/pebblo/0.1.13/pebblo-0.1.13-py3-none-any.whl" 
pip install pebblo-0.1.13-py3-none-any.whl

Run Pebblo Server

pebblo

Pebblo Server now listens to localhost:8000 to accept Gen-AI application data snippets for inspection and reporting.

Pebblo Optional Flags
  • --config <file>: specify a configuration file in yaml format.

See configuration guide for knobs to control Pebblo Server behavior like enabling snippet anonymization, selecting specific report renderer, etc.

Using Docker

docker run -p 8000:8000 docker.daxa.ai/daxaai/pebblo

Local UI can be accessed by pointing the browser to https://localhost:8000.

See installation guide for details on how to pass custom config.yaml and accessing PDF reports in the host machine.

Troubleshooting

Refer to troubleshooting guide.

Pebblo SafeLoader

Langchain

Pebblo SafeLoader is natively supported in Langchain framework. It is available in Langchain versions >=0.1.7

Enable Pebblo in Langchain Application

Add PebbloSafeLoader wrapper to the existing Langchain document loader(s) used in the RAG application. PebbloSafeLoader is interface compatible with Langchain BaseLoader. The application can continue to use load() and lazy_load() methods as it would on an Langchain document loader.

Here is the snippet of Lanchain RAG application using CSVLoader before enabling PebbloSafeLoader.

    from langchain_community.document_loaders import CSVLoader

    loader = CSVLoader(file_path)
    documents = loader.load()
    vectordb = Chroma.from_documents(documents, OpenAIEmbeddings())

The Pebblo SafeLoader can be enabled with few lines of code change to the above snippet.

    from langchain_community.document_loaders import CSVLoader
    from langchain_community.document_loaders.pebblo import PebbloSafeLoader

    loader = PebbloSafeLoader(
                CSVLoader(file_path),
                name="acme-corp-rag-1", # App name (Mandatory)
                owner="Joe Smith", # Owner (Optional)
                description="Support productivity RAG application", # Description (Optional)
    )
    documents = loader.load()
    vectordb = Chroma.from_documents(documents, OpenAIEmbeddings())

See here for samples with Pebblo SafeLoader enabled RAG applications and this document for more details.

Pebblo SafeRetriever

Langchain

PebbloRetrievalQA chain uses a SafeRetrieval to enforce that the snippets used for in-context are retrieved only from the documents authorized for the user and semantically allowed for the Gen-AI application.

Here is a sample code for the PebbloRetrievalQA with authorized_identities from the user accessing the RAG application, passed in auth_context.

from langchain_community.chains import PebbloRetrievalQA
from langchain_community.chains.pebblo_retrieval.models import AuthContext, ChainInput

safe_rag_chain = PebbloRetrievalQA.from_chain_type(
    llm=llm,
    app_name="pebblo-safe-retriever-demo",
    owner="Joe Smith",
    description="Safe RAG demo using Pebblo",
    chain_type="stuff",
    retriever=vectordb.as_retriever(),
    verbose=True,
)

def ask(question: str, auth_context: dict):
    auth_context_obj = AuthContext(**auth_context)
    chain_input_obj = ChainInput(query=question, auth_context=auth_context_obj)
    return safe_rag_chain.invoke(chain_input_obj.dict())

See here for samples with Pebblo SafeRetriever enabled RAG applications and this document for more details.

Contribution

Pebblo is a open-source community project. If you want to contribute see Contributor Guidelines for more details.

License

Pebblo is released under the MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pebblo-0.1.16-py3-none-any.whl (3.5 MB view details)

Uploaded Python 3

File details

Details for the file pebblo-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: pebblo-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.12

File hashes

Hashes for pebblo-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 9d74a7b52222891415a2f74fb03fbb37408d3cfd70da45a10575354669604fd4
MD5 ac1b23e893e1b4f03098289368ff8578
BLAKE2b-256 5f8cfe4b2d6db31c9d6e2ea5c72cb93b7697b73d0d69d1a6bacc7206e32afe5c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page