Skip to main content

Cognee - is a library for enriching LLM context with a semantic layer for better understanding and reasoning.

Project description

cognee

We build for developers who need a reliable, production-ready data layer for AI applications

cognee forks cognee stars cognee pull-requests cognee releases

cognee implements scalable, modular data pipelines that allow for creating the LLM-enriched data layer using graph and vector stores.

cognee aims to be dbt for LLMOps

Try it in a Google collab notebook or have a look at our documentation

If you have questions, join our Discord community

📦 Installation

With pip

pip install cognee

With poetry

poetry add cognee

💻 Usage

Setup

import os

os.environ["LLM_API_KEY"] = "YOUR OPENAI_API_KEY"

or

import cognee
cognee.config.llm_api_key = "YOUR_OPENAI_API_KEY"

You can use different LLM providers, for more info check out our documentation

In the next step make sure to launch a Postgres instance. Here is an example from our docker-compose:

  postgres:
    image: postgres:latest
    container_name: postgres
    environment:
      POSTGRES_USER: cognee
      POSTGRES_PASSWORD: cognee
      POSTGRES_DB: cognee_db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - 5432:5432
    networks:
      - cognee-network

If you are using Networkx, create an account on Graphistry to visualize results:

   
   cognee.config.set_graphistry_username = "YOUR_USERNAME"
   cognee.config.set_graphistry_password = "YOUR_PASSWORD"

(Optional) To run the UI, run:

docker-compose up cognee

Then navigate to localhost:3000/wizard

Run the default example

Make sure to launch the Postgres instance first. Navigate to the cognee folder and run:

docker compose up postgres

Run the default cognee pipeline:

import cognee

text = """Natural language processing (NLP) is an interdisciplinary
       subfield of computer science and information retrieval"""

await cognee.add([text], "example_dataset") # Add a new piece of information

await cognee.cognify() # Use LLMs and cognee to create knowledge

await search_results = cognee.search("SIMILARITY", {'query': 'Tell me about NLP'}) # Query cognee for the knowledge

print(search_results)

Create your pipelines

cognee framework consists of tasks that can be grouped into pipelines. Each task can be an independent part of business logic, that can be tied to other tasks to form a pipeline. Here is an example of how it looks for a default cognify pipeline:

  1. To prepare the data for the pipeline run, first we need to add it to our metastore and normalize it:

Start with:

docker compose up postgres

And then run:

text = """Natural language processing (NLP) is an interdisciplinary
       subfield of computer science and information retrieval"""

await cognee.add([text], "example_dataset") # Add a new piece of information
  1. In the next step we make a task. The task can be any business logic we need, but the important part is that it should be encapsulated in one function.

Here we show an example of creating a naive LLM classifier that takes a Pydantic model and then stores the data in both the graph and vector stores after analyzing each chunk. We provided just a snippet for reference, but feel free to check out the implementation in our repo.

async def chunk_naive_llm_classifier(data_chunks: list[DocumentChunk], classification_model: Type[BaseModel]):
    if len(data_chunks) == 0:
        return data_chunks

    chunk_classifications = await asyncio.gather(
        *[extract_categories(chunk.text, classification_model) for chunk in data_chunks],
    )

    classification_data_points = []

    for chunk_index, chunk in enumerate(data_chunks):
        chunk_classification = chunk_classifications[chunk_index]
        classification_data_points.append(uuid5(NAMESPACE_OID, chunk_classification.label.type))
        classification_data_points.append(uuid5(NAMESPACE_OID, chunk_classification.label.type))

        for classification_subclass in chunk_classification.label.subclass:
            classification_data_points.append(uuid5(NAMESPACE_OID, classification_subclass.value))

    vector_engine = get_vector_engine()

    class Keyword(BaseModel):
        uuid: str
        text: str
        chunk_id: str
        document_id: str

    collection_name = "classification"

    if await vector_engine.has_collection(collection_name):
        existing_data_points = await vector_engine.retrieve(
            collection_name,
            list(set(classification_data_points)),
        ) if len(classification_data_points) > 0 else []

        existing_points_map = {point.id: True for point in existing_data_points}
    else:
        existing_points_map = {}
        await vector_engine.create_collection(collection_name, payload_schema=Keyword)

    data_points = []
    nodes = []
    edges = []

    for (chunk_index, data_chunk) in enumerate(data_chunks):
        chunk_classification = chunk_classifications[chunk_index]
        classification_type_label = chunk_classification.label.type
        classification_type_id = uuid5(NAMESPACE_OID, classification_type_label)

...

To see existing tasks, have a look at the cognee.tasks

  1. Once we have our tasks, it is time to group them into a pipeline. This snippet shows how a group of tasks can be added to a pipeline, and how they can pass the information forward from one to another.
            tasks = [
                Task(document_to_ontology, root_node_id = root_node_id),
                Task(source_documents_to_chunks, parent_node_id = root_node_id), # Classify documents and save them as a nodes in graph db, extract text chunks based on the document type
                Task(chunk_to_graph_decomposition, topology_model = KnowledgeGraph, task_config = { "batch_size": 10 }), # Set the graph topology for the document chunk data
                Task(chunks_into_graph, graph_model = KnowledgeGraph, collection_name = "entities"), # Generate knowledge graphs from the document chunks and attach it to chunk nodes
                Task(chunk_update_check, collection_name = "chunks"), # Find all affected chunks, so we don't process unchanged chunks
                Task(
                    save_chunks_to_store,
                    collection_name = "chunks",
                ), # Save the document chunks in vector db and as nodes in graph db (connected to the document node and between each other)
                run_tasks_parallel([
                    Task(
                        chunk_extract_summary,
                        summarization_model = cognee_config.summarization_model,
                        collection_name = "chunk_summaries",
                    ), # Summarize the document chunks
                    Task(
                        chunk_naive_llm_classifier,
                        classification_model = cognee_config.classification_model,
                    ),
                ]),
                Task(chunk_remove_disconnected), # Remove the obsolete document chunks.
            ]

            pipeline = run_tasks(tasks, documents)

To see the working code, check cognee.api.v1.cognify default pipeline in our repo.

Vector retrieval, Graphs and LLMs

Cognee supports a variety of tools and services for different operations:

  • Modular: Cognee is modular by nature, using tasks grouped into pipelines

  • Local Setup: By default, LanceDB runs locally with NetworkX and OpenAI.

  • Vector Stores: Cognee supports Qdrant and Weaviate for vector storage.

  • Language Models (LLMs): You can use either Anyscale or Ollama as your LLM provider.

  • Graph Stores: In addition to LanceDB, Neo4j is also supported for graph storage.

  • User management: Create individual user graphs and manage permissions

Demo

Check out our demo notebook here

Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognee-0.1.15.tar.gz (273.9 kB view details)

Uploaded Source

Built Distribution

cognee-0.1.15-py3-none-any.whl (352.7 kB view details)

Uploaded Python 3

File details

Details for the file cognee-0.1.15.tar.gz.

File metadata

  • Download URL: cognee-0.1.15.tar.gz
  • Upload date:
  • Size: 273.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for cognee-0.1.15.tar.gz
Algorithm Hash digest
SHA256 4ca7935f099d122f364d19f5040046fdea170df3c39e972ddcd9a6118cd66e50
MD5 be33ba2e9f7b0c389cd1f72ca5d29391
BLAKE2b-256 91a4ec5c7b01849ef4188904f7d521d2a57bbaed1dabc0f8f759b7ba85a81e6c

See more details on using hashes here.

File details

Details for the file cognee-0.1.15-py3-none-any.whl.

File metadata

  • Download URL: cognee-0.1.15-py3-none-any.whl
  • Upload date:
  • Size: 352.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.4 Darwin/23.5.0

File hashes

Hashes for cognee-0.1.15-py3-none-any.whl
Algorithm Hash digest
SHA256 f49186fa08845ed61b32497c0bcfb1a7e9220f9875e278872e8d6828cba2710b
MD5 17f958d51ff40033f8efa5562575c115
BLAKE2b-256 1affe4a936a3cf51eb26925a6c8077e954466fef8ec77d7ee9e9c9fa717bfe14

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page