This is a temporary project while I wait for my langchain [pull-request](https://github.com/langchain-ai/langchain/pull/7278) to be validated.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Langchain-RAG

Note: A pull-request with this code was proposed to langchain.

When splitting documents for retrieval, there are often conflicting desires:

You may want to keep documents small, ensuring that their embeddings accurately represent their meaning. If they become too long, the embeddings can lose their meaning.
You also want to maintain documents long enough to retain the context of each chunk.

When you have a lot of documents, and therefore a lot of pieces, it's likely that dozens of pieces have a distance close to the question. Taking only the top 4 is not a good idea. The answer may lie in the 6 or 7 tracks. How can we improve the match between the question and a fragment? By preparing several versions of the fragment, each with an embedding. In this way, one of the versions can be closer to the question than the original fragment. This version is stripped of context. But the context is still needed to answer the question correctly. One strategy consists of breaking down each fragment into different versions, but using the retriever to return to the original fragment.

The RAGVectorStore strikes a balance by splitting and storing small chunks and different variations of data. During retrieval, it initially retrieves the small chunks but then looks up the parent IDs for those chunks and returns the larger documents.

The challenge lies in correctly managing the lifecycle of the three levels of documents:

Original documents
Chunks extracted from the original documents
Transformations of chunks to generate more vectors for improved retrieval

The RAGVectorStore, in combination with other components, is designed to address this challenge.

Demo

Read this notebook Or :

poetry run python -m ipykernel install --user --name langchain-rag
jupyter lab

Tips

poetry run python -m ipykernel install --user --name langchain-parent

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.52

May 16, 2024

This version

0.1.46

Apr 25, 2024

0.1.45

Apr 23, 2024

0.1.33

Mar 18, 2024

0.1.27

Feb 27, 2024

0.1.18

Feb 7, 2024

0.1.17

Jan 31, 2024

0.1.10

Jan 15, 2024

0.1.1

Jan 12, 2024

0.1.0

Jan 11, 2024

0.0.355

Jan 8, 2024

0.0.354

Jan 3, 2024

0.0.353

Jan 3, 2024

0.0.352

Dec 26, 2023

0.0.341

Nov 28, 2023

0.0.339

Nov 22, 2023

0.0.330

Nov 6, 2023

0.0.316

Nov 3, 2023

0.0.0

Apr 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_rag-0.1.46.tar.gz (27.6 kB view hashes)

Uploaded Apr 25, 2024 Source

Built Distribution

langchain_rag-0.1.46-py3-none-any.whl (35.4 kB view hashes)

Uploaded Apr 25, 2024 Python 3

Hashes for langchain_rag-0.1.46.tar.gz

Hashes for langchain_rag-0.1.46.tar.gz
Algorithm	Hash digest
SHA256	`8b0c8a76721b6e3bfaa5d9fe28d53616deb7f5b3e1babb27163b61e2d5fb942b`
MD5	`f6974b54a10fe829f14c1506b31ab4b7`
BLAKE2b-256	`098f562bfc8bdf30c0c3c807056516ca52e0d45160971dff960b929dca96d070`

Hashes for langchain_rag-0.1.46-py3-none-any.whl

Hashes for langchain_rag-0.1.46-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf7fddfa51aed8ce3f2b52555ae31e721d7207aac1c7120811585cf91e7914d0`
MD5	`0e83412c5ea9b657d7f6e1b570c510a4`
BLAKE2b-256	`41290beda69b166ee4bd5212cbdced157364675f2abd6dbf3a46065201a79c72`