Python package to create an AI clone of yourself using LLMs.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Logo

CloneLLM

Create an AI clone of yourself using LLMs.

Introduction

A minimal Python package that enables you to create an AI clone of yourself using LLMs. Built on top of LiteLLM and Langchain, CloneLLM utilizes the Retrieval-Augmented Generation (RAG) to tailor AI responses as if you are answering the questions.

You can input texts and documents about yourself — including personal information, professional experience, educational background, etc. — which are then embedded into a vector space for dynamic retrieval. This AI clone can act as a virtual assistant or digital representation, capable of handling queries and tasks in a manner that reflects the your own knowledge, tone, style and mannerisms.

Installation

Prerequisites

Before installing CloneLLM, make sure you have Python 3.9 or newer installed on your machine.

PyPi

pip install clonellm

Poetry

poetry add clonellm

GitHub

# Clone the repository
git clone https://github.com/msamsami/clonellm.git

# Navigate into the project directory
cd clonellm

# Install the package
pip install .

Usage

Getting started

Step 1. Gather documents that contain relavant information about you. These documents form the base from which your AI clone will learn to mimic your tone, style, and expertise.

from langchain_core.documents import Document

documents = [
    Document(page_content="My name is Mehdi Samsami."),
    open("cv.txt", "r").read(),
]

Step 2. Initialize an embedding model using CloneLLM's LiteLLMEmbeddings or Langchain's embeddings. Then, initialize a clone with your documents, embedding model, and your referred LLM.

from clonellm import CloneLLM, LiteLLMEmbeddings

embedding = LiteLLMEmbeddings(model="text-embedding-ada-002")
clone = CloneLLM(model="gpt-4-turbo", documents=documents, embedding=embedding)

Step 3. Configure environment variables to store API keys for embedding and LLM models.

export OPENAI_API_KEY=sk-...

Step 4. Fit the clone to the data (documents).

clone.fit()

Step 5. Invoke the clone to ask questions.

clone.invoke("What's your name?")

# Response: My name is Mehdi Samsami. How can I help you?

Models

At its core, CloneLLM utilizes LiteLLM for interactions with various LLMs. This is why you can choose from many different providers (100+ LLMs) supported by LiteLLM, including Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate, etc.

Document loaders

You can use Langchain's document loaders to seamlessly import data from various sources into Document format. Take, for example, text and HTML loaders:

# !pip install unstructured
from langchain_community.document_loaders import TextLoader, UnstructuredHTMLLoader

documents = TextLoader("cv.txt").load() + UnstructuredHTMLLoader("linkedin.html").load()

Or JSON loader:

# !pip install jq
from langchain_community.document_loaders import JSONLoader

documents = JSONLoader(
    file_path='chat.json',
    jq_schema='.messages[].content',
    text_content=False
).load()

Embeddings

With LiteLLMEmbeddings, CloneLLM allows you to utilize embedding models from a variety of providers supported by LiteLLM. Additionally, you can select any preferred embedding model from Langchain's extensive range. Take, for example, the Hugging Face embedding:

# !pip install --upgrade --quiet sentence_transformers
from langchain_community.embeddings import HuggingFaceEmbeddings
from clonellm import CloneLLM
import os

os.environ["COHERE_API_KEY"] = "cohere-api-key"

embedding = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
clone = CloneLLM(model="command-xlarge-beta", documents=documents, embedding=embedding)

Or, the Llama-cpp embedding:

# !pip install --upgrade --quiet llama-cpp-python
from langchain_community.embeddings import LlamaCppEmbeddings
from clonellm import CloneLLM
import os

os.environ["OPENAI_API_KEY"] = "openai-api-key"

embedding = LlamaCppEmbeddings(model_path="ggml-model-q4_0.bin")
clone = CloneLLM(model="gpt-3.5-turbo", documents=documents, embedding=embedding)

User profile

Create a personalized profile using CloneLLM's UserProfile, which allows you to feed detailed personal information into your clone for more customized interactions:

from clonellm import UserProfile

profile = UserProfile(
    first_name="Mehdi",
    last_name="Samsami",
    city="Shiraz",
    country="Iran",
    expertise=["Data Science", "AI/ML", "Data Analytics"],
)

Or simply define your profile using Python dictionaries:

profile = {
    "full_name": "Mehdi Samsami",
    "age": 28,
    "location": "Shiraz, Iran",
    "expertise": ["Data Science", "AI/ML", "Data Analytics"],
}

Finnaly:

from clonellm import CloneLLM
import os

os.environ["ANTHROPIC_API_KEY"] = "anthropic-api-key"

clone = CloneLLM(
    model="claude-3-opus-20240229",
    documents=documents,
    embedding=embedding,
    user_profile=profile,
)

Conversation history (memory)

Enable the memory feature to allow your clone to retain a history of past interactions. This "memory" helps the clone to deliver contextually aware responses by referencing previous dialogues. This is simply done by setting memory to True when initializing the clone:

from clonellm import CloneLLM
import os

os.environ["HUGGINGFACE_API_KEY"] = "huggingface-api-key"

clone = CloneLLM(
    model="meta-llama/Llama-2-70b-chat",
    documents=documents,
    embedding=embedding,
    memory=True,
)

Streaming

CloneLLM supports streaming responses from the LLM, allowing for real-time processing of text as it is being generated, rather than receiving the whole output at once.

from clonellm import CloneLLM, LiteLLMEmbeddings
import os

os.environ["VERTEXAI_PROJECT"] = "hardy-device-28813"
os.environ["VERTEXAI_LOCATION"] = "us-central1"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your/credentials.json"

embedding = LiteLLMEmbeddings(model="textembedding-gecko@001")
clone = CloneLLM(model="gemini-1.0-pro", documents=documents, embedding=embedding)

for chunk in clone.stream("Describe yourself in 100 words"):
    print(chunk, end="", flush=True)

Async

CloneLLM provides asynchronous counterparts to its core methods, afit, ainvoke, and astream, enhancing performance in asynchronous programming contexts.

`ainvoke`

import asyncio
from clonellm import CloneLLM, LiteLLMEmbeddings
from langchain_core.documents import Document
import os

os.environ["OPENAI_API_KEY"] = "openai-api-key"

async def main():
    documents = [...]
    embedding = LiteLLMEmbeddings(model="text-embedding-ada-002")
    clone = CloneLLM(model="gpt-4o", documents=documents, embedding=embedding)
    await clone.afit()
    response = await clone.ainvoke("Tell me about your skills?")
    return response

response = asyncio.run(main())
print(response)

`astream`

import asyncio
from clonellm import CloneLLM, LiteLLMEmbeddings
from langchain_core.documents import Document
import os

os.environ["OPENAI_API_KEY"] = "openai-api-key"

async def main():
    documents = [...]
    embedding = LiteLLMEmbeddings(model="text-embedding-3-small")
    clone = CloneLLM(model="gpt-4o", documents=documents, embedding=embedding)
    await clone.afit()
    async for chunk in clone.astream("How comfortable are you with remote work?"):
        print(chunk, end="", flush=True)

asyncio.run(main())

Support Us

If you find CloneLLM useful, please consider showing your support in one of the following ways:

⭐ Star our GitHub repository: This helps increase the visibility of our project.
💡 Contribute: Submit pull requests to help improve the codebase, whether it's adding new features, fixing bugs, or improving documentation.
📰 Share: Post about CloneLLM on LinkedIn or other social platforms.

Thank you for your interest in CloneLLM. We look forward to seeing what you'll create with your AI clone!

TODO

Add pre commit configuration file
Add setup.py script
Add support for conversation history
Add support for RAG with no embedding (ingest the entire context into the prompt)
Add support for string documents
Fix mypy errors
Rename completion methods to invoke
Add support for streaming completion
Add support for custom system prompts
Add an attribute to return supported models
Add initial version of README
Add documents
Add usage examples
Add initial unit tests
Add GitHub workflow to run tests on PR
Add GitHub workflow to publish to PyPI on release

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.7

Jun 21, 2024

0.0.6

Jun 8, 2024

0.0.5

Jun 4, 2024

0.0.4

Jun 2, 2024

0.0.3 yanked

Jun 1, 2024

Reason this release was yanked:

Major bug in `invoke` and `stream` methods of `CloneLLM` class

This version

0.0.2

May 24, 2024

0.0.1

May 23, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clonellm-0.0.2.tar.gz (13.3 kB view hashes)

Uploaded May 24, 2024 Source

Built Distribution

clonellm-0.0.2-py3-none-any.whl (12.0 kB view hashes)

Uploaded May 24, 2024 Python 3

Hashes for clonellm-0.0.2.tar.gz

Hashes for clonellm-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`f28c9087948a69db56182b2d071fdca37e7515faec25b0e96f8cffc36787e839`
MD5	`8e127e7db113e199db9800e0b130bde6`
BLAKE2b-256	`ee620112159e79fb366ffb11db5ca4bb0f9e5aeb4913905cf642d6b7ebaaf65b`

Hashes for clonellm-0.0.2-py3-none-any.whl

Hashes for clonellm-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ebd29d34f8f419f7ab88cb3d3dae018d1c6f82e2c323c1892e250bf68e60d46`
MD5	`26703953e77134c29ff54497a82721ac`
BLAKE2b-256	`3679c7e870583f769c779a99a1ead9e77e6ecf52167c870defcb99f02086222e`

clonellm 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

CloneLLM

Introduction

Installation

Prerequisites

PyPi

Poetry

GitHub

Usage

Getting started

Models

Document loaders

Embeddings

User profile

Conversation history (memory)

Streaming

Async

ainvoke

astream

Support Us

TODO

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`ainvoke`

`astream`