Lightweight wrapper for cortecs.ai enabling ⚡️ instant provisioning
Project description
cortecs-py
Lightweight wrapper for the cortecs.ai enabling instant provisioning.
⚡ Instant provisioning
Dynamic provisioning allows you to run LLM-workflows on dedicated compute. The LLM and underlying resources are automatically provisioned for the duration of use, providing maximum cost-efficiency. Once the workflow is complete, the infrastructure is automatically shut down.
This library starts and stops your resources. The logic can be implemented using popular frameworks such as langchain or crewAI.
- Load (vast amounts of) data
- Start your LLM
- Execute your (batch) jobs
- Shutdown your LLM
from cortecs.client import Cortecs
from cortecs.langchain.dedicated_llm import DedicatedLLM
cortecs = Cortecs()
with DedicatedLLM(client=cortecs, model_name='neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8') as llm:
essay = llm.invoke('Write an essay about dynamic provisioning')
print(essay.content)
Getting started
Install
pip install pycortecs
Summarizing documents
First, set up the in environment variables. Use your credentials from cortecs.ai.
export CORTECS_CLIENT_ID="<YOUR_ID>"
export CORTECS_CLIENT_SECRET="<YOUR_SECRET>"
This example shows how to use langchain to configure a simple translation chain. The llm is dynamically provisioned and the chain is executed in paralle.
from langchain_community.document_loaders import ArxivLoader
from langchain_core.prompts import ChatPromptTemplate
from cortecs.client import Cortecs
from cortecs.langchain.dedicated_llm import DedicatedLLM
cortecs = Cortecs(api_base_url='https://develop.cortecs.ai/api/v1')
loader = ArxivLoader(
query="reasoning",
load_max_docs=20,
get_ful_documents=True,
doc_content_chars_max=25000, # ~6.25k tokens, make sure the models supports that context length
load_all_available_meta=False
)
prompt = ChatPromptTemplate.from_template("{text}\n\n Explain me like I'm five:")
docs = loader.load()
with DedicatedLLM(client=cortecs, model_name='neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8') as llm:
chain = prompt | llm
print("Processing data batch-wise ...")
summaries = chain.batch([{"text": doc.page_content} for doc in docs])
for summary in summaries:
print(summary.content + '-------\n\n\n')
This simple example showcases the power of dynamic provisioning. We translated X input tokens to Y output tokens in Z minutes. The llm can be fully utilized in those Z minutes enabling better cost efficiency. Comparing the execution with cloud-APIs from OpenAI and Meta we see the costs advantage.
TODO insert bar chart
Use Cases
- Batch processing
- Low latency -> How to process reddit in realtime
- Multi-agents -> How to use CrewAI without request limits
- High-security
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cortecs_py-0.0.1.tar.gz
.
File metadata
- Download URL: cortecs_py-0.0.1.tar.gz
- Upload date:
- Size: 11.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b4080eb3b874ab09423554240298bc276bbec608150e91b2d68da0ed7248c163 |
|
MD5 | 192c424392548bd42d0069644b5e6e24 |
|
BLAKE2b-256 | d28ab63da0318afaf08a563d7043bef2c22118702dac4e1e7019616559c54d27 |
File details
Details for the file cortecs_py-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: cortecs_py-0.0.1-py3-none-any.whl
- Upload date:
- Size: 11.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d6b1a4fd20dcefb1b28514772c717ca786d7f9ded1076b77c62376d40993adb |
|
MD5 | 3f6476c10849f681076bec68d5791b3a |
|
BLAKE2b-256 | a2c8a108a31f52f31024299632fd5eceb6a82e614d89c0e8f5e0cb43d7a4dab5 |