A lightweight, production-ready service layer for modular, rate-aware LLM integrations

These details have not been verified by PyPI

Project description

LLMSERVICE Logo

A clean, production-ready service layer that centralizes prompts, invocations, and post-processing, ensuring rate-aware, maintainable, and scalable LLM logic in your application.


Package

Installation

Install LLMService via pip:

pip install llmservice

Installation
What makes it unique?
Main Features
Architecture
Usage
Postprocessing Pipeline
Async support
- Translating a 100 pages book (chunked to pieces)

What makes it unique?

Feature	LLMService	LangChain
Result Handling	Returns a single `GenerationResult` dataclass encapsulating success/failure, rich metadata (tokens, cost, latency), and pipeline outcomes	Composes chains of tools and agents; success/failure handling is dispersed via callbacks and exceptions
Rate-Limit & Throughput Control	Built-in sliding-window RPM/TPM counters and an adjustable semaphore for concurrency, automatically pausing when you hit your API quota	Relies on external throttlers or underlying client logic; no native RPM/TPM management
Cost Monitoring	Automatic per-model token-level cost calculation and aggregated usage stats for real-time billing insights	No built-in cost monitoring—you must implement your own wrappers or middleware
Post-Processing Pipelines	Declarative configs for JSON parsing, semantic extraction, validation, and transformation without ad-hoc parsing code	Encourages embedding output parsers inside chains or writing ad-hoc post-chain functions, scattering parsing logic
Dependencies	Minimal footprint: only Tenacity, your LLM client, and optionally YAML for prompts	Broad ecosystem: agents, retrievers, vector stores, callback managers, and other heavy dependencies
Extensibility	Provides a clear `BaseLLMService` subclassing interface so you encapsulate each business operation and never call the engine directly	You wire together chains or agents at call-site, mixing business logic with prompt orchestration

LLMService delivers a well-structured alternative to more monolithic frameworks like LangChain.

"LangChain isn't a library, it's a collection of demos held together by duct tape, fstrings, and prayers."

Main Features

Minimal Footprint & Low Coupling
Designed for dependency injection—your application code never needs to know about LLM logic.
Result Monad Pattern
Returns a GenerationResult dataclass for every invocation, encapsulating success/failure status, raw and processed outputs, error details, retry information, and per-step results—giving you full control over custom workflows.
Declarative Post-Processing Pipelines
Chain semantic extraction, JSON parsing, string validation, and more via simple, declarative configurations.
Rate-Limit-Aware Asynchronous Requests
Dynamically queue and scale workers based on real-time RPM/TPM metrics to maximize throughput without exceeding API quotas.
Transparent Cost & Usage Monitoring
Automatically track input/output tokens and compute per-model cost, exposing detailed metadata with each response.
Automated Retry & Exponential Backoff
Handle transient errors (rate limits, network hiccups) with configurable retries and exponential backoff powered by Tenacity.
Custom Exception Handling
Provide clear, operation-specific fallbacks (e.g., insufficient quota, unsupported region) for graceful degradation.

Architecture

LLMService provides an abstract BaseLLMService class to guide users in implementing their own service layers. It includes llmhandlerwhich manages interactions with different LLM providers and generation_engine which handles the process of prompt crafting, LLM invocation, and post-processing

LLMService Architecture

schemas

Usage

Step 0: Config & Installation

Put your OPENAI_API_KEY inside .env file
Install LLMService via pip:

pip install llmservice

Step 1: Subclassing `BaseLLMService` and create methods

Create a new Python file (e.g., myllmservice.py) and extend the BaseLLMService class. And all llm using logic of your business logic will be defined here as methods.

class MyLLMService(BaseLLMService):
  def translate_to_latin(self, input_paragraph: str) -> GenerationResult:
          my_prompt=f"translate this text to latin {input_paragraph}"

          generation_request = GenerationRequest(
              formatted_prompt=my_prompt,
               model="gpt-4o", 
          )

          # Execute the generation synchronously
          generation_result = self.execute_generation(generation_request)
          return generation_result

Step 2: Import your llm layer and use the methods

# in your app.py
from myllmservice import MyLLMService

if __name__ == '__main__':
    service = MyLLMService()
    result = service.translate_to_latin("Hello, how are you?")
    print(result)
    
    # in this case the result will be a generation_result object which inludes all the information you need.

Step 3: Some simple fact

Dont forget to live your life man. Remember all code is legacy the moment it is written.

Postprocessing Pipeline

There are 5 custom methods integrated into LLMservice. These postprocessing methods are the most commonly used methods so we are supporting them natively.

Method 1: Semantic Isolation

Use the SemanticIsolator step whenever you need to extract a specific semantic element (for example, a code snippet, a name, or any targeted fragment) from an LLM’s output.

For example, imagine you asked LLM to write you a SQL snippet and it returns:

Here is your answer:
SELECT * FROM users;
Do you need anything else?

And lets say you plan to use the output directly in your database connection. But in this case you cant run it because it contains text like "Here is your answer:"

So in such scenario where just need the pure semantic elemet this postprocessing step is useful.

Here is sample usage for above example:

# in your  myllmservice 


 def create_sql_code(self, user_question: str,  database_desc,) -> GenerationResult:
    
        formatted_prompt = f"""Here is my database description: {database_desc},
                            and here is what the user wants to learn: {user_question}.
                            I want you to generate a SQL query. answer should contain only SQL code."""

        pipeline_config = [
            {
                'type': 'SemanticIsolation',   
                'params': {
                    'semantic_element_for_extraction': 'SQL code'
                }
            }
        ]
        
        generation_request = GenerationRequest(
            formatted_prompt=formatted_prompt,
            model="gpt-4o", 
            pipeline_config=pipeline_config,
        )

        generation_result = self.execute_generation(generation_request)
        return generation_result

The SemanticIsolator postprocessing step fixes this by running a second query that extracts only the semantic element you provided (in this case SQL code).

Method 2: ConvertToDict

When you ask an LLM to output a JSON-like response, you typically convert it into a dictionary (for example, using json.loads()). However, if the output is missing quotes or otherwise isn’t strictly valid JSON, json.loads() will fail. ConvertToDict leverages the string2dict module to handle these edge cases—even with missing quotes or minor formatting issues, it can parse the string into a proper Python dict.

Below are some LLM outputs where json.loads() fails but ConvertToDict succeeds:

sample_1:

  '{\n    "key": "SELECT DATE_FORMAT(bills.bill_date, \'%Y-%m\') AS month, SUM(bills.total) AS total_spending FROM bills WHERE YEAR(bills.bill_date) = 2023 GROUP BY DATE_FORMAT(bills.bill_date, \'%Y-%m\') ORDER BY month;"\n}'

sample_2:

  "{\n    'key': 'SELECT DATE_FORMAT(bill_date, \\'%Y-%m\\') AS month, SUM(total) AS total_spendings FROM bills WHERE YEAR(bill_date) = 2023 GROUP BY month ORDER BY month;'\n}"

sample_3:

  '{   \'key\': "https://dfasdfasfer.vercel.app/"}'

Usage :

pipeline_config = [
           
             {
                'type': 'ConvertToDict', 
                'params': {}
             } 
]

Method 3: ExtractValue

Use this pipeline step with the ConvertToDict method to extract a single field from a JSON-like response. Simply specify the field name as a parameter.

For example, if your LLM returns:

{"answer": "<LLM-generated answer>"}

add the following to your pipeline config:

  {
                'type': 'ExtractValue',  
                 'params': {'key': 'answer'}
 }

This configuration first ensures the output is parsed into a Python dict, then automatically returns the value associated with "answer".

Using Pipeline Methods Together

A common scenario is to chain multiple pipeline steps to extract a specific value from an LLM response:

SemanticIsolation
Extracts the JSON-like snippet from a larger text response.
ConvertToDict
Normalizes that snippet into a Python dict, even if it isn’t strictly valid JSON.
ExtractValue
Retrieves the value associated with a given key from the dictionary.

pipeline_config = [
            {
                'type': 'SemanticIsolation',   
                'params': { 'semantic_element_for_extraction': 'SQL code' }
            }, 
            {
                'type': 'ConvertToDict', 
                'params': {}
             },
            {
                'type': 'ExtractValue',      
                'params': {'key': 'answer'}
            }
          ]

Async Support

LLMService includes first-class asynchronous methods, with built-in rate and concurrency controls. You can configure max_rpm max_tpm and max_concurrent_requests (which indirectly governs TPM over the same window). Here’s an example for your myllm_service.py:


class MyLLMService(BaseLLMService):
    def __init__(self):
        super().__init__(default_model_name="gpt-4o-mini")
       
        self.set_rate_limits(max_rpm=120, max_tpm=10_000)
        self.set_concurrency(100)

  async def translate_to_latin_async(self, input_paragraph: str) -> GenerationResult:
        
          my_prompt=f"translate this to to latin {input_paragraph}"

          generation_request = GenerationRequest(
              formatted_prompt=my_prompt,
              model="gpt-4o-mini",
              operation_name="translate_to_latin",
          )

          generation_result = await self.execute_generation_async(generation_request)
          return generation_result

Translating a 100 pages book with various configs

For this experiement we are using a text which is already chunked into pieces

Model Name	Method	Max Concurrency	Max RPM	Max TPM
gpt4o-mini	synch	–	–	–
gpt4o-mini	asynch	10	100	10000
gpt4o-mini	asynch	50	100	10000
gpt4o-mini	asynch	100	150	20000
gpt4o-mini	asynch	200	300	30000
gpt4o	synch	-
gpt4o	asynch
gpt4.1-nano	synch
gpt4.1-nano	asynch

Project details

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

0.3.1

Nov 30, 2025

0.3.0

Sep 16, 2025

0.2.7.2

Aug 2, 2025

0.2.7.1

Aug 1, 2025

0.2.7.0

Jul 31, 2025

0.2.6.2

Jun 12, 2025

0.2.6.1

Jun 12, 2025

0.2.6

Jun 12, 2025

0.2.5.5

Jun 4, 2025

0.2.5.1

Jun 4, 2025

0.2.5

Jun 4, 2025

0.2.4

Jun 3, 2025

This version

0.2.2

May 24, 2025

0.2.0

May 23, 2025

0.1.9

May 12, 2025

0.1.8

May 12, 2025

0.1.7

May 9, 2025

0.1.6

May 9, 2025

0.1.5

Dec 31, 2024

0.1.4

Oct 22, 2024

0.1.3

Oct 21, 2024

0.1.2

Oct 17, 2024

0.1.1 yanked

Oct 17, 2024

0.1.0 yanked

Oct 16, 2024

0.0.9 yanked

Oct 16, 2024

0.0.8 yanked

Oct 13, 2024

0.0.7 yanked

Oct 12, 2024

0.0.5 yanked

Oct 12, 2024

0.0.4 yanked

Oct 11, 2024

0.0.3 yanked

Oct 11, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmservice-0.2.2.tar.gz (25.3 kB view details)

Uploaded May 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llmservice-0.2.2-py3-none-any.whl (22.7 kB view details)

Uploaded May 24, 2025 Python 3

File details

Details for the file llmservice-0.2.2.tar.gz.

File metadata

Download URL: llmservice-0.2.2.tar.gz
Upload date: May 24, 2025
Size: 25.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for llmservice-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`5cb663494c0cb7fb6ad323f3fe61324431b79f43edaa32ffa2a59ca39686eaed`
MD5	`540be14c85f35da8fdb48e4924623100`
BLAKE2b-256	`71a82989154dba5556c94221134123b6d5bbad5d92f183d8834a6c612b18d876`

See more details on using hashes here.

File details

Details for the file llmservice-0.2.2-py3-none-any.whl.

File metadata

Download URL: llmservice-0.2.2-py3-none-any.whl
Upload date: May 24, 2025
Size: 22.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for llmservice-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5d8360bb6e66b4a07b2d80edd5c6dd60d0b241a717443eea06d97b357a9f80ac`
MD5	`387579738324b5350a58bd5f02adf439`
BLAKE2b-256	`9bf46fe584e89b75383a01518972f88afa420a0ed6ac156c6c81d5d6c0cde590`

See more details on using hashes here.

llmservice 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Installation

Table of Contents

What makes it unique?

Main Features

Architecture

Usage

Step 0: Config & Installation

Step 1: Subclassing BaseLLMService and create methods

Step 2: Import your llm layer and use the methods

Step 3: Some simple fact

Postprocessing Pipeline

Method 1: Semantic Isolation

Method 2: ConvertToDict

Method 3: ExtractValue

Using Pipeline Methods Together

Async Support

Translating a 100 pages book with various configs

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Step 1: Subclassing `BaseLLMService` and create methods