Skip to main content

A Python library for parsing LLM conversation JSON files into RAG-optimized format

Project description

LLM Conversation Parser

A Python library for parsing LLM conversation JSON files into RAG-optimized format.

Documentation

Supported LLMs

  • Claude (Anthropic)
  • ChatGPT (OpenAI)
  • Grok (xAI)

Installation

pip install llm-conversation-parser

Quick Start

from llm_conversation_parser import LLMConversationParser

# Initialize parser
parser = LLMConversationParser()

# Parse single file (auto-detect LLM type)
data = parser.parse_file("claude_conversations.json")
print(f"Parsed {len(data)} conversations")

# Parse multiple files
all_data = parser.parse_multiple_files([
    "claude_conversations.json",
    "gpt_conversations.json",
    "grok_conversations.json"
])

# Save parsed data
parser.save_parsed_data_by_llm(all_data, "parsed_data")

Key Features

  • Automatic LLM Detection: Analyzes JSON structure to determine LLM type
  • Unified Output Format: Converts all LLM formats to standardized RAG-optimized structure
  • Batch Processing: Process multiple files at once
  • Error Handling: Robust error handling with detailed error messages
  • Zero Dependencies: Uses only Python standard library
  • CLI Support: Command-line interface included

Output Format

[
  {
    "id": "message_uuid",
    "content": {
      "user_query": "User's question",
      "conversation_flow": "[AI_ANSWER] Previous AI response\n[USER_QUESTION] User's question"
    },
    "metadata": {
      "previous_ai_answer": "Previous AI response or null",
      "conversation_id": "conversation_uuid"
    }
  }
]

Usage Examples

1. Automatic LLM Type Detection

from llm_conversation_parser import LLMConversationParser

parser = LLMConversationParser()

# Automatically detect LLM type based on JSON structure
claude_data = parser.parse_file("my_conversations.json")  # Auto-detected as Claude
gpt_data = parser.parse_file("chat_history.json")       # Auto-detected as ChatGPT
grok_data = parser.parse_file("ai_chat.json")          # Auto-detected as Grok

2. Explicit LLM Type Specification

# Specify LLM type explicitly
claude_data = parser.parse_file("conversations.json", "claude")
gpt_data = parser.parse_file("conversations.json", "gpt")
grok_data = parser.parse_file("conversations.json", "grok")

3. Batch Processing

# Process multiple files at once
files = [
    "claude_conversations.json",
    "gpt_conversations.json",
    "grok_conversations.json"
]

# Process all files with auto-detection
data_by_llm = parser.parse_multiple_files(files)

# Check results
for llm_type, conversations in data_by_llm.items():
    print(f"{llm_type}: {len(conversations)} conversations")

# Save by LLM type
parser.save_parsed_data_by_llm(data_by_llm, "parsed_data")

4. RAG Data Utilization

# Use parsed data for RAG systems
for conversation in data:
    message_id = conversation["id"]
    user_query = conversation["content"]["user_query"]
    conversation_flow = conversation["content"]["conversation_flow"]

    # Extract text for vectorization
    rag_text = f"{user_query}\n{conversation_flow}"

    # Store in vector database
    # vector_db.add_document(message_id, rag_text)

Command Line Interface

# Parse single file
llm-conversation-parser parse input.json

# Parse multiple files
llm-conversation-parser parse file1.json file2.json --output parsed_data/

# Auto-detect LLM type
llm-conversation-parser parse conversations.json

# Specify LLM type explicitly
llm-conversation-parser parse conversations.json --llm-type claude

License

MIT License

Changelog

See CHANGELOG.md for detailed changelog.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_conversation_parser-1.0.1.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_conversation_parser-1.0.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_conversation_parser-1.0.1.tar.gz.

File metadata

  • Download URL: llm_conversation_parser-1.0.1.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_conversation_parser-1.0.1.tar.gz
Algorithm Hash digest
SHA256 f03dad5941413d7a0a1d351697153801c1913b2acc86cfe868e0314bca007e85
MD5 aada46395d468f20c8b2e649ca360bbf
BLAKE2b-256 d3cf85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506

See more details on using hashes here.

File details

Details for the file llm_conversation_parser-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_conversation_parser-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35540fa1fa0caba7efd9dd3c5f41071412a9b5c42ab558aee7136066ef04dc27
MD5 610491b86ecac1e72672b967fd9cea8f
BLAKE2b-256 827a7660ae616d233557139d9784309e1e50a09a8c507a6f45737ab739b8ca51

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page