Skip to main content

A Python library for parsing LLM conversation JSON files into RAG-optimized format

Project description

LLM Conversation Parser

A Python library for parsing LLM conversation JSON files into RAG-optimized format.

Documentation

Supported LLMs

  • Claude (Anthropic)
  • ChatGPT (OpenAI)
  • Grok (xAI)

Installation

pip install llm-conversation-parser

Quick Start

from llm_conversation_parser import LLMConversationParser

# Initialize parser
parser = LLMConversationParser()

# Parse single file (auto-detect LLM type)
data = parser.parse_file("claude_conversations.json")
print(f"Parsed {len(data)} conversations")

# Parse multiple files
all_data = parser.parse_multiple_files([
    "claude_conversations.json",
    "gpt_conversations.json",
    "grok_conversations.json"
])

# Save parsed data
parser.save_parsed_data_by_llm(all_data, "parsed_data")

Key Features

  • Automatic LLM Detection: Analyzes JSON structure to determine LLM type
  • Unified Output Format: Converts all LLM formats to standardized RAG-optimized structure
  • Batch Processing: Process multiple files at once
  • Error Handling: Robust error handling with detailed error messages
  • Zero Dependencies: Uses only Python standard library
  • CLI Support: Command-line interface included

Output Format

[
  {
    "id": "message_uuid",
    "content": {
      "user_query": "User's question",
      "conversation_flow": "[AI_ANSWER] Previous AI response\n[USER_QUESTION] User's question"
    },
    "metadata": {
      "previous_ai_answer": "Previous AI response or null",
      "conversation_id": "conversation_uuid"
    }
  }
]

Usage Examples

1. Automatic LLM Type Detection

from llm_conversation_parser import LLMConversationParser

parser = LLMConversationParser()

# Automatically detect LLM type based on JSON structure
claude_data = parser.parse_file("my_conversations.json")  # Auto-detected as Claude
gpt_data = parser.parse_file("chat_history.json")       # Auto-detected as ChatGPT
grok_data = parser.parse_file("ai_chat.json")          # Auto-detected as Grok

2. Explicit LLM Type Specification

# Specify LLM type explicitly
claude_data = parser.parse_file("conversations.json", "claude")
gpt_data = parser.parse_file("conversations.json", "gpt")
grok_data = parser.parse_file("conversations.json", "grok")

3. Batch Processing

# Process multiple files at once
files = [
    "claude_conversations.json",
    "gpt_conversations.json",
    "grok_conversations.json"
]

# Process all files with auto-detection
data_by_llm = parser.parse_multiple_files(files)

# Check results
for llm_type, conversations in data_by_llm.items():
    print(f"{llm_type}: {len(conversations)} conversations")

# Save by LLM type
parser.save_parsed_data_by_llm(data_by_llm, "parsed_data")

4. RAG Data Utilization

# Use parsed data for RAG systems
for conversation in data:
    message_id = conversation["id"]
    user_query = conversation["content"]["user_query"]
    conversation_flow = conversation["content"]["conversation_flow"]

    # Extract text for vectorization
    rag_text = f"{user_query}\n{conversation_flow}"

    # Store in vector database
    # vector_db.add_document(message_id, rag_text)

Command Line Interface

# Parse single file
llm-conversation-parser parse input.json

# Parse multiple files
llm-conversation-parser parse file1.json file2.json --output parsed_data/

# Auto-detect LLM type
llm-conversation-parser parse conversations.json

# Specify LLM type explicitly
llm-conversation-parser parse conversations.json --llm-type claude

License

MIT License

Changelog

See CHANGELOG.md for detailed changelog.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_conversation_parser-1.0.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_conversation_parser-1.0.0-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file llm_conversation_parser-1.0.0.tar.gz.

File metadata

  • Download URL: llm_conversation_parser-1.0.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for llm_conversation_parser-1.0.0.tar.gz
Algorithm Hash digest
SHA256 49f02ba134aca6fb8807080a60306b7fdddbf9abaa7f8048f521eaf8849ab51d
MD5 fed29d8562f0238f583e5534675d757f
BLAKE2b-256 2843fdc83b46023b96a2358c83a920a387179d020f597f2468624682907f6564

See more details on using hashes here.

File details

Details for the file llm_conversation_parser-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_conversation_parser-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b60a2124d9c3ee19d3f937dbff369b30f3925d93137adb50171d731a24ad7a0
MD5 4ab6acd095507cddaf19285774412346
BLAKE2b-256 47fd89fde08e482d69b3a98a7948ace4215801e7cf96cbe6f6315e4343fc47e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page