A Python library for parsing LLM conversation JSON files into RAG-optimized format
Project description
LLM Conversation Parser
A Python library for parsing LLM conversation JSON files into RAG-optimized format.
Documentation
- Korean Documentation - Detailed documentation for Korean users
- LLM JSON Format Guide - Claude, ChatGPT, Grok JSON file structure analysis
Supported LLMs
- Claude (Anthropic)
- ChatGPT (OpenAI)
- Grok (xAI)
Installation
pip install llm-conversation-parser
Quick Start
from llm_conversation_parser import LLMConversationParser
# Initialize parser
parser = LLMConversationParser()
# Parse single file (auto-detect LLM type)
data = parser.parse_file("claude_conversations.json")
print(f"Parsed {len(data)} conversations")
# Parse multiple files
all_data = parser.parse_multiple_files([
"claude_conversations.json",
"gpt_conversations.json",
"grok_conversations.json"
])
# Save parsed data
parser.save_parsed_data_by_llm(all_data, "parsed_data")
Key Features
- Automatic LLM Detection: Analyzes JSON structure to determine LLM type
- Unified Output Format: Converts all LLM formats to standardized RAG-optimized structure
- Batch Processing: Process multiple files at once
- Error Handling: Robust error handling with detailed error messages
- Zero Dependencies: Uses only Python standard library
- CLI Support: Command-line interface included
Output Format
[
{
"id": "message_uuid",
"content": {
"user_query": "User's question",
"conversation_flow": "[AI_ANSWER] Previous AI response\n[USER_QUESTION] User's question"
},
"metadata": {
"previous_ai_answer": "Previous AI response or null",
"conversation_id": "conversation_uuid"
}
}
]
Usage Examples
1. Automatic LLM Type Detection
from llm_conversation_parser import LLMConversationParser
parser = LLMConversationParser()
# Automatically detect LLM type based on JSON structure
claude_data = parser.parse_file("my_conversations.json") # Auto-detected as Claude
gpt_data = parser.parse_file("chat_history.json") # Auto-detected as ChatGPT
grok_data = parser.parse_file("ai_chat.json") # Auto-detected as Grok
2. Explicit LLM Type Specification
# Specify LLM type explicitly
claude_data = parser.parse_file("conversations.json", "claude")
gpt_data = parser.parse_file("conversations.json", "gpt")
grok_data = parser.parse_file("conversations.json", "grok")
3. Batch Processing
# Process multiple files at once
files = [
"claude_conversations.json",
"gpt_conversations.json",
"grok_conversations.json"
]
# Process all files with auto-detection
data_by_llm = parser.parse_multiple_files(files)
# Check results
for llm_type, conversations in data_by_llm.items():
print(f"{llm_type}: {len(conversations)} conversations")
# Save by LLM type
parser.save_parsed_data_by_llm(data_by_llm, "parsed_data")
4. RAG Data Utilization
# Use parsed data for RAG systems
for conversation in data:
message_id = conversation["id"]
user_query = conversation["content"]["user_query"]
conversation_flow = conversation["content"]["conversation_flow"]
# Extract text for vectorization
rag_text = f"{user_query}\n{conversation_flow}"
# Store in vector database
# vector_db.add_document(message_id, rag_text)
Command Line Interface
# Parse single file
llm-conversation-parser parse input.json
# Parse multiple files
llm-conversation-parser parse file1.json file2.json --output parsed_data/
# Auto-detect LLM type
llm-conversation-parser parse conversations.json
# Specify LLM type explicitly
llm-conversation-parser parse conversations.json --llm-type claude
License
MIT License
Changelog
See CHANGELOG.md for detailed changelog.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_conversation_parser-1.0.1.tar.gz.
File metadata
- Download URL: llm_conversation_parser-1.0.1.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f03dad5941413d7a0a1d351697153801c1913b2acc86cfe868e0314bca007e85
|
|
| MD5 |
aada46395d468f20c8b2e649ca360bbf
|
|
| BLAKE2b-256 |
d3cf85e150612fd68d52106ebc5b1718b5a3bb3f83fd521f362c7a42a11ca506
|
File details
Details for the file llm_conversation_parser-1.0.1-py3-none-any.whl.
File metadata
- Download URL: llm_conversation_parser-1.0.1-py3-none-any.whl
- Upload date:
- Size: 10.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35540fa1fa0caba7efd9dd3c5f41071412a9b5c42ab558aee7136066ef04dc27
|
|
| MD5 |
610491b86ecac1e72672b967fd9cea8f
|
|
| BLAKE2b-256 |
827a7660ae616d233557139d9784309e1e50a09a8c507a6f45737ab739b8ca51
|