Skip to main content

FastMCP server for Google's langextract library - extract structured information from unstructured text using LLMs

Project description

LangExtract MCP Server

A FastMCP server for Google's langextract library. This server enables AI assistants like Claude Code to extract structured information from unstructured text using Large Language Models through a MCP interface.

LangExtract Server MCP server

Overview

LangExtract is a Python library that uses LLMs to extract structured information from text documents while maintaining precise source grounding. This MCP server exposes langextract's capabilities through the Model Context Protocol. The server includes intelligent caching, persistent connections, and server-side credential management to provide optimal performance in long-running environments like Claude Code.

Quick Setup for Claude Code

Prerequisites

  • Claude Code installed and configured
  • Google Gemini API key (Get one here)
  • Python 3.10 or higher

Installation

Install directly into Claude Code using the built-in MCP management:

claude mcp add langextract-mcp -e LANGEXTRACT_API_KEY=your-gemini-api-key -- uv run --with fastmcp fastmcp run src/langextract_mcp/server.py

The server will automatically start and integrate with Claude Code. No additional configuration is required.

Verification

After installation, verify the integration entering in Claude Code:

/mcp

You should see output indicating the server is running and can enter the server to see its tool contents.

Available Tools

The server provides the following tools for text extraction workflows:

Core Extraction

  • extract_from_text - Extract structured information from provided text
  • extract_from_url - Extract information from web content
  • save_extraction_results - Save results to JSONL format
  • generate_visualization - Create interactive HTML visualizations

For more information, you can checkout out the resources available to the client under src/langextract_mcp/resources

Usage Examples

I am currently adding the abilty for MCP clients to pass file paths to unstructured text.

Basic Text Extraction

Ask Claude Code to extract information using natural language:

Extract medication information from this text: "Patient prescribed 500mg amoxicillin twice daily for infection"

Use these examples to guide the extraction:
- Text: "Take 250mg ibuprofen every 4 hours"
- Expected: medication=ibuprofen, dosage=250mg, frequency=every 4 hours

Advanced Configuration

For complex extractions, specify configuration parameters:

Extract character emotions from Shakespeare using:
- Model: gemini-2.5-pro for better literary analysis
- Multiple passes: 3 for comprehensive extraction
- Temperature: 0.2 for consistent results

URL Processing

Extract information directly from web content:

Extract key findings from this research paper: https://arxiv.org/abs/example
Focus on methodology, results, and conclusions

Supported Models

This server currently supports Google Gemini models only, optimized for reliable structured extraction with advanced schema constraints:

  • gemini-2.5-flash - Recommended default - Optimal balance of speed, cost, and quality
  • gemini-2.5-pro - Best for complex reasoning and analysis tasks requiring highest accuracy

The server uses persistent connections, schema caching, and connection pooling for optimal performance with Gemini models. Support for additional providers may be added in future versions.

Configuration Reference

Environment Variables

Set during installation or in server environment:

LANGEXTRACT_API_KEY=your-gemini-api-key  # Required

Tool Parameters

Configure extraction behavior through tool parameters:

{
    "model_id": "gemini-2.5-flash",     # Language model selection
    "max_char_buffer": 1000,            # Text chunk size
    "temperature": 0.5,                 # Sampling temperature (0.0-1.0)  
    "extraction_passes": 1,             # Number of extraction attempts
    "max_workers": 10                   # Parallel processing threads
}

Output Format

All extractions return consistent structured data:

{
    "document_id": "doc_123",
    "total_extractions": 5,
    "extractions": [
        {
            "extraction_class": "medication", 
            "extraction_text": "amoxicillin",
            "attributes": {"type": "antibiotic"},
            "start_char": 25,
            "end_char": 35
        }
    ],
    "metadata": {
        "model_id": "gemini-2.5-flash",
        "extraction_passes": 1,
        "temperature": 0.5
    }
}

Use Cases

LangExtract MCP Server supports a wide range of use cases across multiple domains. In healthcare and life sciences, it can extract medications, dosages, and treatment protocols from clinical notes, structure radiology and pathology reports, and process research papers or clinical trial data. For legal and compliance applications, it enables extraction of contract terms, parties, and obligations, as well as analysis of regulatory documents, compliance reports, and case law. In research and academia, the server is useful for extracting methodologies, findings, and citations from papers, analyzing survey responses and interview transcripts, and processing historical or archival materials. For business intelligence, it helps extract insights from customer feedback and reviews, analyze news articles and market reports, and process financial documents and earnings reports.

Support and Documentation

Primary Resources:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iflow_mcp_larsenweigle_langextract_mcp-0.1.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file iflow_mcp_larsenweigle_langextract_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: iflow_mcp_larsenweigle_langextract_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_larsenweigle_langextract_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 be008f22fd5ca42ed63fb978ee1c339a52b53077b0f2a5f0186177080ee5d382
MD5 e9c8fb022ec99b9c8bc632c5b7cbb97c
BLAKE2b-256 03b8784288f6873f8064782860a62ba9454629943379b6de223f71e9d382b638

See more details on using hashes here.

File details

Details for the file iflow_mcp_larsenweigle_langextract_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iflow_mcp_larsenweigle_langextract_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_larsenweigle_langextract_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17da3e9610bff5aff70436dcd0b926797f6a928aa956dd04d0902e2fd291cc10
MD5 9538119def2c8b180569b3e664f63243
BLAKE2b-256 7589b891d4f51dd8620f75672bb6b400ff5c78a17522736b54ce97a83113f949

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page