Extract what matters from any media source. Available as Python Library, macOS Service, CLI and MCP Server

Project description

Content Core

Extract, process, and summarize content from URLs, files, and text through a unified async Python API, CLI, or MCP server.

Supported Formats

Category	Formats
Web	URLs, HTML pages, YouTube videos, Reddit posts
Documents	PDF, DOCX, PPTX, XLSX, EPUB, Markdown, plain text
Media	MP3, WAV, M4A, FLAC, OGG (audio); MP4, AVI, MOV, MKV (video)

Quick Start

pip install content-core

import content_core

result = await content_core.extract_content(url="https://example.com")
print(result.content)

Or with zero install:

uvx content-core extract "https://example.com"

CLI Usage

Content Core provides a unified content-core command with subcommands for extraction, summarization, and MCP server.

Extract

# From a URL
content-core extract "https://example.com"

# From a file
content-core extract document.pdf

# With JSON output
content-core extract document.pdf --format json

# With a specific engine
content-core extract "https://example.com" --engine firecrawl

# From stdin
echo "some text" | content-core extract

Summarize

# Summarize text
content-core summarize "Long article text here..."

# With context
content-core summarize "Long text" --context "bullet points"

# From stdin
cat article.txt | content-core summarize --context "explain to a child"

MCP Server

content-core mcp

Configuration

# Set persistent config
content-core config set llm_provider anthropic
content-core config set llm_model claude-sonnet-4-20250514

# List current config
content-core config list

# Delete a config value
content-core config delete llm_provider

Config is stored in ~/.content-core/config.toml. Priority: command flags > env vars > config file > defaults.

Zero-Install with uvx

All commands work without installation using uvx:

uvx content-core extract "https://example.com"
uvx content-core summarize "text" --context "one sentence"
uvx content-core mcp

Python API

Extraction

import content_core

# From a URL
result = await content_core.extract_content(url="https://example.com")

# From a file
result = await content_core.extract_content(file_path="document.pdf")

# From text
result = await content_core.extract_content(content="some text")

# With engine override
from content_core import ContentCoreConfig
config = ContentCoreConfig(url_engine="firecrawl")
result = await content_core.extract_content(url="https://example.com", config=config)

Summarization

import content_core

summary = await content_core.summarize("long article text", context="bullet points")

Configuration

from content_core import ContentCoreConfig

config = ContentCoreConfig(
    url_engine="firecrawl",
    document_engine="docling",
    audio_concurrency=5,
)
result = await content_core.extract_content(url="https://example.com", config=config)

MCP Integration

Content Core includes a Model Context Protocol (MCP) server for use with Claude Desktop and other MCP-compatible applications.

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "content-core": {
      "command": "uvx",
      "args": ["content-core", "mcp"],
      "env": {
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

The MCP server exposes two tools: extract_content and summarize_content. Both return plain text.

For detailed setup, see the MCP documentation.

Claude Code Skill

Content Core includes a SKILL.md that teaches AI agents how to use it for extracting content from external sources. To make it available in your Claude Code project, copy it to your skills directory:

# Download the skill
curl -o .claude/skills/content-core/SKILL.md --create-dirs \
  https://raw.githubusercontent.com/lfnovo/content-core/main/SKILL.md

Once installed, Claude Code can use content-core to extract content from URLs, documents, and media files — either via CLI (uvx content-core) or MCP if configured.

AI Providers

Content Core uses Esperanto to support multiple LLM and STT providers. Switch providers by changing the config — no code changes needed:

# Use Anthropic for summarization
content-core config set llm_provider anthropic
content-core config set llm_model claude-sonnet-4-20250514

# Use Groq for transcription
content-core config set stt_provider groq
content-core config set stt_model whisper-large-v3

Supported providers include OpenAI, Anthropic, Google, Groq, DeepSeek, Ollama, and more. See the Esperanto documentation for the full list.

Configuration

Content Core uses ContentCoreConfig powered by pydantic-settings. Settings are resolved in priority order: constructor args > env vars (CCORE_*) > config file (~/.content-core/config.toml) > defaults.

Environment Variables

Variable	Description	Default
`CCORE_URL_ENGINE`	URL extraction engine (`auto`, `simple`, `firecrawl`, `jina`, `crawl4ai`)	`auto`
`CCORE_DOCUMENT_ENGINE`	Document extraction engine (`auto`, `simple`, `docling`)	`auto`
`CCORE_AUDIO_CONCURRENCY`	Concurrent audio transcriptions (1-10)	`3`
`CRAWL4AI_API_URL`	Crawl4AI Docker API URL (omit for local browser mode)	-
`FIRECRAWL_API_URL`	Custom Firecrawl API URL for self-hosted instances	-
`CCORE_FIRECRAWL_PROXY`	Firecrawl proxy mode (`auto`, `basic`, `stealth`)	`auto`
`CCORE_FIRECRAWL_WAIT_FOR`	Wait time in ms before extraction	`3000`
`CCORE_LLM_PROVIDER`	LLM provider for summarization	-
`CCORE_LLM_MODEL`	LLM model for summarization	-
`CCORE_STT_PROVIDER`	Speech-to-text provider	-
`CCORE_STT_MODEL`	Speech-to-text model	-
`CCORE_STT_TIMEOUT`	Speech-to-text timeout in seconds	-
`CCORE_YOUTUBE_LANGUAGES`	Preferred YouTube transcript languages	-

API keys for external services are set via their standard environment variables (e.g., OPENAI_API_KEY, FIRECRAWL_API_KEY, JINA_API_KEY).

Proxy Configuration

Content Core reads standard HTTP_PROXY / HTTPS_PROXY / NO_PROXY environment variables automatically. No additional configuration is needed.

Optional Dependencies

# Docling for advanced document parsing (PDF, DOCX, PPTX, XLSX)
pip install content-core[docling]

# Crawl4AI for local browser-based URL extraction
pip install content-core[crawl4ai]
python -m playwright install --with-deps

# LangChain tool wrappers
pip install content-core[langchain]

# All optional features
pip install content-core[docling,crawl4ai,langchain]

Using with LangChain

When installed with the langchain extra, Content Core provides LangChain-compatible tool wrappers:

from content_core.tools import extract_content_tool, summarize_content_tool

tools = [extract_content_tool, summarize_content_tool]

Documentation

Usage Guide -- Python API details, configuration, and examples
Processors -- How content extraction works for each format
MCP Server -- Claude Desktop and MCP integration

Development

git clone https://github.com/lfnovo/content-core
cd content-core

uv sync --group dev

# Run tests
make test

# Lint
make ruff

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please see our Contributing Guide for details.

Project details

Release history Release notifications | RSS feed

This version

2.0.3

Apr 13, 2026

2.0.2

Apr 13, 2026

2.0.1

Apr 13, 2026

2.0.0

Apr 12, 2026

1.14.1

Jan 30, 2026

1.14.0

Jan 30, 2026

1.13.0

Jan 26, 2026

1.12.0

Jan 25, 2026

1.11.0

Jan 25, 2026

1.10.0

Jan 16, 2026

1.9.0

Jan 15, 2026

1.8.0

Nov 25, 2025

1.7.0

Nov 1, 2025

1.6.0

Oct 27, 2025

1.5.0

Oct 14, 2025

1.4.2

Sep 27, 2025

1.4.1

Aug 27, 2025

1.4.0

Aug 1, 2025

1.3.1

Jul 22, 2025

1.3.0

Jul 22, 2025

1.2.3

Jul 12, 2025

1.2.2

Jul 5, 2025

1.2.1

Jun 27, 2025

1.2.0

Jun 27, 2025

1.1.2

Jun 19, 2025

1.1.0

Jun 19, 2025

1.0.4

Jun 14, 2025

1.0.3

Jun 10, 2025

1.0.2

Jun 9, 2025

1.0.1

Jun 9, 2025

1.0.0

May 30, 2025

0.8.5

May 29, 2025

0.8.3

May 27, 2025

0.8.1

May 26, 2025

0.8.0

May 26, 2025

0.7.2

May 25, 2025

0.7.0

May 13, 2025

0.6.0

Apr 28, 2025

0.5.1

Apr 20, 2025

0.5.0

Apr 19, 2025

0.4.0

Apr 19, 2025

0.3.1

Apr 17, 2025

0.3.0

Apr 17, 2025

0.2.0

Apr 17, 2025

0.1.2

Apr 16, 2025

0.1.1

Apr 14, 2025

0.1.0

Apr 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

content_core-2.0.3.tar.gz (20.5 MB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

content_core-2.0.3-py3-none-any.whl (57.0 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file content_core-2.0.3.tar.gz.

File metadata

Download URL: content_core-2.0.3.tar.gz
Upload date: Apr 13, 2026
Size: 20.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for content_core-2.0.3.tar.gz
Algorithm	Hash digest
SHA256	`e4252da6ff407da0554682ea11b50ab9b849d470cd5ec71e6379d6e62502a0ec`
MD5	`a647c0c32ea91a9be2310e54173ca047`
BLAKE2b-256	`e2eaedaa640474314e67fb0bef721af878eb53e8a47334d884b884d15d9cc54e`

See more details on using hashes here.

File details

Details for the file content_core-2.0.3-py3-none-any.whl.

File metadata

Download URL: content_core-2.0.3-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 57.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for content_core-2.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c740aac5efd54ce712b7278f89ac6d354a232c0f5163193d63c671af2c714da6`
MD5	`5b1ee2c3b5ee2407ec22f61690e6dfa1`
BLAKE2b-256	`08c47f807277718304b93ada06d0fedde8e318addd1db21c45d71fe80c19ae42`

See more details on using hashes here.

content-core 2.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Content Core

Supported Formats

Quick Start

CLI Usage

Extract

Summarize

MCP Server

Configuration

Zero-Install with uvx

Python API

Extraction

Summarization

Configuration

MCP Integration

Claude Code Skill

AI Providers

Configuration

Environment Variables

Proxy Configuration

Optional Dependencies

Using with LangChain

Documentation

Development

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes