Async library to ingest and parse Slack channel messages
Project description
slack-ingester
An async Python library for ingesting and parsing Slack channel messages and threads.
Built on top of httpx for async HTTP, slack-ingester provides a clean, high-level interface to fetch channel history, thread replies, and associated metadata (reactions, files) from the Slack API, all with automatic pagination, concurrent reply fetching, and immutable data models.
Features
- Fully async: powered by
httpx.AsyncClientandasyncio.TaskGroup - Channel history ingestion: fetch all messages from a channel with automatic cursor-based pagination
- Thread ingestion: fetch all replies from a specific thread
- Date filtering: restrict results by
oldest/latestusingdatetimeordateobjects - Concurrent reply fetching: thread replies for multiple parent messages are fetched concurrently
- Message limits: cap the number of messages returned with
max_messages - Immutable data models: frozen dataclasses with
__slots__for memory efficiency and safety - Rich message data: reactions, file attachments, bot detection, and thread metadata
- Structured error handling: typed exceptions for auth failures, missing channels, and rate limits
Requirements
- Python 3.13+
- A Slack Bot Token (
xoxb-...) with the necessary scopes
Slack Bot Token Scopes
Your Slack app must have the following OAuth scopes:
| Scope | Purpose |
|---|---|
channels:history |
Read messages from public channels |
channels:read |
View basic channel info |
groups:history |
Read messages from private channels |
groups:read |
View basic info about private channels |
Installation
Using uv (recommended)
uv add slack-ingester
Using pip
pip install slack-ingester
From source
git clone https://github.com/gvre/slack-ingester.git
cd slack-ingester
uv sync
Quick Start
import asyncio
from slack_ingester import SlackIngester
async def main():
# Token is read from SLACK_BOT_TOKEN env var, or pass it explicitly
ingester = SlackIngester(token="xoxb-your-token")
# Fetch all messages from a channel (including thread replies)
result = await ingester.ingest("C1234567890")
print(f"Channel: {result.channel_name}")
print(f"Total messages: {result.total_messages}")
for msg in result.messages:
print(f"[{msg.timestamp}] {msg.user_id}: {msg.text}")
for reply in msg.replies:
print(f" ↳ [{reply.timestamp}] {reply.user_id}: {reply.text}")
asyncio.run(main())
Usage
Configuration
The SlackIngester accepts a Slack Bot Token in two ways:
-
Explicitly via the
tokenparameter:ingester = SlackIngester(token="xoxb-your-token")
-
Via environment variable
SLACK_BOT_TOKEN:export SLACK_BOT_TOKEN=xoxb-your-tokeningester = SlackIngester() # reads from SLACK_BOT_TOKEN
If neither is provided, a SlackAuthError is raised immediately.
Ingesting Channel History
from datetime import date, datetime, UTC
from slack_ingester import SlackIngester
ingester = SlackIngester()
# Fetch all messages from a channel
result = await ingester.ingest("C1234567890")
# Fetch without thread replies (faster, skips conversations.replies calls)
result = await ingester.ingest("C1234567890", include_replies=False)
# Limit the number of messages
result = await ingester.ingest("C1234567890", max_messages=100)
# Filter by date range using datetime objects
result = await ingester.ingest(
"C1234567890",
oldest=datetime(2024, 6, 1, tzinfo=UTC),
latest=datetime(2024, 6, 30, 23, 59, 59, tzinfo=UTC),
)
# Filter by date range using date objects
# - `oldest` as a date uses midnight UTC (start of day)
# - `latest` as a date uses 23:59:59.999999 UTC (end of day)
result = await ingester.ingest(
"C1234567890",
oldest=date(2024, 6, 1),
latest=date(2024, 6, 30),
)
Ingesting a Specific Thread
When a thread_ts is provided, the ingester fetches only the messages within that thread. The oldest, latest, and include_replies parameters are ignored in thread mode.
# Fetch all messages from a specific thread
result = await ingester.ingest(
"C1234567890",
thread_ts="1700000000.000100",
)
print(f"Thread has {result.total_messages} messages")
for msg in result.messages:
print(f" {msg.user_id}: {msg.text}")
Ingesting a Single Message
When a message_ts is provided, the ingester fetches only that specific message. All other parameters except channel_id are ignored, and message_ts cannot be combined with thread_ts.
# Fetch a single message by its timestamp
result = await ingester.ingest(
"C1234567890",
message_ts="1700000000.000100",
)
if result.messages:
msg = result.messages[0]
print(f"{msg.user_id}: {msg.text}")
Discovering Threads
You can find threads by first ingesting a channel without replies, then drilling into individual threads:
# First pass: get top-level messages only
result = await ingester.ingest("C1234567890", include_replies=False)
for msg in result.messages:
if msg.is_thread_parent:
print(f"Thread: {msg.thread_ts} ({msg.reply_count} replies)")
# Fetch the full thread
thread = await ingester.ingest("C1234567890", thread_ts=msg.thread_ts)
for reply in thread.messages:
print(f" {reply.user_id}: {reply.text}")
Working with Message Data
result = await ingester.ingest("C1234567890")
for msg in result.messages:
# Basic message info
print(f"ID: {msg.id}")
print(f"User: {msg.user_id}")
print(f"Text: {msg.text}")
print(f"Time: {msg.timestamp}") # datetime object (UTC)
print(f"Bot: {msg.is_bot}")
# Reactions
for reaction in msg.reactions:
print(f" :{reaction.name}: x{reaction.count} by {reaction.users}")
# File attachments
for file in msg.files:
print(f" 📎 {file.name} ({file.mimetype}, {file.size} bytes)")
print(f" URL: {file.url_private}")
print(f" Permalink: {file.permalink}")
# Thread info
if msg.is_thread_parent:
print(f" Thread with {msg.reply_count} replies")
for reply in msg.replies:
print(f" ↳ {reply.user_id}: {reply.text}")
Error Handling
from slack_ingester import (
SlackIngester,
SlackAuthError,
SlackChannelNotFoundError,
SlackIngesterError,
SlackRateLimitError,
)
try:
ingester = SlackIngester()
result = await ingester.ingest("C1234567890")
except SlackAuthError:
print("Invalid or missing Slack bot token")
except SlackChannelNotFoundError:
print("Channel not found or bot is not a member")
except SlackRateLimitError as e:
print(f"Rate limited, retry after {e.retry_after} seconds")
except SlackIngesterError as e:
print(f"Slack API error: {e}")
API Reference
SlackIngester
The main entry point for the library.
__init__(token: str | None = None, *, timeout: float = 60.0) -> None
Create a new ingester instance.
| Parameter | Type | Default | Description |
|---|---|---|---|
token |
str | None |
None |
Slack Bot Token. Falls back to SLACK_BOT_TOKEN env var. |
timeout |
float |
60.0 |
HTTP request timeout in seconds. |
Raises SlackAuthError if no token is available.
async ingest(channel_id, *, message_ts, thread_ts, oldest, latest, include_replies, max_messages) -> IngestionResult
Ingest messages from a Slack channel, a specific thread, or a single message.
| Parameter | Type | Default | Description |
|---|---|---|---|
channel_id |
str |
(required) | The Slack channel ID to ingest from. |
message_ts |
str | None |
None |
Fetch a single message by its timestamp. All other parameters except channel_id are ignored. Cannot be combined with thread_ts. |
thread_ts |
str | None |
None |
Thread timestamp. If set, ingests only that thread (ignores oldest, latest, include_replies). Cannot be combined with message_ts. |
oldest |
datetime | date | None |
None |
Only fetch messages newer than this (inclusive). |
latest |
datetime | date | None |
None |
Only fetch messages older than this (inclusive). |
include_replies |
bool |
True |
Whether to fetch thread replies for parent messages. |
max_messages |
int | None |
None |
Maximum number of messages to fetch. None means all. |
Returns: IngestionResult
Data Models
All models are frozen (immutable) dataclasses with __slots__ for optimal memory usage.
IngestionResult
| Field | Type | Description |
|---|---|---|
channel_id |
str |
The channel ID. |
channel_name |
str | None |
The channel name, if available. |
messages |
tuple[SlackMessage, ...] |
Top-level messages (newest first for channel mode, chronological for thread mode). |
total_messages |
int |
Total count including nested replies. |
oldest_ts |
datetime | None |
UTC datetime of the oldest message, or None if empty. |
latest_ts |
datetime | None |
UTC datetime of the latest message, or None if empty. |
SlackMessage
| Field | Type | Description |
|---|---|---|
id |
str |
Message timestamp (used as unique ID in Slack). |
channel_id |
str |
Channel this message belongs to. |
user_id |
str | None |
User ID of the author, or None for bot messages without a user. |
text |
str |
Message text content. |
timestamp |
datetime |
UTC datetime of the message. |
thread_ts |
str | None |
Thread parent timestamp, if this message is part of a thread. |
is_thread_parent |
bool |
True if this message started a thread with replies. |
reply_count |
int |
Number of replies in the thread. |
replies |
tuple[SlackMessage, ...] |
Nested reply messages (populated when include_replies=True). |
files |
tuple[SlackFile, ...] |
File attachments. |
reactions |
tuple[SlackReaction, ...] |
Emoji reactions. |
is_bot |
bool |
True if the message was posted by a bot. |
subtype |
str | None |
Slack message subtype (e.g., "bot_message"). |
SlackFile
| Field | Type | Description |
|---|---|---|
id |
str |
File ID. |
name |
str |
File name. |
mimetype |
str |
MIME type. |
size |
int |
File size in bytes. |
url_private |
str | None |
Private download URL (requires authentication). |
permalink |
str | None |
Permalink to the file in Slack. |
SlackReaction
| Field | Type | Description |
|---|---|---|
name |
str |
Emoji name (without colons). |
count |
int |
Total reaction count. |
users |
tuple[str, ...] |
User IDs who reacted. |
Exceptions
All exceptions inherit from SlackIngesterError.
| Exception | Description |
|---|---|
SlackIngesterError |
Base exception for all library errors. |
SlackAuthError |
Invalid, missing, or revoked bot token. |
SlackChannelNotFoundError |
Channel not found, bot not in channel, or missing scope. |
SlackRateLimitError |
Slack API rate limit hit. Has a retry_after: int attribute (seconds). |
Architecture
src/slack_ingester/
├── __init__.py # Public API exports
├── client.py # Low-level async Slack API client (httpx)
├── exceptions.py # Exception hierarchy
├── ingester.py # High-level ingestion orchestrator
└── models.py # Immutable data models (frozen dataclasses)
Component Overview
-
SlackClient(client.py): Thin async wrapper around the Slack Web API. Handles HTTP requests, response validation, and error mapping. Useshttpx.AsyncClientwith bearer token authentication. Wraps three Slack endpoints:conversations.info- channel metadataconversations.history- paginated channel messagesconversations.replies- paginated thread replies
-
SlackIngester(ingester.py): High-level orchestrator that coordinates the client to build a completeIngestionResult. Handles pagination loops, date-to-timestamp conversion, concurrent reply fetching viaasyncio.TaskGroup, and message parsing. -
Models (
models.py): Immutable value objects representing Slack data. All use@dataclass(slots=True, frozen=True)for safety and performance. -
Exceptions (
exceptions.py): Structured exception hierarchy mapping Slack API error codes to typed Python exceptions.
Channel vs. Thread Ingestion
| Aspect | Channel Mode | Thread Mode |
|---|---|---|
| API endpoint | conversations.history |
conversations.replies |
oldest / latest |
Respected | Ignored |
include_replies |
Respected | Ignored (all messages included) |
max_messages |
Respected | Respected |
| Message ordering | Newest first | Chronological |
| Reply nesting | Replies nested in parent's replies tuple |
Flat list of all messages |
Development
Prerequisites
- Python 3.13+
- uv - fast Python package manager
- just - command runner (optional but recommended)
Setup
git clone https://github.com/gvre/slack-ingester.git
cd slack-ingester
# Install all dependencies (dev + test)
just install-all
# Or manually with uv
uv sync --extra dev --extra test
Available Commands
Run just to see all available commands:
| Command | Description |
|---|---|
just install |
Install runtime dependencies |
just install-dev |
Install development dependencies |
just install-all |
Install all dependencies (dev + test) |
just test |
Run tests |
just test-coverage |
Run tests with terminal coverage report |
just test-coverage-html |
Run tests with HTML coverage report |
just lint |
Run linting with ruff |
just lint-fix |
Auto-fix linting issues |
just format |
Format code with ruff |
just typecheck |
Run type checking with ty |
just check |
Run all checks (lint + typecheck) |
just build |
Build the package |
just clean |
Remove build artifacts and caches |
Running Tests
# Run all tests
just test
# Run tests with coverage
just test-coverage
# Run a specific test file
uv run pytest tests/test_ingester.py
# Run a specific test class
uv run pytest tests/test_ingester.py::TestIngest
# Run a specific test
uv run pytest tests/test_ingester.py::TestIngest::test_basic_ingest -v
Code Quality
# Lint
just lint
# Auto-fix lint issues
just lint-fix
# Format
just format
# Type check
just typecheck
# Run all checks
just check
License
This project is licensed under the MIT License.
Copyright (c) 2026 Giannis Vrentzos
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slack_ingester-0.0.1.tar.gz.
File metadata
- Download URL: slack_ingester-0.0.1.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e09137b4e76fb623eb0b8149b869e4202e5bc7604e8f5c304b04c68596a3f9b9
|
|
| MD5 |
9bf0a50947ebbdcf80646e5b02c59d13
|
|
| BLAKE2b-256 |
db5ac933db5d14ce974db022f2724f1944bcac631de3879021f794de61a72291
|
File details
Details for the file slack_ingester-0.0.1-py3-none-any.whl.
File metadata
- Download URL: slack_ingester-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f03c0037883858c0a499f5e22e9b935ee7ac4e7032c28977508ca4419b04f94d
|
|
| MD5 |
e3cc110717a8eb0c193631d436a682dd
|
|
| BLAKE2b-256 |
1c7f01c8e7ab22cf6fa03bb0d17ab5a18dc47f4791ce2b4b839956e054eef716
|