Skip to main content

Monitor and filter Fediverse hashtags, curate quality content, and distribute via external tools like Zhongli

Project description

FenLiu (分流)

Created by marvin8 with assistance from Claude and DeepSeek AI assistants.

⚠️ DISCLAIMER / PROVISO: This project is a work in progress with major changes still happening. It is in no way anywhere close to finished and is only borderline useful for actual production use. Expect breaking changes, incomplete features, and significant architectural evolution as development continues.

Divide the Fediverse content flow

FenLiu is a web application that monitors Fediverse hashtags, filters spam, allows human review, learns from feedback, and exports quality content for boosting. Inspired by the ancient Chinese Dujiangyan irrigation system (256 BC) that separated silt from water, FenLiu applies 2,300-year engineering wisdom to modern digital content streams.

Current Status — v0.7.0

FenLiu is a fully functional spam filtering and content management system with complete Curated Queue integration, flexible pattern-based user blocking, automated queue lifecycle management, production-ready containerization, and ML training data collection. Monitor hashtags, score posts for spam, manually review content, reliably export quality posts, and manage queue health with automatic cleanup and trimming.

Latest updates (v0.7.0): Review page pagination (20 posts/page), bulk approve/reject buttons, auto-refresh when page empties, ML training data snapshot collection on every review action, stream deletion cascade fix; 402 total tests passing.

Features

Core Functionality

  • Hashtag Monitoring: Monitor multiple Fediverse hashtags with customizable instance sources and scheduling
  • Spam Scoring: Rule-based detection (0-100 scale) with 7 intelligent detection rules
  • Manual Review Interface: Web interface for reviewing and approving/rejecting posts with scoring
  • Bulk Operations: Fetch and process posts in bulk with real-time progress tracking
  • Curated Queue Export: API-driven queue with ack/nack/error reliability pattern

Reblog Controls (Export Filters)

  • Pattern-Based User Blocking: Block users with flexible matching modes:
    • exact: Exact account identifier (e.g., @user@mastodon.social)
    • suffix: Block all users from domain (e.g., bsky.app for all Bluesky users)
    • prefix: Block by username prefix (e.g., bot_ for bot accounts)
    • contains: Block by substring (e.g., spam for accounts with "spam" in name)
  • "Don't Reblog" Hashtag Blocklist: Exclude posts with blocked hashtags
  • Attachments-Only Mode: Export only posts with media attachments
  • Auto-Reject on Fetch: Automatically reject blocked content before review
  • Blocklist Refresh: Apply Settings changes to review page instantly without losing progress

Web Interface

  • Dashboard: Real-time analytics, top hashtags, review progress
  • Streams Management: Create, edit, manage hashtag streams with CRUD operations
  • Review Workflow: Approve/reject posts with manual score adjustment and spam breakdown
    • Pagination: 20 posts per page with prev/next navigation
    • Bulk Actions: Approve All / Reject All buttons for the current page
    • Auto-refresh: Page reloads automatically when emptied but more posts remain
  • Pattern Blocking Settings: Intuitive UI for adding pattern-based user blocks with examples
  • Queue Preview: Monitor queue health (pending/reserved/delivered/error counts)
  • Statistics: Charts for posts over time and hashtag distribution
  • Responsive Design: Fully responsive across desktop, tablet, mobile

REST API

  • Hashtag Streams: Full CRUD for stream management and bulk fetching
  • Posts: List, filter, update with approval/rejection and scoring
  • Curated Queue: /next, /ack, /nack, /error, /requeue endpoints
  • Reblog Controls: Manage blocked users (with pattern types) and hashtags
  • Statistics: Post counts, hashtag distribution, approval rates
  • Authentication: API key-based authentication for queue endpoints
  • Health: Health check and application info endpoints

Technical Quality

  • Type Safety: Comprehensive type hints throughout
  • Testing: 402 tests with 100% pass rate
  • Resource Management: Proper cleanup of DB sessions and HTTP connections
  • Database Migrations: Alembic with automatic schema migration on startup
  • API Key Security: Secure generation and management of API keys
  • Code Complexity: All functions optimized for maintainability
  • No JavaScript Bloat: Pure HTML/CSS frontend, no external JS dependencies

Quick Start

Prerequisites

  • Python 3.12 or higher
  • uv package manager (recommended)

Installation

# Install dependencies
uv sync -U --all-groups

# Optional: Set up pre-commit hooks
uv run pre-commit install

Running the Application

# Development mode with auto-reload
fenliu --reload --debug

# Alternative development mode
uv run python -m fenliu --reload --debug

# Production mode
fenliu --host 0.0.0.0 --port 8000

# See all options
fenliu --help

Container Deployment (Docker/Podman)

FenLiu includes production-ready containerization with minimal image size (~207 MB):

podman build -t fenliu -f Containerfile .
cp .env.example .env  # edit with your settings
podman run -d -p 8000:8000 \
  -v fenliu-data:/app/data \
  -v fenliu-logs:/app/logs \
  --env-file .env \
  fenliu

See the Container Deployment guide for full instructions including volumes, compose examples, and security notes.

First Steps

  1. Start the server: fenliu --reload
  2. Open browser: Navigate to http://localhost:8000
  3. Add a hashtag: Go to Streams page and create a hashtag stream (e.g., "python")
  4. Fetch posts: Click "Fetch" on the stream to retrieve posts from Fediverse
  5. Review posts: Use the Review interface to approve quality content or reject spam
  6. Block patterns: Go to Settings to add pattern-based blocks (optional)
  7. Export: Monitor the Queue Preview to see posts flowing to Curated Queue

Pattern-Based Blocking Examples

Settings Page Usage

  1. Go to Settings → Don't Reblog — Users
  2. Enter pattern: bsky.app
  3. Select type: suffix
  4. Click "Block"
  5. Result: All users from Bluesky are now blocked

Common Patterns

  • Block all Bluesky users: Pattern bsky.app, Type suffix
  • Block bot accounts: Pattern bot_, Type prefix
  • Block accounts with spam keyword: Pattern spam, Type contains
  • Block specific user: Pattern @user@mastodon.social, Type exact

Applying to Review Page

  1. While reviewing posts, go to Settings to add new patterns
  2. Return to Review page
  3. Click Refresh Blocklists button (next to Refresh)
  4. Current posts instantly re-evaluated with new patterns
  5. Continue reviewing without page reload

Debug Logging

Enable detailed debug logging with the --debug flag:

# Enable debug logging to file
fenliu --debug

# View logs in real-time
tail -f logs/fenliu_debug.log

# Custom log directory
fenliu --debug --log-dir=/var/log/fenliu

In your code: from fenliu.logging import get_logger then logger.debug(f"message")

API Usage

Authentication

All queue endpoints require API key authentication. Generate a key in Settings, then include it in requests:

curl -H "X-API-Key: your-api-key-here" \
  http://localhost:8000/api/v1/curated/next

Common Examples

# List all hashtag streams
curl http://localhost:8000/api/v1/streams

# Create a new hashtag stream
curl -X POST http://localhost:8000/api/v1/streams \
  -H "Content-Type: application/json" \
  -d '{"hashtag": "python", "instance": "mastodon.social", "active": true}'

# Fetch posts for a stream
curl -X POST http://localhost:8000/api/v1/streams/1/fetch?limit=20

# Get next post from Curated Queue
curl -H "X-API-Key: your-api-key-here" \
  http://localhost:8000/api/v1/curated/next

# Acknowledge successful reblog
curl -X POST -H "X-API-Key: your-api-key-here" \
  http://localhost:8000/api/v1/curated/123/ack

# Report permanent failure
curl -X POST -H "X-API-Key: your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{"reason": "Account suspended"}' \
  http://localhost:8000/api/v1/curated/123/error

# Review a post (approve)
curl -X PATCH http://localhost:8000/api/v1/posts/123 \
  -H "Content-Type: application/json" \
  -d '{"approved": true, "reviewer_notes": "Quality content"}'

# Adjust spam score manually
curl -X PATCH http://localhost:8000/api/v1/posts/123 \
  -H "Content-Type: application/json" \
  -d '{"manual_spam_score": 15}'

# Add a pattern-based block (suffix type)
curl -X POST http://localhost:8000/api/v1/reblog-controls/blocked-users \
  -H "Content-Type: application/json" \
  -d '{"account_identifier": "bsky.app", "pattern_type": "suffix", "notes": "Block all Bluesky"}'

# List blocked users with pattern types
curl http://localhost:8000/api/v1/reblog-controls/blocked-users

API Endpoints

Streams & Posts:

  • GET /api/v1/streams - List streams
  • POST /api/v1/streams - Create stream
  • GET/PUT/DELETE /api/v1/streams/{id} - Stream operations
  • POST /api/v1/streams/{id}/fetch - Fetch posts for stream
  • POST /api/v1/streams/fetch-all - Fetch all active streams
  • GET /api/v1/posts - List posts with filtering
  • GET /api/v1/posts/{id} - Get post details
  • PATCH /api/v1/posts/{id} - Update post (review, approve, score)
  • GET /api/v1/stats - Application statistics

Curated Queue:

  • GET /api/v1/curated/next - Get next post (returns 204 if empty)
  • POST /api/v1/curated/{post_id}/ack - Confirm successful reblog
  • POST /api/v1/curated/{post_id}/nack - Return to queue (transient failure)
  • POST /api/v1/curated/{post_id}/error - Mark permanently failed
  • POST /api/v1/curated/{post_id}/requeue - Return errored post to queue

Reblog Controls (Pattern-Based Blocking):

  • GET /api/v1/reblog-controls/settings - Get reblog filter settings
  • PUT /api/v1/reblog-controls/settings - Update settings
  • GET /api/v1/reblog-controls/blocked-users - List blocked users with pattern types
  • POST /api/v1/reblog-controls/blocked-users - Add blocked user (with pattern_type)
  • DELETE /api/v1/reblog-controls/blocked-users/{id} - Remove blocked user
  • GET /api/v1/reblog-controls/blocked-hashtags - List blocked hashtags
  • POST /api/v1/reblog-controls/blocked-hashtags - Add blocked hashtag
  • DELETE /api/v1/reblog-controls/blocked-hashtags/{id} - Remove blocked hashtag
  • POST /api/v1/reblog-controls/reject-blocked - Bulk reject posts matching any pattern

System:

  • GET /health - Health check
  • GET /info - Application info

Configuration

Environment variables (via .env file):

# Database
DATABASE_URL=sqlite:///./fenliu.db

# Fediverse settings
DEFAULT_INSTANCE=mastodon.social
API_TIMEOUT=30
MAX_POSTS_PER_FETCH=20
RATE_LIMIT_DELAY=1.0

# Application
DEBUG=false
SECRET_KEY=your-secret-key-change-in-production
APP_NAME=FenLiu

# Spam scoring thresholds
VERY_HIGH_THRESHOLD=76
LOW_MAX_THRESHOLD=25

# Queue timeout
RESERVE_TIMEOUT_SECONDS=300

Development

Testing

# Run full test suite
pytest

# Run with coverage
pytest --cov=src/fenliu tests/

# Quick validation
python -m pytest -q

# Run specific test file
pytest tests/test_pattern_blocking.py -v

Code Quality

# Linting
ruff check src/fenliu/

# Formatting
ruff format src/fenliu/

# Complexity check
complexipy src

# Pre-commit checks
prek run --all-files

# Full CI simulation
nox

Database Migrations

# Apply pending migrations
alembic upgrade head

# Create new migration
alembic revision --autogenerate -m "description"

# Show current revision
alembic current

# View all revisions
alembic history

Development Workflow

# After dependency changes
uv sync -U --all-groups

# Quick validation before commits
prek run --all-files

# Full validation before commits
nox

Project Structure

fenliu/
├── src/fenliu/
│   ├── __init__.py              # Package definition
│   ├── __main__.py              # CLI entry point
│   ├── main.py                  # PyView application
│   ├── config.py                # Configuration
│   ├── database.py              # Database setup
│   ├── models.py                # SQLAlchemy models
│   ├── schemas.py               # Pydantic validation
│   ├── api/                     # REST API endpoints
│   │   ├── curated.py           # Queue API
│   │   ├── reblog_controls.py   # Filter management (pattern-based)
│   │   └── api_keys.py          # API key management
│   ├── services/                # Business logic
│   │   ├── spam_scoring.py      # Spam detection
│   │   ├── fediverse.py         # Fediverse client
│   │   ├── export_eligibility.py # Export filtering with pattern matching
│   │   ├── scheduler.py         # Task scheduling
│   │   └── api_key.py           # API key service
│   ├── templates/               # HTML templates
│   └── static/                  # CSS and assets
├── alembic/                     # Database migrations
├── tests/                       # Test suite (384 tests)
├── docs/                        # MkDocs documentation
├── pyproject.toml               # Project configuration
├── ROADMAP.md                   # Development roadmap
├── README.md                    # This file
└── PATTERN_BLOCKING_FEATURE.md  # Pattern blocking documentation

Documentation

Complete documentation available in the docs/ folder built with MkDocs:

# Serve locally with hot reload
mkdocs serve

# Build static site
mkdocs build

📚 Live Documentation: https://marvinsmastodontools.codeberg.page/fenliu/

Includes: Installation, Quick Start, API Reference, Pattern Blocking Guide, Curated Queue Integration, Contributing Guide, Roadmap, and FAQ.

Technical Stack

  • Framework: PyView (Starlette-based LiveView) with real-time capabilities
  • Database: SQLAlchemy with SQLite, optimized with eager loading
  • API Client: minimal-activitypub for Fediverse integration
  • Async: Full async/await throughout (sync for SQLite only)
  • Type Hints: Comprehensive type annotations with Pydantic validation
  • Frontend: Jinja2 templates with Tailwind CSS, responsive design
  • Testing: pytest with 384 tests (100% pass rate)
  • Linting: ruff for formatting and linting
  • Migrations: Alembic for schema management
  • Package Manager: uv for dependency management

Upcoming Features

See Roadmap for detailed plans. Phase 4 focus:

  • Docker containerization and CI/CD
  • Performance optimization and caching for pattern matching
  • Multi-user support with roles
  • Advanced monitoring dashboard
  • PostgreSQL/MySQL support

What's New in v0.7.0

Review Page Improvements

The review workflow is now faster and more ergonomic for large queues:

  • Pagination: Posts are shown 20 at a time with prev/next navigation — no more infinite scrolling through hundreds of posts
  • Bulk Actions: "Approve All" and "Reject All" buttons at the bottom-right of the table act on all posts currently visible on the page (already individually reviewed posts are excluded)
  • Auto-refresh: When the current page is emptied by reviewing all posts, the page automatically loads the next batch if more unreviewed posts exist
  • Scroll-to-top: Page scrolls to the top automatically when navigating between pages or when the auto-refresh triggers

ML Training Data Collection

Review decisions now capture a full feature snapshot at review time, so ML training data survives the queue cleanup job that deletes old posts:

  • Snapshot fields on ReviewFeedback: content snippet, spam score, hashtag count/list, attachment count, video flag, engagement counts (boosts/likes/replies), author bot flag, instance, stream ID
  • Complete coverage: Snapshots are captured for every approve, reject, score-adjust, and bulk action from the LiveView UI (previously only REST API reviews were recorded)
  • Post-deletion safe: Training data is self-contained in ReviewFeedback rows and does not depend on the original post existing

Bug Fix: Stream Deletion

Deleting a hashtag stream no longer raises an integrity error. Previously, cascade-deleting posts would fail because SQLAlchemy tried to NULL-out review_feedback.post_id (a NOT NULL column) rather than deleting the orphaned rows. The Post → ReviewFeedback relationship now uses cascade="all, delete-orphan".

Code Quality

  • 13 new tests: cascade delete (2), ReviewFeedback creation on review actions (3), pagination and scroll behaviour (8); 402 total
  • Type safety: Zero errors under ty check
  • Linting: All code passes ruff checks

Previous Release — v0.6.0

Queue Lifecycle Management

  • Auto-Delete Delivered Posts: Posts automatically deleted after 7 days (configurable), with historical stats preserved
  • Trim Excess Pending Posts: Weighted random deletion maintains invariant: pending_count ≥ 2 × daily_consumption_rate
  • Cleanup API Endpoints: POST /api/v1/curated/cleanup and POST /api/v1/curated/trim-pending
  • Queue UI Controls: "Purge old delivered" and "Trim excess pending" buttons on Queue Preview page
  • Historical Stats: All-time deletion counts preserved; stats page shows active and historical data

Production Containerization

  • Multi-stage Dockerfile: Minimal final image (~207 MB)
  • Non-root User: Runs as fenliu (UID 1000)
  • Persistent Volumes: Separate data and logs volumes
  • Automatic Migrations: Schema migrated automatically on container startup
  • Docker/Podman Support: Works with both runtimes

Previous Release — v0.5.3

Pattern-Based User Blocking (v0.5.3)

Users can block Fediverse accounts using flexible pattern matching:

  • Four Pattern Types: exact, suffix, prefix, contains
  • Real-World Examples: Block all Bluesky users, all bot accounts, or any account with a keyword
  • Settings UI: Intuitive pattern selector with helpful examples
  • Review Page Integration: Pattern-based blocks show on review page with instant visibility
  • Blocklist Refresh: New button allows applying Settings changes to review page without losing progress

See PATTERN_BLOCKING_FEATURE.md for complete details and examples.

Cultural Context

The name "FenLiu" (分流) means "divide the flow" in Chinese, inspired by the ancient Dujiangyan irrigation system (256 BC). This project applies the same engineering wisdom to digital content streams, separating valuable content from spam and noise while maintaining the natural flow of community conversation.

Key Resources

License

AGPL-3.0 License - See LICENSE file for details.

Contributing

  1. Follow existing code style (ruff formatted with comprehensive type hints)
  2. Write tests for new functionality (maintain 100% test pass rate)
  3. Update documentation as needed
  4. Run nox before submitting changes
  5. Run alembic upgrade head after pulling changes with new migrations

Version: 0.7.0 Status: Production Ready ✅ Released: 2026-03-14 Tests: 402 passing ✅ Code Quality: All checks passing ✅ Container Size: ~207 MB (multi-stage optimized) Framework: PyView (Starlette-based LiveView) Architecture: Async Python with comprehensive type hints Repository: https://codeberg.org/marvinsmastodontools/fenliu

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fenliu-0.7.0.tar.gz (427.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fenliu-0.7.0-py3-none-any.whl (441.0 kB view details)

Uploaded Python 3

File details

Details for the file fenliu-0.7.0.tar.gz.

File metadata

  • Download URL: fenliu-0.7.0.tar.gz
  • Upload date:
  • Size: 427.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fenliu-0.7.0.tar.gz
Algorithm Hash digest
SHA256 fe3d418cbbb4627cb930a4c9face104193eb4816175a51f69ce83e277229a00f
MD5 080234fab3bab4b2b35681e483466ed1
BLAKE2b-256 49fbf68c7615e7a8a3b5de0b45a09508dafd695d41fa46338efb8c98e6188fb0

See more details on using hashes here.

File details

Details for the file fenliu-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: fenliu-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 441.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.10 {"installer":{"name":"uv","version":"0.10.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fenliu-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 365ae3e338428fbee54363d081eb29e85f32de7cb76db2cafed539f95ec212bf
MD5 deffada301afddf0f0ec236d23530020
BLAKE2b-256 a3a807352ab0455eaa9bafa20d9e9d761436504a692f3c4d72bb11d0ba565832

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page