Skip to main content

Deterministic CLI that ingests article URLs/HTML and emits Jekyll-ready posts

Project description

Artifactor logo

Artifactor

Artifactor is a Python-based CLI tool that generates Jekyll HTML posts from structured article data. I treat my writing like an artifact pipeline.

Live demo: https://k-reel.github.io/artifactor/

CI Pages License: MIT

What is Artifactor?

Artifactor transforms article data into static site posts with deterministic, reproducible output. It can:

  • Ingest from URLs: Fetch and extract content from web articles (Phase 1)
  • Generate from fixtures: Create posts from structured JSON (Phase 0)
  • Output Jekyll posts: HTML files with YAML front matter, ready for GitHub Pages

Think of it as a build pipeline for content: stable inputs produce stable outputs, every time.

Features

Phase 1 (Current): URL ingestion with content extraction

  • Fetch articles from curated URL lists
  • Extract title, date, authors, and canonical HTML
  • Support for Socket blog and generic web pages
  • Deterministic output: same input → same bytes

Phase 0: Scaffolding and foundation

  • Generate posts from JSON fixtures
  • Jekyll site structure with layouts
  • Comprehensive test suite

Installation

pip install artifactor

For development:

git clone https://github.com/K-reel/artifactor.git
cd artifactor
pip install -e ".[dev]"

Versioning

  • v0.2.0: Configuration system, adapter debugging tools, repo-root packaging, trusted publishing
  • v0.1.1: CI workflow (offline-only, runs on every push/PR)
  • v0.1.0: Initial release (scaffold + ingest + offline fixture mode)

Repository Structure

artifactor/
├── tool/                          # Python package
│   ├── src/artifactor/           # Source code
│   │   ├── cli.py                # Typer-based CLI (scaffold, ingest)
│   │   ├── models.py             # Article dataclass
│   │   ├── generator.py          # Post generation logic
│   │   ├── fetch.py              # HTTP fetching
│   │   ├── ingest.py             # URL ingestion orchestration
│   │   ├── sources/              # Content extraction adapters
│   │   │   ├── base.py           # Adapter interface
│   │   │   ├── socket_blog.py    # Socket blog adapter
│   │   │   └── generic.py        # Generic fallback adapter
│   │   └── templates/            # Jinja2 templates
│   ├── tests/                    # Test suite
│   └── pyproject.toml            # Package configuration
├── site/                          # Jekyll site
│   ├── _config.yml               # Jekyll configuration
│   ├── _layouts/                 # Layout templates
│   ├── _posts/                   # Generated posts go here
│   ├── index.html                # Post listing
│   └── Gemfile                   # Ruby dependencies
├── fixtures/                      # Sample input data
│   ├── sample_article.json       # Example article (Phase 0)
│   ├── socket_article_sample.html # Socket blog HTML (testing)
│   └── urls_sample.txt           # Sample URL list
└── README.md                      # This file

Quick Start

For Users

# Install from PyPI
pip install artifactor

# Validate your configuration (creates default config if needed)
artifactor config validate

# Ingest URLs from a file
artifactor ingest --urls urls.txt

# List available adapters
artifactor adapters list

For Developers

Get started with the development environment in four simple commands:

1. Install Dependencies

make install

This installs the Python package and development dependencies in editable mode.

2. Generate a Sample Post

make scaffold

This generates a Jekyll post from the sample fixture at site/_posts/2024-03-15-writing-as-artifact-pipeline.html.

3. Run Tests

make test

Runs the test suite to verify everything works correctly.

4. Serve the Site (Optional)

# First time only: install Ruby dependencies
cd site && bundle install && cd ..

# Start Jekyll server
make serve

Open http://localhost:4000 to view your generated posts.

Prerequisites for Jekyll:

  • Ruby 2.7+ and Bundler installed
  • On macOS: brew install ruby && gem install bundler

Note on Gemfile.lock: We commit site/Gemfile.lock (Bundler dependency lockfile) to improve reproducibility. If your Ruby/Bundler environment differs:

  1. Try bundle install first
  2. If needed, run bundle update --bundler
  3. Use bundle update only as a last resort

Usage

Make Targets

The easiest way to work with Artifactor is through the Makefile:

make help      # Show all available targets
make install   # Install Python package and dependencies
make scaffold  # Generate sample post from fixture
make ingest    # Ingest URLs (offline demo, no network)
make test      # Run test suite
make serve     # Start Jekyll development server
make clean     # Remove generated site files and caches

Note on make ingest: This target runs in offline demo mode using a sample HTML fixture. No network requests are made, making it safe for testing and CI environments. To ingest real URLs from the web, use the artifactor ingest command directly (see Phase 1 usage below).

Phase 1: Ingest from URLs

Create a URL list (urls.txt):

# URLs to ingest (one per line, # for comments)
https://socket.dev/blog/understanding-npm-security
https://socket.dev/blog/another-article

# Blank lines are ignored

Ingest URLs (with dry-run first):

# Dry run to preview
python3 -m artifactor ingest --urls urls.txt --out site/ --limit 2 --dry-run

# Actually generate posts
python3 -m artifactor ingest --urls urls.txt --out site/

Options:

  • --urls PATH: File containing URLs (one per line, required)
  • --out PATH: Output directory (default: site/)
  • --limit N: Process only first N URLs (useful for testing)
  • --dry-run: Show what would be done without writing
  • --timeout SECONDS: Request timeout (default: 20)
  • --user-agent STRING: Custom User-Agent header
  • --html-fixture PATH: Use HTML file for offline testing (no network fetch)

Offline testing: Use --html-fixture to test ingestion without network access:

python3 -m artifactor ingest --urls fixtures/urls_sample.txt --out site/ --html-fixture fixtures/socket_article_sample.html --dry-run

This processes URLs using the provided HTML file instead of fetching from the network. Useful for CI, demos, and development. In fixture mode, the URL is still used for adapter selection and filename slugging; multiple URLs may generate different filenames from the same fixture content.

Status indicators:

  • ✓ CREATED: New post created
  • ↻ UPDATED: Existing post updated
  • = UNCHANGED: Content identical, not rewritten
  • ✗ FAILED: Extraction or fetch failed

Phase 0: Generate from Fixtures

Generate posts from structured JSON:

artifactor scaffold --out site/ --fixture fixtures/sample_article.json

Note: If artifactor is not in your PATH, use:

python3 -m artifactor scaffold --out site/ --fixture fixtures/sample_article.json

Article Schema

The canonical Article model has these fields:

@dataclass
class Article:
    title: str              # Article title
    date: str               # Publication date (YYYY-MM-DD)
    slug: str               # URL-friendly identifier
    canonical_url: str      # Original article URL
    source: str             # Source publication name
    html: str               # Pre-cleaned HTML content
    authors: list[str]      # Optional list of authors
    tags: list[str]         # Optional list of tags

Example fixture JSON:

{
  "title": "Example Article",
  "date": "2024-03-15",
  "slug": "example-article",
  "canonical_url": "https://example.com/article",
  "source": "Example Blog",
  "authors": ["Author Name"],
  "tags": ["tag1", "tag2"],
  "html": "<p>Article content here.</p>"
}

Development

Running Tests

make test

Or run pytest directly with coverage:

cd tool && python3 -m pytest --cov=artifactor --cov-report=term-missing

Deterministic Output

Artifactor ensures byte-for-byte identical output for identical inputs:

  • Stable YAML key ordering in front matter
  • Normalized line endings (\n)
  • No timestamps except the article date
  • Consistent 2-space indentation for YAML lists

Run the determinism test specifically:

cd tool && python3 -m pytest tests/test_generator.py::test_generate_post_deterministic_file -v

Adding New Fixtures

Create a JSON file in fixtures/ following the Article schema, then generate a post:

artifactor scaffold --out site/ --fixture fixtures/your_article.json
# Or use make:
# Edit the Makefile scaffold target to point to your fixture

Jekyll Site Details

Layouts

_layouts/reprint.html: Renders article posts with:

  • Title and date
  • Author attribution (if provided)
  • Reprint notice with canonical URL and source
  • Tags (if provided)
  • Full HTML content

Configuration

_config.yml: Minimal Jekyll configuration with:

  • Site title and description
  • Markdown processor (kramdown)
  • Permalink structure
  • Build exclusions

Local Development

cd site
bundle exec jekyll serve --livereload

The --livereload flag auto-refreshes your browser when files change.

Design Principles

  1. Minimal and Deterministic: Stable file paths, stable formatting, no unnecessary complexity
  2. HTML Posts: Uses HTML with YAML front matter, not Markdown (we control the HTML)
  3. Local-First: Easy to develop and build locally
  4. Testable: Automated tests ensure deterministic output
  5. Extensible: Clear separation between tool (Python) and site (Jekyll)

Roadmap

Phase 0 (Current): Repository scaffolding, sample post generation, local Jekyll build

Future Phases:

  • Content ingestion from web sources
  • Automated HTML cleaning and normalization
  • Batch processing
  • CI/CD integration
  • Archive management

Contributing

Contributions welcome! This is an open-source project focused on treating content like artifacts in a build pipeline.

License

MIT License - see LICENSE file for details

Requirements

  • Python 3.10+
  • Ruby 2.7+ (for Jekyll)
  • Bundler

Troubleshooting

Command not found: artifactor

Make sure you installed the package in editable mode:

cd tool
pip install -e .

Jekyll fails to build

Ensure you installed the Ruby dependencies:

cd site
bundle install

Tests fail

Make sure you're in the tool directory and have installed dev dependencies:

cd tool
pip install -e ".[dev]"
pytest

Philosophy: I treat my writing like an artifact pipeline—structured, versioned, and reproducible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

artifactor-0.2.1.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

artifactor-0.2.1-py3-none-any.whl (30.6 kB view details)

Uploaded Python 3

File details

Details for the file artifactor-0.2.1.tar.gz.

File metadata

  • Download URL: artifactor-0.2.1.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for artifactor-0.2.1.tar.gz
Algorithm Hash digest
SHA256 687f923738c193cc864917c9f60d67858cee9fe1bf628405af4f024c2803a910
MD5 689f40f2a3d083d0ab97a429b0745335
BLAKE2b-256 48f0f53fe741cc49282f3a323f2e5bae010e2cc64f61ab8b51dbc4b1078f96f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for artifactor-0.2.1.tar.gz:

Publisher: publish.yml on K-reel/artifactor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file artifactor-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: artifactor-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 30.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for artifactor-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6f3155c3c4493288e2f1d5400574db3bda2cd330a3e0ee59172b64d428efb7ef
MD5 7ff9cda60e5679aaec5abcccbc2f1992
BLAKE2b-256 253621790599158669059e791aaac44281a1edf774b4030468468fa68c6979ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for artifactor-0.2.1-py3-none-any.whl:

Publisher: publish.yml on K-reel/artifactor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page