No project description provided
Project description
D361: Robust Offline Documentation Generator
๐ TL;DR
D361 is a robust, enterprise-grade Python package that creates comprehensive offline versions of Document360 knowledge bases and other sitemap-based documentation sites. It's the generic, reusable foundation of the Document360 unified toolkit, designed for reliability, performance, and seamless integration.
Quick Start:
# Install and generate offline docs in one command
pip install d361 && playwright install chromium
d361-offline all --map-url="https://docs.example.com/sitemap-en.xml" --output-dir="offline_docs"
# Or use the standalone binary (no Python required)
curl -L -o d361-offline https://github.com/twardoch/d361/releases/latest/download/d361-offline-ubuntu-latest
chmod +x d361-offline && ./d361-offline all --map-url="https://docs.example.com/sitemap-en.xml"
Key Features:
- ๐ฏ Complete Documentation Capture - Intelligently extracts entire documentation structures
- ๐ Multi-Strategy Parsing - Robust sitemap parsing with multiple fallback mechanisms
- ๐ค Browser Automation - Playwright-based extraction with stealth techniques for dynamic content
- ๐ฑ Multi-Format Output - HTML, Markdown, and combined documentation files
- โก Performance Optimized - Concurrent downloads with intelligent retry logic
- ๐ Navigation Preservation - Maintains original site structure for intuitive offline browsing
๐ฆ What is D361?
D361 is the robust offline documentation generator that serves as the foundational component of the Document360 unified toolkit. As a standalone package, it specializes in extracting, processing, and organizing documentation content for offline access, with enterprise-grade reliability and performance.
Core Purpose: D361 automates the complete process of downloading entire Document360 sites (or other sitemap-based documentation) and converting them into comprehensive, browsable offline formats. It's designed to handle the complexities of modern documentation sites, including dynamic content, virtual scrolling, and complex navigation structures.
The D361 Workflow:
- ๐ Multi-Strategy Discovery - Advanced sitemap parsing with multiple fallback mechanisms
- ๐บ๏ธ Dynamic Structure Extraction - Intelligently maps navigation hierarchies from live sites
- โก Concurrent Content Fetching - High-performance parallel downloading with retry logic
- ๐ Multi-Format Processing - Converts content to HTML, Markdown, and combined formats
- ๐ Intelligent Organization - Creates structured offline archives with preserved navigation
Result: A complete, self-contained documentation snapshot that works entirely offline.
๐ฏ Who Uses D361?
Enterprise Documentation Teams:
- Technical Writers - Archive documentation versions, perform offline reviews, and create distribution packages
- DevOps Engineers - Integrate offline documentation into deployment pipelines and container images
- Support Engineers - Access knowledge bases instantly in customer support scenarios
- Compliance Teams - Create immutable documentation snapshots for regulatory requirements
Development & Integration:
- Software Developers - Bundle documentation with applications for offline environments
- System Integrators - Deploy documentation in air-gapped or restricted network environments
- CI/CD Pipelines - Automated documentation processing and archival as part of build processes
- Documentation Toolkit Builders - Use D361 as a foundational component (like in
vexy-help)
Specialized Use Cases:
- Industrial/Manufacturing - Offline documentation access on factory floors and production environments
- Healthcare/Government - Secure, compliant documentation in regulated environments
- Field Service - Technical documentation for remote locations with limited connectivity
- Training & Education - Portable documentation packages for distributed learning
๐ Why Choose D361?
๐ง Technical Excellence:
- Robust Architecture - Handles complex modern documentation sites with dynamic content
- Enterprise Performance - Concurrent processing with intelligent retry mechanisms and error handling
- Multiple Fallback Strategies - Ensures successful content extraction even with challenging sites
- Format Flexibility - Outputs HTML, Markdown, and combined formats for different use cases
๐ Real-World Reliability:
- Production-Tested - Successfully processes large-scale documentation sites with thousands of pages
- Stealth Browser Automation - Advanced Playwright techniques to handle cookie banners, virtual scrolling, and dynamic loading
- Content Preservation - Maintains original navigation structure, styling, and cross-references
- Error Resilience - Comprehensive error handling ensures partial success even with network issues
๐ Integration-Friendly:
- Standalone Operation - Works independently without external dependencies on other toolkit components
- API-First Design - Clean programmatic interface for integration into larger workflows
- Container-Ready - Docker-friendly with minimal resource requirements
- Cross-Platform - Native support for Linux, macOS, and Windows environments
โก Core Features & Capabilities
๐ฏ Complete Documentation Extraction
D361 employs sophisticated techniques to capture entire documentation ecosystems:
# Advanced content discovery with multiple fallback strategies
from d361.offline.parser import parse_sitemap
# Strategy 1: Direct sitemap parsing
urls = await parse_sitemap("https://docs.example.com/sitemap-en.xml")
# Strategy 2: Robots.txt discovery + parsing
urls = await parse_sitemap("https://docs.example.com/robots.txt", strategy="robots")
# Strategy 3: Stealth browser automation for protected sites
urls = await parse_sitemap("https://docs.example.com", strategy="stealth")
What gets captured:
- ๐ All article content (HTML + converted Markdown)
- ๐บ๏ธ Complete navigation hierarchy with nested categories
- ๐ผ๏ธ Referenced images and media files
- ๐ Cross-references and internal links
- ๐จ Original styling and CSS (optional)
๐ Multi-Strategy Sitemap Parsing
Robust discovery mechanisms ensure content extraction even from challenging sites:
from d361.offline.d361_offline import D361Offline
from d361.offline.config import Config
config = Config(
map_url="https://docs.example.com/sitemap-en.xml",
# Fallback strategies automatically attempted if primary fails
effort=True, # Enable additional discovery strategies
max_concurrent=8, # Concurrent parsing attempts
retries=3 # Per-strategy retry attempts
)
offline_gen = D361Offline(config)
await offline_gen.prep() # Intelligent sitemap discovery and parsing
Parsing Strategies:
- Direct Navigation - Standard HTTP GET to sitemap URL
- Stealth Browser - Playwright with human-like behavior patterns
- HTTP Direct - aiohttp-based lightweight parsing
- Robots.txt Discovery - Automatic sitemap URL discovery
- Google Cache - Last resort via cached versions
๐ค Advanced Browser Automation
Playwright-powered content extraction handles modern web complexity:
from d361.offline.browser import setup_browser, expand_all_items
# Configure stealth browser with realistic parameters
browser_config = {
'headless': True,
'user_agent': 'Mozilla/5.0 (compatible; D361 Documentation Archiver)',
'viewport': {'width': 1920, 'height': 1080},
'extra_http_headers': {
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate, br'
}
}
async with setup_browser(**browser_config) as browser:
page = await browser.new_page()
await page.goto("https://docs.example.com")
# Handle dynamic content loading
navigation_tree = await page.locator('#left-panel d360-data-list-tree-view').first
await expand_all_items(navigation_tree, page) # Recursively expand all navigation
# Extract complete navigation structure
nav_data = await extract_tree_structure(navigation_tree)
Browser Automation Capabilities:
- ๐ช Cookie Banner Dismissal - Automatically handles consent dialogs
- ๐ Virtual Scrolling - Loads all content from virtually rendered lists
- ๐ณ Dynamic Tree Expansion - Recursively expands navigation hierarchies
- โฑ๏ธ Network Idle Detection - Waits for complete content loading
- ๐ Retry Logic - Handles intermittent failures gracefully
๐ฑ Multi-Format Output Generation
Flexible output formats for different consumption needs:
# Configure output formats and customization
config = Config(
map_url="https://docs.example.com/sitemap-en.xml",
output_dir=Path("./offline_docs"),
css_file=Path("./custom-styling.css"), # Custom CSS for HTML output
# File naming patterns
all_docs_html_filename="complete_documentation.html",
all_docs_md_filename="complete_documentation.md",
# Processing options
test=False, # Process all content (not just test subset)
verbose=True # Detailed logging
)
offline_gen = D361Offline(config)
await offline_gen.all() # Generate all formats
Generated Output Structure:
offline_docs/docs.example.com/
โโโ prep.json # Sitemap discovery metadata
โโโ fetch.json # Content extraction results
โโโ nav.json # Navigation structure data
โโโ nav.html # Standalone navigation menu
โโโ nav.md # Markdown navigation index
โโโ all_docs.html # Complete HTML with embedded navigation
โโโ all_docs.md # Complete Markdown with TOC
โโโ html/ # Individual HTML pages
โ โโโ getting-started.html
โ โโโ api-reference.html
โ โโโ ...
โโโ md/ # Individual Markdown pages
โโโ getting-started.md
โโโ api-reference.md
โโโ ...
โก Performance-Optimized Processing
Enterprise-grade performance with intelligent resource management:
from d361.offline.config import Config
# Performance-tuned configuration
config = Config(
map_url="https://docs.example.com/sitemap-en.xml",
max_concurrent=12, # Concurrent page fetching
timeout=60, # Per-page timeout (seconds)
retries=5, # Retry attempts for failed pages
pause=0, # No artificial delays (max speed)
)
# Monitor performance during processing
offline_gen = D361Offline(config)
start_time = time.time()
result = await offline_gen.all()
processing_time = time.time() - start_time
print(f"Processed {len(result['content'])} pages in {processing_time:.2f}s")
print(f"Average: {processing_time/len(result['content']):.3f}s per page")
Performance Features:
- ๐ Concurrent Downloads - Configurable parallel processing (default: 5 concurrent)
- ๐ Exponential Backoff - Intelligent retry delays with
tenacitylibrary - ๐พ Memory Efficient - Streaming content processing to minimize memory usage
- ๐ Progress Tracking - Real-time processing status and performance metrics
- โก Network Optimization - Connection pooling and keep-alive for HTTP efficiency
Installation
D361 can be installed in multiple ways depending on your needs:
Quick Installation (Recommended)
# One-line installation script
curl -sSL https://raw.githubusercontent.com/twardoch/d361/main/scripts/install.sh | bash
Manual Installation
Via pip:
pip install d361
playwright install chromium
Via uv (faster):
uv pip install d361
playwright install chromium
Binary Download (No Python required):
# Linux
curl -L -o d361-offline https://github.com/twardoch/d361/releases/latest/download/d361-offline-ubuntu-latest
chmod +x d361-offline
# macOS
curl -L -o d361-offline https://github.com/twardoch/d361/releases/latest/download/d361-offline-macos-latest
chmod +x d361-offline
# Windows
curl -L -o d361-offline.exe https://github.com/twardoch/d361/releases/latest/download/d361-offline-windows-latest.exe
Installation Options
The installation script supports various options:
# Install specific version
./scripts/install.sh --version 1.0.0
# Install via specific method
./scripts/install.sh --method binary
# Install with Playwright browsers
./scripts/install.sh --install-browsers
# Install to custom directory
./scripts/install.sh --install-dir ~/.local/bin
# See all options
./scripts/install.sh --help
Command Line Usage
The package provides a command-line interface d361-offline with several operations. The main commands are prep, fetch, build, and all.
1. all (Recommended for most users):
Runs the entire process: preparation, fetching, and building.
d361-offline all --map-url="https://docs.example.com/sitemap-en.xml" --output-dir="my_offline_docs"
--map-url: (Required) The URL to your Document360 sitemap (usually ends withsitemap-en.xmlor similar).--output-dir: (Optional) The directory where offline documentation will be saved. Defaults to a folder named after the domain in the current directory (e.g.,./docs.example.com/).--style: (Optional) Path to a custom CSS file to style the HTML output.--nav-url: (Optional) URL of a specific page to extract navigation from. If not provided, uses the first URL from the sitemap.
2. Individual Steps (for advanced control):
-
prep: Parses the sitemap and extracts the navigation structure.d361-offline prep --map-url="https://docs.example.com/sitemap-en.xml" --output-dir="my_docs"
This creates a
prep.jsonfile in the output directory. -
fetch: Downloads the content for all URLs found in theprepphase.d361-offline fetch --prep-file="my_docs/prep.json" --output-dir="my_docs"
This creates a
fetch.jsonfile and saves individual HTML/Markdown pages. -
build: Generates the final combined documentation files from the fetched content.d361-offline build --fetch-file="my_docs/fetch.json" --output-dir="my_docs" --style="path/to/custom.css"
Getting Help:
For a full list of options for each command, use d361-offline <command> --help.
For example: d361-offline all --help.
Programmatic Usage
You can also use D361 from your Python scripts:
import asyncio
from pathlib import Path
from d361.offline.config import Config
from d361.offline.d361_offline import D361Offline
async def generate_my_docs():
# Configure the generator
# Ensure map_url is provided
sitemap_url = "https://docs.example.com/sitemap-en.xml" # Replace with actual sitemap URL
if not sitemap_url:
raise ValueError("map_url must be set for Config")
config = Config(
map_url=sitemap_url,
output_dir=Path("custom_offline_docs"), # Output will be in ./custom_offline_docs/docs.example.com/
css_file=Path("styles/my_custom_style.css") if Path("styles/my_custom_style.css").exists() else None,
max_concurrent=5, # Number of parallel downloads
retries=3, # Number of retries for failed requests
timeout=60, # Timeout for page loads in seconds
verbose=False, # Set to True for detailed logging
test=False # Set to True to process only a few items for testing
)
# Create an instance of the offline generator
offline_generator = D361Offline(config)
try:
print(f"Starting offline generation for {config.map_url}...")
print(f"Output will be saved to: {config.output_dir.resolve()}")
# Run the entire process: prep, fetch, and build
await offline_generator.all()
# Alternatively, run individual phases:
# print("Running prep phase...")
# prep_data = await offline_generator.prep()
# print(f"Prep phase complete. Data saved to {config.prep_file}")
# print("Running fetch phase...")
# fetch_data = await offline_generator.fetch(prep_file=config.prep_file)
# print(f"Fetch phase complete. Data saved to {config.fetch_file}")
# print("Running build phase...")
# await offline_generator.build(fetch_file=config.fetch_file)
# print("Build phase complete.")
print("Offline documentation generated successfully!")
print(f"Combined HTML: {config.output_dir / config.all_docs_html_filename}")
print(f"Combined Markdown: {config.output_dir / config.all_docs_md_filename}")
except Exception as e:
print(f"An error occurred: {e}")
import traceback
traceback.print_exc()
if __name__ == "__main__":
# Ensure Playwright browsers are installed:
# Run `playwright install` in your terminal if you haven't already.
asyncio.run(generate_my_docs())
Configuration Options
The behavior of d361-offline is controlled by the Config model (see src/d361/offline/config.py). Key options include:
Option (Config field) |
CLI Argument | Description | Default (from Config) |
|---|---|---|---|
map_url |
--map-url |
URL of the sitemap (e.g., sitemap.xml) | Required (None by default, must be set) |
nav_url |
--nav-url |
URL of a page to extract navigation from (optional) | None |
output_dir |
--output-dir |
Base directory for output. A subfolder named after the domain will be created here. | Current working directory |
css_file |
--style (build) |
Path to custom CSS file for styling HTML output | None |
effort |
--effort (prep) |
Try harder to map all sitemap links in navigation | False |
max_concurrent |
--parallel |
Maximum number of concurrent download requests | 5 |
retries |
--retries |
Number of retry attempts for failed requests | 3 |
timeout |
--timeout |
Request timeout in seconds for page loads | 60 |
verbose |
--verbose |
Enable verbose (DEBUG level) logging | False |
test |
--test (prep, fetch) |
Test mode: process only a few items (typically 5) | False |
pause |
--wait (prep) |
Pause during navigation extraction (for debugging browser) | False (numeric value for seconds in CLI) |
Note: Default output_dir behavior: If map_url is https://docs.example.com/... and output_dir is my_docs (or not set, defaulting to current dir), the actual output path will be my_docs/docs.example.com/.
Output Structure
The generated offline documentation will be organized as follows in your specified output directory (e.g., output_dir/your_domain_com/):
output_dir/your_domain_com/
โโโ prep.json # Intermediate data from preparation phase (URLs, navigation)
โโโ fetch.json # Intermediate data from fetch phase (content map)
โโโ nav.json # Navigation structure in JSON format
โโโ nav.html # Standalone HTML version of the navigation menu
โโโ nav.md # Markdown version of the navigation menu
โโโ all_docs.html # Combined HTML documentation with navigation and styling
โโโ all_docs.md # Combined Markdown documentation with a table of contents
โโโ html/ # Directory for individual HTML page files
โ โโโ page-slug-1.html
โ โโโ page-slug-2.html
โ โโโ ...
โโโ md/ # Directory for individual Markdown page files
โโโ page-slug-1.md
โโโ page-slug-2.md
โโโ ...
If a custom CSS file is used, it will be copied into html/assets/ and linked in all_docs.html.
Part 2: Technical Deep-Dive
This section describes how D361 works internally, its architecture, and guidelines for contributors.
How the Code Works
D361 operates in a three-phase workflow: Prep, Fetch, and Build. All operations are asynchronous using Python's asyncio library for efficient I/O and concurrency.
Core Workflow:
-
Prep Phase:
- Parses the sitemap (
map_url) to get a list of all unique page URLs. - Extracts the navigation structure (table of contents) from a specified page (
nav_urlor the first sitemap URL). - Saves this information (
urls,navigation,config) intoprep.json. - Generates
nav.json,nav.html, andnav.md.
- Parses the sitemap (
-
Fetch Phase:
- Reads
prep.json. - For each URL, fetches the page content (title, HTML body, Markdown version).
- Saves individual page content as
html/<slug>.htmlandmd/<slug>.md. - Saves all fetched content mapped by URL, along with the navigation structure and config, into
fetch.json.
- Reads
-
Build Phase:
- Reads
fetch.json. - Generates
all_docs.html: A single HTML file containing all articles, prepended with the navigation menu and linked to the specified CSS. - Generates
all_docs.md: A single Markdown file containing all articles, with a generated table of contents at the top.
- Reads
Key Components and Processes:
-
Configuration (
src/d361/offline/config.py):- The
Configclass (Pydantic model) manages all settings. It validates inputs, computes default values (likeoutput_dirbased on the domain), and provides paths for various output files.
- The
-
Main Orchestrator (
src/d361/offline/d361_offline.py):- The
D361Offlineclass is the heart of the generator. It takes aConfigobject and has methods forprep(),fetch(),build(), andall(). - It coordinates interactions between other modules.
- The
-
Command Line Interface (
src/d361/offline/__main__.py):- Uses the
firelibrary to exposeD361Offlinemethods and configuration options to the command line asd361-offline prep,fetch,build,all.
- Uses the
-
Sitemap Parsing (
src/d361/offline/parser.py):- The
parse_sitemapfunction is responsible for fetching and extracting URLs from thesitemap.xml. - It employs multiple strategies for robustness:
- Direct Playwright navigation (
_parse_with_playwright_direct). - Playwright with enhanced stealth techniques (
_parse_with_playwright_stealth) to mimic human browsing. - Direct HTTP GET request using
aiohttp(_parse_with_aiohttp_direct). - Checking
robots.txtfor sitemap directives and then parsing found URLs (_parse_with_playwright_via_robots). - As a last resort, it can try Google's web cache of the sitemap.
- Direct Playwright navigation (
- Uses
BeautifulSoup(withlxmlparser) to parse XML content and extract<loc>tags.
- The
-
Navigation Extraction (
src/d361/offline/navigation.py):- The
extract_navigationfunction uses Playwright to load thenav_url. - This is one ofthe most complex interactions due to Document360's dynamic UI:
- Cookie/Consent Handling: Attempts to detect and dismiss various cookie consent banners.
- Tree Expansion: Locates the main navigation tree element (e.g.,
#left-panel ... d360-data-list-tree-view). It then callsexpand_navigation_treewhich usesbrowser.expand_all_items. expand_all_items(inbrowser.py) repeatedly scrolls the navigation pane (to load virtually rendered items viascroll_to_bottom) and clicks on collapsed item indicators (e.g., triangle icons) until all navigation nodes are visible.- Structure Parsing:
extract_tree_structurethen iterates over the DOM elements of the expanded tree to rebuild the hierarchical navigation data (titles, links, children).
- Includes fallback mechanisms if standard Document360 selectors are not found.
- The
-
Content Fetching and Processing (
src/d361/offline/content.py,D361Offline.process_url):- For each URL,
D361Offline.process_urllaunches a Playwright page. extract_page_content(incontent.py) is called:- Navigates to the URL.
- Attempts to dismiss cookie banners.
- Waits for network idle and content to render.
- Extracts the page title (trying common selectors like
h1.article-title). - Extracts the main article HTML content (trying selectors like
#articleContent,.article-content). - Converts the extracted HTML to Markdown using the
markdownifylibrary.
- The
D361Offlineclass then saves this content tohtml/<slug>.htmlandmd/<slug>.md. Slugs are generated from URLs.
- For each URL,
-
Browser Automation (
src/d361/offline/browser.py):setup_browser: Configures and launches Playwright (Chromium by default) with specific arguments to appear more like a regular browser and handle various environments.scroll_to_bottom: Handles scrolling within elements that use virtual scrolling (common in Document360 navigation) to ensure all items are loaded into the DOM.expand_all_items: A sophisticated function to recursively find and click "expand" icons in a tree structure, dealing with items that might only appear after scrolling or previous expansions. It uses multiple selector strategies.
-
Output Generation (
D361Offline._generate_combined_files,src/d361/offline/generator.py):D361Offline._generate_combined_filesis responsible for creatingall_docs.htmlandall_docs.md.- For
all_docs.html:- It includes a navigation section generated from
nav.json. - It appends the HTML content of each article, ordered by the navigation structure.
- It embeds the custom CSS (if provided) or a default style.
- It includes a navigation section generated from
- For
all_docs.md:- It generates a Table of Contents based on the navigation and article titles.
- It appends the Markdown content of each article.
- The
generator.pymodule contains helper functions for creating directories and was initially intended for more granular file generation, though much of that logic is now withinD361Offline.
-
Error Handling and Retries:
- The
tenacitylibrary is used incontent.extract_page_contentto automatically retry page content extraction on failure, using exponential backoff. - Individual URL processing errors are logged but generally don't stop the entire batch, allowing the tool to fetch as much content as possible.
- The
Development Environment
This project uses Hatch for managing dependencies, virtual environments, and running development tasks. Hatch leverages uv if available, which significantly speeds up environment setup and package installation.
Setup:
-
Install Hatch and uv: It's recommended to install
uvfirst, then use it to installhatch.# Install uv (refer to official uv documentation for your OS) # Example for Linux/macOS: curl -LsSf https://astral.sh/uv/install.sh | sh # Then install Hatch using uv uv pip install hatch
-
Create/Activate Hatch Environment: Navigate to the project root directory and run:
hatch shellThis command:
- Creates an isolated virtual environment (e.g., in
.hatch/) if one doesn't exist. - Installs all project dependencies, including development tools (
pytest,ruff,mypy, etc.), usinguvif available. - Activates the environment in your current shell.
- Creates an isolated virtual environment (e.g., in
-
Install Playwright Browsers: After activating the environment, install the necessary browser binaries for Playwright:
playwright installThis typically installs Chromium, Firefox, and WebKit. D361 primarily uses Chromium.
Running Tasks with Hatch:
Hatch scripts are defined in pyproject.toml under [tool.hatch.envs.*.scripts].
-
Run Tests: The project uses
pytest.# Run tests with coverage report hatch run test:test-cov # Run tests without coverage hatch run test:test
-
Linting and Formatting: The project uses Ruff for super-fast linting and formatting, and MyPy for static type checking.
# Format code and fix lint issues (where possible) hatch run lint:fix # Or an alias: hatch run fix # Check for linting and formatting issues hatch run lint:style # Or an alias: hatch run lint # Run static type checking hatch run lint:typing # Or an alias: hatch run type-check # Run all checks (style, format, types) hatch run lint:all
-
Pre-commit Hooks: The project is configured with pre-commit hooks (see
.pre-commit-config.yaml). Install them to automatically run checks before each commit:pre-commit install
Coding and Contribution Guidelines
Contributions are highly welcome! Please adhere to the following guidelines:
-
Branching Strategy:
- Create new branches from
mainfor features or bug fixes (e.g.,feat/add-new-exporter,fix/navigation-parsing-bug).
- Create new branches from
-
Code Style & Quality:
- Formatting: Code is formatted with Ruff. Run
hatch run lint:fixbefore committing. - Linting: Code is linted with Ruff. Ensure
hatch run lint:stylepasses. - Type Checking: All code should pass MyPy checks. Run
hatch run lint:typing. - Pythonic Code: Write clear, readable, and idiomatic Python.
- Docstrings and Comments: Add docstrings to all public modules, classes, and functions. Use comments for complex logic.
- Formatting: Code is formatted with Ruff. Run
-
Commit Messages:
- Follow the Conventional Commits specification.
- Examples:
feat: add support for Confluence sitemap parsingfix: improve resilience of cookie banner dismissaldocs: update README with advanced usage examplesrefactor: simplify content extraction logictest: add unit tests for slug generation
-
Testing:
- Write tests for all new features and bug fixes using
pytest. - Place tests in the
tests/directory, mirroring the structure ofsrc/d361/. - Aim for high test coverage. Check coverage with
hatch run test:test-cov. - Ensure all tests pass locally before submitting a Pull Request.
- Write tests for all new features and bug fixes using
-
Pull Requests (PRs):
- Submit PRs against the
mainbranch. - Provide a clear and descriptive title and summary for your PR.
- Explain the "what" and "why" of your changes. Link to any relevant issues.
- Ensure all GitHub Actions CI checks (tests, linting, type checking) pass on your PR.
- Be responsive to feedback and code reviews.
- Submit PRs against the
-
Dependencies:
- Minimize new dependencies. If adding one, justify its need.
- Add new dependencies to
pyproject.tomlunder[project.dependencies]or[project.optional-dependencies.dev].
Releases
D361 follows Semantic Versioning and provides multiple distribution formats:
- PyPI Package: Available on PyPI for
pipanduvinstallation - Binary Releases: Pre-built executables for Linux, macOS, and Windows
- Source Code: Available on GitHub
Each release includes:
- Source distribution (
.tar.gz) - Wheel distribution (
.whl) - Standalone binaries for all platforms
- Automated testing across Python 3.10-3.12 and multiple operating systems
Release Process
New releases are automatically created when version tags are pushed:
# Create and push a new release tag
git tag v1.0.0
git push origin v1.0.0
This triggers the CI/CD pipeline which:
- Runs comprehensive tests on all platforms
- Builds Python packages and binaries
- Publishes to PyPI
- Creates GitHub release with binary artifacts
For development and contribution guidelines, see DEVELOPMENT.md.
License
D361 is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file d361-2.2.1.tar.gz.
File metadata
- Download URL: d361-2.2.1.tar.gz
- Upload date:
- Size: 9.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5ed6866c1ca0de698c646a530e056f836cc48f03116c423aa8b76210251cda1
|
|
| MD5 |
0622b1ab3f91a46af88594a6acc1092d
|
|
| BLAKE2b-256 |
97cc0ec8f293fd1e943d51c1f7b9b036e988988aea705b7ca47b15c85fb8502d
|
File details
Details for the file d361-2.2.1-py3-none-any.whl.
File metadata
- Download URL: d361-2.2.1-py3-none-any.whl
- Upload date:
- Size: 328.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
186ced79a2f7adc65fe0f9679e1c03f73fbe82a4c0830da9104d7dc2a52b771f
|
|
| MD5 |
f61a014fda038f198ae29b1c42b3c4cc
|
|
| BLAKE2b-256 |
31ac3e8f6a8c4f0870cd1e668551793a7270e2f4b36c21957f5ed61f17753f0b
|