A modern, high-performance Python library for turning text into clean, URL-safe slugs.
Project description
sluggi
sluggi โ The modern, blazing-fast Python library and CLI for turning any text into clean, URL-safe slugs.
Inspired by slugify, reimagined for speed, Unicode, and robust parallel batch processing.
Table of Contents
- Features
- Installation
- Usage
- API Reference
- Advanced Usage & Performance Tips
- Command-Line Interface (CLI)
- Development & Contributing
- Performance & Benchmarks
- License
- See Also
Features
- ๐ Fast: Optimized for speed with minimal dependencies.
- ๐ Unicode & Emoji: Handles dozens of scripts, emoji, and edge cases out of the box.
- ๐ง Customizable: Define your own character mappings and rules.
- ๐งต Parallel Batch: True multi-core batch slugification (thread/process/serial modes).
- โก Async Support: Full asyncio-compatible API for modern Python apps.
- ๐ฅ๏ธ CLI Tool: Powerful, colorized CLI for quick slug generation and batch jobs.
- ๐ Safe Output: Always generates URL-safe, predictable slugs.
- ๐งฉ Extensible API: Easy to use and extend.
- โ CI & Pre-commit: Linting, formatting, and tests run automatically.
Modular Slugification Pipeline
sluggi processes text through a modular pipeline of single-responsibility functions, making the codebase more readable, maintainable, and extensible. Each step in the pipeline performs a distinct transformation, allowing for easy customization and extension.
Pipeline Steps:
- normalize_unicode(text) Normalize Unicode characters to a canonical form (NFKC).
- decode_html_entities_and_refs(text) Decode HTML entities and character references to their Unicode equivalents.
- convert_emojis(text) Replace emojis with their textual representations.
- transliterate_text(text) Transliterate non-ASCII characters to ASCII (where possible).
- apply_custom_replacements(text, custom_map) Apply user-defined or staged character/string replacements.
- extract_words(text, word_regex) Extract words using a customizable regex pattern.
- filter_stopwords(words, stopwords) Remove unwanted words (e.g., stopwords) from the list.
- join_words(words, separator) Join words using the specified separator.
- to_lowercase(text, lowercase) Convert the result to lowercase if requested.
- strip_separators(text, separator) Remove leading/trailing separators.
- smart_truncate(text, max_length, separator) Optionally truncate the slug at a word boundary.
Processing Flow:
Input Text
โ
normalize_unicode
โ
decode_html_entities_and_refs
โ
convert_emojis
โ
transliterate_text
โ
apply_custom_replacements
โ
extract_words
โ
filter_stopwords
โ
join_words
โ
to_lowercase
โ
strip_separators
โ
smart_truncate
โ
Final Slug
This modular approach makes it easy to add, remove, or modify steps in the pipeline. Each function is pure and well-documented. See the API docs and source for details on customizing or extending the pipeline.
Installation
Install from PyPI:
pip install sluggi
For CLI and development:
pip install .[cli,dev]
Usage
from sluggi import slugify, batch_slugify
slug = slugify("Hello, world!")
print(slug) # hello-world
# Batch processing (parallel by default)
slugs = batch_slugify(["Hello, world!", "ะัะธะฒะตั ะผะธั"])
print(slugs) # ['hello-world', 'privet-mir']
# Advanced: Parallel processing
slugs = batch_slugify(["foo", "bar"], parallel=True, mode="process", workers=2)
# Stopwords (exclude common words from slugs)
slug = slugify("The quick brown fox jumps", stopwords=["the", "fox"])
print(slug) # quick-brown-jumps
slugs = batch_slugify([
"The quick brown fox jumps",
"Jump over the lazy dog"
], stopwords=["the", "over", "dog"])
print(slugs) # ['quick-brown-fox-jumps', 'jump-lazy']
# Custom regex pattern for word extraction (e.g., only extract capitalized words)
slug = slugify("The Quick Brown Fox", word_regex=r"[A-Z][a-z]+")
print(slug) # The-Quick-Brown-Fox
# Use in batch_slugify
slugs = batch_slugify([
"The Quick Brown Fox",
"Jump Over The Lazy Dog"
], word_regex=r"[A-Z][a-z]+")
print(slugs) # ['The-Quick-Brown-Fox', 'Jump-Over-The-Lazy-Dog']
Async Usage
Requires Python 3.7+
import asyncio
from sluggi import async_slugify, async_batch_slugify
async def main():
slug = await async_slugify("Hello, world!")
slugs = await async_batch_slugify(["Hello, world!", "ะัะธะฒะตั ะผะธั"], parallel=True)
print(slug) # hello-world
print(slugs) # ['hello-world', 'privet-mir']
asyncio.run(main())
Custom Separator
slug = slugify("Hello, world!", separator="_")
print(slug) # hello_world
Stopwords
slug = slugify("The quick brown fox", stopwords=["the", "fox"])
print(slug) # quick-brown
Custom Mapping
slug = slugify("รค รถ รผ", custom_map={"รค": "ae", "รถ": "oe", "รผ": "ue"})
print(slug) # ae-oe-ue
API Reference
slugify
| Argument | Type | Default | Description |
|---|---|---|---|
| text | str | โ | The input string to slugify. |
| separator | str | "-" | Word separator in the slug. |
| custom_map | dict | None | Custom character mappings. |
| stopwords | Iterable[str] | None | Words to exclude from the slug (case-insensitive if lowercase=True). |
| lowercase | bool | True | Convert result to lowercase. |
| word_regex | str | None | Custom regex pattern for word extraction (default: r'\w+'). |
| process_emoji | bool | True | If False, disables emoji-to-name conversion for max performance. |
Returns: str (slugified string)
Example:
slug = slugify("The quick brown fox", stopwords=["the", "fox"])
print(slug) # quick-brown
batch_slugify
| Argument | Type | Default | Description |
|---|---|---|---|
| texts | Iterable[str] | โ | List of strings to slugify. |
| separator | str | "-" | Word separator in the slug. |
| custom_map | dict | None | Custom character mappings. |
| stopwords | Iterable[str] | None | Words to exclude from slugs. |
| lowercase | bool | True | Convert result to lowercase. |
| word_regex | str | None | Custom regex pattern for word extraction (default: r'\w+'). |
| parallel | bool | False | Enable parallel processing. |
| workers | int | None | Number of parallel workers. |
| mode | str | "thread" | "thread", "process", or "serial". |
| chunk_size | int | 1000 | Number of items per worker chunk. |
| cache_size | int | 2048 | Size of the internal cache. |
Returns: List[str] (list of slugified strings)
Example:
slugs = batch_slugify(["The quick brown fox", "Jumped over the lazy dog"])
print(slugs) # ['quick-brown', 'jumped-over-the-lazy-dog']
async_slugify(text, separator="-", custom_map=None)
- Same as
slugify, but async.
async_batch_slugify(texts, ...)
- Same as
batch_slugify, but async.
Advanced Usage & Performance Tips
Skipping Emoji Handling for Maximum Speed
- By default, sluggi converts emoji to their textual names (e.g., ๐ โ smiley-face) for maximum compatibility and searchability.
- For maximum performance, you can disable emoji handling entirely if you do not need emoji-to-name conversion. This avoids all emoji detection and replacement logic, providing a measurable speedup for emoji-heavy or large datasets.
- To disable emoji handling:
- Python API: Pass
process_emoji=Falsetoslugify,batch_slugify, or any pipeline config. - CLI: Add the
--no-process-emojiflag to your command.
- Python API: Pass
Example:
slug = slugify("emoji: ๐๐ค๐", process_emoji=False)
print(slug) # emoji
sluggi slug "emoji: ๐๐ค๐" --no-process-emoji
# Output: emoji
Batch and Async Performance
- Parallel Processing:
- For large batches, use
parallel=Trueand tuneworkersandchunk_size. mode="process"enables true CPU parallelism for CPU-bound workloads.mode="thread"is best for I/O-bound or repeated/cached inputs.
- For large batches, use
- Caching:
- Threaded mode enables slugification result caching for repeated or overlapping inputs.
- Process mode disables cache (each process is isolated).
- Asyncio:
- Use
async_batch_slugifyfor async web servers or event-driven apps. - The
paralleloption with async batch uses a semaphore to limit concurrency, avoiding event loop starvation. - For best throughput, set
workersto your CPU count or the number of concurrent requests you expect.
- Use
Example: Tuning Batch Processing
# Large batch, CPU-bound: use process pool
slugs = batch_slugify(my_list, parallel=True, mode="process", workers=8, chunk_size=500)
# Async batch in a web API (FastAPI, Starlette, etc.)
from sluggi import async_batch_slugify
@app.post("/bulk-slugify")
async def bulk_slugify(payload: list[str]):
return await async_batch_slugify(payload, parallel=True, workers=8)
When to Use Serial vs Parallel vs Async
- Serial: Small batches, low latency, or single-threaded environments.
- Parallel (thread/process): Large batches, heavy CPU work, or when maximizing throughput is critical.
- Async: Integrate with modern async web frameworks, handle many concurrent requests, or avoid blocking the event loop.
See the docstrings and API reference for more details on each option.
Command-Line Interface (CLI)
Install CLI dependencies:
pip install .[cli]
Quick Start
sluggi slug "ฮฮตฮนฮฌ ฯฮฟฯ
ฮฯฯฮผฮต"
# Output: geia-sou-kosme
sluggi slug "The quick brown fox jumps" --stopwords "the,fox"
# Output: quick-brown-jumps
sluggi slug "The Quick Brown Fox" --word-regex "[A-Z][a-z]+"
# Output: The-Quick-Brown-Fox
sluggi slug "The Quick Brown Fox" --no-lowercase
# Output: The-Quick-Brown-Fox
sluggi batch --input names.txt --output slugs.txt
sluggi batch --input names.txt --word-regex "[A-Z][a-z]+" --no-lowercase
# Custom output formatting in batch mode:
sluggi batch --input names.txt --output-format "{line_num}: {original} -> {slug}"
# Output example:
# 1: Foo Bar -> foo-bar
# 2: Baz Qux -> baz-qux
sluggi batch --input names.txt --output-format "{slug}"
# Output: just the slug, as before
# Display results as a rich table in the console:
sluggi batch --input names.txt --display-output
# Output example (with rich):
# โโโโโโโโโโโโโโณโโโโโโโโโโโโโโโณโโโโโโโโโโโ
# โ row_number โ original โ slug โ
# โกโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฉ
# โ 1 โ Foo Bar โ foo-bar โ
# โ 2 โ Baz Qux โ baz-qux โ
# โโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโ
Supported placeholders for --output-format:
{slug}: The generated slug{original}: The original input line{line_num}: The 1-based line number
Note: The --display-output table uses the rich Python library. If not installed, a plain text table will be shown instead.
CLI Options
| Option | Description |
|---|---|
--separator |
Separator for words in the slug (default: -). |
--stopwords |
Comma-separated words to exclude from slug. |
--custom-map |
Custom mapping as JSON, e.g. '{"รค": "ae"}'. |
--word-regex |
Custom regex pattern for word extraction (default: \w+). |
--no-lowercase |
Preserve capitalization in the slug (default: False). |
--output-format |
Custom output format for batch mode. Supports {slug}, {original}, {line_num}. Default: just the slug. |
--display-output |
Display results as a rich table in the console after batch processing. |
CLI Help
sluggi --help
Error Handling Example
sluggi batch --input missing.txt
# Output:
[bold red]Input file not found: missing.txt[/bold red]
Development & Contributing
- Clone the repo:
git clone https://github.com/blip-box/sluggi.git cd sluggi
- Create a virtual environment and install dependencies using uv:
uv venv uv pip install .[dev,cli]
- Run tests and lints:
pytest ruff src/sluggi tests black --check src/sluggi tests
- Pre-commit hooks:
pre-commit install pre-commit run --all-files
- PRs and issues welcome!
Encoding Notes
- Input and output files must be UTF-8 encoded.
- On Windows, use a UTF-8 capable terminal or set the environment variable
PYTHONUTF8=1if you encounter encoding issues.
Help and Examples
- Run
sluggi --helpor any subcommand with--helpto see detailed usage and examples directly in your terminal.
Performance & Benchmarks
Batch slugification performance was measured using the included benchmark script:
python scripts/benchmark_batch.py
Results on 20,000 random strings:
| Mode | Time (s) | Avg ms/item |
|---|---|---|
| Serial | 0.74 | 0.037 |
| Thread | 0.62โ0.72 | 0.031โ0.036 |
| Process | 1.55โ1.73 | 0.078โ0.086 |
- Serial is fast and reliable for most workloads.
- Thread mode may be slightly faster for I/O-bound or lightweight CPU tasks (default for --parallel).
- Process mode (multiprocessing) enables true CPU parallelism, but has higher overhead and is best for very CPU-bound or expensive slugification tasks.
- Use
--mode processfor multiprocessing,--mode threadfor threads, or--mode serialfor no parallelism. Combine with--workersto tune performance.
Script location: scripts/benchmark_batch.py
Shell Completion
Enable tab-completion for your shell (bash, zsh, fish):
sluggi completion bash # or zsh, fish
# Follow the printed instructions to enable completion in your shell
License
MIT
[Changelog](GitHub Releases)
Note: This project is a complete rewrite, inspired by existing slugify libraries, but aims to set a new standard for speed, correctness, and extensibility in Python.
See Also
This project was inspired by the Java library slugify by akullpp. If you need Java or Gradle support, see their documentation for advanced transliteration and custom replacements.
Example (Java):
final Slugify slg = Slugify.builder()
.customReplacements(Map.of("Foo", "Hello", "bar", "world"))
.customReplacement("Foo", "Hello")
.customReplacement("bar", "world")
.build();
final String result = slg.slugify("Foo, bar!");
// result: hello-world
For advanced transliteration in Java:
capabilities {
requireCapability('com.github.slugify:slugify-transliterator')
}
Or add the optional dependency com.ibm.icu:icu4j to your project.
โจ New Automation & Collaboration Features
- Adaptive triage workflows: Issues and PRs are now auto-labeled, parsed for agent/human status, and incomplete PRs are auto-closed for youโsaving time for everyone.
- Agent-ready templates: All issue and PR templates are designed for both humans and autonomous agents, with structured metadata and feedback built in.
- Playground workflow: Safely experiment, test, or self-heal code with the new playground automationโperfect for bots and contributors alike.
See .github/workflows/README.md for more details on these next-generation automations!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sluggi-0.1.2.tar.gz.
File metadata
- Download URL: sluggi-0.1.2.tar.gz
- Upload date:
- Size: 35.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ecaacca10697635782312b2be742deed3652265a303dfc2f67709b7d37fb539
|
|
| MD5 |
f12919b76da92ed08681b552024715ef
|
|
| BLAKE2b-256 |
de7f89cbbb659d6dfe54f90f0926ff7e9ad104043c6e9fbe4f67ee4381314483
|
Provenance
The following attestation bundles were made for sluggi-0.1.2.tar.gz:
Publisher:
release.yml on blip-box/sluggi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sluggi-0.1.2.tar.gz -
Subject digest:
5ecaacca10697635782312b2be742deed3652265a303dfc2f67709b7d37fb539 - Sigstore transparency entry: 604501287
- Sigstore integration time:
-
Permalink:
blip-box/sluggi@252b2345ec402e874e596335051e22f3eb2cd858 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/blip-box
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@252b2345ec402e874e596335051e22f3eb2cd858 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sluggi-0.1.2-py3-none-any.whl.
File metadata
- Download URL: sluggi-0.1.2-py3-none-any.whl
- Upload date:
- Size: 28.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b2d493194f57c488e72c57f149806c64b2458ec441005f826fd7cb10326b6417
|
|
| MD5 |
824305003c0f81b35190b73b23951aa6
|
|
| BLAKE2b-256 |
8caf6ead89a4f9c2b1c1f14e813f6a00b9251fb443b17590ea031ae9f15f4143
|
Provenance
The following attestation bundles were made for sluggi-0.1.2-py3-none-any.whl:
Publisher:
release.yml on blip-box/sluggi
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sluggi-0.1.2-py3-none-any.whl -
Subject digest:
b2d493194f57c488e72c57f149806c64b2458ec441005f826fd7cb10326b6417 - Sigstore transparency entry: 604501332
- Sigstore integration time:
-
Permalink:
blip-box/sluggi@252b2345ec402e874e596335051e22f3eb2cd858 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/blip-box
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@252b2345ec402e874e596335051e22f3eb2cd858 -
Trigger Event:
push
-
Statement type: