A library for cleaning, stripping markdown and converting it to text
Project description
mdclense
A lightweight, efficient Python library for converting Markdown to plain text.
mdclensestrips away all Markdown formatting while preserving the original content, making it perfect for text analysis, content extraction, and data processing pipelines.
Features
- Comprehensive Markdown support
- Preserves content while removing formatting
- Handles complex nested structures
- Clean whitespace management
- JSON-like structure support
- Zero dependencies beyond Python standard library
Supported Markdown Elements
- Headers (ATX and Setext style)
- Emphasis (bold, italic, bold-italic)
- Links and images
- Lists (ordered, unordered, and task lists)
- Code blocks and inline code
- Blockquotes
- Tables
- Horizontal rules
- HTML tags
- Strikethrough
- Footnotes
- Escaped characters
Installation
pip install mdclense
Quick Start
from mdclense.parser import MarkdownParser
# Create a parser instance
parser = MarkdownParser()
# Convert markdown to plain text
markdown_text = """
# Hello World
This is a **bold** and *italic* text with a [link](http://example.com).
- List item 1
- List item 2
"""
plain_text = parser.parse(markdown_text)
print(plain_text)
Output:
Hello World
This is a bold and italic text with a link.
List item 1
List item 2
Advanced Usage
Handling JSON-like Structures
# Parse markdown content from JSON-like structure
json_text = '"answer": "This is **bold** text with a [link](url)"'
plain_text = parser.parse(json_text)
print(plain_text) # Output: This is bold text with a link
Working with Code Blocks
markdown_text = '''
Here's some code:
```python
def hello():
print("world")
'''
plain_text = parser.parse(markdown_text) print(plain_text) # Code blocks are replaced with [CODE BLOCK] placeholder
## API Reference
### MarkdownParser Class
```python
class MarkdownParser:
def parse(markdown_text: str) -> str:
"""
Convert markdown text to plain text by removing all markdown formatting.
Args:
markdown_text (str): The markdown text to be converted
Returns:
str: Plain text without markdown formatting
"""
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Testing
# Install development dependencies
pip install pytest
# Run tests
pytest tests/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Inspired by the need for a simple, dependency-free Markdown to plain text converter
- Thanks to all contributors who have helped shape this project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdclense-0.1.1.tar.gz.
File metadata
- Download URL: mdclense-0.1.1.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d47206a023aa017b5b0f4e8255a7c43c3efff04d7a479722aa8f25e2b22ea05
|
|
| MD5 |
a0c59be524228ba79a7dce15eed368ac
|
|
| BLAKE2b-256 |
406ebc7ed25f61ef20195c2610618c334851773f314df01c1929e92607a12196
|
File details
Details for the file mdclense-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mdclense-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7496853a12298f71fb8d086e910fdd9c0841585df56f4142765da1366aef127a
|
|
| MD5 |
9ec742b568dfa17eb2972021934e2e6c
|
|
| BLAKE2b-256 |
b9502820482587e07ce0370de05ebd7e591b03cade0cc4ca5a3d583321dfe611
|