No project description provided

These details have not been verified by PyPI

Project links

Homepage

Project description

mrkdwn_analysis

mrkdwn_analysis is a powerful Python library designed to analyze Markdown files. It provides extensive parsing capabilities to extract and categorize various elements within a Markdown document, including headers, sections, links, images, blockquotes, code blocks, lists, tables, tasks (todos), footnotes, and even embedded HTML. This makes it a versatile tool for data analysis, content generation, or building other tools that work with Markdown.

Features

File Loading: Load any given Markdown file by providing its file path.
Header Detection: Identify all headers (ATX # to ######, and Setext === and ---) in the document, giving you a quick overview of its structure.
Section Identification (Setext): Recognize sections defined by a block of text followed by = or - lines, helping you understand the document’s conceptual divisions.
Paragraph Extraction: Distinguish regular text (paragraphs) from structured elements like headers, lists, or code blocks, making it easy to isolate the body content.
Blockquote Identification: Extract all blockquotes defined by lines starting with >.
Code Block Extraction: Detect fenced code blocks delimited by triple backticks (```), optionally retrieve their language, and separate programming code from regular text.
List Recognition: Identify both ordered and unordered lists, including task lists (- [ ], - [x]), and understand their structure and hierarchy.
Tables (GFM): Detect GitHub-Flavored Markdown tables, parse their headers and rows, and separate structured tabular data for further analysis.
Links and Images: Identify text links ([text](url)) and images (![alt](url)), as well as reference-style links. This is useful for link validation or content analysis.
Footnotes: Extract and handle Markdown footnotes ([^note1]), providing a way to process reference notes in the document.
HTML Blocks and Inline HTML: Handle HTML blocks (<div>...</div>) as a single element, and detect inline HTML elements (<span style="...">... </span>) as a unified component.
Front Matter: If present, extract YAML front matter at the start of the file.
Counting Elements: Count how many occurrences of a certain element type (e.g., how many headers, code blocks, etc.).
Textual Statistics: Count the number of words and characters (excluding whitespace). Get a global summary (analyse()) of the document’s composition.

Installation

Install mrkdwn_analysis from PyPI:

pip install markdown-analysis

Usage

Using mrkdwn_analysis is straightforward. Import MarkdownAnalyzer, create an instance with your Markdown file path, and then call the various methods to extract the elements you need.

from mrkdwn_analysis import MarkdownAnalyzer

analyzer = MarkdownAnalyzer("path/to/document.md")

headers = analyzer.identify_headers()
paragraphs = analyzer.identify_paragraphs()
links = analyzer.identify_links()
...

Example

Consider example.md:

---
title: "Python 3.11 Report"
author: "John Doe"
date: "2024-01-15"
---

Python 3.11
===========

A major **Python** release with significant improvements...

### Performance Details

```python
import math
print(math.factorial(10))

Quote: "Python 3.11 brings the speed we needed"

HTML block example

This paragraph contains inline HTML: Red text.

Unordered list:
- A basic point
- A task to do
- A completed task

Ordered list item 1
Ordered list item 2


After analysis:

```python
analyzer = MarkdownAnalyzer("example.md")

print(analyzer.identify_headers())
# {"Header": [{"line": X, "level": 1, "text": "Python 3.11"}, {"line": Y, "level": 3, "text": "Performance Details"}]}

print(analyzer.identify_paragraphs())
# {"Paragraph": ["A major **Python** release ...", "This paragraph contains inline HTML: ..."]}

print(analyzer.identify_html_blocks())
# [{"line": Z, "content": "<div class=\"note\">\n  <p>HTML block example</p>\n</div>"}]

print(analyzer.identify_html_inline())
# [{"line": W, "html": "<span style=\"color:red;\">Red text</span>"}]

print(analyzer.identify_lists())
# {
#   "Ordered list": [["Ordered list item 1", "Ordered list item 2"]],
#   "Unordered list": [["A basic point", "A task to do [Task]", "A completed task [Task done]"]]
# }

print(analyzer.identify_code_blocks())
# {"Code block": [{"start_line": X, "content": "import math\nprint(math.factorial(10))", "language": "python"}]}

print(analyzer.analyse())
# {
#   'headers': 2,
#   'paragraphs': 2,
#   'blockquotes': 1,
#   'code_blocks': 1,
#   'ordered_lists': 2,
#   'unordered_lists': 3,
#   'tables': 0,
#   'html_blocks': 1,
#   'html_inline_count': 1,
#   'words': 42,
#   'characters': 250
# }

Key Methods

__init__(self, input_file): Load the Markdown from path or file object.
identify_headers(): Returns all headers.
identify_sections(): Returns setext sections.
identify_paragraphs(): Returns paragraphs.
identify_blockquotes(): Returns blockquotes.
identify_code_blocks(): Returns code blocks with content and language.
identify_lists(): Returns both ordered and unordered lists (including tasks).
identify_tables(): Returns any GFM tables.
identify_links(): Returns text and image links.
identify_footnotes(): Returns footnotes used in the document.
identify_html_blocks(): Returns HTML blocks as single tokens.
identify_html_inline(): Returns inline HTML elements.
identify_todos(): Returns task items.
count_elements(element_type): Counts occurrences of a specific element type.
count_words(): Counts words in the entire document.
count_characters(): Counts non-whitespace characters.
analyse(): Provides a global summary (headers count, paragraphs count, etc.).

Checking and Validating Links

check_links(): Validates text links to see if they are broken (e.g., non-200 status) and returns a list of broken links.

Global Analysis Example

analysis = analyzer.analyse()
print(analysis)
# {
#   'headers': X,
#   'paragraphs': Y,
#   'blockquotes': Z,
#   'code_blocks': A,
#   'ordered_lists': B,
#   'unordered_lists': C,
#   'tables': D,
#   'html_blocks': E,
#   'html_inline_count': F,
#   'words': G,
#   'characters': H
# }

New in Version 0.2.0 🚀

Version 0.2.0 introduces powerful new features while maintaining 100% backward compatibility!

Search and Filtering

# Search for content
results = doc.search("Python", case_sensitive=False)

# Find headers by level
h2_headers = doc.find_headers_by_level(2)

# Generate table of contents
toc = doc.get_table_of_contents(max_level=3)

Export to Multiple Formats

# Export to JSON
json_output = doc.to_json(include_metadata=True)

# Export to HTML with styling
html_output = doc.to_html(include_style=True)

# Export to plain text
plain_text = doc.to_plain_text(strip_formatting=True)

Advanced Statistics

# Get reading time
reading_time = doc.get_reading_time()
print(reading_time['formatted'])  # "5 min read"

# Document complexity metrics
complexity = doc.get_complexity_metrics()
print(f"Complexity score: {complexity['complexity_score']}")

# Link statistics
link_stats = doc.get_link_statistics()
print(f"External links: {link_stats['external_links']}")

# Word frequency analysis
top_words = doc.get_word_frequency(top_n=20)

Improved Link Checking

# Parallel link checking (much faster!)
broken_links = doc.check_links(max_workers=10)
for link in broken_links:
    print(f"Broken: {link['url']} - {link.get('status_code', 'error')}")

Document Validation

# Validate document structure
validation = doc.validate_structure()
print(f"Valid: {validation['valid']}, Score: {validation['score']}/100")
for issue in validation['issues']:
    print(f"[{issue['type']}] {issue['message']}")

Code Extraction by Language

# Extract Python code blocks
python_code = doc.extract_code_by_language('python')
for block in python_code:
    print(block['content'])

Performance Improvements

Caching: Results are cached for faster repeated access
Parallel Processing: Link checking uses ThreadPoolExecutor (up to 10x faster)
Memory Optimization: Better memory management for large documents

See CHANGELOG.md for complete details.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request for bug reports, feature requests, or code improvements. Your input helps make mrkdwn_analysis more robust and versatile.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.3

Nov 30, 2025

0.2.2

Nov 25, 2025

0.2.1

Oct 31, 2025

0.2.0

Oct 31, 2025

0.1.6

Apr 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrkdwn_analysis-0.2.3.tar.gz (34.7 kB view details)

Uploaded Nov 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mrkdwn_analysis-0.2.3-py3-none-any.whl (31.7 kB view details)

Uploaded Nov 30, 2025 Python 3

File details

Details for the file mrkdwn_analysis-0.2.3.tar.gz.

File metadata

Download URL: mrkdwn_analysis-0.2.3.tar.gz
Upload date: Nov 30, 2025
Size: 34.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for mrkdwn_analysis-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`97482db59cad5d861e51026c47af0ffb2e03e92729c7d67a8233089b4faa5beb`
MD5	`2e2d67b7f81da2c623ec61bcf1133e4e`
BLAKE2b-256	`ff7db81dbddfd6f7d1fad1dd6b5411356e39ff8384afedd8c170297b7f25946e`

See more details on using hashes here.

File details

Details for the file mrkdwn_analysis-0.2.3-py3-none-any.whl.

File metadata

Download URL: mrkdwn_analysis-0.2.3-py3-none-any.whl
Upload date: Nov 30, 2025
Size: 31.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.4

File hashes

Hashes for mrkdwn_analysis-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`938014699168660f49032481fb17110b57f02d6a6aaf55bfbccd2685c7905d9f`
MD5	`3df44f7517d568605910d0fd0b6dc24b`
BLAKE2b-256	`a82c0a225cf227616f1b6e6d16d6e8778bac8a9383b3323f546e43e73ba6cd82`

See more details on using hashes here.

mrkdwn-analysis 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mrkdwn_analysis

Features

Installation

Usage

Example

Key Methods

Checking and Validating Links

Global Analysis Example

New in Version 0.2.0 🚀

Search and Filtering

Export to Multiple Formats

Advanced Statistics

Improved Link Checking

Document Validation

Code Extraction by Language

Performance Improvements

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes