Skip to main content

Tools for markdown parsing and generation

Project description

dn

Markdown parsing and generation

To install: pip install dn

Optional Dependencies

This package supports converting various file formats to Markdown, with each format requiring specific dependencies:

Format      Required Package(s)
----------- -----------------
PDF         pypdf
Word        mammoth
Excel       pandas, openpyxl, tabulate
PowerPoint  python-pptx
HTML        html2text
Notebooks   nbconvert, nbformat

Installation Options

You can install these dependencies after the fact, if and when package complains it needs some specific resource.

You can also install these when installing dn, like so:

    # Install with minimal dependencies
    pip install dn

    # Install with support for specific formats
    pip install dn[pdf]               # PDF conversion support
    pip install dn[word]              # Word document support
    pip install dn[excel]             # Excel support
    pip install dn[powerpoint]        # PowerPoint support
    pip install dn[html]              # HTML conversion
    pip install dn[notebook]          # Jupyter Notebook support

    # Install multiple format support
    pip install dn[pdf,word,excel]    # Multiple formats

    # Install all optional dependencies
    pip install dn[all]

Examples

To and from jupyter notebooks

from dn import markdown_to_notebook

sample_markdown = """# Sample Notebook

This is a markdown cell with some explanation.

```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer is {x}")
```

## Another Section

More markdown content here.

```python
# Another code cell
def greet(name):
    return f"Hello, {name}!"

print(greet("Jupyter"))
```

Final markdown cell."""

Test basic functionality

notebook = markdown_to_notebook(sample_markdown)
print(f"Created notebook with {len(notebook['cells'])} cells")

Test with file output

output_path = markdown_to_notebook(
    sample_markdown,
    egress="./sample_notebook.ipynb"
)
print(f"Saved notebook to: {output_path}")
Created notebook with 5 cells
Saved notebook to: /Users/thorwhalen/Dropbox/py/proj/t/dn/misc/sample_notebook.ipynb
from dn import notebook_to_markdown

md_string = notebook_to_markdown(notebook)
print(md_string)
# Sample Notebook

This is a markdown cell with some explanation.



```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer 
...
nt(greet("Jupyter"))

```

Final markdown cell.

... and other formats

from dn import pdf_to_markdown  # requires pypdf
from dn import docx_to_markdown  # requires mammoth
from dn import excel_to_markdown  # requires pandas
from dn import pptx_to_markdown  # requires python-pptx
from dn import html_to_markdown  # requires html2text

Markdown stores

User story: I have a directory with multiple files in different formats.

I want to batch convert all supported files to markdown and store them in memory.

from dn import Files, bytes_store_to_markdown_store

from dn.tests.utils_for_testing_dn import test_data_dir

# Setup source files from test directory
src_files = Files(test_data_dir)

# Setup target store as an in-memory dictionary
target_store = {}

# Convert all files in directory to markdown
result = bytes_store_to_markdown_store(src_files, target_store, verbose=False)

# Check that the result is the target_store
assert result is target_store

# Verify that the supported file types were converted correctly
supported_files = [
    "test.docx",
    "test.pptx",
    "test.pdf",
    "test.html",
    "test.xlsx",
    "test.txt",
    "test.md",
    "test.ipynb",
]

print(f"\nSupported files (given what packages are installed here): {supported_files}\n")

for filename in supported_files:
    assert f"{filename}.md" in target_store, f"{filename} not found in target_store"
    assert len(target_store[f"{filename}.md"]) > 0, f"{filename} conversion failed"
invalid pdf header: b'PK\x03\x04\n'
EOF marker not found
EOF marker not found
invalid pdf header: b'PK\x03\x04\x14'
EOF marker not found
invalid pdf h
...
df header: b'PK\x03\x04\x14'
EOF marker not found



Supported files (given what packages are installed here): ['test.docx', 'test.pptx', 'test.pdf', 'test.html', 'test.xlsx', 'test.txt', 'test.md', 'test.ipynb']

Convert this notebook into a markdown for the README.md

from dn import notebook_to_markdown

notebook_to_markdown('~/Dropbox/py/proj/t/dn/misc/dn_readme.ipynb', target_file='../README.md')
HTML output truncated. (Data removed)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dn-0.0.11.tar.gz (153.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dn-0.0.11-py3-none-any.whl (151.7 kB view details)

Uploaded Python 3

File details

Details for the file dn-0.0.11.tar.gz.

File metadata

  • Download URL: dn-0.0.11.tar.gz
  • Upload date:
  • Size: 153.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dn-0.0.11.tar.gz
Algorithm Hash digest
SHA256 28a5ef3f62050f5d788181083e88443a84af4d798ef194a92b08fe2d51006172
MD5 06f5c2c31b4ee1f09416ea2e12d9dce5
BLAKE2b-256 5d273d6bfae096d1f163b0d5bf5845e7f6b9c92f720f9a2f0598b47cf90d1b9e

See more details on using hashes here.

File details

Details for the file dn-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: dn-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 151.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dn-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 d5cfd807d22823764881a47efb14a1035de6ca99a2610f6341e31fd0ddb8e579
MD5 822d78ff528d124290f72957f66b6a7f
BLAKE2b-256 b2b9148800e934b05dcf4d4a95c625732caee77143ea12142bcb53ca100d0a15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page