Skip to main content

Tools for markdown parsing and generation

Project description

dn

Markdown parsing and generation

To install: pip install dn

Optional Dependencies

This package supports converting various file formats to Markdown, with each format requiring specific dependencies:

Format      Required Package(s)
----------- -----------------
PDF         pypdf
Word        mammoth
Excel       pandas, openpyxl, tabulate
PowerPoint  python-pptx
HTML        html2text
Notebooks   nbconvert, nbformat

Installation Options

You can install these dependencies after the fact, if and when package complains it needs some specific resource.

You can also install these when installing dn, like so:

    # Install with minimal dependencies
    pip install dn

    # Install with support for specific formats
    pip install dn[pdf]               # PDF conversion support
    pip install dn[word]              # Word document support
    pip install dn[excel]             # Excel support
    pip install dn[powerpoint]        # PowerPoint support
    pip install dn[html]              # HTML conversion
    pip install dn[notebook]          # Jupyter Notebook support

    # Install multiple format support
    pip install dn[pdf,word,excel]    # Multiple formats

    # Install all optional dependencies
    pip install dn[all]

Examples

To and from jupyter notebooks

from dn import markdown_to_notebook

sample_markdown = """# Sample Notebook

This is a markdown cell with some explanation.

```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer is {x}")
```

## Another Section

More markdown content here.

```python
# Another code cell
def greet(name):
    return f"Hello, {name}!"

print(greet("Jupyter"))
```

Final markdown cell."""

Test basic functionality

notebook = markdown_to_notebook(sample_markdown)
print(f"Created notebook with {len(notebook['cells'])} cells")

Test with file output

output_path = markdown_to_notebook(
    sample_markdown,
    egress="./sample_notebook.ipynb"
)
print(f"Saved notebook to: {output_path}")
Created notebook with 5 cells
Saved notebook to: /Users/thorwhalen/Dropbox/py/proj/t/dn/misc/sample_notebook.ipynb
from dn import notebook_to_markdown

md_string = notebook_to_markdown(notebook)
print(md_string)
# Sample Notebook

This is a markdown cell with some explanation.



```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer 
...
nt(greet("Jupyter"))

```

Final markdown cell.

... and other formats

from dn import pdf_to_markdown  # requires pypdf
from dn import docx_to_markdown  # requires mammoth
from dn import excel_to_markdown  # requires pandas
from dn import pptx_to_markdown  # requires python-pptx
from dn import html_to_markdown  # requires html2text

Markdown stores

User story: I have a directory with multiple files in different formats.

I want to batch convert all supported files to markdown and store them in memory.

from dn import Files, bytes_store_to_markdown_store

from dn.tests.utils_for_testing_dn import test_data_dir

# Setup source files from test directory
src_files = Files(test_data_dir)

# Setup target store as an in-memory dictionary
target_store = {}

# Convert all files in directory to markdown
result = bytes_store_to_markdown_store(src_files, target_store, verbose=False)

# Check that the result is the target_store
assert result is target_store

# Verify that the supported file types were converted correctly
supported_files = [
    "test.docx",
    "test.pptx",
    "test.pdf",
    "test.html",
    "test.xlsx",
    "test.txt",
    "test.md",
    "test.ipynb",
]

print(f"\nSupported files (given what packages are installed here): {supported_files}\n")

for filename in supported_files:
    assert f"{filename}.md" in target_store, f"{filename} not found in target_store"
    assert len(target_store[f"{filename}.md"]) > 0, f"{filename} conversion failed"
invalid pdf header: b'PK\x03\x04\n'
EOF marker not found
EOF marker not found
invalid pdf header: b'PK\x03\x04\x14'
EOF marker not found
invalid pdf h
...
df header: b'PK\x03\x04\x14'
EOF marker not found



Supported files (given what packages are installed here): ['test.docx', 'test.pptx', 'test.pdf', 'test.html', 'test.xlsx', 'test.txt', 'test.md', 'test.ipynb']

Convert this notebook into a markdown for the README.md

from dn import notebook_to_markdown

notebook_to_markdown('~/Dropbox/py/proj/t/dn/misc/dn_readme.ipynb', target_file='../README.md')
HTML output truncated. (Data removed)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dn-0.0.10.tar.gz (153.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dn-0.0.10-py3-none-any.whl (151.7 kB view details)

Uploaded Python 3

File details

Details for the file dn-0.0.10.tar.gz.

File metadata

  • Download URL: dn-0.0.10.tar.gz
  • Upload date:
  • Size: 153.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dn-0.0.10.tar.gz
Algorithm Hash digest
SHA256 6a3bd4c552fab54222a60d209a794c5a3e2f52860c6727b6a019469d4eacdfc3
MD5 6748e6c05d21ea59ffd4e2b5f0b777e1
BLAKE2b-256 40307a040f2fd5dcb3ae66d984ddbcf3587611540954c0a48f9fc8003ab2ef9e

See more details on using hashes here.

File details

Details for the file dn-0.0.10-py3-none-any.whl.

File metadata

  • Download URL: dn-0.0.10-py3-none-any.whl
  • Upload date:
  • Size: 151.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dn-0.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 18853d8411ac5faab9d183cbc64934c4f002b9d9ba0bf22d8d8f37ac1f4b86b3
MD5 d7966b74ef9dfb50801fd48302795176
BLAKE2b-256 9d01bcf9847d90b8d2dbb900b76875b09f05bccdaf7c62ad3f5dda6c307c3030

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page