Skip to main content

Tools for markdown parsing and generation

Project description

dn

Markdown parsing and generation

To install: pip install dn

Optional Dependencies

This package supports converting various file formats to Markdown, with each format requiring specific dependencies:

Format      Required Package(s)
----------- -----------------
PDF         pypdf
Word        mammoth
Excel       pandas, openpyxl, tabulate
PowerPoint  python-pptx
HTML        html2text
Notebooks   nbconvert, nbformat

Installation Options

You can install these dependencies after the fact, if and when package complains it needs some specific resource.

You can also install these when installing dn, like so:

    # Install with minimal dependencies
    pip install dn

    # Install with support for specific formats
    pip install dn[pdf]               # PDF conversion support
    pip install dn[word]              # Word document support
    pip install dn[excel]             # Excel support
    pip install dn[powerpoint]        # PowerPoint support
    pip install dn[html]              # HTML conversion
    pip install dn[notebook]          # Jupyter Notebook support

    # Install multiple format support
    pip install dn[pdf,word,excel]    # Multiple formats

    # Install all optional dependencies
    pip install dn[all]

Examples

To and from jupyter notebooks

from dn import markdown_to_notebook

sample_markdown = """# Sample Notebook

This is a markdown cell with some explanation.

```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer is {x}")
```

## Another Section

More markdown content here.

```python
# Another code cell
def greet(name):
    return f"Hello, {name}!"

print(greet("Jupyter"))
```

Final markdown cell."""

Test basic functionality

notebook = markdown_to_notebook(sample_markdown)
print(f"Created notebook with {len(notebook['cells'])} cells")

Test with file output

output_path = markdown_to_notebook(
    sample_markdown,
    egress="./sample_notebook.ipynb"
)
print(f"Saved notebook to: {output_path}")
Created notebook with 5 cells
Saved notebook to: /Users/thorwhalen/Dropbox/py/proj/t/dn/misc/sample_notebook.ipynb
from dn import notebook_to_markdown

md_string = notebook_to_markdown(notebook)
print(md_string)
# Sample Notebook

This is a markdown cell with some explanation.



```python
# This is a code cell
print("Hello, World!")
x = 42
print(f"The answer 
...
nt(greet("Jupyter"))

```

Final markdown cell.

... and other formats

from dn import pdf_to_markdown  # requires pypdf
from dn import docx_to_markdown  # requires mammoth
from dn import excel_to_markdown  # requires pandas
from dn import pptx_to_markdown  # requires python-pptx
from dn import html_to_markdown  # requires html2text

Markdown stores

User story: I have a directory with multiple files in different formats.

I want to batch convert all supported files to markdown and store them in memory.

from dn import Files, bytes_store_to_markdown_store

from dn.tests.utils_for_testing_dn import test_data_dir

# Setup source files from test directory
src_files = Files(test_data_dir)

# Setup target store as an in-memory dictionary
target_store = {}

# Convert all files in directory to markdown
result = bytes_store_to_markdown_store(src_files, target_store, verbose=False)

# Check that the result is the target_store
assert result is target_store

# Verify that the supported file types were converted correctly
supported_files = [
    "test.docx",
    "test.pptx",
    "test.pdf",
    "test.html",
    "test.xlsx",
    "test.txt",
    "test.md",
    "test.ipynb",
]

print(f"\nSupported files (given what packages are installed here): {supported_files}\n")

for filename in supported_files:
    assert f"{filename}.md" in target_store, f"{filename} not found in target_store"
    assert len(target_store[f"{filename}.md"]) > 0, f"{filename} conversion failed"
invalid pdf header: b'PK\x03\x04\n'
EOF marker not found
EOF marker not found
invalid pdf header: b'PK\x03\x04\x14'
EOF marker not found
invalid pdf h
...
df header: b'PK\x03\x04\x14'
EOF marker not found



Supported files (given what packages are installed here): ['test.docx', 'test.pptx', 'test.pdf', 'test.html', 'test.xlsx', 'test.txt', 'test.md', 'test.ipynb']

Convert this notebook into a markdown for the README.md

from dn import notebook_to_markdown

notebook_to_markdown('~/Dropbox/py/proj/t/dn/misc/dn_readme.ipynb', target_file='../README.md')
HTML output truncated. (Data removed)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dn-0.0.9.tar.gz (153.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dn-0.0.9-py3-none-any.whl (151.7 kB view details)

Uploaded Python 3

File details

Details for the file dn-0.0.9.tar.gz.

File metadata

  • Download URL: dn-0.0.9.tar.gz
  • Upload date:
  • Size: 153.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dn-0.0.9.tar.gz
Algorithm Hash digest
SHA256 4e9e5f99944cc128eef2535c0955d81bbe19b5c913f45584454a8d36a99012cc
MD5 7dbb0f4ef8ae29fb064e7a3b4c2e373b
BLAKE2b-256 1c0ee0adf93aeea87475dd98369f4831cd38e466dc550e1840d475ad8e9abe4f

See more details on using hashes here.

File details

Details for the file dn-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: dn-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 151.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dn-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 e58acb37005d5b476f190358b4f5addc32b894bc59d56d91bff696e2255c1aba
MD5 a2814e401196419e27eb7442d078520f
BLAKE2b-256 ebb42a0d5fce981624359e9d35e5a6a1d3392ba83bb29451138616dc71a25cc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page