Aspose.Note-compatible Python API for reading OneNote (.one) files
Project description
🗒️ Aspose.Note FOSS for Python
Quick links: 📚 Examples • 📦 PyPI
✅ Official Aspose project — 100% free & open-source. Provides an Aspose.Note-compatible Python API for working with OneNote .one files.
This repository provides a Python library with a subset-compatible Aspose.Note for .NET-shaped public API for reading Microsoft OneNote files (.one).
The goal is to offer a familiar surface (aspose.note.*) inspired by Aspose.Note for .NET, backed by this repository’s built-in MS-ONE/OneStore parser.
✨ Features
- ✅ Read
.onefrom a file path or a binary stream - ✅ Aspose-like DOM (Document/Page/Outline/…): traversal + type-based search
- ✅ Content extraction
- ✅ Rich text with formatting runs (TextRun/TextStyle) and hyperlinks
- ✅ Images (bytes, file name, dimensions)
- ✅ Attached files (bytes, file name)
- ✅ Tables (rows/cells + cell content)
- ✅ OneNote tags (NoteTag) on text/images/tables and tagged list content
- ✅ Numbered lists (NumberList) and nested outline elements
- ✅ PDF export via
Document.Save(..., SaveFormat.Pdf)(uses ReportLab)
🚀 Quick start
from aspose.note import Document
doc = Document("testfiles/SimpleTable.one")
print(doc.DisplayName)
pages = list(doc)
print(len(pages))
# pages are direct children of Document
for page in pages:
print(page.Title.TitleText.Text)
📄 Export to PDF
from aspose.note import Document, SaveFormat
doc = Document("testfiles/FormattedRichText.one")
doc.Save("out.pdf", SaveFormat.Pdf)
📦 Installation
From PyPI:
python -m pip install aspose-note
With PDF export support:
python -m pip install "aspose-note[pdf]"
From a local checkout:
python -m pip install -e .
PDF export requires ReportLab:
python -m pip install -e ".[pdf]"
Semantic PDF golden tests require pypdf in addition to ReportLab:
python -m pip install -e ".[pdf,test-pdf]"
PDF golden workflow
Golden PDFs are stored under tests/goldens/pdf/ together with JSON manifests extracted from the generated PDF.
The test suite compares manifests, not raw PDF bytes, so it stays stable across platforms and ReportLab internals.
The PDF writer now uses deterministic Base-14 fonts by default. If you explicitly want to try Windows system fonts for local inspection, set ASPOSE_NOTE_PDF_USE_SYSTEM_FONTS=1 before export.
Regenerate the baselines with:
python tools/regenerate_pdf_goldens.py
To rebuild only selected cases:
python tools/regenerate_pdf_goldens.py --case formatted_richtext --case simple_table
Run the verification suite with:
python -m unittest tests.test_aspose_note_pdf_goldens -v
On mismatch, generated PDFs and manifests are written to tests/out/pdf_golden_failures/ for inspection.
If PyMuPDF is installed, the failing test also renders baseline/generated pages to PNG and writes visual diff artifacts into the same output tree.
If PyMuPDF is unavailable but pdftoppm is available on PATH, the tests use pdftoppm as a fallback renderer.
PyPI release page (maintainers): https://pypi.org/manage/project/aspose-note/releases/
🧩 Public API (what is considered supported)
The supported public entry points are aspose.note and aspose.note.saving.
Everything under aspose.note._internal is internal implementation detail and may change.
Below is the supported public surface across those entry points.
🧭 Document and traversal
-
Document(source=None, load_options=None)DisplayName: str | NoneCreationTime: datetime | None- iteration:
for page in doc: ... FileFormat -> FileFormat(best-effort)GetPageHistory(page) -> PageHistoryDetectLayoutChanges()(compatibility stub)Save(target, format_or_options=None)- supported:
SaveFormat.Pdfonly
- supported:
-
PageHistoryCurrent: PageCount: int,IsReadOnly: bool- iteration/indexing over historical revisions only
-
DocumentVisitor— base visitor for traversal:VisitDocumentStart/End,VisitPageStart/End,VisitTitleStart/End,VisitOutlineStart/End,VisitOutlineElementStart/End,VisitRichTextStart/End,VisitImageStart/End
-
NodeParentNodeDocument(property) — walk up to the rootDocumentAccept(visitor)
-
Container nodes (
Document,Page,Title,Outline,OutlineElement,Image,Table,TableRow,TableCell)FirstChild,LastChildAppendChildLast(node),AppendChildFirst(node),InsertChild(index, node),RemoveChild(node)GetEnumerator()/ iterationfor child in node: ...GetChildNodes(Type) -> list[Type]— recursive search by type
🏗️ Document structure
-
PageTitle: Title | NoneAuthor: str | NoneCreationTime: datetime | None,LastModifiedTime: datetime | NoneLevel: int | NoneClone(deep=False) -> Page(minimal clone)
-
TitleTitleText: RichText | NoneTitleDate: RichText | NoneTitleTime: RichText | None
-
OutlineHorizontalOffset,VerticalOffset,MaxWidthMaxHeight,MinWidth,ReservedWidth,IndentPositionDescendantsCannotBeMoved,LastModifiedTime
-
OutlineElementNumberList: NumberList | None
📝 Content
-
RichText(Node)Text: strTextRuns: list[TextRun]— formatted segmentsParagraphStyle: ParagraphStyleLength: intAlignment: HorizontalAlignment | NoneTags: list[NoteTag]Append(text, style=None) -> RichTextReplace(old_value, new_value) -> RichTextIndexOf(...) -> int
-
TextRunText: strStyle: TextStyle
-
ParagraphStyle- default paragraph-level text formatting used by
RichText.ParagraphStyle
- default paragraph-level text formatting used by
-
TextStyleIsBold/IsItalic/IsUnderline/IsStrikethrough/IsSuperscript/IsSubscript: boolIsHidden: bool,IsMathFormatting: boolFontName: str | None,FontSize: float | NoneFontColor: int | None,Highlight: int | NoneLanguage: int | NoneFontStyle: intIsHyperlink: bool,HyperlinkAddress: str | None
-
ImageFileName: str | None,Bytes: bytesWidth: float | None,Height: float | NoneAlternativeTextTitle: str | None,AlternativeTextDescription: str | NoneHyperlinkUrl: str | NoneTags: list[NoteTag]Replace(image) -> None— replace image contents
-
AttachedFile(Node)FileName: str | None,Bytes: bytesTags: list[NoteTag]
-
TableColumns: list[TableColumn]IsBordersVisible: boolTags: list[NoteTag]
-
TableColumnWidth: float | NoneLockedWidth: bool
-
TableRow,TableCell -
NoteTagLabel,Icon,Status,Highlight,CreationTime,CompletedTime,FontColorCreateYellowStar(),CreateQuestionMark()— convenience factories
-
NumberListFormat: str | None,NumberFormat: str | NoneFont: str | None,FontSize: float | None,FontColor: int | NoneIsBold: bool,IsItalic: bool,Restart: int | NoneGetNumberedListHeader(number) -> str
⚙️ Load/save options
-
LoadOptionsDocumentPassword: str | None(password/encryption is not supported)LoadHistory: bool
-
aspose.note.saving.SaveOptions(base)- abstract compatibility base type
SaveFormat: SaveFormatPageIndex: int,PageCount: int | None,FontsSubsystem
-
aspose.note.saving.PdfSaveOptions(SaveOptions)(subset)PageIndex: int,PageCount: int | NoneImageCompression,JpegQuality,PageSettings,PageSplittingAlgorithm
🔢 Enums
SaveFormat:PdfFileFormat:OneNote2010,OneNoteOnline,OneNote2007HorizontalAlignment:Left,Center,RightNodeType:Document,Page,Outline,OutlineElement,RichText,Image,Table,AttachedFile
🚨 Exceptions
FileCorruptedExceptionIncorrectDocumentStructureExceptionIncorrectPasswordExceptionUnsupportedFileFormatException(has aFileFormatfield)UnsupportedSaveFormatException
📚 MS OneNote Examples
More runnable scripts are available in examples/ (MS OneNote .one samples).
📝 Extract all text from an MS OneNote document
from aspose.note import Document, RichText
doc = Document("testfiles/FormattedRichText.one")
texts = [rt.Text for rt in doc.GetChildNodes(RichText)]
print("\n".join(texts))
🖼️ Save all images from an MS OneNote document to disk
from aspose.note import Document, Image
doc = Document("testfiles/3ImagesWithDifferentAlignment.one")
for i, img in enumerate(doc.GetChildNodes(Image), start=1):
name = img.FileName or f"image_{i}.bin"
with open(name, "wb") as f:
f.write(img.Bytes)
🏷️📄 Export an MS OneNote document to PDF
from aspose.note import Document, SaveFormat
from aspose.note.saving import PdfSaveOptions
doc = Document("testfiles/TagSizes.one")
opts = PdfSaveOptions(
JpegQuality=90,
)
doc.Save("out.pdf", opts)
📦 Load an MS OneNote document from a binary stream
from pathlib import Path
from aspose.note import Document
one_path = Path("testfiles/SimpleTable.one")
with one_path.open("rb") as f:
doc = Document(f)
print(doc.DisplayName)
print(len(list(doc)))
🧭 Traverse MS OneNote document structure (DOM) and print a simple outline
from aspose.note import Document, Page, Outline, OutlineElement, RichText
doc = Document("testfiles/SimpleTable.one")
for page in doc.GetChildNodes(Page):
title = page.Title.TitleText.Text if page.Title and page.Title.TitleText else "(no title)"
print(f"# {title}")
for outline in page.GetChildNodes(Outline):
for oe in outline.GetChildNodes(OutlineElement):
# OutlineElement may contain RichText, Table, Image, etc.
texts = [rt.Text for rt in oe.GetChildNodes(RichText)]
if texts:
print("-", " ".join(t.strip() for t in texts if t.strip()))
🔎 Count MS OneNote DOM nodes with DocumentVisitor
from aspose.note import Document, DocumentVisitor, Page, Image, RichText
class Counter(DocumentVisitor):
def __init__(self) -> None:
self.pages = 0
self.rich_texts = 0
self.images = 0
def VisitPageStart(self, page: Page) -> None: # noqa: N802
self.pages += 1
def VisitRichTextStart(self, rich_text: RichText) -> None: # noqa: N802
self.rich_texts += 1
def VisitImageStart(self, image: Image) -> None: # noqa: N802
self.images += 1
doc = Document("testfiles/3ImagesWithDifferentAlignment.one")
counter = Counter()
doc.Accept(counter)
print(counter.pages, counter.rich_texts, counter.images)
🔗 Extract hyperlinks from formatted text in an MS OneNote document
from aspose.note import Document, RichText
doc = Document("testfiles/FormattedRichText.one")
for rt in doc.GetChildNodes(RichText):
for run in rt.TextRuns:
if run.Style.IsHyperlink and run.Style.HyperlinkAddress:
print(run.Text, "->", run.Style.HyperlinkAddress)
🏷️ Inspect MS OneNote tags (NoteTag) across the document
from aspose.note import Document, RichText, Image, Table
doc = Document("testfiles/TagSizes.one")
def dump_tags(kind: str, tags) -> None:
for t in tags:
print(kind, "tag:", t.Label, t.Icon)
for rt in doc.GetChildNodes(RichText):
dump_tags("RichText", rt.Tags)
for img in doc.GetChildNodes(Image):
dump_tags("Image", img.Tags)
for tbl in doc.GetChildNodes(Table):
dump_tags("Table", tbl.Tags)
🧱 Work with tables in an MS OneNote document (rows/cells)
from aspose.note import Document, Table, TableRow, TableCell, RichText
doc = Document("testfiles/SimpleTable.one")
for table in doc.GetChildNodes(Table):
print("Table columns:", [column.Width for column in table.Columns])
for row_index, row in enumerate(table.GetChildNodes(TableRow), start=1):
cells = row.GetChildNodes(TableCell)
values: list[str] = []
for cell in cells:
cell_text = " ".join(rt.Text for rt in cell.GetChildNodes(RichText)).strip()
values.append(cell_text)
print(f"Row {row_index}:", values)
📎 Extract attached files from an MS OneNote document
from aspose.note import Document, AttachedFile
doc = Document("testfiles/OnePageWithFile.one")
for i, af in enumerate(doc.GetChildNodes(AttachedFile), start=1):
name = af.FileName or f"attachment_{i}.bin"
with open(name, "wb") as f:
f.write(af.Bytes)
print("saved:", name)
🔢 Inspect numbered lists in an MS OneNote document
from aspose.note import Document, OutlineElement
doc = Document("testfiles/NumberedListWithTags.one")
for oe in doc.GetChildNodes(OutlineElement):
nl = oe.NumberList
if nl is None:
continue
print(
"format=", nl.Format,
"number_format=", nl.NumberFormat,
"restart=", nl.Restart,
)
⚠️ Current limitations
- The implementation focuses on reading
.oneand building a DOM; writing back to.oneis not implemented. DocumentPassword/ encrypted documents are not supported (raisesIncorrectPasswordException).- Saving formats other than PDF (HTML/images/ONE) are declared for compatibility but not implemented.
🌐 Other platforms (official Aspose.Note)
If you need the full-featured Aspose product (writing/conversion, broader compatibility, etc.), see the official libraries:
-
Aspose.Note for .NET
- Product: https://products.aspose.com/note/net/
- Documentation: https://docs.aspose.com/note/net/
-
Aspose.Note for Java
- Product: https://products.aspose.com/note/java/
- Documentation: https://docs.aspose.com/note/java/
🛠️ Development
Run tests:
python -m pip install -e ".[pdf]"
python -m pytest -q
Third-party license notices (e.g., ReportLab used for PDF export) are in THIRD_PARTY_NOTICES.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aspose_note-26.3.2.tar.gz.
File metadata
- Download URL: aspose_note-26.3.2.tar.gz
- Upload date:
- Size: 67.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55353b7b225c888acc7d3f467ee6e32d54a7dc619c1c3a92098e7ebaee9b7332
|
|
| MD5 |
1c11424237d85acbdc732edfa96a28d9
|
|
| BLAKE2b-256 |
fca3cae50f051fab05b4f8d2febeaf487778a79169f68980e710215def1c98b2
|
File details
Details for the file aspose_note-26.3.2-py3-none-any.whl.
File metadata
- Download URL: aspose_note-26.3.2-py3-none-any.whl
- Upload date:
- Size: 51.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04989da92610462db314f0d865e959ab9ada5667a73525acc606e823a6e8beb5
|
|
| MD5 |
46e55ffd3f0c2a261357f9b46fcd1eed
|
|
| BLAKE2b-256 |
5e6987cc298ea4b0486230dcdef0f42c6541332992b1d8c9f27f213e20b0a6ab
|