GroupDocs.Parser for Python via .NET is a powerful API designed for advanced document parsing, offering extensive features like text extraction, metadata retrieval, and image extraction across various document formats, including PDFs, Word, Excel, and PowerPoint.

These details have not been verified by PyPI

Project links

Homepage

Project description

Advanced Document Parsing API for Python via .NET

GroupDocs.Parser for Python via .NET is a powerful on-premise document parsing library that lets you extract text, parser, images, attachments, barcodes and structured content from dozens of popular formats – including PDF, Word, Excel, PowerPoint, emails, archives, images and more.

You can embed GroupDocs.Parser into your own Python applications without installing any 3rd-party office suites. GroupDocs also provides free online apps built on top of the same APIs that allow users to parse PDF, Office and other documents right in the browser.

Document Parser API Features

GroupDocs.Parser for Python via .NET provides a single, unified API for advanced document parsing and data extraction:

Text extraction
- Extract text from PDF, Word, Excel, PowerPoint, e-books, emails and many other formats.
- Work in accurate or raw text modes depending on your scenario.
- Keep track of pages and logical blocks of text.
Preserve structure & formatting
- Retrieve formatted text with font styles, sizes and basic layout information.
- Analyze document structure – paragraphs, lists, headings, table cells, etc.
Text search
- Search for specific words or phrases in documents.
- Use advanced search options such as case sensitivity, whole-word matching or regular expressions.
OCR text extraction
- Extract text from scanned PDFs and raster images using OCR options.
- Combine OCR with spell-checking in supported environments for better recognition quality.
Parser extraction
- Read common parser properties like author, title, subject and keywords.
- Extract creation / modification dates and other technical properties.
- Retrieve custom fields such as invoice numbers or business IDs.
Image & attachment extraction
- Extract embedded images from Office documents, PDFs, e-books and more.
- Pull file attachments from PDFs and email messages.
- Extract barcodes from supported document and image formats.
Document structure analysis
- Parse tables, including rows, columns and individual cells.
- Detect text areas and content blocks for fine-grained extraction.
- Extract hyperlinks, bookmarks and table of contents (TOC) where supported.
PDF-specific parsing
- Extract text, images, parser and attachments from PDFs.
- Get PDF page count and PDF-specific document information.
- Work with bookmarks, forms and PDF portfolios.
Email parsing
- Extract sender, recipients, subject and body from emails.
- Get email parser and embedded attachments.
- Work with formats like MSG, EML, EMLX, PST and OST.
Spreadsheet parsing
- Extract text and data from Excel and other spreadsheet formats.
- Work with specific sheets, ranges or individual cells.
- Extract spreadsheet parser and images.
Presentation parsing
- Extract text, notes, images and parser from PowerPoint files.
- Work with slide-by-slide content, including shapes and notes.
Template-based data extraction
- Define parsing templates to extract structured fields (e.g. invoices, receipts).
- Use templates to describe positions of fields, tables and patterns.
- Apply your own parsing rules for domain-specific scenarios.
Advanced & batch features
- High-performance processing for large documents and document batches.
- Cross-platform support (Windows, Linux, macOS) via .NET runtime.
- Build scalable, secure parsing workflows in your Python applications.

Supported Document Formats

GroupDocs.Parser for Python via .NET supports a wide range of document families. Below is an overview of the most important ones.

Word Processing

DOC, DOT – Microsoft Word binary documents & templates
DOCX, DOCM, DOTX, DOTM – Office Open XML documents & templates
RTF – Rich Text Format
TXT – Plain text
ODT, OTT – OpenDocument text documents & templates

Typical operations: text extraction (accurate & raw), structured text parsing, text areas, parser, images, attachments, TOC, barcode scanning.

PDF

PDF – Portable Document Format

Operations: template-based parsing, accurate & raw text extraction, text areas, parser, images, attachments/containers, forms, TOC, barcode scanning.

Markup

XHTML – Extensible Hypertext Markup Language
MHTML – MIME HTML
MD – Markdown
XML – XML files

Operations: text extraction (including formatted text for supported types) and parser extraction.

eBook

CHM – Compiled HTML Help
EPUB – Digital e-book format
FB2 – FictionBook 2.0
MOBI, AZW3 – Mobile/Kindle formats

Operations: text extraction, structured text, parser, containers, TOC support for selected formats, barcode scanning for supported types.

Spreadsheets

XLS, XLT, XLSX, XLSM, XLSB
XLTX, XLTM
ODS, OTS – OpenDocument spreadsheets
CSV – Comma-Separated Values
XLA, XLAM – add-ins
NUMBERS – Apple iWork Numbers

Operations: text & data extraction, structured content, text areas, parser, images, containers/attachments.

Presentations

PPT, PPS, POT – binary PowerPoint
PPTX, PPTM, PPSX, PPSM, POTX, POTM – Office Open XML
ODP, OTP – OpenDocument presentations

Operations: slide text and notes, structured text, text areas, parser, images, attachments, TOC, barcode scanning.

Email

PST, OST – Outlook data files
EML, EMLX, MSG – email messages

Operations: email body text, parser (from/to/subject), attachments, images and containers.

Notes

ONE – Microsoft OneNote documents

Operations: text extraction and basic parser support.

Images

BMP, GIF, JP2, JPG/JPEG, PNG, TIF/TIFF
DICOM, DJVU, EMF, J2K, PS, PSD, SVG, SVGZ, WEBP, WMF

Operations: text extraction (for some formats via OCR), parser, barcode scanning (where supported).

Databases

ADO.NET-based data sources and supported database formats

Operations: text and structured data extraction using database-specific options.

Platform Independence

GroupDocs.Parser for Python via .NET can be used to build 32-bit and 64-bit applications for different operating systems, such as Windows, Linux and macOS, where a supported Python 3.x version is installed.

The parsing engine is powered by the same core technology as the GroupDocs.Parser .NET library, giving you production-ready performance and compatibility in Python environments.

Get Started

Ready to try GroupDocs.Parser for Python via .NET?

You can install the Python package from PyPI and reference it in your project. The exact package name and version may depend on the final distribution, but the flow will be similar to other GroupDocs Python via .NET libraries:

Install GroupDocs.Parser for Python via .NET from PyPI

pip install groupdocs.parser-net

Upgrade to the latest version

pip install --upgrade groupdocs.parser-net

Download Package from Official Website

To download the GroupDocs.Parser package for your operating system, please visit the official GroupDocs Releases website. Currently, four OS-specific packages are available:

Windows 64-bit: Package name ends with amd64.whl
Windows 32-bit: Package name ends with win32.whl
Linux 64-bit: Package name ends with linux1_x86_64.whl
macOS Intel Silicon: Package name ends with macosx_10_14_x86_64.whl

Choose the appropriate package based on your system's architecture.

Quick Text Extraction Example

The snippet below demonstrates how a typical usage scenario for extracting text from a PDF document might look in Python.

import groupdocs.parser as gp

def run():
    # Load the PDF document
    with gp.Parser("sample.pdf") as parser:
        # Extract text from the document
        text = parser.GetText()

        # Output the extracted text
        print(text)

Extract Images from a Word Document

This example shows how to iterate over images embedded in a Word document and save them to disk.

import groupdocs.parser as gp

def run():
    # Load the Word document
    with gp.Parser("sample.docx") as parser:
        # Get images from the document
        images = parser.GetImages()

        # Save each image to a PNG file
        index = 1
        for image in images:
            image.Save(f"image{index}.png")
            index += 1

GroupDocs.Parser for Python requires you to use python programming language. For Node.js, Java and .NET languages, we recommend you get GroupDocs.Parser for Node.js, GroupDocs.Parser for Java and GroupDocs.Parser for .NET, respectively.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

25.12

Feb 6, 2026

0.0.0

Dec 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

groupdocs_parser_net-25.12-py3-none-win_amd64.whl (218.6 MB view details)

Uploaded Feb 9, 2026 Python 3Windows x86-64

groupdocs_parser_net-25.12-py3-none-win32.whl (213.5 MB view details)

Uploaded Feb 9, 2026 Python 3Windows x86

groupdocs_parser_net-25.12-py3-none-manylinux1_x86_64.whl (231.5 MB view details)

Uploaded Feb 6, 2026 Python 3

groupdocs_parser_net-25.12-py3-none-macosx_11_0_arm64.whl (228.8 MB view details)

Uploaded Feb 6, 2026 Python 3macOS 11.0+ ARM64

groupdocs_parser_net-25.12-py3-none-macosx_10_14_x86_64.whl (234.5 MB view details)

Uploaded Feb 6, 2026 Python 3macOS 10.14+ x86-64

File details

Details for the file groupdocs_parser_net-25.12-py3-none-win_amd64.whl.

File metadata

Download URL: groupdocs_parser_net-25.12-py3-none-win_amd64.whl
Upload date: Feb 9, 2026
Size: 218.6 MB
Tags: Python 3, Windows x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for groupdocs_parser_net-25.12-py3-none-win_amd64.whl
Algorithm	Hash digest
SHA256	`d74a780079b4cbc27df1b694903f4801e9acf305ed3e4f3b79a8d002ec1e2f38`
MD5	`815fa7baa92d978a9222ab7ad02f8833`
BLAKE2b-256	`a9fe9f06dabeab3bf7bbb50ae86001b229afd44fbb9185d6326c7625049eccb1`

See more details on using hashes here.

File details

Details for the file groupdocs_parser_net-25.12-py3-none-win32.whl.

File metadata

Download URL: groupdocs_parser_net-25.12-py3-none-win32.whl
Upload date: Feb 9, 2026
Size: 213.5 MB
Tags: Python 3, Windows x86
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.6

File hashes

Hashes for groupdocs_parser_net-25.12-py3-none-win32.whl
Algorithm	Hash digest
SHA256	`c13aa5d48266684e238da69e81e5f92fc13b3178176c824b8750f8c84702a6d8`
MD5	`74891db30797f4f456be2d01075f3993`
BLAKE2b-256	`5adb8db77dd1c7c18a0cf755f6f7a02b6e9e13131aa6aaffe14bdfa0a6314db5`

See more details on using hashes here.

File details

Details for the file groupdocs_parser_net-25.12-py3-none-manylinux1_x86_64.whl.

File metadata

Download URL: groupdocs_parser_net-25.12-py3-none-manylinux1_x86_64.whl
Upload date: Feb 6, 2026
Size: 231.5 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for groupdocs_parser_net-25.12-py3-none-manylinux1_x86_64.whl
Algorithm	Hash digest
SHA256	`d4d5bcef5ab6ffdb651c3fbe1ba390be544af302de36b49ebc623bf987af5072`
MD5	`c82e5b9030e88045b2c0ba2063f9503d`
BLAKE2b-256	`15beba6dd4207021ff67363d4e7e07d02c28a34ad5dad024d51ab9996271df2a`

See more details on using hashes here.

File details

Details for the file groupdocs_parser_net-25.12-py3-none-macosx_11_0_arm64.whl.

File metadata

Download URL: groupdocs_parser_net-25.12-py3-none-macosx_11_0_arm64.whl
Upload date: Feb 6, 2026
Size: 228.8 MB
Tags: Python 3, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for groupdocs_parser_net-25.12-py3-none-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`6eb970b01500df654ba98d88157f236241bb4c6fc5cc4392d59f2a7111e45283`
MD5	`822f6b55a420a39cd2e30e39e3aa0fea`
BLAKE2b-256	`e59a940c05f539d9ed4bc07c6c32818e6584cb675f5acb184217b444c251886c`

See more details on using hashes here.

File details

Details for the file groupdocs_parser_net-25.12-py3-none-macosx_10_14_x86_64.whl.

File metadata

Download URL: groupdocs_parser_net-25.12-py3-none-macosx_10_14_x86_64.whl
Upload date: Feb 6, 2026
Size: 234.5 MB
Tags: Python 3, macOS 10.14+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for groupdocs_parser_net-25.12-py3-none-macosx_10_14_x86_64.whl
Algorithm	Hash digest
SHA256	`388c40cbb61aebbdb0657ec6db6dc0bc1a69f178467a0288ddef731f757e5e46`
MD5	`17934f649c9c1ba530feef93ad23d5af`
BLAKE2b-256	`31a31a7683c41ee42bda4ae7439ff83d4f7435645eeeaffd4e7327e4646231d5`

See more details on using hashes here.

groupdocs-parser-net 25.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Advanced Document Parsing API for Python via .NET

Document Parser API Features

Supported Document Formats

Word Processing

PDF

Markup

eBook

Spreadsheets

Presentations

Email

Notes

Archives

Images

Databases

Platform Independence

Get Started

Install GroupDocs.Parser for Python via .NET from PyPI

Upgrade to the latest version

Quick Text Extraction Example

Extract Images from a Word Document

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes