Skip to main content

Aspose.HTML for Python via .NET is a powerful API for Python that provides a headless browser functionality, allowing you to work with HTML documents in a variety of ways. With this API, you can easily create new HTML documents or open existing ones from different sources. Once you have the document, you can perform various manipulation operations, such as removing and replacing HTML nodes.

Project description

Process & Manipulate HTML via Python API

banner

Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support

Aspose.HTML for Python via .NET is a powerful API for Python that provides headless browser functionality, allowing you to work with HTML documents. With this API, you can easily create new HTML documents or open existing ones from different sources. Once you have the document, you can perform various manipulation operations, such as removing and replacing HTML nodes, rendering, and converting HTML to other popular formats, etc.

HTML API Features

The following are some popular features of Aspose.HTML for Python via .NET:

General Features

  • Create, Load, and Read Documents. Create, load, and modify HTML, XHTML, Markdown, or SVG documents with full control over elements, attributes, and structure using a powerful DOM-based API.
  • Load EPUB and MHTML file Formats. Open, read, and convert EPUB and MHTML documents with full support for their internal structure and linked resources.
  • Edit Documents. Insert, remove, clone, or replace HTML elements at any level of the DOM tree for granular control over content.
  • Save HTML Documents. Save documents along with all linked resources like CSS, fonts, and images using customizable saving options.
  • Navigate HTML. Navigate through documents using either NodeIterator or TreeWalker.
  • Sandboxing. Configure a Sandbox environment that is independent of the execution machine, ensuring a secure and isolated environment for running and testing.

Data Extraction

  • DOM Traversal. Navigate and manipulate the DOM tree using W3C-compliant traversal interfaces to inspect and retrieve content from HTML documents.
  • XPath Queries. Perform high-performance XPath queries to find and extract target content from large HTML documents.
  • CSS Selector and JavaScript. Use CSS selector queries and JavaScript execution to dynamically locate and extract specific elements.
  • Extract CSS Styling Information. Retrieve and analyze inline styles, embedded <style> blocks, and external stylesheets within HTML documents.
  • Extract any Data from HTML Documents. Text, attributes, form values, metadata, tables, links, or media elements: Aspose.HTML for Python via .NET enables the accurate and efficient extraction of any content for processing, analysis, or editing.

Conversion and Rendering

  • Convert Documents. Convert HTML, XHTML, SVG, MHTML, MD, and EPUB files to a wide range of formats, including PDF, XPS, DOCX, and different image formats (PNG, JPEG, BMP, TIFF, and GIF).
  • Custom Conversion Settings. Adjust page size, resolution, stylesheets, resource management, script execution, and other settings during conversion to fine-tune the output.
  • Markdown Support. Convert HTML to Markdown or vice versa for content migration and Markdown-based workflows.
  • Timeout Control. Set and control the timeout for the rendering process.

Advanced HTML Features

  • Monitor DOM Changes. Use MutationObserver to monitor DOM modifications.
  • HTML Templates. Populate HTML documents with external data sources such as XML and JSON.
  • Output Streams. Support for both single (PDF, XPS) and multiple (image formats) output file streams.
  • Check Web Accessibility. Check web documents against WCAG standards using built-in validators and accessibility rule sets.

Supported File Formats

Format Description Load Save
HTML HyperText Markup Language format ✔️ ✔️
XHTML eXtensible HyperText Markup Language format ✔️ ✔️
MHTML MIME HTML format ✔️ ✔️
EPUB E-book file format ✔️
SVG Scalable Vector Graphics format ✔️ ✔️
MD Markdown markup language format ✔️ ✔️
PDF Portable Document Format ✔️
XPS XML Paper Specification format ✔️
DOCX Microsoft Word Open XML document format ✔️
TIFF Tagged Image File Format ✔️
JPEG Joint Photographic Experts Group format ✔️
PNG Portable Network Graphics format ✔️
BMP Bitmap Picture format ✔️
GIF Graphics Interchange Format ✔️
WEBP Modern image format providing both lossy and lossless compression ✔️

Platform Independence

Aspose.HTML for Python via .NET can be used to develop applications for a vast range of operating systems, such as Windows, where Python 3.5 or later is installed. You can build both 32-bit and 64-bit Python applications.

Get Started

Are you ready to give Aspose.HTML for Python via .NET a try?

Simply run pip install aspose-html-net from the Console to fetch the package. If you already have Aspose.HTML for Python via .NET and want to upgrade the version, please run pip install --upgrade aspose-html-net to get the latest version.

You can run the following snippets in your environment to see how Aspose.HTML works, or check out the GitHub Repository or Aspose.HTML for Python via .NET Documentation for other common use cases.

Create a New HTML Document

If you want to create an HTML document programmatically from scratch, use the parameterless constructor:

from aspose.html import *

# Initialize an empty HTML document
with HTMLDocument() as document:
    # Create a text node and add it to the document
    text = document.create_text_node("Hello, World!")
    document.body.append_child(text)

    # Save the document to a file
    document.save("create-new-document.html")

Source - Create a Document in Python

Extract Images from Website

Here is an example of how to use Aspose.HTML for Python via .NET to find images specified by the <img> element:

import os
from aspose.html import *
from aspose.html.net import *

# Open a document you want to extract images from
with HTMLDocument("https://docs.aspose.com/svg/net/drawing-basics/svg-shapes/") as document:

    # Collect all <img> elements
    images = document.get_elements_by_tag_name("img")

    # Create a distinct collection of relative image URLs
    urls = set(element.get_attribute("src") for element in images)

    # Create absolute image URLs
    abs_urls = [Url(url, document.base_uri) for url in urls]

    for url in abs_urls:
        # Create an image request message
        request = RequestMessage(url)

        # Extract image
        response = document.context.network.send(request)

        # Check whether a response is successful
        if response.is_success:
            # Parse the URL to get the file name
            file_name = os.path.basename(url.pathname)

            # Save image to the local file system
            with open(os.path.join(file_name), 'wb') as file:
                file.write(response.content.read_as_byte_array())

Source - Extract Images From Website in Python

HTML to PDF in one line of code

Aspose.HTML for Python via .NET allows you to convert HTML to PDF, XPS, Markdown, MHTML, PNG, JPEG, and other file formats. The following snippet demonstrates the conversion from HTML to PDF literally with a single line of code!

from aspose.html.converters import *
from aspose.html.saving import *

# Convert HTML to PDF
Converter.convert_html("document.html", PdfSaveOptions(), "document.pdf")

Source - Convert HTML to PDF in Python

Convert HTML to Markdown (MD)

The following snippet demonstrates the conversion from HTML to GIT-based Markdown (MD) Format:

from aspose.html.converters import *
from aspose.html.saving import *

# Prepare HTML code and save it to the file
code = "<h1>Header 1</h1>" \
         "<h2>Header 2</h2>" \
         "<p>Hello World!!</p>"
with open('document.html', 'w', encoding="utf-8") as f:
         f.write(code)
         f.close()
         # Call convert_html method to convert HTML to Markdown.
         Converter.convert_html('document.html', MarkdownSaveOptions.git, 'output.md')

Source - Creating an HTML Document

Convert EPUB to PDF using SaveOptions

The PdfSaveOptions class provides numerous properties that give you full control over a wide range of parameters and improve the process of converting EPUB to PDF format. In the example, we use the page_setup, jpeg_quality, and css.media_type properties:

from aspose.html.converters import *
from aspose.html.saving import *
from aspose.html.drawing import *

# Open an existing EPUB file for reading
with open("input.epub", 'rb') as stream:
    # Create an instance of PdfSaveOptions
    options = PdfSaveOptions()
    options.page_setup.any_page = Page(Size(500, 500), Margin(20, 20, 10, 10))
    options.css.media_type.PRINT
    options.jpeg_quality = 10

    # Convert EPUB to PDF
    Converter.convert_epub(stream, options, "output.pdf")

Source - Convert EPUB to PDF in Python

Product Page | Docs | Demos | API Reference | Examples | Blog | Search | Free Support | Temporary License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

aspose_html_net-25.10.0-py3-none-win_amd64.whl (58.9 MB view details)

Uploaded Python 3Windows x86-64

aspose_html_net-25.10.0-py3-none-win32.whl (51.6 MB view details)

Uploaded Python 3Windows x86

aspose_html_net-25.10.0-py3-none-manylinux1_x86_64.whl (83.0 MB view details)

Uploaded Python 3

aspose_html_net-25.10.0-py3-none-macosx_11_0_arm64.whl (56.2 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

aspose_html_net-25.10.0-py3-none-macosx_10_14_x86_64.whl (70.1 MB view details)

Uploaded Python 3macOS 10.14+ x86-64

File details

Details for the file aspose_html_net-25.10.0-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for aspose_html_net-25.10.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 5ecab599dbf5395cce6bae018e6953adb49d77ad7489ae157049201ae084b562
MD5 0673c87419f26332a5eaab36cdb1cfd3
BLAKE2b-256 b210baa72298e109e0f9a2e0bc9d88ff83fcb206dd53cfa1403978029e38e0b4

See more details on using hashes here.

File details

Details for the file aspose_html_net-25.10.0-py3-none-win32.whl.

File metadata

File hashes

Hashes for aspose_html_net-25.10.0-py3-none-win32.whl
Algorithm Hash digest
SHA256 fb4f6b2ac79fdd73cef724c719026b25fb9692cb744a05016db17d2957c0163d
MD5 c20729bb8e282046790ae64f0cd38b63
BLAKE2b-256 74fe8693ba094f88470f3a26c0d6d56537d746f5d08dd87da0e06a63d6803082

See more details on using hashes here.

File details

Details for the file aspose_html_net-25.10.0-py3-none-manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for aspose_html_net-25.10.0-py3-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 1a25d12884ecedccee665803555a66d171ba5e93ae6ba6e7f3a1c82c6af48254
MD5 29fc022cbe9850c00a42fea9a580a168
BLAKE2b-256 1703e2184970438a39561a9f8e8a224bbf15133c9cff35af10236dc6a7d4e391

See more details on using hashes here.

File details

Details for the file aspose_html_net-25.10.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for aspose_html_net-25.10.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 124154fb8724db8c5415bd641e0d8ae6b69f753a740585ce40b4933693423708
MD5 4a5123720d718bc494903c2ef3ffdb2a
BLAKE2b-256 93cc50fd0f64c0d817b3a07e9f97233774233f76f5af4ef510e92d7b1b148aff

See more details on using hashes here.

File details

Details for the file aspose_html_net-25.10.0-py3-none-macosx_10_14_x86_64.whl.

File metadata

File hashes

Hashes for aspose_html_net-25.10.0-py3-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 20a44a1eca931bc42966162c5d6d781bc05e172f62526c7731548f486e1548b1
MD5 65052c19f057fa8ef72441a5c0766b9a
BLAKE2b-256 5247a37ea296aafe72319021aa96757ae14b8d7eb26db2afe5face11741ab90f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page