Skip to main content

Extract images and videos from PowerPoint presentations

Project description

Extract Media PowerPoint

PyPI version Python License: MIT

A lightweight Python library to extract images and videos from PowerPoint (.pptx) presentations, with optional filtering by media type, file extension, and automatic organization by slide number.

Features

  • Extract all embedded media (images and videos) from a .pptx file
  • Extract filtered media by type (image or video) and custom extensions
  • Output organized in subdirectories by slide number
  • Supports custom output directories
  • Built on top of python-pptx

Requirements

  • Python >= 3.10
  • lxml >= 6.0.2
  • Pillow >= 12.1.1
  • python-pptx >= 1.0.2

Installation

pip install pptmedia

Quick Start

Extract all media (no filtering)

from extract import PowerPointMediaExtractor

extractor = PowerPointMediaExtractor(
    filepath="presentation.pptx",
    output_dir="output"
)
count = extractor.extract_all_media()
print(f"Extracted {count} media files")

Files are saved under output/ppt/media/.

Extract images filtered by extension

from extract import PowerPointMediaExtractor

extractor = PowerPointMediaExtractor(
    filepath="presentation.pptx",
    media_type="image",
    output_dir="output",
    extensions=["png", "jpg"]
)
count = extractor.extract_filtered_media()

Files are saved under output/<slide_number>/ppt/media/.

Extract videos

from extract import PowerPointMediaExtractor

extractor = PowerPointMediaExtractor(
    filepath="presentation.pptx",
    media_type="video",
    output_dir="output",
    extensions=["mp4", "avi"]
)
count = extractor.extract_filtered_media()

API Reference

PowerPointMediaExtractor

PowerPointMediaExtractor(
    filepath: str | Path,
    media_type: str = "image",      # "image" or "video"
    output_dir: str | Path = "temp",
    extensions: list[str] | None = None,
)
Parameter Type Default Description
filepath str | Path Path to the .pptx file
media_type str "image" Media type to extract: "image" or "video"
output_dir str | Path "temp" Directory where media will be saved
extensions list[str] | None None Allowed extensions (defaults to all for the media type)

Default extensions:

Type Default extensions
image png, jpeg, jpg, bmp, svg
video mp4, avi, mpg, mpeg, wmv

Methods

Method Returns Description
extract_all_media() int Extracts all embedded media, returns count
extract_filtered_media() int Extracts media filtered by type/extension, organized by slide

MediaInfo

Dataclass representing a media item found in the presentation.

@dataclass
class MediaInfo:
    shape_id: int
    filename: str
    slide_number: int

Logging

The library uses Python's standard logging module under the logger name extract.service. To see output:

import logging
logging.basicConfig(level=logging.INFO)

License

MIT — see LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pptmedia-1.0.2.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pptmedia-1.0.2-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file pptmedia-1.0.2.tar.gz.

File metadata

  • Download URL: pptmedia-1.0.2.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pptmedia-1.0.2.tar.gz
Algorithm Hash digest
SHA256 68ed5b547e292cf6321746c1f991e3f486bbf004733ad83942d59c910d0da9b5
MD5 fddef9a989379594f3c43cd533ca942f
BLAKE2b-256 133bb88ea811fcacd437ec4d459079d9e305664b67f5ebe29a86fbfa0f126fcd

See more details on using hashes here.

Provenance

The following attestation bundles were made for pptmedia-1.0.2.tar.gz:

Publisher: ci.yml on madyel/pptmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pptmedia-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: pptmedia-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pptmedia-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7947ffd933fc6cfc62e7dd90a56522262e1b87e20df4c150089fbca2eb560ee1
MD5 3a198a80d07cbfa4a0a3b34b9893d58d
BLAKE2b-256 192bf3b573c52c4ad3a289b5ebd01c967b432967ed09a00cb4f69344067bd717

See more details on using hashes here.

Provenance

The following attestation bundles were made for pptmedia-1.0.2-py3-none-any.whl:

Publisher: ci.yml on madyel/pptmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page