Skip to main content

Extract images and videos from PowerPoint presentations

Project description

Extract Media PowerPoint

PyPI version Python License: MIT

A lightweight Python library to extract images and videos from PowerPoint (.pptx) presentations, with optional filtering by media type, file extension, and automatic organization by slide number.

Features

  • Extract all embedded media (images and videos) from a .pptx file
  • Extract filtered media by type (image or video) and custom extensions
  • Output organized in subdirectories by slide number
  • Supports custom output directories
  • Built on top of python-pptx

Requirements

  • Python >= 3.10
  • lxml >= 6.0.2
  • Pillow >= 12.1.1
  • python-pptx >= 1.0.2

Installation

pip install Extract-Media-PowerPoint

Quick Start

Extract all media (no filtering)

from extract import PowerPointMediaExtractor

extractor = PowerPointMediaExtractor(
    filepath="presentation.pptx",
    output_dir="output"
)
count = extractor.extract_all_media()
print(f"Extracted {count} media files")

Files are saved under output/ppt/media/.

Extract images filtered by extension

from extract import PowerPointMediaExtractor

extractor = PowerPointMediaExtractor(
    filepath="presentation.pptx",
    media_type="image",
    output_dir="output",
    extensions=["png", "jpg"]
)
count = extractor.extract_filtered_media()

Files are saved under output/<slide_number>/ppt/media/.

Extract videos

from extract import PowerPointMediaExtractor

extractor = PowerPointMediaExtractor(
    filepath="presentation.pptx",
    media_type="video",
    output_dir="output",
    extensions=["mp4", "avi"]
)
count = extractor.extract_filtered_media()

API Reference

PowerPointMediaExtractor

PowerPointMediaExtractor(
    filepath: str | Path,
    media_type: str = "image",      # "image" or "video"
    output_dir: str | Path = "temp",
    extensions: list[str] | None = None,
)
Parameter Type Default Description
filepath str | Path Path to the .pptx file
media_type str "image" Media type to extract: "image" or "video"
output_dir str | Path "temp" Directory where media will be saved
extensions list[str] | None None Allowed extensions (defaults to all for the media type)

Default extensions:

Type Default extensions
image png, jpeg, jpg, bmp, svg
video mp4, avi, mpg, mpeg, wmv

Methods

Method Returns Description
extract_all_media() int Extracts all embedded media, returns count
extract_filtered_media() int Extracts media filtered by type/extension, organized by slide

MediaInfo

Dataclass representing a media item found in the presentation.

@dataclass
class MediaInfo:
    shape_id: int
    filename: str
    slide_number: int

Logging

The library uses Python's standard logging module under the logger name extract.service. To see output:

import logging
logging.basicConfig(level=logging.INFO)

License

MIT — see LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pptmedia-1.0.1.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pptmedia-1.0.1-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file pptmedia-1.0.1.tar.gz.

File metadata

  • Download URL: pptmedia-1.0.1.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pptmedia-1.0.1.tar.gz
Algorithm Hash digest
SHA256 95cc5952bc9f5bbbd86d1d4605d8f51ca36398cc61af87325f4eb7222af8b62f
MD5 91379217e33c1117cde4cc12f3190a44
BLAKE2b-256 444e5958aaadfc1035853aa5a33050c7b0091c5752cbdec87c145d33522afea9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pptmedia-1.0.1.tar.gz:

Publisher: ci.yml on madyel/pptmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pptmedia-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pptmedia-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 5.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pptmedia-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 89abb310dcb14ee2aef2ea3907cefac1ee6c469f667583bd87ce626eb8cff384
MD5 5266df085af66384762c0297abe49094
BLAKE2b-256 5d90c107fd1c88ee4858059edbe444c65c3fbdefff6d6d8c46c583433d6c832b

See more details on using hashes here.

Provenance

The following attestation bundles were made for pptmedia-1.0.1-py3-none-any.whl:

Publisher: ci.yml on madyel/pptmedia

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page