Skip to main content

officeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.

Project description

officeextractor

Test Status Build Status Coverage Status
Version Info PyPI Version PyPI Downloads
Compatibility Python Versions
Style Code Style: Black pre-commit

About

officeextractor is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).


Supported File Types

Supported File Types Supported Media Formats
Microsoft Word docx, docm, dotm, dotx images
Microsoft Excel xlsx, xlsb, xlsm, xltm, xltx images
Microsoft PowerPoint potx, ppsm, ppsx, pptm, pptx, potm images, video & audio
LibreOffice Writer odt, ott images
LibreOffice Calc ods, ots images
LibreOffice Impress odp, otp, odg images
NOTE: Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

Installation

pip install officeextractor

Usage

>>> import officeextractor

>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")

4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png

1 media file extracted from Folder/File2.xlsx:
- 1 png
Parameters

officeextractor.extract(src, dest, log=True)

src : str, list of str or tuple of str

Either a single file (string) or several files (list/tuple of strings) as relative or full path.

dest : str

Output directory as relative or full path. If the directory doesn't exist, it will be created.

log : bool, optional

Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.


Release Notes

Can be found here on GitHub


Licence

GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

officeextractor-0.1.2.tar.gz (19.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

officeextractor-0.1.2-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file officeextractor-0.1.2.tar.gz.

File metadata

  • Download URL: officeextractor-0.1.2.tar.gz
  • Upload date:
  • Size: 19.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for officeextractor-0.1.2.tar.gz
Algorithm Hash digest
SHA256 88ad0272c299d72cb5b5c0e6e3503c2ad0e01f7e8d453cda95923acb6306dcca
MD5 0fda1e0704c15113da3c426ef7620e64
BLAKE2b-256 6149af188c037d2b9b4f37bf83278b86fd861f8b835cd1ed281ed61ec328331a

See more details on using hashes here.

File details

Details for the file officeextractor-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for officeextractor-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 049fef2bdc0df4e3c8f76b7278dc74cf77c8232667362c6d82648ccacca120d9
MD5 cf39502bf1f03eb3477da08a0dd44ee9
BLAKE2b-256 2746f0c74a2718b18b1bda68529d3c2d8c3a06e49c0753266c346594eb3a11f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page