Skip to main content

officeextractor extracts media files (images, videos, music) from Microsoft Office and LibreOffice files.

Project description

officeextractor

Test Status Build Status Coverage Status
Version Info PyPI Version PyPI Downloads
Compatibility Python Versions
Style Code Style: Black pre-commit

About

officeextractor is a Python library to extract media files like images, audio and video from office documents (Microsoft Office & LibreOffice).


Supported File Types

Supported File Types Supported Media Formats
Microsoft Word docx, docm, dotm, dotx images
Microsoft Excel xlsx, xlsb, xlsm, xltm, xltx images
Microsoft PowerPoint potx, ppsm, ppsx, pptm, pptx, potm images, video & audio
LibreOffice Writer odt, ott images
LibreOffice Calc ods, ots images
LibreOffice Impress odp, otp, odg images
NOTE: Microsoft Office 2003 files (doc, dot, xls, xlt, ppt, pot) are not supported.

Installation

pip install officeextractor

Usage

>>> import officeextractor

>>> officeextractor.extract(src=("File1.docx", "Folder/File2.xlsx"), dest="Path/To/Output/Folder")

4 media files extracted from File1.docx:
- 2 jpeg
- 1 gif
- 1 png

1 media file extracted from Folder/File2.xlsx:
- 1 png
Parameters

officeextractor.extract(src, dest, log=True)

src : str, list of str or tuple of str

Either a single file (string) or several files (list/tuple of strings) as relative or full path.

dest : str

Output directory as relative or full path. If the directory doesn't exist, it will be created.

log : bool, optional

Whether logging should be actived or not. If True, print a summary of the extraction. Default is True.


Release Notes

Can be found here on GitHub


Licence

GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

officeextractor-0.1.1.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

officeextractor-0.1.1-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file officeextractor-0.1.1.tar.gz.

File metadata

  • Download URL: officeextractor-0.1.1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.9.0

File hashes

Hashes for officeextractor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6cf12259e965ab8a22576ff02263ebca3dd404faf099041c12c30d92a581a380
MD5 8ae52e1e12ed552cae132d0a35c42ec0
BLAKE2b-256 324ef77c5ad49929c3147cced40288fe6c38622b33ceff96ef83c9838991fb1b

See more details on using hashes here.

File details

Details for the file officeextractor-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: officeextractor-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.54.0 CPython/3.9.0

File hashes

Hashes for officeextractor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f8fe70bcf0472c468635c49e2559c41c79a0e9a9ed5641807a1cbede90f8d3d3
MD5 273813d3128869f53a1e7fb8fe4133f1
BLAKE2b-256 211ab71624ec97fa5b58d621ff8b68229e50cfd8a011a264f5a50acd37d36e1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page