Skip to main content

A pure python based utility to extract text from PPT files.

Project description

ppt2txt

A pure python based utility to extract text from PPT files.

The code is based on the official documentation for MS-PPT files available at https://msopenspecs.azureedge.net/files/MS-PPT/%5bMS-PPT%5d.pdf.

How to install?

pip install ppt2txt

How to run?

  • From command line:
ppt2txt file.ppt -o output_dir
  • From python:
import ppt2txt

# extract content
parsed_ppt_dict = ppt2txt.process("file.ppt") 

Output

parsed_ppt_dict is a dictionary with the following structure:

{
    "filename": "file.ppt",
    "slides": 4,
    "content": {
        "0": "Text from the first record",
        "1": "Text from the second record"
    }
}

where:

  • filename is the name of the input file
  • slides is the number of slides
  • content is a dictionary containing an element for each record of type text found in the document

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ppt2txt-0.1.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

ppt2txt-0.1.0-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file ppt2txt-0.1.0.tar.gz.

File metadata

  • Download URL: ppt2txt-0.1.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ppt2txt-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bc9401f1859475a09657a453577063f396d0157b1d379d755ac6e49fdafd1e8a
MD5 ed21311babf234a0a42573c77e506791
BLAKE2b-256 6dc25a4b032934eb4c5518f269f9b18ef6453e0fe877c8642edda541013089ee

See more details on using hashes here.

File details

Details for the file ppt2txt-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ppt2txt-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for ppt2txt-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3c5535d9a3af1c048814bffbefef62677dfc5c9e051a1e690db2b011448c9198
MD5 2d8d6a11cfbfb78357857237e82e71d3
BLAKE2b-256 713c9d8e82b6ea753f35c543a615d0dd27521688669d7ea5a7ce6850a25070e0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page