A pure python based utility to extract text from PPT files.
Project description
ppt2txt
A pure python based utility to extract text from PPT files.
The code is based on the official documentation for MS-PPT files available at https://msopenspecs.azureedge.net/files/MS-PPT/%5bMS-PPT%5d.pdf.
How to install?
pip install ppt2txt
How to run?
- From command line:
ppt2txt file.ppt -o output_dir
- From python:
import ppt2txt
# extract content
parsed_ppt_dict = ppt2txt.process("file.ppt")
Output
parsed_ppt_dict
is a dictionary with the following structure:
{
"filename": "file.ppt",
"slides": 4,
"content": {
"0": "Text from the first record",
"1": "Text from the second record"
}
}
where:
filename
is the name of the input fileslides
is the number of slidescontent
is a dictionary containing an element for each record of type text found in the document
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ppt2txt-0.1.0.tar.gz
(4.8 kB
view hashes)