No project description provided
Project description
Screenplay Parser
Parse PDF screenplay into rich JSON format
Install
pipenv install
# or
pip3 install -r requirements.txt
Usage
python index.py -s path_of_screenplay.pdf --start page_number_to_start_analyzing
Notes
- It's advisable to set
--start
to the start of the screenplay. Title page, cast list, etc should be skipped. Feature to detect these pages is part of the roadmap, so stay tuned. - Works well for "clean" PDF screenplays, not OCR PDFs.
- Production screenplays works pretty well.
JSON structure
[{
// page number
"page": 1,
// scene info
"scene_info": {
"region": "EXT.", //region of scene [EXT., INT., EXT./INT, INT./EXT]
"location": "VILLA",
"time": ["DAY"] // time of scene [DAY, NIGHT, DAWN, DUSK, ...]
},
"scene": [{
"type": "ACTION", // type of snippet [ACTION, CHARACTER, TRANSITION, DUAL_DIALOGUE]
"content": {...} // content differs based on ACTION
}, {...}]
}, {...}]
- It's really an array of dictionaries rather than a JSON object.
Type Content Structure
- ACTION
"content": [{ "text": "an action paragraph", "x": 108, "y": 120 // Y-axis of last line in paragraph }, {...}]
- CHARACTER
"content": { "character": "MILES", "modifier": null, // V.O, O.S., and more. null if no modifier "dialogue": [ "Hey good morning. How you doing?... Weekend was short, huh? ", "(he turns to another kid)", //parentheticals are seperated " Oh my gosh this is embarrassing, we wore the same jacket--" ] }
- DUAL_DIALOGUE
"content": { "character1": { "character": { "character": "PETER", "modifier": null }, "dialogue": [ "(groggy)", " Why are you trying to kill me?--" ] }, "character2": { "character": { "character": "MILES", "modifier": "CONT'D" }, "dialogue": [ "--I’m not! I’m trying to save you!" ] } }
- TRANSITION
"content": { "text": "SMASH TO:", "metadata": { "x": 448, "y": 720 } }
Run tests
python -m pytest tests/
Todos
- Add unit tests
- Skip to start of screenplay
- Add -o flag to set output path
- More documentation
- Add option to use as a library
- detect end of screenplay
Author
👤 Egan Bisma
- Website: egan.dev
- Github: @VVNoodle
Show your support
Give a ⭐️ if this project helped you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for screenplay_pdf_to_json-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c468490fc3b7461ea4dee62915ceef34e3267c2c43bf08f06dd2fa8c3eba305 |
|
MD5 | 7e4761674a4cfb022b6096426697c662 |
|
BLAKE2b-256 | 2c366461b8fa35e2e7600cb505727b3aed92d238c467838ab10ce5617aedf6d5 |
Close
Hashes for screenplay_pdf_to_json-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d01583086112a940fe86bfb5f1cf15e4c3345acdcc8aca3fd3dc88d45ccad880 |
|
MD5 | 300af0cc5969455dda54b4ac0dc8ef24 |
|
BLAKE2b-256 | 98a89736cb258e67cd0ab180eab753f1150fdead1d37b10ab0979a21594fe61c |