Skip to main content

No project description provided

Project description

Screenplay Parser

Parse PDF screenplay into rich JSON format

Install

pipenv install

# or
pip3 install -r requirements.txt

Usage

python index.py -s path_of_screenplay.pdf --start page_number_to_start_analyzing

Notes

  • It's advisable to set --start to the start of the screenplay. Title page, cast list, etc should be skipped. Feature to detect these pages is part of the roadmap, so stay tuned.
  • Works well for "clean" PDF screenplays, not OCR PDFs.
  • Production screenplays works pretty well.

JSON structure

[{
    // page number
    "page": 1,

    // scene info
    "scene_info": {
        "region": "EXT.",  //region of scene [EXT., INT., EXT./INT, INT./EXT]
        "location": "VILLA",
        "time": ["DAY"] // time of scene [DAY, NIGHT, DAWN, DUSK, ...]
    },
    "scene": [{
        "type": "ACTION",  // type of snippet [ACTION, CHARACTER, TRANSITION, DUAL_DIALOGUE]
        "content": {...} // content differs based on ACTION
    }, {...}]

}, {...}]
  • It's really an array of dictionaries rather than a JSON object.

Type Content Structure

  • ACTION
    "content": [{
        "text": "an action paragraph",
        "x": 108,
        "y": 120 // Y-axis of last line in paragraph
    }, {...}]
    
  • CHARACTER
     "content": {
         "character": "MILES",
         "modifier": null,  // V.O, O.S., and more. null if no modifier
         "dialogue": [
          "Hey good morning. How you doing?... Weekend was short, huh? ",
          "(he turns to another kid)", //parentheticals are seperated
          " Oh my gosh this is embarrassing, we wore the same jacket--"
         ]
     }
    
  • DUAL_DIALOGUE
     "content": {
         "character1": {
              "character": {
                  "character": "PETER",
                  "modifier": null
              },
              "dialogue": [
                  "(groggy)",
                  " Why are you trying to kill me?--"
              ]
          },
          "character2": {
              "character": {
                  "character": "MILES",
                  "modifier": "CONT'D"
              },
              "dialogue": [
                  "--I’m not! I’m trying to save you!"
              ]
          }
     }
    
  • TRANSITION
     "content": {
         "text": "SMASH TO:",
         "metadata": {
             "x": 448,
             "y": 720
         }
     }
    

Run tests

python -m pytest tests/

Todos

  • Add unit tests
  • Skip to start of screenplay
  • Add -o flag to set output path
  • More documentation
  • Add option to use as a library
  • detect end of screenplay

Author

👤 Egan Bisma

Show your support

Give a ⭐️ if this project helped you!


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

screenplay_pdf_to_json-0.1.0.tar.gz (13.2 kB view hashes)

Uploaded Source

Built Distribution

screenplay_pdf_to_json-0.1.0-py3-none-any.whl (15.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page