Skip to main content

Python package for generating training data from documents.

Project description

SpiceJack

SpiceJack is a tool for generating json questions and answers from documents in python.

Usage

from spicejack.pdf import PDFprocessor

def filter1(list):
    """
    Example filter
    """
    return [i.replace("badword","b*dword") for i in list]


processor = PDFprocessor(
    "/path/to/Tax_Evasion_Tutorial.pdf",
    use_legitimate = True, # Runs the processor with the openai api (See "legitimate use")
    filters = (filter1,) # Extra custom filters
)

processor.run(
    thread = True # Runs the processor in a child thread. (threading.Thread)
    process = True # Runs the processor in a child thread. (multiprocessing.Process)
    logging = True # Prints the responses from the LLM
)

Legitimate use

Create a file named .env and put this:

OPENAI_API_KEY = "<YOUR-OPENAI-API-KEY>"

Installation

pip install spicejack

Support me

You can use SpiceJack for completely free, but donations are very appreciated as I am making this on an 10+ year old laptop.

Bitcoin

bc1q7xaxer2xpxttm3vpzc8s9dutvck8u9ercxxc95

Ethereum

0xB7351e098c80E2dCDE48BB769ac14c599E32c47E

Monero

44Y47Sf2huJV4hx7K1JrTeKbgkPsWdRWSbEiAHRWKroaGYAnxkPjdxhUsDeiFeQ3wc6Tw8v3uYTZMbBUfcdUUgqt5HCqbtY

Litecoin

LQzd9phuN7iPRn8p5rT1zyVssJ8nY5WjM5

Roadmap

  • Python library

  • Mass generation

  • GUI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spicejack-0.44b0.tar.gz (42.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spicejack-0.44b0-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file spicejack-0.44b0.tar.gz.

File metadata

  • Download URL: spicejack-0.44b0.tar.gz
  • Upload date:
  • Size: 42.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for spicejack-0.44b0.tar.gz
Algorithm Hash digest
SHA256 6c726d07e35cad37c6cd489dcb1e49a55b9f6f5e8167138f0b5953a6279f8cef
MD5 9fdbc240d5c37c76b504fdba966b9fd6
BLAKE2b-256 3d874ab673a2d4139e544cf5473439c9b545ef95f5c0ddca7067d25142c3351c

See more details on using hashes here.

File details

Details for the file spicejack-0.44b0-py3-none-any.whl.

File metadata

  • Download URL: spicejack-0.44b0-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for spicejack-0.44b0-py3-none-any.whl
Algorithm Hash digest
SHA256 9477181d842f690107a820979b1fbd3321481b833f225afddba23841e577fc05
MD5 7d716a08b3271d5ff6716f88fb61a090
BLAKE2b-256 9fde593e677e6d4b45a92871641eba85512d7e48ded27aba9993cc05d4a8935a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page