Skip to main content

Python package for generating training data from documents.

Project description

SpiceJack

SpiceJack is a tool for generating json questions and answers from documents in python.

Usage

from spicejack.pdf import PDFprocessor

def filter1(list):
    """
    Example filter
    """
    return [i.replace("badword","b*dword") for i in list]


processor = PDFprocessor(
    "/path/to/Tax_Evasion_Tutorial.pdf",
    use_legitimate = True, # Runs the processor with the openai api (See "legitimate use")
    filters = (filter1,) # Extra custom filters
)

processor.run(
    thread = True # Runs the processor in a child thread. (threading.Thread)
    process = True # Runs the processor in a child thread. (multiprocessing.Process)
    logging = True # Prints the responses from the LLM
)

Legitimate use

Create a file named .env and put this:

OPENAI_API_KEY = "<YOUR-OPENAI-API-KEY>"

Installation

pip install spicejack

Support me

You can use SpiceJack for completely free, but donations are very appreciated as I am making this on an 10+ year old laptop.

Bitcoin

bc1q7xaxer2xpxttm3vpzc8s9dutvck8u9ercxxc95

Ethereum

0xB7351e098c80E2dCDE48BB769ac14c599E32c47E

Monero

44Y47Sf2huJV4hx7K1JrTeKbgkPsWdRWSbEiAHRWKroaGYAnxkPjdxhUsDeiFeQ3wc6Tw8v3uYTZMbBUfcdUUgqt5HCqbtY

Litecoin

LQzd9phuN7iPRn8p5rT1zyVssJ8nY5WjM5

Roadmap

  • Python library

  • Mass generation

  • GUI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spicejack-0.43b0.tar.gz (42.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spicejack-0.43b0-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file spicejack-0.43b0.tar.gz.

File metadata

  • Download URL: spicejack-0.43b0.tar.gz
  • Upload date:
  • Size: 42.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for spicejack-0.43b0.tar.gz
Algorithm Hash digest
SHA256 f8ba2bf3f651ad60046381171f6ff079efc5dd97ca7085c12b00e399d69508f1
MD5 c033141de6b71f2166365923588c6e4d
BLAKE2b-256 36fe8719f1942f14714b6ef0b13b093bbb382795e1a9fd0881bce63e02b84985

See more details on using hashes here.

File details

Details for the file spicejack-0.43b0-py3-none-any.whl.

File metadata

  • Download URL: spicejack-0.43b0-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for spicejack-0.43b0-py3-none-any.whl
Algorithm Hash digest
SHA256 be3fd8c1927925adb78db978464eacf82fb8cb1e729d88524db08abc1ab5ee39
MD5 9e4c9a43bcbd117663e8c0586e209e94
BLAKE2b-256 ed6ff1d1163a1f0129878837724944abf9c4f1d24c0195620185d0eb64aad5b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page