Skip to main content

A small example package

Project description

SuperBIG

SuperBIG is a virtual prompt/context management system. It can take long prompts that wouldn't otherwise fit into the context size limit of your model and optimally search for information and snippets that are relevant to a search string. The search results can then be injected back into the prompt, cutting down on token length and giving the model just enough information to produce a solid generation.

ELI5:

SuperBIG wraps your prompt in a searchable environment to simulate a virtual context of unlimited size - think of it like a swapfile or pagefile with a search engine on top

More info: https://github.com/oobabooga/text-generation-webui/pull/1548

A simplified version of this exists (superbooga) in the Text-Generation-WebUI, but this repo contains the full WIP project.

Note that SuperBIG is an experimental project, with the goal of giving local models the ability to give accurate answers using massive data sources.

Installation

pip install superbig

Usage

Import the PseudocontextProvider, and use it in your projects like so:

from superbig.provider import PseudocontextProvider

provider = PseudocontextProvider()
tokenizer = AutoTokenizer.from_pretrained(...)
model = AutoModelForCausalLM.from_pretrained(...)

...

new_prompt = provider.with_pseudocontext(prompt)

input_ids = tokenizer.tokenize(new_prompt)
model.generate(input_ids, **kwargs)

Adding Sources

Sources can be automatically inferred from the prompt by passing auto_infer_sources={} to the provider:

new_prompt = provider.with_pseudocontext(prompt, auto_infer_sources={UrlSource: true})

You can also manually add sources using the add_source function:

provider.add_source('mysource', UrlSource('https://github.com/kaiokendev/superbig'))

Manually added sources need to be explicitly referenced in the prompt, surrounded by triple square brackets:

Hello World, this is my prompt, and this is my source: [[[mysource]]]

Milestones

  • PyPI package
  • Bugs fixed for general usage
  • Manually-added sources
  • More sources
    • PDF
    • Filepath
  • More chunkers
    • Multitext
    • Paragraph
    • Forum
  • Allow each source to use separate chunkers
  • Search result metadata
  • Dynamically shrinking search result pages
  • Custom search logic that incorporates model output (Focus system)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

superbig-0.1.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

superbig-0.1.0-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file superbig-0.1.0.tar.gz.

File metadata

  • Download URL: superbig-0.1.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for superbig-0.1.0.tar.gz
Algorithm Hash digest
SHA256 04694c496ac68bff543b2e2de38c32430086efac7e655cd43897499a450aa805
MD5 37e7334582dedd27f9d73a61945c0edd
BLAKE2b-256 cc5690e3a911947b0f6584fc72ac1acf00c29d1d41025a65d3bf1b3a3db8d164

See more details on using hashes here.

File details

Details for the file superbig-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: superbig-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for superbig-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 350e5bc95b91e2fbbd3e3c1f9e42cc2fd999b86f5f98cf78ed7b369c93b9ba86
MD5 74de33301a36b1a3c65d0376de2093ff
BLAKE2b-256 d1115b207fca0aa457f7971041e447f28b868ba7c819eb7e77d52b5e73f3c705

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page