Skip to main content

Compute, store and operate on data sketches

Project description

sketch

Co-pilot for pandas users, AI that understands the content of data, greatly enhancing the relevance of suggestions. Adding data context to AI code-writing assistants, usable in notebooks in seconds.

Enhance your workflow by asking questions of your data, and getting code suggestions to answer those questions. Reduce the time spent googling, asking chat-gpt3, and even re-writing co-pilot suggestions. Get more accurate code suggestions for code and pandas, all without adding any plugin to your IDE.

pip install sketch

Open In Colab

Demo

In the following demo we follow a "standard" (hypothetical) data-analysis workflow, showing a Natural Language interace that successfully navigates many tasks in the data stack landscape.

  • Catalogging:
    • General tagging (eg. PII identification)
    • Metadata generation (names and descriptions)
  • Data Engineering:
    • Data cleaning and masking (compliance)
    • Derived feature creation and extraction
  • Data Analysis:
    • Data questions
    • Data visualization

https://user-images.githubusercontent.com/916073/212602281-4ebd090f-09c4-495d-b48d-0b4c37b9f665.mp4

How to use

It's as simple as importing sketch, and then using the .sketch extension on any pandas dataframe.

import sketch

Now, any pandas dataframe you have will have an extension registered to it. Access this new extension with your dataframes name .sketch

.sketch.ask

Ask is a basic question-answer system on sketch, this will return an answer in text that is based off of the summary statistics and description of the data.

Use ask to get an understanding of the data, get better column names, ask hypotheticals (how would I go about doing X with this data), and more.

df.sketch.ask("Which columns are integer type?")

.sketch.howto

Howto is the basic "code-writing" prompt in sketch. This will return a code-block you should be able to copy paste and use as a starting point (or possibly ending!) for any question you have to ask of the data. Ask this how to clean the data, normalize, create new features, plot, and even build models!

df.sketch.howto("Plot the sales versus time")

.sketch.apply

apply is a more advanced prompt that is more useful for data generation. Use it to parse fields, generate new features, and more. This is built directly on lambdaprompt. In order to use this, you will need to set up a free account with OpenAI, and set an environment variable with your API key. OPENAI_API_KEY=YOUR_API_KEY

df['review_keywords'] = df.sketch.apply("Keywords for the review [{{ review_text }}] of product [{{ product_name }}] (comma separated):")
df['capitol'] = pd.DataFrame({'State': ['Colorado', 'Kansas', 'California', 'New York']}).sketch.apply("What is the capitol of [{{ State }}]?")

Sketch currently uses prompts.approx.dev to help run with minimal setup

In the future, we plan to update the prompts at this endpoint with our own custom foundation model, built to answer questions more accurately than GPT-3 can with its minimal data context.

You can also directly call OpenAI directly (and not use our endpoint) by using your own API key. To do this, set 2 environment variables.

(1) SKETCH_USE_REMOTE_LAMBDAPROMPT=False (2) OPENAI_API_KEY=YOUR_API_KEY

How it works

Sketch uses efficient approximation algorithms (data sketches) to quickly summarize your data, and feed that information into language models. Right now it does this by summarizing the columns and writing these summary statistics as additional context to be used by the code-writing prompt. In the future we hope to feed these sketches directly into custom made "data + language" foundation models to get more accurate results.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sketch-0.3.3.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sketch-0.3.3-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file sketch-0.3.3.tar.gz.

File metadata

  • Download URL: sketch-0.3.3.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for sketch-0.3.3.tar.gz
Algorithm Hash digest
SHA256 2fa2159104ce2e9409b1bc7bb21de20e4af1423116bc2752066bae72551af51f
MD5 20c88d35b656e3c823ae789b29d6eec2
BLAKE2b-256 d93a060b88a8506c210d35576bb12972b551306414d48baf401c3e9de18bbe9e

See more details on using hashes here.

File details

Details for the file sketch-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: sketch-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for sketch-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6fc0cfef4594b972690e737d601ae38756b58b8dd72b9c988d444c927b6345a9
MD5 e0d8f092d71d0f49fea74b3fdde1973a
BLAKE2b-256 096fa67e62d26d244cd7932fc6308b2b7dfa94637d6ce8dd299df39dbb1a5ee4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page