Skip to main content

An easy to use semantic (soft) querying on pandas dataframes.

Project description

SoftPandas - Semantic Querying for Pandas

SoftPandas

X (formerly Twitter) Follow

Example Usage:

  1. Let's say we want to get all red and black swim shorts that cost less than 600$

python test.py

Or play with this code:

import pandas as pd

from core.data_types import InputDataType
from core.soft_dataframe import SoftDataFrame
from embedders.clip_embedder import OpenClipEmbedder
from embedders.sentence_transformer_embedder import SentenceTransformerEmbedder
from sklearn.metrics.pairwise import cosine_similarity

lang_model = SentenceTransformerEmbedder('thenlper/gte-small',
                                metric=cosine_similarity, threshold=0.82, device="cpu")


vision_model = OpenClipEmbedder('ViT-B-32-256', metric=cosine_similarity,
                                threshold=0.25, pretrained="datacomp_s34b_b86k")

df = pd.read_csv("sample_data/men-swimwear.csv")
df = SoftDataFrame(df, soft_columns={'NAME': InputDataType.text,
                                     'DESCRIPTION & COLOR': InputDataType.text,
                                     'FABRIC': InputDataType.text},
                   models={InputDataType.text: lang_model, InputDataType.image: vision_model}
                   )

relevant_price_items = df.query("PRICE < 600")
df_filtered_desc = relevant_price_items.soft_query("'DESCRIPTION & COLOR' ~= 'red and black swim shorts'")
df = df_filtered_desc.add_soft_columns({'IMAGE': InputDataType.image}, inplace=False)

df_filtered_image = df.soft_query("'IMAGE' ~= 'red and black swim shorts'")
print(df_filtered_image)
  1. Saving and loading:
relevant_price_items.to_pickle("relevant_items.p")
a = pd.read_pickle("relevant_items.p")

TODO:

  1. Add saving methods for SoftDataFrame
  2. Method for adding new columns
  3. Add dealing with Nans
    • if a column is Nan, just ignore it
    • If value isn't there, it shouldn't pass condition - similar to normal querying
  4. Batching of initial encoding -
    • don't do it one by one
    • use device (cuda, mps, tpu, etc.)
  5. make into a package
    • requirements file
    • Add image

Long Term Goals:

  1. Add automatic feature extraction from images into new columns
    • allows hard querying using visual data!
  2. Add ability to soft query based on Image
  3. Expand to more modalities

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

softpandas-0.1.tar.gz (16.3 kB view hashes)

Uploaded Source

Built Distribution

softpandas-0.1-py3-none-any.whl (15.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page