Skip to main content

Process and profile text datasets interactively

Project description

Texture: Structured Text Analytics

PyPi

Texture is a system for exploring and creating structured insights with your text datasets.

  1. Interactive Attribute Profiles: Texture visualizes structured attributes alongside your text data in interactive, cross-filterable charts.
  2. Flexible attribute definitions: Attribute charts can come from different tables and any level of a document such as words, sentences, or documents.
  3. Derive new attributes: Texture helps you derive new attributes during analysis with code and LLM transformations.

screenshot of Texture interface

Install and run

Install texture with pip:

pip install texture-viz

Then you can run in a python script or notebook by providing a dataframe with your text data and attributes.

import texture
texture.run(df)

Texture Configuration

You can optionally pass arguments to the run command to configure the interface. Notable configuration options are:

  • embeddings: np.ndarray: embeddings of your text data can be provided to enable similarity search and a projection overview. If you already have a 2d projection of these embeddings, you must provide it as columns umap_x and umap_y in the dataframe.
  • column_info: List[ColumnInputInfo]: Used to override default column types and provide derived tables. Texture will automatically infer the types (text, categorical, number, date) of your columns, but you can override here. Additionally, you can provide column information for columns from another table like words.
  • api_key: Your OpenAI API key to enable LLM attribute derivation.

We provide various preprocessing functions to calculate embeddings, projections, and word tables. You can use these functions to preprocess your data before launching the Texture app.

import pandas as pd
import texture

df_vis_papers = pd.read_parquet("https://raw.githubusercontent.com/cmudig/Texture/main/examples/vis_papers/vis_paper_data.parquet")

# get embeddings and projection
embeddings, projection = texture.preprocess.get_embeddings_and_projection(
    df_vis_papers["Abstract"], ".", "all-mpnet-base-v2"
)

df_vis_papers["umap_x"] = projection[:, 0]
df_vis_papers["umap_y"] = projection[:, 1]

# get word table
df_words = texture.preprocess.get_df_words_w_span(df_vis_papers["Abstract"], df_vis_papers["id"])

# launch texture
texture.run(
    df_vis_papers,
    embeddings=embeddings,
    column_info=[
        {"name": "Abstract", "type": "text"},
        {"name": "Title", "type": "categorical"},
        {"name": "Year", "type": "number"},
        {
            "name": "word",
            "derived_from": "Abstract",
            "table_data": df_words,
            "type": "categorical",
        },
    ],
)

Dev install

See DEV.md for dev workflows and setup.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

texture_viz-0.0.4.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

texture_viz-0.0.4-py3-none-any.whl (3.2 MB view details)

Uploaded Python 3

File details

Details for the file texture_viz-0.0.4.tar.gz.

File metadata

  • Download URL: texture_viz-0.0.4.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/23.4.0

File hashes

Hashes for texture_viz-0.0.4.tar.gz
Algorithm Hash digest
SHA256 c181d951fddb271da64d44bcf323d90b24801637e73557ec8cdf5404e0b43e72
MD5 05ffbe99b57117e25381aeb8faaee029
BLAKE2b-256 1c3a792f52308ead20a901cc22ee66c542c02419d596e2cb5b38a497de97de84

See more details on using hashes here.

File details

Details for the file texture_viz-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: texture_viz-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 3.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/23.4.0

File hashes

Hashes for texture_viz-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a4606f5ddb04af8d0bd3245a1a45f9953100f9da970994fa50adda30b117c826
MD5 e4753508ff25a9c8ecf66e19e7d579ae
BLAKE2b-256 e84faf8ca109e4b3863a298b75b3ff62b04a788d495e1b174ae308868f9a9cad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page