Process and profile text datasets interactively
Project description
Texture: Structured Text Analytics
Texture is a system for exploring and creating structured insights with your text datasets.
- Interactive Attribute Profiles: Texture visualizes structured attributes alongside your text data in interactive, cross-filterable charts.
- Flexible attribute definitions: Attribute charts can come from different tables and any level of a document such as words, sentences, or documents.
- Derive new attributes: Texture helps you derive new attributes during analysis with code and LLM transformations.
Install and run
Install texture with pip:
pip install texture-viz
Then you can run in a python script or notebook by providing a dataframe with your text data and attributes.
import texture
texture.run(df)
Texture Configuration
You can optionally pass arguments to the run
command to configure the interface. Notable configuration options are:
embeddings: np.ndarray
: embeddings of your text data can be provided to enable similarity search and a projection overview. If you already have a 2d projection of these embeddings, you must provide it as columnsumap_x
andumap_y
in the dataframe.column_info: List[ColumnInputInfo]
: Used to override default column types and provide derived tables. Texture will automatically infer the types (text, categorical, number, date) of your columns, but you can override here. Additionally, you can provide column information for columns from another table like words.api_key
: Your OpenAI API key to enable LLM attribute derivation.
We provide various preprocessing functions to calculate embeddings, projections, and word tables. You can use these functions to preprocess your data before launching the Texture app.
import pandas as pd
import texture
df_vis_papers = pd.read_parquet("https://raw.githubusercontent.com/cmudig/Texture/main/examples/vis_papers/vis_paper_data.parquet")
# get embeddings and projection
embeddings, projection = texture.preprocess.get_embeddings_and_projection(
df_vis_papers["Abstract"], ".", "all-mpnet-base-v2"
)
df_vis_papers["umap_x"] = projection[:, 0]
df_vis_papers["umap_y"] = projection[:, 1]
# get word table
df_words = texture.preprocess.get_df_words_w_span(df_vis_papers["Abstract"], df_vis_papers["id"])
# launch texture
texture.run(
df_vis_papers,
embeddings=embeddings,
column_info=[
{"name": "Abstract", "type": "text"},
{"name": "Title", "type": "categorical"},
{"name": "Year", "type": "number"},
{
"name": "word",
"derived_from": "Abstract",
"table_data": df_words,
"type": "categorical",
},
],
)
Dev install
See DEV.md for dev workflows and setup.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file texture_viz-0.0.4.tar.gz
.
File metadata
- Download URL: texture_viz-0.0.4.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c181d951fddb271da64d44bcf323d90b24801637e73557ec8cdf5404e0b43e72 |
|
MD5 | 05ffbe99b57117e25381aeb8faaee029 |
|
BLAKE2b-256 | 1c3a792f52308ead20a901cc22ee66c542c02419d596e2cb5b38a497de97de84 |
File details
Details for the file texture_viz-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: texture_viz-0.0.4-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.4 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4606f5ddb04af8d0bd3245a1a45f9953100f9da970994fa50adda30b117c826 |
|
MD5 | e4753508ff25a9c8ecf66e19e7d579ae |
|
BLAKE2b-256 | e84faf8ca109e4b3863a298b75b3ff62b04a788d495e1b174ae308868f9a9cad |