TF Tabular simplifies the experimentation and preprocessing of tabular datsets for TensorFlow models.
Project description
TF Tabular
TF Tabular is a project aimed at simplifying the process of handling tabular data in TensorFlow. It provides utilities for building models on top of numeric, categorical, multihot, and sequential data types.
Features
- Create input layers based on lists of columns
- Support custom embeddings: Useful to include external embeddings for example obtained from an LLM.
- Support sequence layers: Useful for time series or when building recommenders on top of user interaction data.
- Support multi-hot categorical columns
- No model building or training: Build whatever you want on top
Installation
To get started with TF Tabular, you will need to install it using pip:
pip install tabular-tf
Usage
Here is a basic example of how to use TF Tabular:
from tf_tabular.builder import InputBuilder
# Define columns to use and specify additional parameters:
categoricals = ['Pclass', 'Embarked']
numericals = ['Age', 'Fare']
# ....
# Build model:
input_builder = InputBuilder()
input_builder.add_inputs_list(categoricals=categoricals,
numericals=numericals,
normalization_params=norm_params,
vocabs=vocabs,
embedding_dims=embedding_dims)
inputs, output = input_builder.build_input_layers()
output = Dense(1, activation='sigmoid')(output)
model = Model(inputs=inputs, outputs=output)
Which will produce a model like this:
Examples
The examples folder includes more complete examples including:
- Titanic: A simple binary classification example using the Titanic dataset.
- MovieLens: A two tower retrieval model using the MovieLens dataset.
- MovieLens Sequential: Another two tower retrieval model build on the MovieLens dataset preprocessed so that the input of the model is the list of movies the user has interacted with.
Contributing
Contributions to TF Tabular are welcome. Check out the contributing guidelines for more details.
Setting up local development environment
To set up a local development environment, you will need to first clone the repo and then install the required dependencies:
- Install Poetry follow the instructions on the official Poetry website.
- Run
poetry install
- Run
poetry run pre-commit install
to install git pre-commit
Roadmap:
This is a list of possible features to be added in the future depending on need and interest expressed by the community.
- Parse dataset to separate numeric vs categoricals, multihots and sequencials
- Implement other types of normalization
- Support computing vocab and normalization params instead of receiving them as parameters
- Improve documentation and provide more usage examples
License
TF Tabular is licensed under the MIT License. See the LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tabular_tf-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c57a824a937298b7cc99f6e507680b3711ff7b4ccb4df1c74899a6d9e36d8bf9 |
|
MD5 | e995e1bc6fa251bcff917ada19fa2655 |
|
BLAKE2b-256 | 70086995e8fd9aeb90f886c6b58f8f21b394937b57604579ecc9f60dfc7e6e50 |