Skip to main content

Probabilistic Unification

Project description

td-ml-probabilistic-unification

Introduction

The td-ml-probabilistic-unification is a Python package designed for Scalable Probabilistic Unification within the Treasure Data environment. It provides functionality to unify and cluster records probabilistically based on various attributes, making it useful for a wide range of data integration and analysis tasks.

In order to perform probabilistic unification using this package, you should have an input table containing the data you want to unify. The package will use the specified configuration parameters to perform probabilistic unification and generate an output table with clustered records.

Configuration

Before using this package, you need to set the following environment variables:

# Configuration variables
TD_SINK_DATABASE = os.environ.get('TD_SINK_DATABASE')
TD_API_KEY = os.environ.get('TD_API_KEY')
TD_API_SERVER = os.environ.get('TD_API_SERVER')

id_col = os.environ.get('id_col')
cluster_col_name = os.environ.get('cluster_col_name')
convergence_threshold = float(os.environ.get('convergence_threshold'))
cluster_threshold = float(os.environ.get('cluster_threshold'))
string_type = os.environ.get('string_type')
fill_missing = os.environ.get('fill_missing')
feature_dict = json.loads(os.environ.get('feature_dict'))
blocking_table = os.environ.get('blocking_table')
output_table = os.environ.get('output_table')

record_limit = int(os.environ.get('record_limit'))
lower_limit = int(os.environ.get('lower_limit'))
upper_limit = int(os.environ.get('upper_limit'))
range_index = os.environ.get('range_index')
paralelism = os.environ.get('paralelism')
input_table = blocking_table



Thank you for choosing td-ml-probabilistic-unification for your probabilistic unification needs! 📊🚀

`Copyright © 2022 Treasure Data, Inc. (or its affiliates). All rights reserved`

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

td_ml_probabilistic_unification-0.0.8.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file td_ml_probabilistic_unification-0.0.8.tar.gz.

File metadata

File hashes

Hashes for td_ml_probabilistic_unification-0.0.8.tar.gz
Algorithm Hash digest
SHA256 670e440c9ede62ca247ca77e1bdceba0f23e8934a7ff8e7f3d984fd7edd412cd
MD5 2d1d6536e932ace6689f25c61d67a60e
BLAKE2b-256 fe799ef53775843a310212e43fcde26c016b6c5f10827ca860189a22e7e94674

See more details on using hashes here.

File details

Details for the file td_ml_probabilistic_unification-0.0.8-py3-none-any.whl.

File metadata

File hashes

Hashes for td_ml_probabilistic_unification-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ce74aca28e66ef8ff8a676da51c5d388ee28858fb60ef7785470b4bb6d2875d0
MD5 7dea5f0b9b76505809d7a0f92fb3ca3b
BLAKE2b-256 c3a7d4dec3b093a11b0b04b8ee9332a7e5552bc8205d24ea78a0bd562931b14b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page