Skip to main content

No project description provided

Project description

Tagging system


The tagging system is composed by the following major components:

  • Preprocessors: preprocesses the input data objects before tagging.
  • Tag ID strategies: independent strategies to identify tags from input data objects
  • Aggregators: post-process and aggregates the tagging results from tag ID pipelines configured for each assets
  • Handlers: assembles preprocessor, tag ID pipelines, and result aggregation logics for each type of data source

Input of the tagging system

Currently, the system is implemented to receive batches of data objects from various data sources, e.g., Twitter ( implemented), Discord (TBI), Medium (TBI), Reddit (TBI), etc.


Preprocessors are used to preprocess the input data objects before tagging.


Strategies are used to identify tags from input (pre-processed) data objects. Strategies are expected to work independently, and will work per-asset.


Aggregators are used to post-process and aggregate the tagging results from tag ID pipelines configured for each assets. Cross-asset tagging strategies should also be implemented here. Aggregators are expected to have access to all outputs from tagging strategies (asset-wise), and the input data objects.


Handlers control the flows of the actual tagging process for each data source. The handler reads preprocessing pipeline and aggregation pipeline from the data_source_configs of global config, and reads tag identification pipelines from ticker-specific configurations.

Applying TaggingSystem in downstream logics

  1. Prepare tagger configs in a JSON file, e.g., config.json. You may find a sample in: . This config file contains global settings for all tickers, including: preprocessing pipeline, and ticker idenfication results aggregation pipeline. Read this config as:
config = json.load(open('config.json'))
  1. Prepare a list of ticker specific configs, e.g., ticker_configs.json. You may find a lot of prepared configs in . Read this config as:
# In this example we only read one ticker config
ticker_config_list = json.load(open('ticker_configs/curated_tickers/Chains/ETH.json'))
  1. Initialize a Handler object, and tag your data object(s) by calling it on the data object(s):
from TaggingSystem.handler.DiscordHandler import DiscordHandler

# Init handler
crypto_ticker_tagger = DiscordHandler(config=config, ticker_config_list=ticker_config_list)

# Sample data, could also be a list of dicts
processed_data = {"content": "test BTC"}

# Apply Tagging Logic
crypto_tickers = crypto_ticker_tagger(processed_data)

# crypto_tickers = [{'BTC': 1.0}]

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tagging_system-0.1.0.tar.gz (16.4 kB view hashes)

Uploaded source

Built Distribution

tagging_system-0.1.0-py3-none-any.whl (24.7 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page