Skip to main content

Professional SDK for DSF Label Adaptive Formula API

Project description

DSF Label SDK

Accelerate AI development with programmatic data classification. Reduce labeling costs and time with configurable heuristics.


Why DSF Label?

Manual data labeling is slow and expensive. This SDK transforms domain expertise into configurable heuristics that classify datasets at scale with confidence scoring.

Built on the DSF evaluation formula (weighted similarity with parameter scaling), ensuring consistency across all DSF products.


Core Concepts

Define weighted heuristics based on domain knowledge. The system evaluates data points against these rules, producing classification scores. Uncertain cases get flagged for review.


Installation

pip install dsf-label-sdk

Quick Start

Community Edition

from dsf_label import LabelSDK

sdk = LabelSDK()

# Build configuration
config = (sdk.create_config()
    .add_field('feature_a', reference=True, params={'importance': 5.0})
    .add_field('feature_b', reference=True, params={'importance': 4.0})
    .add_field('feature_c', reference=50, params={'importance': 2.0})
)

# Classify data point
data = {
    'feature_a': True,
    'feature_b': True,
    'feature_c': 45
}

result = sdk.evaluate(data, config)
print(f"Score: {result.score:.3f}")
print(f"Above threshold: {result.is_above_threshold}")

Note: For boolean fields, reference=True means expected value is True. Similarity computed as exact match.

Professional Edition

import pandas as pd

sdk = LabelSDK(
    license_key='PRO-2026-12-31-XXXX',
    tier='professional'
)

# Batch processing
df = pd.read_csv('unlabeled_data.csv')
results = sdk.batch_evaluate(
    data_points=df.to_dict('records'),
    config=config
)

# Access metrics
metrics = sdk.get_metrics()
print(f"Confidence level: {metrics['confidence_level']:.3f}")
print(f"Average score: {metrics['avg_score']:.3f}")

Enterprise Edition

sdk = LabelSDK(
    license_key='ENT-2026-12-31-XXXX',
    tier='enterprise',
    mode='adaptive'
)

# Process large datasets with auto-calibration
for batch in dataset_batches:
    results = sdk.batch_evaluate(batch, config)

# View adaptation metrics
metrics = sdk.get_metrics()
print(f"Evaluations: {metrics['evaluations']}")
print(f"Auto-calibrated: {metrics.get('auto_calibration', False)}")

Config Builder

sdk = LabelSDK()

# Chained calls
config = (sdk.create_config()
    .add_field('metric_a', reference=20, params={'importance': 3.0})
    .add_field('metric_b', reference=1.0, params={'importance': 2.5})
)

# Sequential calls
config = sdk.create_config()
for field_name, params in field_definitions.items():
    config.add_field(field_name, **params)

Context Manager

with LabelSDK(license_key='...', tier='enterprise') as sdk:
    result = sdk.evaluate(data, config)
    metrics = sdk.get_metrics()

Error Handling

from dsf_label import LabelSDK, LicenseError, ValidationError

try:
    sdk = LabelSDK(license_key='invalid', tier='professional')
    result = sdk.evaluate(data, config)
    
except LicenseError as e:
    sdk = LabelSDK()  # Fallback to community
    
except ValidationError as e:
    print(f"Invalid configuration: {e}")

Tier Comparison

Feature Community Professional Enterprise
Classifications/month Unlimited† Unlimited Unlimited
Single Evaluation
Batch Processing
DataFrame Support
Adaptive Thresholds
Performance Metrics ✅ Enhanced
Auto-Calibration
Adaptive Modes
Support Community Email Priority SLA

†Subject to fair-use policies. Community tier free for evaluation. Production requires registration.


Enterprise Features

Auto-Calibration (Enterprise)

System optimizes heuristic parameters based on data patterns.

Adaptation Modes (Enterprise)

# Standard: Full history
sdk = LabelSDK(tier='enterprise', mode='standard')

# Adaptive: Recent patterns priority
sdk = LabelSDK(tier='enterprise', mode='adaptive')

Cache Management (Enterprise)

sdk.invalidate_cache()  # Reset when patterns change

Configuration Guidelines

config = {
    'field_name': {
        'reference': expected_value,
        'params': {'importance': <float>}  # 0.0-5.0
    }
}

Internally: All field contributions computed using DSF evaluation formula (weighted similarity with parameter scaling), ensuring consistency across all DSF SDKs.


Hybrid Integration

Integrate ML models as additional heuristics:

# Example: Load pre-trained models
bert_classifier = your_model_loader('sentiment')  # Conceptual
xgboost_model = your_model_loader('risk')

# Hybrid configuration
config = {
    'metric_standard': {'reference': 100, 'params': {'importance': 3.0}},
    'bert_score': {'reference': 0.8, 'params': {'importance': 4.0}},
    'xgb_score': {'reference': 0.3, 'params': {'importance': 4.5}}
}

# Process with ensemble
def process_item(item_data):
    bert_pred = bert_classifier(item_data['text'])
    xgb_pred = xgboost_model.predict(item_data['features'])
    
    hybrid_data = {
        'metric_standard': item_data['value'],
        'bert_score': bert_pred,
        'xgb_score': xgb_pred
    }
    
    return sdk.evaluate(hybrid_data, config)

API Reference

LabelSDK

Methods:

  • __init__(tier='community', license_key=None, mode='standard')
  • evaluate(data, config) - Single evaluation
  • batch_evaluate(data_points, config) - Batch processing (Pro/Enterprise)
  • create_config() - Config builder
  • get_metrics() - Performance statistics (Pro/Enterprise)
  • set_confidence_level(level) - Threshold adjustment
  • invalidate_cache() - Reset cache (Enterprise)

EvaluationResult

Attributes:

  • score (float): Confidence score [0.0-1.0]
  • tier (str): License tier
  • confidence_level (float): Current threshold
  • is_above_threshold (bool): Threshold comparison
  • metrics (dict): Metrics (Pro/Enterprise)

ConfigBuilder

Methods:

  • add_field(name, reference, params={'importance': 1.0})
  • remove_field(name)
  • build()

Performance Metrics (Pro/Enterprise)

metrics = sdk.get_metrics()

print(f"Evaluations: {metrics['evaluations']}")
print(f"Average score: {metrics['avg_score']:.3f}")
print(f"Confidence level: {metrics['confidence_level']:.3f}")

# Enterprise metrics
if metrics.get('auto_calibration'):
    print(f"Adapted fields: {metrics['adapted_fields']}")

Migration Example

Before:

for data_point in dataset:
    label = human_annotator.label(data_point)
    labeled_data.append((data_point, label))

After:

for data_point in dataset:
    result = sdk.evaluate(data_point, config)
    if result.score > 0.75:
        labeled_data.append((data_point, 'POSITIVE'))
    elif result.score < 0.35:
        labeled_data.append((data_point, 'NEGATIVE'))
    else:
        review_queue.append(data_point)

Use Cases

Classification Tasks

config = {
    'indicator_a': {'reference': True, 'params': {'importance': 5.0}},
    'metric_b': {'reference': 0.8, 'params': {'importance': 3.0}},
    'count_c': {'reference': 2, 'params': {'importance': 2.5}}
}

Quality Assessment

config = {
    'quality_metric': {'reference': 0.8, 'params': {'importance': 4.0}},
    'completeness': {'reference': 1.0, 'params': {'importance': 3.0}},
    'consistency': {'reference': 0.9, 'params': {'importance': 3.5}}
}

FAQ

How accurate are classifications?
Accuracy depends on heuristic design. Well-designed configs achieve comparable results to human annotators at scale.

Can I use with active learning?
Yes. Use confidence scores to identify uncertain examples for human review.

Difference between modes?

  • Standard: Uses full history for adaptation
  • Adaptive: Prioritizes recent patterns (Enterprise only)

When to invalidate cache?
When data distribution changes significantly (new categories, seasonal shifts, different sources).


⚠️ Important Notes

Client Responsibility:
Clients must validate classifications and compliance. This SDK is a classification support tool and does not make autonomous decisions.

Data Processing:
All logic executes server-side. SDK exposes configuration interface only.

Model Outputs:
ML model inputs are client-provided and client-controlled.


Support

Documentation: https://dsfuptech.cloud
Issues: contacto@dsfuptech.cloud
Licensing: contacto@dsfuptech.cloud


Licensing

License Format:

  • Professional: PRO-YYYY-MM-DD-XXXX-XXXX
  • Enterprise: ENT-YYYY-MM-DD-XXXX-XXXX

Contact: contacto@dsfuptech.cloud


📋 Credits

Technology Architect: Jaime Alexander Jimenez


© 2025 DSF UpTech. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dsf_label_sdk-2.0.0.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dsf_label_sdk-2.0.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file dsf_label_sdk-2.0.0.tar.gz.

File metadata

  • Download URL: dsf_label_sdk-2.0.0.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dsf_label_sdk-2.0.0.tar.gz
Algorithm Hash digest
SHA256 00d35f646619181da449501c9ae9db373bde91d97e718792dc2acac606a605b8
MD5 c0eab42452ab98a26862b94f930ce21c
BLAKE2b-256 b430a7bc8049a6a4a029f6ecc89571114f890ef51483836afbdb425ba77581a4

See more details on using hashes here.

File details

Details for the file dsf_label_sdk-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: dsf_label_sdk-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for dsf_label_sdk-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0f9c698cdcecd01b3ef897382659c8f7410a057e64eae94af63b602faacd5ad8
MD5 a460a982cdcc833c2186bd5d4a76c3ed
BLAKE2b-256 c44ad2d40821b43b71ea73a325e3d239d58be33cc478ac14038b2785052eeb94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page