A tool for creating heuristic and ML-based importance scores.
Project description
ImportanceScore
ImportanceScore is a configurable ML based tool suite designed to create a meaningful importance score by applying either supervised machine learning or explicit, rule-based logic. ImportanceScore externalizes all configuration to ensure your scoring process is automated, repeatable, and scalable.
Note: for a detailed usage guide for the GUI, see the Usage Guide.
Key Benefits
- Reproducible Pipeline: A config driven system and a
tune -> train -> predictworkflow ensure every run is repeatable. The system is designed for version control (e.g., Git), allowing you to archive configurations and model artifacts together for long-term reproducibility. - Prescriptive Directories and File Names To ensure clarity and reproducibility, ImportanceScore uses a standardized directory structure and file naming convention. This design allows you to define the logic for a category once and apply it to any number of different data segments.
- Transparent & Tunable: The system is built for the iterative loop of "score -> explain -> tune." Detailed logging, feature contribution
reports (
--explain), and a fully configuration-driven design allow you to build trust in your model and refine its logic with precision. A GUI is provided to make this process quick and easy. - Drop-in Models: Because of its structured nature you can very easily switch between models. This includes the ability to start with the rule-based Weighted Linear Model, use it to create training data, and then switch to the more powerful Random Forest Regressor.
- Scoring-Specific Features: The tool includes a powerful preprocessing pipeline with features specifically tailored for creating importance scores:
text_weight_scoring: Assign bonus points based on keywords.feature_interactions: Combine related features (e.g.,historic,heritage) to prevent double-counting.clip_outliers: Cap feature values at absolute thresholds based on domain knowledge.
- Regional Context Scoring: The system allows you to score distinct geographic regions independently while using a shared logic configuration. This is critical for highlighting locally significant
- features that would otherwise be overshadowed in a global ranking.
- Example: Mt. Mitchell (2,037m) is the towering giant of the Appalachians and a major landmark. However, if scored directly against the 4,000m peaks of the Rockies, it would appear
- insignificant. By scoring regions separately (e.g.,
East_Peaks,West_Peaks), the system correctly identifies Mt. Mitchell as a "Tier 1" feature within its context, ensuring it appears - prominently on the map.
- Requirement: To use this feature, you simply provide separate input files for each region (e.g.,
peaks_east.csv,peaks_west.csv) and run the scoring pipeline for each file individually.
Directory Structure
This system uses
two key organizing concepts: category and segment.
category: A reusable blueprint for a type of data (e.g.,peaks,poi).segment: A specific subset of data being processed (e.g.,uswest,yellowstone).
The project layout separates reusable configurations from segment-specific data:
config/: (Category-centric) Contains all reusable YAML configuration files. These are named bycategory(e.g.,peaks_model.yml).models/: (Category-centric) Stores the final trained.joblibmodel artifacts, which are also named bycategory.data/: (Segment-centric) Holds all data files, which are almost always specific to asegment.data/raw/: Input feature and target files.data/interim/: Intermediate outputs, such as scored files.logs/: (Segment-centric) Contains detailed output and explanation files from specific runs.
File Naming Convention
File names are designed to be self-describing:
- Configuration Files: Are always named for the
categorythey configure. config/peaks_model.ymlconfig/poi_classification.yml- Data and Log Files: Must be prefixed with their
segmentandcategory. data/raw/uswest_peaks_features.csvdata/interim/yellowstone_poi_score.csvlogs/yellowstone_poi_explain.csv
Weighted Linear Model (WLM)
This suite provides a WeightedLinearModel, a sci-kit compatible rule-based model. The final score is calculated
as: score = intercept + Σ(contribution_of_each_feature).
The contribution from each feature is determined by its configured mode:
presence: If the feature is present, add thecoefficientvalue.value: Multiply the feature's value by thecoefficient.base_multiplier: If the feature is present, multiply thebase_score_column's value by thecoefficient.
For a detailed guide, see the Weighted Linear Model Readme.
Advanced Workflow: Bootstrapping a Model
The suite is uniquely designed to solve the "cold start" problem where no labeled data exists. You can bootstrap a powerful supervised model from your own expertise.
- Encode Expertise: Manually define your heuristic rules in the configuration for the Weighted Linear Model (WLM).
- Generate Weak Labels: Run the WLM to produce an initial ranked list.
- Curate a Training Set: Hand-pick a small, diverse subset of these scored items and adjust their scores to create a high-quality "gold" training set.
- Switch to Supervised Learning: Change a single line in the model configuration (
model: WLM->model: RFR) and run thetrainandtunesteps to create aRandomForestRegressorthat learns the nuanced patterns from your curated labels. All data extraction and cleanup for the WLM model will continue to be used for RFR.
This process combines the best of both worlds: it starts with your domain knowledge and uses machine learning to scale and refine it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file importancescore-1.1.2.tar.gz.
File metadata
- Download URL: importancescore-1.1.2.tar.gz
- Upload date:
- Size: 42.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de57241db4ccffb161ea7032002fef53f12a3bb5e940ae719d2d6510faf42ac8
|
|
| MD5 |
7f8992cce06f98170c2c35738f2828bf
|
|
| BLAKE2b-256 |
327450438edfb7ee4334af225d9ab9f47882cf6f71e1442f47ba47a7df3b6cb6
|
File details
Details for the file importancescore-1.1.2-py3-none-any.whl.
File metadata
- Download URL: importancescore-1.1.2-py3-none-any.whl
- Upload date:
- Size: 50.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e7614dbc68efe2e73168f65028dcc236d01ec13a63673fa0390d53dfb0bf97d
|
|
| MD5 |
0804b08d02a498effd9388cf5ba60267
|
|
| BLAKE2b-256 |
9f65df1c02f63595d98bf92ddc97a60a40fcd335ae593b30c8b01c18d07383ba
|