A Python package that simplifies the process of building predictive and non-predictive lead scoring models.
Project description
Generic Lead Scoring Model
About
glsm
is a user-friendly Python package that simplifies the process of building lead scoring models. It supports both predictive and non-predictive models, providing flexibility and ease of use.
The goal of the lead score is to provide a qualification metric for comparison between leads. It is based on product interest and company interaction data.
Predictive Model
Soon!
Non-predictive Model
Name | Description | Variable |
---|---|---|
Weight | Feature weight that represents the relative importance of each feature | $$w$$ |
Points | Assigned points of each feature | $$p$$ |
Normalized weight | Weights unit vector normalization | $${\hat{w}} = \frac{w_n}{\sqrt{\sum\limits^{n}_{i=0}w_i^2}}$$ |
Lead score | A weighted sum of assigned points for each feature, where the feature weights are normalized to form a unit vector. | $$\lambda = \sum_{i=1}^n {\hat{w}_i^2}{p_i}$$ |
Index
Disclaimer
This library is in active development. Suggestions and contributions are welcome.
Installation
Requirements
- pydantic
Can be installed using pip:
pip install glsm
Theory
There are two ways for you to understand the proposed models:
- Through the Google Sheets simulator (Non-predictive only)
- Reading the following sections
Predictive Model
Soon!
Non-predictive Model
This model and the following set of rules are a suggestion to get you started. You can use them as a starting point and adapt them to your business needs. The library is flexible enough to allow you to use your own assumptions and rules.
The non-predictive model has the following characteristics:
- It avoids the use of predictive algorithms, which can be data-intensive and require significant computational power and technical expertise to operate effectively.
- It uses relative feature weights, meaning that the inclusion of a new feature won't change the weights of the existing ones. This can simplify the implementation and interpretation of the model.
- It provides a score that ranges from 0 to 100 points, with 50 being the minimum threshold for lead qualification. The score reflects the current objectives and scenarios of the company, allowing for comparisons of lead performance over time.
Weight (${w}$):
Feature weight is a value that multiplies the points assigned to each feature. It is used to differentiate the importance of each feature.
You can make it easier to understand by thinking of it as a multiplier. The higher the weight, the more important the feature is. You can use any range of values (due to the unit vector normalization), but it is easier to interpret if the weights are between 0 and 1.
Suppose you choose to use values from 0 to 1. Your most important feature will have a weight of 1. Other features should have a weight less than 1.
Normalized Weight (${\hat{w}}$):
The model needs to be flexible and adaptable to accommodate changes in the business environment. Sometime after the model is built, the company may change its focus or process. In this case, features may need to be added or removed.
The normalized weight is a unit vector that is used to scale data in a way that preserves the relative relationships between the features when new features are added.
The basic idea is to transform the data such that the magnitude of the vector formed by the features is equal to 1. This ensures that each feature is scaled proportionally to the others, so the relative relationships between them is preserved when new features are added.
You may be asking yourself, why not just recalculate the weights after adding or removing a feature? Well, this may work if you have the original data and just want to make a report out of it, but once you calculate the lead score and send the conversion events to plaftorms such as Google Analytics, Facebook Ads, etc, the scores registered in those platforms can't be changed. Later on you may want to create audiences based on the lead score, but you won't be able to do that if the scoring model has changed. The normalized weight vector solves this problem.
Unit vector nomalization:
$$ \hat{w_n} = \frac{w_n}{|w|} $$
Feature weights vector magnitude:
$$ |w| = \sqrt{\sum\limits^{n}_{i=0}w_i^2} $$
Normalized weight vector:
$$ \hat{w_n} = \frac{w_n}{\sqrt{\sum\limits^{n}_{i=0}w_i^2}} $$
In this way the sum of the squares of the normalized weights is equal to 1:
$$ \sum\limits^{n}_{i=0}{\hat{w}_i^2} = 1 $$
Points ($p$):
Assigned points per feature.The score assigned to each option of a feature. 50 shloud assigned to the option that represents your ICP.
These numbers are only a suggestion. You can use any range of values, but it is easier to interpret if the points are $0 \leq p \geq 100$ and 50 is the ICP.
Lead Score ($\lambda$):
Lead score is the sum of squares the normalized weights of each feature multiplied by the points assined to each feature.
$$ \lambda = \sum_{i=1}^n {\hat{w}_i^2}{p_i} = ({\hat{w}_1^2}{p_1})+({\hat{w}_2^2}{p_2})+({\hat{w}_3^2}{p_3})...({\hat{w}_n^2}{p_n}) $$
Features ($f_n$)
Features are a set of characteristics assigned to each lead. If you have difficulties finding out which features to add, start by adding relevant lead form or CRM fields as features.
Each feature has points associated with it, which are assigned to each option of the feature. The points assigned to each option are relative to the minimum viable option for the lead to be considered qualified (50 points).
You should first define the features and their options, then assign 50 points to the minimum viable option for the lead to be considered qualified. The remaining points should be distributed among the other options in a way that reflects the relative importance of each option.
In this way if $\lambda \geq 50$ the lead is considered qualified.
Remember that this is a suggestion, you can assign the points as you see fit and as your business requires. You may want to use negative points to penalize leads that do not meet certain criteria for example. It is generally easier to work with positive points, but it is up to you.
Example:
Monthly Website Users | Points |
---|---|
Up to 50k | 30 |
50k - 100k | 50 (ICP) |
100k - 200k | 80 |
More than 200k | 100 |
Usage
Predictive Model
Soon!
Non-predictive Model
In the examples folder you can find a Jupyter Notebook with a step-by-step guide on how to use the library. You may also want to check the Google Sheets simulator (Non-predictive only)
Importing the library
from glsm.non_predictive import NonPredictive
from glsm.features import Feature
Instantiating the model and adding features
model = NonPredictive()
feature_a = Feature(
name="Monthly Users",
weight=0.5,
points_map=[
("Up to 50K",00),
("50K - 100K",50),
("100K - 200K",70),
("More than 200K",100),
]
)
feature_b = Feature(
name="Industry",
weight=0.25,
points_map=[
("Technology",70),
("Real State",20),
("Retail",50),
("Education",50),
("Health",100),
]
)
model.add_features([feature_a, feature_b])
Importing lead data
From a dictionary
lead = { # lambda = 81.43
"Monthly Users": "50K - 100K",
"Industry": "Technology",
"Mkt Investment": "$300K - $400K",
}
From a csv
import csv
with open('leads.csv', 'r') as file:
if file.read(3) == b'\xef\xbb\xbf':
file.seek(3)
csv_reader = csv.reader(file)
headers = next(csv_reader)
# New csv file adding lambda values
with open('leads_with_lambda.csv', 'w', newline='') as new_file:
csv_writer = csv.writer(new_file)
csv_writer.writerow(headers + ['lambda'])
for row in csv_reader:
lead = dict(zip(headers, row))
lambda_value = model.compute_lambda(lead)
csv_writer.writerow(row + [lambda_value])
Calculating the lead score
lambda_value = model.compute_lambda(lead)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file glsm-0.1.0.tar.gz
.
File metadata
- Download URL: glsm-0.1.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1d39f1426fbc51ec56eaa682e18cf4341d3ecbb4fb45014fd8cef343c16f31c3 |
|
MD5 | cc7cff6b02b37ce02398c169364c12e2 |
|
BLAKE2b-256 | 12e7825ed680de5559da3a98c3dc1cc2a749c8dd592e24e7ce824120081bd492 |
File details
Details for the file glsm-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: glsm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d74993d872d5bf27565dee77f473b38df0f13e33355fa5bf86eb793fe6f6b4a6 |
|
MD5 | 479e099ecc0fd3a9e09fdafcad9c71ed |
|
BLAKE2b-256 | e68f1833f1a124c87442561a6045112f81df4f10400e44ae9861c4f94239b2f8 |