Machine learning utilities by DEJAN.
Project description
Dejan: SEO Machine Learning Utilities
Dejan is a growing collection of SEO-related machine learning utilities designed to assist with various tasks in the field of search engine optimization. This repository will be continuously updated with new tools and features aimed at helping SEO professionals streamline their workflows using advanced ML techniques.
Installation
You can install the package using pip:
pip install dejan
Current Utilities
roo
Purpose: Fetches and processes data from the Algoroo API, providing insights into search engine fluctuations.
Search Engine Options:
- 2: Google.com (Desktop)
- 3: Google.com.au (Desktop)
- 4: Google.com (Mobile)
- 5: Google.com.au (Mobile)
Output: The data can be returned either as a raw JSON object or as a pandas DataFrame for further analysis.
Example Usage:
from dejan import roo
def main():
# Mapping of search engines to their corresponding identifiers
search_engines = {
2: "google.com/desktop",
3: "google.com.au/desktop",
4: "google.com/mobile",
5: "google.com.au/mobile"
}
# Choose the search engine by setting the appropriate identifier
search_engine = 2 # Change this number to select a different search engine:
# 2: google.com/desktop
# 3: google.com.au/desktop
# 4: google.com/mobile
# 5: google.com.au/mobile
# Fetch data as a pandas DataFrame
roo_data = roo.get_roo(search_engine, as_dataframe=True)
# Display the first few rows of the DataFrame
print(f"Data for search engine {search_engine} ({search_engines[search_engine]}):")
print(roo_data.head())
if __name__ == "__main__":
main()
linkbert
Purpose: Uses the LinkBERT model to predict link tokens in the provided text, useful for analyzing link placement within content.
Grouping Modes:
subtoken: Returns individual subword tokens classified as links.token: Merges any subtokens into whole tokens (words).phrase: Groups predictions into phrases, treating the entire phrase as a link if any part of it is classified as a link.
Example Usage:
from dejan import linkbert
def main():
# Initialize the LinkBERTInference model
model = linkbert.LinkBERTInference()
# Sample text for prediction
text = "LinkBERT is a model developed by Dejan Marketing designed to predict natural link placement within web content."
print("Input Text:")
print(text)
print("-" * 50)
# Group by subtoken
links_subtoken = model.predict_link_tokens(text, group="subtoken")
print(f"Predicted link tokens (subtoken): {links_subtoken}")
# Group by token
links_token = model.predict_link_tokens(text, group="token")
print(f"Predicted link tokens (token): {links_token}")
# Group by phrase
links_phrase = model.predict_link_tokens(text, group="phrase")
print(f"Predicted link tokens (phrase): {links_phrase}")
if __name__ == "__main__":
main()
turboquant
Purpose: Compresses dense vector embeddings to 3.5-15.5x smaller using hybrid residual quantization, while preserving 82-99.8% retrieval recall. Based on Google's TurboQuant (arXiv:2504.19874).
Presets:
| Preset | Method | Recall | Compression |
|---|---|---|---|
| quality | N4+N4+TQ1 | ~0.998 | 3.5x |
| balanced | N4+TQ1 | ~0.963 | 6.4x |
| compact | TQ1+TQ1 | ~0.820 | 15.5x |
Example Usage:
from dejan import turboquant
# Compress
tq = turboquant.TurboQuant(preset="balanced")
compressed = tq.compress(embeddings) # (n, d) float32 numpy array
tq.save(compressed, "corpus.tq")
# Load and search
tq, compressed = turboquant.TurboQuant.load("corpus.tq")
indices, scores = tq.search(queries, compressed, k=10)
# Perfect recall with rescore (requires original fp32 embeddings)
indices, scores = tq.search(queries, compressed, k=10,
rescore_from=embeddings, rescore_k=20)
CLI Usage:
dejan turboquant embeddings.npy
dejan turboquant embeddings.csv --preset quality
dejan turboquant embeddings.npy -o corpus.tq --preset compact
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dejan-1.4-py3-none-any.whl.
File metadata
- Download URL: dejan-1.4-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06423389a5e6b7af842c25a75f180665ce468d0abaaf2901d7dd3bc2c4cd66bf
|
|
| MD5 |
f145f9213cd759c63fdec24ded51e00b
|
|
| BLAKE2b-256 |
a9bd9d0f8986e50e955872d23b5bdb5f446b28fb72132b86d54621ba762aac1d
|