Skip to main content

Machine learning utilities by DEJAN.

Project description

Dejan: SEO Machine Learning Utilities

Dejan is a growing collection of SEO-related machine learning utilities designed to assist with various tasks in the field of search engine optimization. This repository will be continuously updated with new tools and features aimed at helping SEO professionals streamline their workflows using advanced ML techniques.

Installation

You can install the package using pip:

pip install dejan

Current Utilities

roo

Purpose: Fetches and processes data from the Algoroo API, providing insights into search engine fluctuations.

Search Engine Options:

  • 2: Google.com (Desktop)
  • 3: Google.com.au (Desktop)
  • 4: Google.com (Mobile)
  • 5: Google.com.au (Mobile)

Output: The data can be returned either as a raw JSON object or as a pandas DataFrame for further analysis.

Example Usage:

from dejan import roo

def main():
    # Mapping of search engines to their corresponding identifiers
    search_engines = {
        2: "google.com/desktop",
        3: "google.com.au/desktop",
        4: "google.com/mobile",
        5: "google.com.au/mobile"
    }
    
    # Choose the search engine by setting the appropriate identifier
    search_engine = 2  # Change this number to select a different search engine:
                       # 2: google.com/desktop
                       # 3: google.com.au/desktop
                       # 4: google.com/mobile
                       # 5: google.com.au/mobile
    
    # Fetch data as a pandas DataFrame
    roo_data = roo.get_roo(search_engine, as_dataframe=True)
    
    # Display the first few rows of the DataFrame
    print(f"Data for search engine {search_engine} ({search_engines[search_engine]}):")
    print(roo_data.head())

if __name__ == "__main__":
    main()

linkbert

Purpose: Uses the LinkBERT model to predict link tokens in the provided text, useful for analyzing link placement within content.

Grouping Modes:

  • subtoken: Returns individual subword tokens classified as links.
  • token: Merges any subtokens into whole tokens (words).
  • phrase: Groups predictions into phrases, treating the entire phrase as a link if any part of it is classified as a link.

Example Usage:

from dejan import linkbert

def main():
    # Initialize the LinkBERTInference model
    model = linkbert.LinkBERTInference()

    # Sample text for prediction
    text = "LinkBERT is a model developed by Dejan Marketing designed to predict natural link placement within web content."

    print("Input Text:")
    print(text)
    print("-" * 50)

    # Group by subtoken
    links_subtoken = model.predict_link_tokens(text, group="subtoken")
    print(f"Predicted link tokens (subtoken): {links_subtoken}")

    # Group by token
    links_token = model.predict_link_tokens(text, group="token")
    print(f"Predicted link tokens (token): {links_token}")

    # Group by phrase
    links_phrase = model.predict_link_tokens(text, group="phrase")
    print(f"Predicted link tokens (phrase): {links_phrase}")

if __name__ == "__main__":
    main()

turboquant

Purpose: Compresses dense vector embeddings to 3.5-15.5x smaller using hybrid residual quantization, while preserving 82-99.8% retrieval recall. Based on Google's TurboQuant (arXiv:2504.19874).

Presets:

Preset Method Recall Compression
quality N4+N4+TQ1 ~0.998 3.5x
balanced N4+TQ1 ~0.963 6.4x
compact TQ1+TQ1 ~0.820 15.5x

Example Usage:

from dejan import turboquant

# Compress
tq = turboquant.TurboQuant(preset="balanced")
compressed = tq.compress(embeddings)  # (n, d) float32 numpy array
tq.save(compressed, "corpus.tq")

# Load and search
tq, compressed = turboquant.TurboQuant.load("corpus.tq")
indices, scores = tq.search(queries, compressed, k=10)

# Perfect recall with rescore (requires original fp32 embeddings)
indices, scores = tq.search(queries, compressed, k=10,
                             rescore_from=embeddings, rescore_k=20)

CLI Usage:

dejan turboquant embeddings.npy
dejan turboquant embeddings.csv --preset quality
dejan turboquant embeddings.npy -o corpus.tq --preset compact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dejan-1.4-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file dejan-1.4-py3-none-any.whl.

File metadata

  • Download URL: dejan-1.4-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for dejan-1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 06423389a5e6b7af842c25a75f180665ce468d0abaaf2901d7dd3bc2c4cd66bf
MD5 f145f9213cd759c63fdec24ded51e00b
BLAKE2b-256 a9bd9d0f8986e50e955872d23b5bdb5f446b28fb72132b86d54621ba762aac1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page