Skip to main content

Machine learning utilities by DEJAN.

Project description

Dejan: SEO Machine Learning Utilities

Dejan is a growing collection of SEO-related machine learning utilities designed to assist with various tasks in the field of search engine optimization. This repository will be continuously updated with new tools and features aimed at helping SEO professionals streamline their workflows using advanced ML techniques.

Installation

You can install the package using pip:

pip install dejan

Current Utilities

roo

Purpose: Fetches and processes data from the Algoroo API, providing insights into search engine fluctuations.

Search Engine Options:

  • 2: Google.com (Desktop)
  • 3: Google.com.au (Desktop)
  • 4: Google.com (Mobile)
  • 5: Google.com.au (Mobile)

Output: The data can be returned either as a raw JSON object or as a pandas DataFrame for further analysis.

Example Usage:

from dejan import roo

def main():
    # Mapping of search engines to their corresponding identifiers
    search_engines = {
        2: "google.com/desktop",
        3: "google.com.au/desktop",
        4: "google.com/mobile",
        5: "google.com.au/mobile"
    }
    
    # Choose the search engine by setting the appropriate identifier
    search_engine = 2  # Change this number to select a different search engine:
                       # 2: google.com/desktop
                       # 3: google.com.au/desktop
                       # 4: google.com/mobile
                       # 5: google.com.au/mobile
    
    # Fetch data as a pandas DataFrame
    roo_data = roo.get_roo(search_engine, as_dataframe=True)
    
    # Display the first few rows of the DataFrame
    print(f"Data for search engine {search_engine} ({search_engines[search_engine]}):")
    print(roo_data.head())

if __name__ == "__main__":
    main()

linkbert

Purpose: Uses the LinkBERT model to predict link tokens in the provided text, useful for analyzing link placement within content.

Grouping Modes:

  • subtoken: Returns individual subword tokens classified as links.
  • token: Merges any subtokens into whole tokens (words).
  • phrase: Groups predictions into phrases, treating the entire phrase as a link if any part of it is classified as a link.

Example Usage:

from dejan import linkbert

def main():
    # Initialize the LinkBERTInference model
    model = linkbert.LinkBERTInference()

    # Sample text for prediction
    text = "LinkBERT is a model developed by Dejan Marketing designed to predict natural link placement within web content."

    print("Input Text:")
    print(text)
    print("-" * 50)

    # Group by subtoken
    links_subtoken = model.predict_link_tokens(text, group="subtoken")
    print(f"Predicted link tokens (subtoken): {links_subtoken}")

    # Group by token
    links_token = model.predict_link_tokens(text, group="token")
    print(f"Predicted link tokens (token): {links_token}")

    # Group by phrase
    links_phrase = model.predict_link_tokens(text, group="phrase")
    print(f"Predicted link tokens (phrase): {links_phrase}")

if __name__ == "__main__":
    main()

turboquant

Purpose: Compresses dense vector embeddings to 3.5-15.5x smaller using hybrid residual quantization, while preserving 82-99.8% retrieval recall. Based on Google's TurboQuant (arXiv:2504.19874).

Presets:

Preset Method Recall Compression
quality N4+N4+TQ1 ~0.998 3.5x
balanced N4+TQ1 ~0.963 6.4x
compact TQ1+TQ1 ~0.820 15.5x

Example Usage:

from dejan import turboquant

# Compress
tq = turboquant.TurboQuant(preset="balanced")
compressed = tq.compress(embeddings)  # (n, d) float32 numpy array
tq.save(compressed, "corpus.tq")

# Load and search
tq, compressed = turboquant.TurboQuant.load("corpus.tq")
indices, scores = tq.search(queries, compressed, k=10)

# Perfect recall with rescore (requires original fp32 embeddings)
indices, scores = tq.search(queries, compressed, k=10,
                             rescore_from=embeddings, rescore_k=20)

CLI Usage:

dejan turboquant embeddings.npy
dejan turboquant embeddings.csv --preset quality
dejan turboquant embeddings.npy -o corpus.tq --preset compact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dejan-1.5.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dejan-1.5-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file dejan-1.5.tar.gz.

File metadata

  • Download URL: dejan-1.5.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for dejan-1.5.tar.gz
Algorithm Hash digest
SHA256 2a40e10a673cb83615e4ba5be83da544e30d6edea37a4689905dc1b2fa7fb46a
MD5 f89ea37e46a6a329e78e8754614e50fd
BLAKE2b-256 a722b125b79b43bff62a3e4e101f39dcf80403b090d309210a073581625903a5

See more details on using hashes here.

File details

Details for the file dejan-1.5-py3-none-any.whl.

File metadata

  • Download URL: dejan-1.5-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for dejan-1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b057c1be05ba16b04d43246b760ac58fe3695a41778b45b779a4bf6422de6c57
MD5 1f5f57c3d33e642c20a12bb220a7570b
BLAKE2b-256 139ecd21e0b886093587b6fd8533800f465029bb5412afba512adf548d73e11b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page