Skip to main content

clanker_score rates search results by keyword density.

Project description

% clanker_score

This project contains two Python3* scripts:

  • ddg_results_by_kwd_density
  • keyword_density

ddg_results_by_kwd_density

This script ranks DuckDuckGo search results by keyword density.

Nowadays search results are dominated by Web pages showing high keyword density, indicating they are search-engine-optimized or even AI generated. This script reads your search terms, retrieves search results from ddg via frogfind, fetches each page, and computes the keyword density there. Please note that frogfind.com is rate limited.

This script reads your search terms from standard input.

It writes a markdown report on standard output.

keyword_density

This script reads a document from standard input.

It writes a report on standard output, showing keywords and their average density. Documents optimized for search-engine placement show keyword density that is abnormally high (>= 0.02). AI-generated text also exhibits this characteristic. Although AI text is indistinguishable from human text in most regards, it is unlikely that AI will ever be made less wordy.

Installation

Windows

> python -m venv ClankerScore

> ClankerScore\Scripts\activate

> pip install clanker_score --upgrade

Linux

> python -m venv ~/ClankerScore

> source ~/ClankerScore/bin/activate

> pip install clanker_score --upgrade 

How to ....

Windows

> ClankerScore\Scripts\activate

> ddg_results_by_kwd_density

Linux

> source ~/ClankerScore/bin/activate

> ddg_results_by_kwd_density

Why ....

Type in your search terms. ddg_results_by_kwd_density doesn't ask.

Press "enter."

ddg_results_by_kwd_density will do a DuckDuckGo search. It will visit each page in the search results so you don't have to. It will present a report of your search results showing the keyword density of each page. This is a clue to how piquant the content of each page is likely to be.

Keywords are not taken from your search terms. Instead they are the seven words most commonly occuring on the page. If these seven words are seen to be repeated on the page to an unusual degree, then it is a good assumption that the page was designed by the author to appear high on search results.

Keyword density is a measure of "gloss." Most people will read pages with high keyword density as unusually glossy. Keyword density is not necessarily related to how genuine the page content appears to be otherwise, but most people will look askance at a page that is too glossy.

It should come as no big surprise that the pages that appear high on search results have been designed that way. They are deliberately glossy with high keyword density. You may consider whether to skip reading them or even loading them in your browser. Chances are good that the glossy pages are mostly advertising.

Generally you will find interspersed in your results a handful of sites with low keyword density. These are likely from universities, government sites, and research institutions that have sources of revenue beyond advertising. You may consider whether to load these up and skim through them. Probably they will show a publication date, author, and list of references, which will move your research forward.

It can be noted that AI-generated sites often exhibit high keyword density. This is probably deliberate so that they garner advertising revenue. However, it may also be due to "bot 'splaning," which is polly-paraphrasing a series of several (perhaps contradictory) articles.

Keyword density is not the only measure of gloss. There are others that have been developed to measure ratios between parts of speech. Unfortunately none of these distinguish sharply between pages that naturally convey genuine information and pages that have been designed to convey fluff for ulterior purposes. It is unlikely that combining measures of gloss will result in a tool that discriminates much better than keyword density by itself.

  • Piskorski, Jakub, Marcin Sydow, and Weiss Weiss. "Exploring Linguistic Features for Web Spam Detection: A Preliminary Study." Airweb '08: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. Ed. Carlos Castillo, Kumar Chellapilla, and Dennis Fetterly. New York: ACM, Apr. 2008. 25-28. ISBN:9781605581590. DOI:10.1145/1451983. 09 Nov. 2025 <https://users.pja.edu.pl/~msyd/lingFeat08draft.pdf>.

ddg_results_by_kwd_density is cumbersome by design — too cumbersome to be a daily driver. We don't want to make this too easy for just anyone to censor all his search results. Rather, it is meant as a learning tool. It demonstrates generally how rotten search results can be on one particular and not very compelling dimension. It should not be necessary to download and scan each and every page. You should be able to train yourself to ignore a priori results that include handfuls of pages from unauthoritative sites.

This README file has a keyword density of approximately 0.026.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clanker_score-0.1.3.tar.gz (169.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clanker_score-0.1.3-py3-none-any.whl (173.8 kB view details)

Uploaded Python 3

File details

Details for the file clanker_score-0.1.3.tar.gz.

File metadata

  • Download URL: clanker_score-0.1.3.tar.gz
  • Upload date:
  • Size: 169.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for clanker_score-0.1.3.tar.gz
Algorithm Hash digest
SHA256 463ac3dc45e317a4788806c62c3a48da9682734c3a238832a13c9bceb55cacf4
MD5 a138dea3267e7947c2074e8778ff0a87
BLAKE2b-256 d9c062bdd8fa37a3f77563052ab43dc64571a5e351dce602dfa0974bb454510d

See more details on using hashes here.

File details

Details for the file clanker_score-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: clanker_score-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 173.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.5

File hashes

Hashes for clanker_score-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7b724a9de3c90bb327861f5f749879b57e89b863275e1d9f4aafc5ff3957297e
MD5 77de71e9f7fe391e0f8f552c025e719c
BLAKE2b-256 900f79d87b36d5d0c4a88471db34e66218860a5dd35c9a5f786056e437aa29cd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page