clanker_score rates search results by keyword density.

These details have not been verified by PyPI

Project links

Home

Project description

% clanker_score

This project contains two Python3* scripts:

ddg_results_by_kwd_density
keyword_density

ddg_results_by_kwd_density

This script ranks DuckDuckGo search results by keyword density.

Nowadays search results are dominated by Web pages showing high keyword density, indicating they are search-engine-optimized or even AI generated. This script reads your search terms, retrieves search results from ddg via frogfind, fetches each page, and computes the keyword density there. Please note that frogfind.com is rate limited.

This script reads your search terms from standard input.

It writes a markdown report on standard output.

keyword_density

This script reads a document from standard input.

It writes a report on standard output, showing keywords and their average density. Documents optimized for search-engine placement show keyword density that is abnormally high (>= 0.02). AI-generated text also exhibits this characteristic. Although AI text is indistinguishable from human text in most regards, it is unlikely that AI will ever be made less wordy.

Installation

Windows

> python -m venv ClankerScore

> ClankerScore\Scripts\activate

> pip install clanker_score --upgrade

Linux

> python -m venv ~/ClankerScore

> source ~/ClankerScore/bin/activate

> pip install clanker_score --upgrade

How to ....

Windows

> ClankerScore\Scripts\activate

> ddg_results_by_kwd_density

Linux

> source ~/ClankerScore/bin/activate

> ddg_results_by_kwd_density

Why ....

Type in your search terms. ddg_results_by_kwd_density doesn't ask.

Press "enter."

ddg_results_by_kwd_density will do a DuckDuckGo search. It will visit each page in the search results so you don't have to. It will present a report of your search results showing the keyword density of each page. This is a clue to how piquant the content of each page is likely to be.

Keywords are not taken from your search terms. Instead they are the seven words most commonly occuring on the page. If these seven words are seen to be repeated on the page to an unusual degree, then it is a good assumption that the page was designed by the author to appear high on search results.

Keyword density is a measure of "gloss." Most people will read pages with high keyword density as unusually glossy. Keyword density is not necessarily related to how genuine the page content appears to be otherwise, but most people will look askance at a page that is too glossy.

It should come as no big surprise that the pages that appear high on search results have been designed that way. They are deliberately glossy with high keyword density. You may consider whether to skip reading them or even loading them in your browser. Chances are good that the glossy pages are mostly advertising.

Generally you will find interspersed in your results a handful of sites with low keyword density. These are likely from universities, government sites, and research institutions that have sources of revenue beyond advertising. You may consider whether to load these up and skim through them. Probably they will show a publication date, author, and list of references, which will move your research forward.

It can be noted that AI-generated sites often exhibit high keyword density. This is probably deliberate so that they garner advertising revenue. However, it may also be due to "bot 'splaning," which is polly-paraphrasing a series of several (perhaps contradictory) articles.

Keyword density is not the only measure of gloss. There are others that have been developed to measure ratios between parts of speech. Unfortunately none of these distinguish sharply between pages that naturally convey genuine information and pages that have been designed to convey fluff for ulterior purposes. It is unlikely that combining measures of gloss will result in a tool that discriminates much better than keyword density by itself.

Piskorski, Jakub, Marcin Sydow, and Weiss Weiss. "Exploring Linguistic Features for Web Spam Detection: A Preliminary Study." Airweb '08: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web. Ed. Carlos Castillo, Kumar Chellapilla, and Dennis Fetterly. New York: ACM, Apr. 2008. 25-28. ISBN:9781605581590. DOI:10.1145/1451983. 09 Nov. 2025 <https://users.pja.edu.pl/~msyd/lingFeat08draft.pdf>.

ddg_results_by_kwd_density is cumbersome by design — too cumbersome to be a daily driver. We don't want to make this too easy for just anyone to censor all his search results. Rather, it is meant as a learning tool. It demonstrates generally how rotten search results can be on one particular and not very compelling dimension. It should not be necessary to download and scan each and every page. You should be able to train yourself to ignore a priori results that include handfuls of pages from unauthoritative sites.

This README file has a keyword density of approximately 0.026.

Project details

These details have not been verified by PyPI

Project links

Home

Release history Release notifications | RSS feed

0.1.5

Nov 10, 2025

0.1.4

Nov 10, 2025

0.1.3

Nov 10, 2025

0.1.1

Nov 10, 2025

This version

0.1

Nov 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clanker_score-0.1.tar.gz (169.4 kB view details)

Uploaded Nov 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clanker_score-0.1-py3-none-any.whl (173.7 kB view details)

Uploaded Nov 9, 2025 Python 3

File details

Details for the file clanker_score-0.1.tar.gz.

File metadata

Download URL: clanker_score-0.1.tar.gz
Upload date: Nov 9, 2025
Size: 169.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.32.5

File hashes

Hashes for clanker_score-0.1.tar.gz
Algorithm	Hash digest
SHA256	`22779e08dc4dbeb1e52636796fafa431685dd11c825f0e754959a877193d2df3`
MD5	`7105e1376cd740fb6e0c94550f7788f0`
BLAKE2b-256	`fa38a694a9a19611b201a3f020d5736611d950fec7caecde61dc5633b55810c3`

See more details on using hashes here.

File details

Details for the file clanker_score-0.1-py3-none-any.whl.

File metadata

Download URL: clanker_score-0.1-py3-none-any.whl
Upload date: Nov 9, 2025
Size: 173.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-requests/2.32.5

File hashes

Hashes for clanker_score-0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1a634399151967ed3ea3994fcee9558f0de429113552bc92a7e8b8171e44ddbe`
MD5	`4300bab78d9a4689ab2e4fd8ec5c28a3`
BLAKE2b-256	`9930ea00d4399b95d49af45a8ec60fba9fa5b1dd872c9244ae91c7de0bd50272`

See more details on using hashes here.

clanker_score 0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ddg_results_by_kwd_density

keyword_density

Installation

Windows

Linux

How to ....

Windows

Linux

Why ....

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes