Skip to main content

Calculate the CF3 hashes for an html page

Project description

CF3

Fingerprinting censors, one blockpage at a time.

What

This tool attempts to extract unique features in blockpages in a compact way.

❯ for f in corpus/*; do cf3 $f hash; done > hashes
❯ wc -l hashes
136 hashes
❯ uniq hashes | wc -l
135
# almost! but there are two blockpages that are essentially the same :)

Install

pip3 install cf3

Hash

curl -L --silent https://example.com | cf3

Verbose

❯ cf3 corpus/prod_comodo_securedns_warning.html
title size: 17
meta: 2
script: 2
head size: 2048
body size: 1024
total size: 4096
tag vector summary: 88
tag vector: html,head,title,link,style,meta,meta,body,div,img,div,img,div,button,div,div,h1,h2,p,br,ul,li,a,img,br,br,p,a,div,div,p,script,script

CF3: 17-2-2-33-88-2048-1024-4096
md5: 12c27a55433b1813c02a8a92dd4b3bff

Dynamic content

The algorithm tries to be invariant under pages that share a well-defined structure but for which dynamic content, js nonces and other quirks result in highly variable content. YMMV.

❯ for i in {1..10}; do curl -L --silent https://youtube.com | cf3; done
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56

Notes

Under development! The fingerprinting algorithm might change.

License

This code is deposited in the public domain.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cf3-0.1.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

cf3-0.1.2-py3-none-any.whl (3.7 kB view details)

Uploaded Python 3

File details

Details for the file cf3-0.1.2.tar.gz.

File metadata

  • Download URL: cf3-0.1.2.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0 CPython/3.10.4 Linux/5.19.0-76051900-generic

File hashes

Hashes for cf3-0.1.2.tar.gz
Algorithm Hash digest
SHA256 62952875fffd2972cf3a094897701291e30963c4416c622a02639e91f13ed036
MD5 b5186176590e86efa2312751d29b2071
BLAKE2b-256 56bc4822a0ef8d74088ddf9d95f945ee47e22bfa5cb775770fac3ea18bbb7580

See more details on using hashes here.

File details

Details for the file cf3-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: cf3-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 3.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0 CPython/3.10.4 Linux/5.19.0-76051900-generic

File hashes

Hashes for cf3-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fb4f9f0479d20f88317ea42d2d5d7aa30357969de63e30ce8a44990306c6be03
MD5 d7728a092c15863629bf0f80b00e9a12
BLAKE2b-256 38a07539c491bdb46dc769ac0cb0e9f1ad0ea9ed6b46873bee72feea0b9420c9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page