Calculate the CF3 hashes for an html page
Project description
CF3
Fingerprinting censors, one blockpage at a time.
What
This tool attempts to extract unique features in blockpages in a compact way.
❯ for f in corpus/*; do cf3 $f hash; done > hashes
❯ wc -l hashes
136 hashes
❯ uniq hashes | wc -l
135
# almost! but there are two blockpages that are essentially the same :)
Install
pip3 install cf3
Hash
curl -L --silent https://example.com | cf3
Verbose
❯ cf3 corpus/prod_comodo_securedns_warning.html
title size: 17
meta: 2
script: 2
head size: 2048
body size: 1024
total size: 4096
tag vector summary: 88
tag vector: html,head,title,link,style,meta,meta,body,div,img,div,img,div,button,div,div,h1,h2,p,br,ul,li,a,img,br,br,p,a,div,div,p,script,script
CF3: 17-2-2-33-88-2048-1024-4096
md5: 12c27a55433b1813c02a8a92dd4b3bff
Dynamic content
The algorithm tries to be invariant under pages that share a well-defined structure but for which dynamic content, js nonces and other quirks result in highly variable content. YMMV.
❯ for i in {1..10}; do curl -L --silent https://youtube.com | cf3; done
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
9a0edf442a37fbb0fb6e28a122d33e56
Notes
Under development! The fingerprinting algorithm might change.
License
This code is deposited in the public domain.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cf3-0.1.2.tar.gz
(3.6 kB
view details)
Built Distribution
cf3-0.1.2-py3-none-any.whl
(3.7 kB
view details)
File details
Details for the file cf3-0.1.2.tar.gz
.
File metadata
- Download URL: cf3-0.1.2.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.0 CPython/3.10.4 Linux/5.19.0-76051900-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 62952875fffd2972cf3a094897701291e30963c4416c622a02639e91f13ed036 |
|
MD5 | b5186176590e86efa2312751d29b2071 |
|
BLAKE2b-256 | 56bc4822a0ef8d74088ddf9d95f945ee47e22bfa5cb775770fac3ea18bbb7580 |
File details
Details for the file cf3-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: cf3-0.1.2-py3-none-any.whl
- Upload date:
- Size: 3.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.0 CPython/3.10.4 Linux/5.19.0-76051900-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb4f9f0479d20f88317ea42d2d5d7aa30357969de63e30ce8a44990306c6be03 |
|
MD5 | d7728a092c15863629bf0f80b00e9a12 |
|
BLAKE2b-256 | 38a07539c491bdb46dc769ac0cb0e9f1ad0ea9ed6b46873bee72feea0b9420c9 |