Skip to main content

Crawl telegra.ph for nude pictures and videos

Project description

nude crawler

Nude crawler crawls all pages on telegra.ph for today and N past days for specific words, checks number of nude, non-nude images, videos (not analysed) and reports pages which looks interesting (e.g. has more then 10 nude images, or has one video)

Install

pip3 install nudecrawler

alternatively, install right from git repo:

pip3 install git+https://github.com/yaroslaff/nudecrawler

start adult-image-detector

If you want nudity detection, we use optional adult-image-detector:

docker run -d -p 9191:9191 opendating/adult-image-detector

Or just add -a option if you do not want to filter by number of nude images.

Launch Nude Crawler!

(I intentionally changed links, do not want to violate github policy)

$ nudecrawler sasha-grey
INTERESTING https://telegra.ph/sasha-grey-XXXXXXXX
  Nude: 0 non-nude: 0
  Total video: 1

INTERESTING https://telegra.ph/sasha-grey-XXXXX
  Nude: 9 non-nude: 6

INTERESTING https://telegra.ph/sasha-grey-XXXXX
  Nude: 9 non-nude: 6

INTERESTING https://telegra.ph/sasha-grey-XXXXX
  Nude: 6 non-nude: 3

Working with wordlists

In simplest case (not so big wordlist), just use -w, like:

# verbose, no-filtering (report all pages), use wordlist
nudecrawler -v -a -w wordlist.txt

If you have very large wordlist, better to pre-check it with faster tool like bulk-http-check, it's much faster, doing simple check (we need only filter-out 200 vs 404 pages) millions of page per hour on smallest VPS server.

Convert wordlist to urllist

# only generate URLs 
nudecrawler -v -w wordlist.txt --urls > urls.txt

Verify it with bulk-http-check and get output file with this format:

https://telegra.ph/abazhurah-02-26 OK 404
https://telegra.ph/ab-03-01 OK 200
https://telegra.ph/aaronov-02-22 OK 404
https://telegra.ph/abazhurami-02-25 OK 404

Filter it, to leave only existing pages, and strip date from it:

grep "OK 200" .local/urls-status.log | cut -f 1 -d" "| sed 's/-[0-9]\+-[0-9]\+$//g' | sort | uniq > .local/urs.txt

List (urls.txt) will look like:

https://telegra.ph/
https://telegra.ph/a
https://telegra.ph/ab
https://telegra.ph/aba
https://telegra.ph/Abakan
....

This list (~300Kb, 11k urls) created from 1.5M words russian wordlist. There are only words which had at least one page with this title for last 10 days. So it has words 'Анжелика' or 'Анфиса' (beautiful woman names), but has no words 'Абажурами' or 'Абажуродержателем').

Now you can use this file as wordlist (nudecrawler will detect it's already base URL, and will only append date to URL).

Options

usage: nudecrawler [-h] [-d DAYS] [--nude NUDE] [--video VIDEO] [-u URL] [-v] [words ...]

Telegra.ph Spider

positional arguments:
  words

options:
  -h, --help            show this help message and exit
  -d DAYS, --days DAYS
  --nude NUDE           Interesting if N nude images
  --video VIDEO         Interesting if N video
  -u URL, --url URL     process one url
  -v, --verbose         verbose

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

nudecrawler-0.0.9-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file nudecrawler-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: nudecrawler-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 7.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.2

File hashes

Hashes for nudecrawler-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 8d5775ecc14209815bc542c10aaf25a824912c0543ab13e77cd1174da48c6d48
MD5 48175491ac92f64f6fd786cb19d42736
BLAKE2b-256 c037b0b4e497980d1e77b859cc26a6d24307e27a80d40c03ecc3c5ce5d73fdf9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page