Skip to main content

Crawler framework to download Internet-novels from web.

Project description

爬取小说网站并生成TXT

目前支持

安装

用法

CLI

config = (
    r'D:\net_novels\crawler_ocr\lord',
    {
        'start_page': 'https://ccc.xxxx.com/Novel/xxxxxx/',
        'login_info': ('test_login', 'test_pwd'),
        'image_folder': 'vip_images',
        'image_process': 'ocr',
        'text_file': 'xxx.txt',
    }
)
from netnovelcrawler import Crawler
from netnovelcrawler.utils.starter_stopper import AfterChapterStarter, CountStopper

mycrawler = Crawler(*config)
mycrawler.crawl(starter=AfterChapterStarter("10. 某章节"), stopper=CountStopper(50))

GUI

python -m netnovelcrawlertaskmgr

绕过滑块验证反爬虫机制

######修改chromedriver.exe

  • 文本编辑器打开chromedriver.exe
  • 找到cdc_字符串
  • 等长替换$cdc_lasutopfhvcZLmcfl
  • 保存

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netnovelcrawler-0.0.2.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

netnovelcrawler-0.0.2-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file netnovelcrawler-0.0.2.tar.gz.

File metadata

  • Download URL: netnovelcrawler-0.0.2.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for netnovelcrawler-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4264c6c68a8ff46aaed21f97881f988dde6bd392e7fd05ccd77fa51e7d90dd18
MD5 35794f4b0455ffe7166c72984fc2dcbc
BLAKE2b-256 8e59e22840195f7f24d51fb0c8a501d94385218e84eadd3efeda008e5f7859cb

See more details on using hashes here.

File details

Details for the file netnovelcrawler-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for netnovelcrawler-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 13e8f51512a33c739366b502caa3c17d752e6e48bd12e8f20defee2a600eb811
MD5 32184814f367d73c564675c2f70387d3
BLAKE2b-256 dce722e1501b93433511075322f2c34e02cecb5aeaa31028f99a7cee677024be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page