Skip to main content

Crawler framework to download Internet-novels from web.

Project description

爬取小说网站并生成TXT

目前支持

安装

用法

CLI

config = (
    r'D:\net_novels\crawler_ocr\lord',
    {
        'start_page': 'https://ccc.xxxx.com/Novel/xxxxxx/',
        'login_info': ('test_login', 'test_pwd'),
        'image_folder': 'vip_images',
        'image_process': 'ocr',
        'text_file': 'xxx.txt',
    }
)
from netnovelcrawler import Crawler
from netnovelcrawler.utils.starter_stopper import AfterChapterStarter, CountStopper

mycrawler = Crawler(*config)
mycrawler.crawl(starter=AfterChapterStarter("10. 某章节"), stopper=CountStopper(50))

GUI

python -m netnovelcrawlertaskmgr

绕过滑块验证反爬虫机制

######修改chromedriver.exe

  • 文本编辑器打开chromedriver.exe
  • 找到cdc_字符串
  • 等长替换$cdc_lasutopfhvcZLmcfl
  • 保存

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netnovelcrawler-0.0.2.tar.gz (25.4 kB view hashes)

Uploaded Source

Built Distribution

netnovelcrawler-0.0.2-py3-none-any.whl (26.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page