Skip to main content

Scrapy utils

Project description

scrapyu

Build Status codecov PyPI - Python Version GitHub GitHub stars GitHub forks

UserAgentMiddleware

# settings.py
USERAGENT_TYPE = 'firefox'
DOWNLOADER_MIDDLEWARES = {
   'scrapyu.UserAgentMiddleware': 543,
}

MarkdownPipeline

# settings.py
MARKDOWNS_STORE = 'news'
ITEM_PIPELINES = {
    'scrapyu.MarkdownPipeline': 300,
}
# items.py
import scrapy

class MarkdownItem(scrapy.Item):
    html = scrapy.Field()
    filename = scrapy.Field()

FirefoxCookiesMiddleware

# settings.py
GECKODRIVER_PATH = 'geckodriver'
DOWNLOADER_MIDDLEWARES = {
   'scrapyu.FirefoxCookiesMiddleware': 543,
}

MongoDBPipeline

# settings.py
MONGODB_URI = 'mongodb://localhost:27017'
# or
# MONGODB_HOST = 'localhost'
# MONGODB_PORT = 27017
MONGODB_DATABASE = 'scrapyu'
MONGODB_COLLECTION = 'items'
MONGODB_BUFFER_LENGTH = 100
MONGODB_UNIQUE_KEY = 'title name'       # use only if no buffer
# or
# MONGODB_UNIQUE_KEY = ['title', 'name']
# MONGODB_UNIQUE_KEY = ('title', 'name')
ITEM_PIPELINES = {
    'scrapyu.MongoDBPipeline': 300,
}

RedisDupeFilter

# settings.py
DUPEFILTER_CLASS = 'scrapyu.RedisDupeFilter'
REDIS_DUPE_HOST = 'localhost'
REDIS_DUPE_PORT = 6379
REDIS_DUPE_DATABASE = 0
REDIS_DUPE_PASSWORD = 'password'
REDIS_DUPE_KEY = 'requests'
REDIS_DUPE_IGNORE_URL = r'http://scrapytest.org/\d+'

genspider

scrapyu genspider -l

results in :

Available templates:
  single
  single_splash

generate a single file spider

scrapyu genspider python www.python.org -t single

generate a single file spider, integration splash

scrapyu genspider python www.python.org -t single_splash

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapyu-0.1.12.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

scrapyu-0.1.12-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file scrapyu-0.1.12.tar.gz.

File metadata

  • Download URL: scrapyu-0.1.12.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.3 Windows/7

File hashes

Hashes for scrapyu-0.1.12.tar.gz
Algorithm Hash digest
SHA256 25d53001063d9da178ab2aa42afe6ac2852c9b4ab8310339a75fc0895c291213
MD5 447532f6140bb24eee654d7c9161c070
BLAKE2b-256 f0a2ccf5ba2e321a4c8c7bc998960183d021da2f1c84bbc0b8ae9858ae7c11ea

See more details on using hashes here.

File details

Details for the file scrapyu-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: scrapyu-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.3 Windows/7

File hashes

Hashes for scrapyu-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 569813efe543ce972a5ad6e2c35604a4db45aad44aae61494e5959d11d6a5242
MD5 e0092ff801fd8669bbc6744814c81124
BLAKE2b-256 c8f0706bd41a0f9be2d7533c675d6285611ed1fd3bc60eee9d7c52ce9e4f8382

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page