Skip to main content

Scrapy Database Loader Wrapper with SQLAlchemy

Project description

scrapy_loaders

Scrapy Pipelines Loaders

  • Free software: MIT license

Install

pip install scrapy_loaders

At Your Scrapy project (Example: SpiderProject)

models.py

from sqlalchemy import (
    Column,
    String,
    Text,
)
from sqlalchemy.ext.declarative import declarative_base
DeclarativeBase = declarative_base()

class ItemModel(DeclarativeBase):
    __tablename__ = 'table_name'
    id = Column('id', String(10), primary_key=True)
    name = Column('name', String(60))
    description = Column('description', Text())
    url = Column('url', Text())
    md5sum = Column('md5sum', String(45))
    ...
...

settings.py

...
# Postgres settings, check other SQLAlchemy settings if you wish
DATABASE = {
    'drivername': 'postgresql+psycopg2',
    'host': 'localhost',
    'port': '5432',
    'username': 'username',
    'password': 'password',
    'database': 'attack_mitre',
}
DECLARATIVE_BASE = 'SpiderProject.models.DeclarativeBase'
...
ITEM_PIPELINES = {
   'SpiderProject.pipelines.SpiderProjectDbPipeline': 300,
}
...

pipelines.py

from SpiderProject.models import ItemModel
from scrapy_loaders.db_loaders import DBLoader
from scrapy_loaders.pipelines import DbPipeline

class ItemLoader(DBLoader):
    model = ItemModel
    hash_fields = ['name', 'description']
    update_fields = hash_fields + ['md5sum']
...

class SpiderProjectDbPipeline(DbPipeline):
    db_loaders = {
        'Item': ItemLoader,
    }
...

Features

Tests

TODO: tests

nosetests --with-coverage --cover-inclusive --cover-package=scrapy_loaders --cover-html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_loaders-0.0.5.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file scrapy_loaders-0.0.5.tar.gz.

File metadata

  • Download URL: scrapy_loaders-0.0.5.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.6

File hashes

Hashes for scrapy_loaders-0.0.5.tar.gz
Algorithm Hash digest
SHA256 316f2332306fb958e5bddce7dd2be1ff5145962aa6a665516f96548244499739
MD5 ccd84d1655436ee90b2f2dea756f1402
BLAKE2b-256 e94ea7494f34fed427d52b8c14ea03d69efc38a82265ada37b7373cca4a9869e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page