Skip to main content

Scrapy Database Loader Wrapper with SQLAlchemy

Project description

scrapy_loaders

Scrapy Pipelines Loaders

  • Free software: MIT license

Install

pip install scrapy_loaders

At Your Scrapy project (Example: SpiderProject)

models.py

from sqlalchemy import (
    Column,
    String,
    Text,
)
from sqlalchemy.ext.declarative import declarative_base
DeclarativeBase = declarative_base()

class ItemModel(DeclarativeBase):
    __tablename__ = 'table_name'
    id = Column('id', String(10), primary_key=True)
    name = Column('name', String(60))
    description = Column('description', Text())
    url = Column('url', Text())
    md5sum = Column('md5sum', String(45))
    ...
...

settings.py

...
# Postgres settings, check other SQLAlchemy settings if you wish
DATABASE = {
    'drivername': 'postgresql+psycopg2',
    'host': 'localhost',
    'port': '5432',
    'username': 'username',
    'password': 'password',
    'database': 'attack_mitre',
}
DECLARATIVE_BASE = 'SpiderProject.models.DeclarativeBase'
...
ITEM_PIPELINES = {
   'SpiderProject.pipelines.SpiderProjectDbPipeline': 300,
}
...

pipelines.py

from SpiderProject.models import ItemModel
from scrapy_loaders.db_loaders import DBLoader
from scrapy_loaders.pipelines import DbPipeline

class ItemLoader(DBLoader):
    model = ItemModel
    hash_fields = ['name', 'description']
    update_fields = hash_fields + ['md5sum']
...

class SpiderProjectDbPipeline(DbPipeline):
    db_loaders = {
        'Item': ItemLoader,
    }
...

Features

Tests

TODO: tests

nosetests --with-coverage --cover-inclusive --cover-package=scrapy_loaders --cover-html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapy_loaders-0.0.2.tar.gz (4.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page