Skip to main content

Scrapy Database Loader Wrapper with SQLAlchemy

Project description

scrapy_loaders

Scrapy Pipelines Loaders

  • Free software: MIT license

Install

pip install scrapy_loaders

At Your Scrapy project (Example: SpiderProject)

models.py

from sqlalchemy import (
    Column,
    String,
    Text,
)
from sqlalchemy.ext.declarative import declarative_base
DeclarativeBase = declarative_base()

class ItemModel(DeclarativeBase):
    __tablename__ = 'table_name'
    id = Column('id', String(10), primary_key=True)
    name = Column('name', String(60))
    description = Column('description', Text())
    url = Column('url', Text())
    md5sum = Column('md5sum', String(45))
    ...
...

settings.py

...
# Postgres settings, check other SQLAlchemy settings if you wish
DATABASE = {
    'drivername': 'postgresql+psycopg2',
    'host': 'localhost',
    'port': '5432',
    'username': 'username',
    'password': 'password',
    'database': 'attack_mitre',
}
DECLARATIVE_BASE = 'SpiderProject.models.DeclarativeBase'
...
ITEM_PIPELINES = {
   'SpiderProject.pipelines.SpiderProjectDbPipeline': 300,
}
...

pipelines.py

from SpiderProject.models import ItemModel
from scrapy_loaders.db_loaders import DBLoader
from scrapy_loaders.pipelines import DbPipeline

class ItemLoader(DBLoader):
    model = ItemModel
    hash_fields = ['name', 'description']
    update_fields = hash_fields + ['md5sum']
...

class SpiderProjectDbPipeline(DbPipeline):
    db_loaders = {
        'Item': ItemLoader,
    }
...

Features

Tests

TODO: tests

nosetests --with-coverage --cover-inclusive --cover-package=scrapy_loaders --cover-html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scrapy_loaders, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size scrapy_loaders-0.0.1.tar.gz (4.3 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page