Scrapy Database Loader Wrapper with SQLAlchemy
Project description
scrapy_loaders
Scrapy Pipelines Loaders
Free software: MIT license
Install
pip install scrapy_loaders
At Your Scrapy project (Example: SpiderProject)
models.py
from sqlalchemy import (
Column,
String,
Text,
)
from sqlalchemy.ext.declarative import declarative_base
DeclarativeBase = declarative_base()
class ItemModel(DeclarativeBase):
__tablename__ = 'table_name'
id = Column('id', String(10), primary_key=True)
name = Column('name', String(60))
description = Column('description', Text())
url = Column('url', Text())
md5sum = Column('md5sum', String(45))
...
...
settings.py
...
# Postgres settings, check other SQLAlchemy settings if you wish
DATABASE = {
'drivername': 'postgresql+psycopg2',
'host': 'localhost',
'port': '5432',
'username': 'username',
'password': 'password',
'database': 'attack_mitre',
}
DECLARATIVE_BASE = 'SpiderProject.models.DeclarativeBase'
...
ITEM_PIPELINES = {
'SpiderProject.pipelines.SpiderProjectDbPipeline': 300,
}
...
pipelines.py
from SpiderProject.models import ItemModel
from scrapy_loaders.db_loaders import DBLoader
from scrapy_loaders.pipelines import DbPipeline
class ItemLoader(DBLoader):
model = ItemModel
hash_fields = ['name', 'description']
update_fields = hash_fields + ['md5sum']
...
class SpiderProjectDbPipeline(DbPipeline):
db_loaders = {
'Item': ItemLoader,
}
...
Features
Tests
TODO: tests
nosetests --with-coverage --cover-inclusive --cover-package=scrapy_loaders --cover-html
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
scrapy_loaders-0.0.5.tar.gz
(4.5 kB
view hashes)