Scrapy extension to write scraped items using Django models
Project description
=================
scrapy-djangoitem
=================
.. image:: https://img.shields.io/pypi/v/scrapy-djangoitem.svg
:target: https://pypi.python.org/pypi/scrapy-djangoitem
:alt: PyPI Version
.. image:: https://img.shields.io/travis/scrapy-plugins/scrapy-djangoitem/master.svg
:target: http://travis-ci.org/scrapy-plugins/scrapy-djangoitem
:alt: Build Status
.. image:: https://img.shields.io/github/license/scrapy-plugins/scrapy-djangoitem.svg
:target: https://github.com/scrapy-plugins/scrapy-djangoitem/blob/master/LICENSE
:alt: License
``scrapy-djangoitem`` is an extension that allows you to define `Scrapy items
<http://doc.scrapy.org/en/latest/topics/items.html>`_ using existing `Django
models <https://docs.djangoproject.com/en/latest/topics/db/models/>`_.
This utility provides a new class, named ``DjangoItem``, that you can use as a
regular Scrapy item and link it to a Django model with its ``django_model``
attribute. Start using it right away by importing it from this package::
from scrapy_djangoitem import DjangoItem
Installation
============
Starting with ``v.1.1`` both ``Python 2.7`` and ``Python 3.4/3.5`` are
supported. For ``Python 3`` you need ``Scrapy v.1.1`` or above.
Latest tested Django version is ``Django 1.9``.
Install from ``PyPI`` using::
pip install scrapy-djangoitem
Introduction
============
``DjangoItem`` is a class of item that gets its fields definition from a
Django model, you simply create a ``DjangoItem`` and specify what Django
model it relates to.
Besides of getting the model fields defined on your item, ``DjangoItem``
provides a method to create and populate a Django model instance with the item
data.
Usage
=====
``DjangoItem`` works much like ModelForms in Django, you create a subclass
and define its ``django_model`` attribute to be a valid Django model. With this
you will get an item with a field for each Django model field.
In addition, you can define fields that aren't present in the model and even
override fields that are present in the model defining them in the item.
Let's see some examples:
Creating a Django model for the examples::
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.IntegerField()
Defining a basic ``DjangoItem``::
from scrapy_djangoitem import DjangoItem
class PersonItem(DjangoItem):
django_model = Person
``DjangoItem`` works just like Scrapy items::
>>> p = PersonItem()
>>> p['name'] = 'John'
>>> p['age'] = '22'
To obtain the Django model from the item, we call the extra method
``DjangoItem.save()`` of the ``DjangoItem``::
>>> person = p.save()
>>> person.name
'John'
>>> person.age
'22'
>>> person.id
1
The model is already saved when we call ``DjangoItem.save()``, we
can prevent this by calling it with ``commit=False``. We can use
``commit=False`` in ``DjangoItem.save()`` method to obtain an unsaved model::
>>> person = p.save(commit=False)
>>> person.name
'John'
>>> person.age
'22'
>>> person.id
None
As said before, we can add other fields to the item::
import scrapy
from scrapy_djangoitem import DjangoItem
class PersonItem(DjangoItem):
django_model = Person
sex = scrapy.Field()
::
>>> p = PersonItem()
>>> p['name'] = 'John'
>>> p['age'] = '22'
>>> p['sex'] = 'M'
And we can override the fields of the model with your own::
class PersonItem(DjangoItem):
django_model = Person
name = scrapy.Field(default='No Name')
This is useful to provide properties to the field, like a default or any other
property that your project uses. Those additional fields won't be taken into
account when doing a ``DjangoItem.save()``.
Caveats
=======
``DjangoItem`` is a rather convenient way to integrate Scrapy projects with Django
models, but bear in mind that Django ORM **may not scale well** if you scrape a lot
of items (ie. millions) with Scrapy. This is because a relational backend is
**often not a good choice for a write intensive applications** (such as a web
crawler), specially if the database is highly normalized and with many indices.
Setup
=====
To use the Django models outside the Django application you need to set up the
``DJANGO_SETTINGS_MODULE`` environment variable and --in most cases-- modify
the ``PYTHONPATH`` environment variable to be able to import the settings
module.
There are many ways to do this depending on your use case and preferences.
Below is detailed one of the simplest ways to do it.
Suppose your Django project is named ``mysite``, is located in the path
``/home/projects/mysite`` and you have created an app ``myapp`` with the model
``Person``. That means your directory structure is something like this::
/home/projects/mysite
├── manage.py
├── myapp
│ ├── __init__.py
│ ├── models.py
│ ├── tests.py
│ └── views.py
└── mysite
├── __init__.py
├── settings.py
├── urls.py
└── wsgi.py
Then you need to add ``/home/projects/mysite`` to the ``PYTHONPATH``
environment variable and set up the environment variable
``DJANGO_SETTINGS_MODULE`` to ``mysite.settings``. That can be done in your
Scrapy's settings file by adding the lines below::
import sys
sys.path.append('/home/projects/mysite')
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
Notice that we modify the ``sys.path`` variable instead the ``PYTHONPATH``
environment variable as we are already within the python runtime. If everything
is right, you should be able to start the ``scrapy shell`` command and import
the model ``Person`` (i.e. ``from myapp.models import Person``).
Starting with ``Django 1.8`` you also have to explicitly set up ``Django`` if using
it outside a ``manage.py`` context
(see `Django Docs <https://docs.djangoproject.com/en/1.8/intro/tutorial01/#playing-with-the-api>`_)::
import django
django.setup()
Development
===========
Test suite from the ``tests`` directory can be run using ``tox`` by running::
tox
...using the configuration in ``tox.ini``. The ``Python`` interpreters
used have to be installed locally on the system.
Changelog
=========
v.1.1 (2016-05-04)
------------------
* ``Python 3.4/3.5`` support
* Making tests work with ``Django 1.9``again
v.1.0 (2015-04-29)
------------------
* Initial version
scrapy-djangoitem
=================
.. image:: https://img.shields.io/pypi/v/scrapy-djangoitem.svg
:target: https://pypi.python.org/pypi/scrapy-djangoitem
:alt: PyPI Version
.. image:: https://img.shields.io/travis/scrapy-plugins/scrapy-djangoitem/master.svg
:target: http://travis-ci.org/scrapy-plugins/scrapy-djangoitem
:alt: Build Status
.. image:: https://img.shields.io/github/license/scrapy-plugins/scrapy-djangoitem.svg
:target: https://github.com/scrapy-plugins/scrapy-djangoitem/blob/master/LICENSE
:alt: License
``scrapy-djangoitem`` is an extension that allows you to define `Scrapy items
<http://doc.scrapy.org/en/latest/topics/items.html>`_ using existing `Django
models <https://docs.djangoproject.com/en/latest/topics/db/models/>`_.
This utility provides a new class, named ``DjangoItem``, that you can use as a
regular Scrapy item and link it to a Django model with its ``django_model``
attribute. Start using it right away by importing it from this package::
from scrapy_djangoitem import DjangoItem
Installation
============
Starting with ``v.1.1`` both ``Python 2.7`` and ``Python 3.4/3.5`` are
supported. For ``Python 3`` you need ``Scrapy v.1.1`` or above.
Latest tested Django version is ``Django 1.9``.
Install from ``PyPI`` using::
pip install scrapy-djangoitem
Introduction
============
``DjangoItem`` is a class of item that gets its fields definition from a
Django model, you simply create a ``DjangoItem`` and specify what Django
model it relates to.
Besides of getting the model fields defined on your item, ``DjangoItem``
provides a method to create and populate a Django model instance with the item
data.
Usage
=====
``DjangoItem`` works much like ModelForms in Django, you create a subclass
and define its ``django_model`` attribute to be a valid Django model. With this
you will get an item with a field for each Django model field.
In addition, you can define fields that aren't present in the model and even
override fields that are present in the model defining them in the item.
Let's see some examples:
Creating a Django model for the examples::
from django.db import models
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.IntegerField()
Defining a basic ``DjangoItem``::
from scrapy_djangoitem import DjangoItem
class PersonItem(DjangoItem):
django_model = Person
``DjangoItem`` works just like Scrapy items::
>>> p = PersonItem()
>>> p['name'] = 'John'
>>> p['age'] = '22'
To obtain the Django model from the item, we call the extra method
``DjangoItem.save()`` of the ``DjangoItem``::
>>> person = p.save()
>>> person.name
'John'
>>> person.age
'22'
>>> person.id
1
The model is already saved when we call ``DjangoItem.save()``, we
can prevent this by calling it with ``commit=False``. We can use
``commit=False`` in ``DjangoItem.save()`` method to obtain an unsaved model::
>>> person = p.save(commit=False)
>>> person.name
'John'
>>> person.age
'22'
>>> person.id
None
As said before, we can add other fields to the item::
import scrapy
from scrapy_djangoitem import DjangoItem
class PersonItem(DjangoItem):
django_model = Person
sex = scrapy.Field()
::
>>> p = PersonItem()
>>> p['name'] = 'John'
>>> p['age'] = '22'
>>> p['sex'] = 'M'
And we can override the fields of the model with your own::
class PersonItem(DjangoItem):
django_model = Person
name = scrapy.Field(default='No Name')
This is useful to provide properties to the field, like a default or any other
property that your project uses. Those additional fields won't be taken into
account when doing a ``DjangoItem.save()``.
Caveats
=======
``DjangoItem`` is a rather convenient way to integrate Scrapy projects with Django
models, but bear in mind that Django ORM **may not scale well** if you scrape a lot
of items (ie. millions) with Scrapy. This is because a relational backend is
**often not a good choice for a write intensive applications** (such as a web
crawler), specially if the database is highly normalized and with many indices.
Setup
=====
To use the Django models outside the Django application you need to set up the
``DJANGO_SETTINGS_MODULE`` environment variable and --in most cases-- modify
the ``PYTHONPATH`` environment variable to be able to import the settings
module.
There are many ways to do this depending on your use case and preferences.
Below is detailed one of the simplest ways to do it.
Suppose your Django project is named ``mysite``, is located in the path
``/home/projects/mysite`` and you have created an app ``myapp`` with the model
``Person``. That means your directory structure is something like this::
/home/projects/mysite
├── manage.py
├── myapp
│ ├── __init__.py
│ ├── models.py
│ ├── tests.py
│ └── views.py
└── mysite
├── __init__.py
├── settings.py
├── urls.py
└── wsgi.py
Then you need to add ``/home/projects/mysite`` to the ``PYTHONPATH``
environment variable and set up the environment variable
``DJANGO_SETTINGS_MODULE`` to ``mysite.settings``. That can be done in your
Scrapy's settings file by adding the lines below::
import sys
sys.path.append('/home/projects/mysite')
import os
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
Notice that we modify the ``sys.path`` variable instead the ``PYTHONPATH``
environment variable as we are already within the python runtime. If everything
is right, you should be able to start the ``scrapy shell`` command and import
the model ``Person`` (i.e. ``from myapp.models import Person``).
Starting with ``Django 1.8`` you also have to explicitly set up ``Django`` if using
it outside a ``manage.py`` context
(see `Django Docs <https://docs.djangoproject.com/en/1.8/intro/tutorial01/#playing-with-the-api>`_)::
import django
django.setup()
Development
===========
Test suite from the ``tests`` directory can be run using ``tox`` by running::
tox
...using the configuration in ``tox.ini``. The ``Python`` interpreters
used have to be installed locally on the system.
Changelog
=========
v.1.1 (2016-05-04)
------------------
* ``Python 3.4/3.5`` support
* Making tests work with ``Django 1.9``again
v.1.0 (2015-04-29)
------------------
* Initial version
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrapy-djangoitem-1.1.0.tar.gz
.
File metadata
- Download URL: scrapy-djangoitem-1.1.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e6adf98a02eacf8b994f16c4c1f835bf7d326bbf8e3ecfa62de17b6416a7fa3e |
|
MD5 | 033c61969b359ea5817690c291712d56 |
|
BLAKE2b-256 | adee5cc82ba1be829b2a3a8ee38e14885d64c86a74ce55c2bc872c520a1239b0 |
File details
Details for the file scrapy_djangoitem-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: scrapy_djangoitem-1.1.0-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e17fb904f113ae3592d6c0f48d48dfba0c4cc2822454205eca7edab6d9815726 |
|
MD5 | de85ff73ed4abccf4b516f4cfc432e00 |
|
BLAKE2b-256 | 092ad49c7fc3d61349f3f633bd28b5e8497f3b5abeceac4a7888ab79d158a26d |