Skip to main content

Data scraper infrastructure for OpenBlock (hyperlocal news for Django)

Project description

ebdata

Code to help write scripts that import/crawl/parse data from the web into ebpub, as well as extract (US) street addresses from (English) text.

This package is part of OpenBlock. Originally developed for EveryBlock.com.

For more information, see the documentation or the project website.

Problems can be reported to the issue tracker.

Discussion is on the ebcode google group or the #openblock channel on freenode.

Installation

Do not just try to easy_install or pip install ebdata. It has a lot of specific dependencies which can’t/shouldn’t be captured by setup.py.

Instead, see the full documentation at http://openblockproject.org/docs/install/index.html

OpenBlock

OpenBlock is a web application that allows users to browse and search their local area for “hyper-local news” - to see what’s going on recently in the immediate geographic area.

For installation instructions and other documentation, see http://openblockproject.org/docs/ (or the .rst files in the docs/ directory).

For help, you can try the ebcode group: http://groups.google.com/group/ebcode or look for us in the #openblock IRC channel on irc.freenode.net.

About the Project

OpenBlock began life as the open-source code released by Everyblock.com in June 2009. Originally created by Adrian Holovaty and the Everyblock team, it is now developed as an open-source (GPL) project by at http://openblockproject.org.

Funding for the initial creation of Everyblock and the ongoing development of OpenBlock has been provided by the Knight Foundation (http://www.knightfoundation.org/).

OpenBlock 1.0 beta 1 release notes

Upgrade Notes

  • If you have an existing database that was built with 1.0a1 or earlier, you’ll need to run this command to deal with the removal of the “django-apikey” dependency:

    django-admin.py migrate apikey 0001 --fake
  • Many data-loading scripts that were scattered all over the source tree are now installed into your environment’s bin directory, so they should be on your $PATH. Documentation has been updated accordingly.

  • As usual, you should always run after upgrading:

    django-admin.py syncdb --migrate

    If you were unlucky and had last migrated with a git checkout including migrations that later got renamed or removed, you may get errors from migrating. In that case try adding the --delete-ghost-migrations option.

  • Production webserver configurations will need a line added to get the django-olwidget javascript and CSS to show up. For example, for Apache you’d add a line like (adjust path as needed):

    Alias /olwidget/ /home/openblock/openblock/src/django-olwidget/
  • We now require Django 1.3. This probably doesn’t have any impact on you. (ticket #155).

  • Settings changes:

    • MAP_BASELAYER_TYPE can now be any base layer supported by olwidget, eg. “google.streets”. (Some require other settings for eg. API keys; see ebpub/settings_default.py for comments and examples.)

    • You can add custom base layers to your maps by creating the dictionary settings.MAP_CUSTOM_BASE_LAYERS. See ebpub/settings_default.py for an example.

      This replaces the WMS_URL setting from openblock 1.0a1 which is no longer supported.

New Features in 1.0 beta 1

  • ticket #33: Different map icons for different news item types. To use this, you can use the admin UI to configure “map icon url” or “map color” for a Schema.

  • ticket #85: Added streets.PlaceType model for categorizing Places. These also can have individual colors or icon URLs on the /maps/ view. (Original ticket title was “‘Landmark’ location type”)

  • ticket #142: JSON push API for news items. See docs/main/api.rst

  • ticket #187: REST API standard features: API key provisioning; require keys (or auth) for POST / DELETE; throttling

  • Import US Zip Codes as Locations, via the admin UI.

  • Work-in-progress: user-submitted content. See code in the ebpub/neighbornews app.

  • Work-in-progress: Maps you can share just by copy/pasting a URL. For a sneak preview, browse to /maps/.

  • Much better admin UI maps. (ticket #140: Bad admin UI for GeometryFields)

  • ticket #72: unify NewsItem.attributes and NewsItem.attribute_values

  • ticket #52: Proper validation for Street Misspellings in admin

  • ticket #157: fill in normalized name automatically

  • ticket #123: Configurable base layer should apply to admin UI maps too

Bug fixes

  • Importers should now not blow up if run more than once.

  • ticket #22: Scraper scripts in everyblock/cities/boston mostly don’t work OOTB

  • ticket #79: Geotagging oddity

  • ticket #188: items.json doesn’t include location_name

  • ticket #200: “obdemo bin scripts are documented, but don’t get installed when installing obdemo non-editable”

Documentation

  • ticket #80: Documentation for Street Misspellings

  • ticket #162: Document pip / easy_install workarounds

  • ticket #139: Document adding database user / granting database access

  • ticket #198: version number in documentation

  • ticket #197: documentation for deploying static media

Other

  • ticket #181: Prepare packages for distribution on pypi.

  • ticket #83: Split out non-core packages into a separate download (ebblog, ebwiki, ebgeo, ebinternal, and everyblock are now at https://github.com/openplans/openblock-extras )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ebdata-1.0-beta1.tar.gz (110.7 kB view details)

Uploaded Source

File details

Details for the file ebdata-1.0-beta1.tar.gz.

File metadata

  • Download URL: ebdata-1.0-beta1.tar.gz
  • Upload date:
  • Size: 110.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ebdata-1.0-beta1.tar.gz
Algorithm Hash digest
SHA256 f95d99f31f97b4c8d4ad379891cad693b69943a0d56d8362756982dfce21501f
MD5 ff957704b7c1d426039cfdb60853de9f
BLAKE2b-256 a5de0d6c6a2d0e8b9c098ea6dda65d5959b332db290c3bdfe0210a4f8086f364

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page