Skip to main content

Gathers unstructured News data into a SQLite3 db

Project description

Gathers unstructured news data and commits it to a SQLite3 database.The goal for GatherNews is to quickly and simply capture news data.

GatherNews allows you to specify which News sites you want to capture by providing the RSS link in “feeds_list.txt” like this:

http://feeds.reuters.com/Reuters/worldNews
http://rss.cnn.com/rss/money_latest.rss

You can then gather the news using the load_db() method:

>>> # Create new tables if any new RSS feed addresses have been added
>>> # Populate all tables with RSS news feeds
>>> # Remove duplicate entries
>>> capture_feeds.load_db()

The examples folder contains working code for each module.

Features

  • Creates tables with a predefined schema

  • Populates each table in the SQLite3 database with articles

  • Removes duplicate articles from each table

  • Includes bug fixes for issues raised about version 0.1.0

  • Faster than version 0.1.0, see the benchmarks!

Installation

To install GatherNews use pip:

$ pip install gathernews

Testing

After installation, you can launch the test suite from outside the source directory (you will need to have nosetests installed):

$ nosetests --exe GatherNews

Bug Resolution

In GatherNews 0.1.0, a bug was introduced that does not allow you to add new RSS feeds to the ‘feeds_list.txt’ after your initial call of the create_tables() method. Before using version 0.2.0 on a preexisting database call the following method.

>>> path = "path/to_your/existing_sqlite3_db/
>>> from gathernews.gRSS import CaptureFeeds
>>> capture_feeds = CaptureFeeds(path)
>>> capture_feeds.fix_create_table_bug()

If this gives you any problems, email me directly (tylers.pile@gmail.com). Thanks.

Documentation

Documentation is available at http://gathernews.readthedocs.org/en/latest/

Contribute

  1. Issue tracker is here: https://github.com/Bonza-Times/GatherNews/issues

  2. Fork it!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

GatherNews-0.2.1.tar.gz (10.1 kB view details)

Uploaded Source

File details

Details for the file GatherNews-0.2.1.tar.gz.

File metadata

  • Download URL: GatherNews-0.2.1.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for GatherNews-0.2.1.tar.gz
Algorithm Hash digest
SHA256 aa9fefda00ab2b4a028320e7cadf9c3c83be5961e831a3b9a93383206ae3b3a2
MD5 089efce1657d3d99f54f0f2c1ff86e3b
BLAKE2b-256 11f0ddb4af6be78e9295c723251bf58cd102a9590f03a256ea5af099680d4567

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page