Skip to main content

Archive your RSS into SQLite:

Project description

rssarchive

rssarchive is a library for fetching multiple RSS source into SQLite database. It has with functionality of scraping full text via newspaper3k library.

Quick Start

To install rssarchive just use with pip:

pip install rssarchive

To use rssarchive you can use over console or calling as library:

Using via console simply call:

rssarchive

Using as library:

#!/usr/bin/env python
import rssarchive as ra
newra  = ra.RssArchive(CONFIG_TEST_MODE=True,CONFIG_FULL_TEXT_MODE = False)
newra.batch_save_rss()

When you run the batch_save_rss() command the library will create two files in the current directory

  • rsslist.cv: This is default file that include some RSS sources
  • rssarchive.sqlite: This is SQLite file that fetched news

After code finishes his task you can view/edit the SQLite file with SQLiteBrowser app.

You can modify the rsslist.csv file for your own sources and re-run.

Parameters in the constuction class

When you run code above you may notice the

newra  = ra.RssArchive(CONFIG_TEST_MODE=True,CONFIG_FULL_TEXT_MODE = False)

construction. Here all parameters are defined:

CONFIG_DEFAULT_TABLE_NAME = 'tab_headline'

CONFIG_SQLITEDB_URL = "rssarchive.sqlite",

CONFIG_RSS_LIST = "rss_list.csv",

CONFIG_SINGLE_RSS_SOURCE_URL = "https://www.sabah.com.tr/rss/anasayfa.xml",

CONFIG_EASY_DEBUG = True,

CONFIG_TEST_VAR = "suatatan",

CONFIG_TEST_MODE = False,

CONFIG_FULL_TEXT_MODE = True,

Amgong these params just two parameters are critical:

CONFIG_EASY_DEBUG: If True you can show all messages in the code, if false you cannot

CONFIG_FULL_TEXT_MODE: If True library will fetch full text of each URL (it takes time) if False the library will getch RSS only

CONFIG_TEST_MODE: If True the library just fetch two sample resource , if false the code will process all RSS sources in the link (please keep it True for your real projects)

Motivation

This library is open-source library developed within the turnusol.org project. This project is a social enterpreneurship for detecting hate-speech and fake-news in Turkish. If you want to contribute this library or our project please contact us via turnusol.org

Packaking commands

python setup.py sdist bdist_wheel

python -m twine upload --skip-existing --repository testpypi dist/* -u suatatan -p password

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rssarchive-0.5.2.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rssarchive-0.5.2-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file rssarchive-0.5.2.tar.gz.

File metadata

  • Download URL: rssarchive-0.5.2.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for rssarchive-0.5.2.tar.gz
Algorithm Hash digest
SHA256 0a2b2c47d1706c56f6214380b6276e72841176359a7c7c8b7587201d0bf9a280
MD5 1a679f1f783535a379cd5a4d27b2cd28
BLAKE2b-256 a4b97e7c19845bc0c0a41ac731c1a7893453fdaf662ac7544a277cf472c4986e

See more details on using hashes here.

File details

Details for the file rssarchive-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: rssarchive-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for rssarchive-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 77ca93693d1dffe57f8c7b965dc293ed2df332f6301dc926e417e0418c5ff167
MD5 8435b31a522f227c95882d136852f0a2
BLAKE2b-256 44e6d1744cee77d846000009198f72ee5b2f436fc9fadffc39d3ee42be93f0f7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page