Archive your RSS into SQLite:
Project description
rssarchive
rssarchive is a library for fetching multiple RSS source into SQLite database. It has with functionality of scraping full text via newspaper3k library.
Quick Start
To install rssarchive just use with pip:
pip install rssarchive
To use rssarchive you can use over console or calling as library:
Using via console simply call:
rssarchive
Using as library:
#!/usr/bin/env python
import rssarchive as ra
newra = ra.RssArchive(CONFIG_TEST_MODE=True,CONFIG_FULL_TEXT_MODE = False)
newra.batch_save_rss()
When you run the batch_save_rss() command the library will create two files in the current directory
- rsslist.cv: This is default file that include some RSS sources
- rssarchive.sqlite: This is SQLite file that fetched news
After code finishes his task you can view/edit the SQLite file with SQLiteBrowser app.
You can modify the rsslist.csv file for your own sources and re-run.
Parameters in the constuction class
When you run code above you may notice the
newra = ra.RssArchive(CONFIG_TEST_MODE=True,CONFIG_FULL_TEXT_MODE = False)
construction. Here all parameters are defined:
CONFIG_DEFAULT_TABLE_NAME = 'tab_headline'
CONFIG_SQLITEDB_URL = "rssarchive.sqlite",
CONFIG_RSS_LIST = "rss_list.csv",
CONFIG_SINGLE_RSS_SOURCE_URL = "https://www.sabah.com.tr/rss/anasayfa.xml",
CONFIG_EASY_DEBUG = True,
CONFIG_TEST_VAR = "suatatan",
CONFIG_TEST_MODE = False,
CONFIG_FULL_TEXT_MODE = True,
Amgong these params just two parameters are critical:
CONFIG_EASY_DEBUG: If True you can show all messages in the code, if false you cannot
CONFIG_FULL_TEXT_MODE: If True library will fetch full text of each URL (it takes time) if False the library will getch RSS only
CONFIG_TEST_MODE: If True the library just fetch two sample resource , if false the code will process all RSS sources in the link (please keep it True for your real projects)
Motivation
This library is open-source library developed within the turnusol.org project. This project is a social enterpreneurship for detecting hate-speech and fake-news in Turkish. If you want to contribute this library or our project please contact us via turnusol.org
Packaking commands
python setup.py sdist bdist_wheel
python -m twine upload --skip-existing --repository testpypi dist/* -u suatatan -p password
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rssarchive-0.5.2.tar.gz.
File metadata
- Download URL: rssarchive-0.5.2.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a2b2c47d1706c56f6214380b6276e72841176359a7c7c8b7587201d0bf9a280
|
|
| MD5 |
1a679f1f783535a379cd5a4d27b2cd28
|
|
| BLAKE2b-256 |
a4b97e7c19845bc0c0a41ac731c1a7893453fdaf662ac7544a277cf472c4986e
|
File details
Details for the file rssarchive-0.5.2-py3-none-any.whl.
File metadata
- Download URL: rssarchive-0.5.2-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.22.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77ca93693d1dffe57f8c7b965dc293ed2df332f6301dc926e417e0418c5ff167
|
|
| MD5 |
8435b31a522f227c95882d136852f0a2
|
|
| BLAKE2b-256 |
44e6d1744cee77d846000009198f72ee5b2f436fc9fadffc39d3ee42be93f0f7
|