Cube for news storage and analysis
Project description
Summary
Cube for news storage and analysis
This cube provides an implementation of Semnews:
store news articles and tweets.
extract and synthetize information.
provide semantic useful and original visualisation.
analytics tools and datamining/machine learning processings.
Installation
Creation of the instance:
Create an instance using: cubicweb-ctl create semnews <name-of-instance>
Create the instance’s database using: cubicweb-ctl db-create <name-of-instance>
Add articles sources
Source of articles could be created using:
Blogs/RSS feeds:
session.create_entity('CWSource', name=<name of the source>, type=u'datafeed', parser=u'rss-parser', lang=<lang of the source>, url=<url of the blog/rss feed>, config=u'synchronization-interval=120min')Tweet:
session.create_entity('CWSource', name=<name of the source>, type=u'datafeed', parser=u'tweet-parser', lang=<lang of the source>, url=<url of the blog/rss feed>, config=u'synchronization-interval=120min')
The synchronization interval could be setted to a more specific value, or setted to “no” for manual synchronization only.
Semnews comes with some predifined blogs/tweets/rss feeds:
Some french political blogs. You can add them using:
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_blogs_fr.pySome international english newspapers. You can add them using:
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_newspapers.pySome french newspapers. You can add them using:
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_newspapers_fr.pySome french politician tweets. You can add them using:
cubicweb-ctl shell <name-of-instance> <path-to-cube-code-source>/migration/examples_twitters_fr.py
Add Named Entities sources
Semnews is based on a named entities process, that you have to define:
session.create_entity('NerProcess', name=<name of process>, host=<appid or sparql endpoint url>, type=<rql or sparql>, lang=<optional lang of the ner source>, request=<request to be performed>)
See the document of the NER cube for more details. Example of source:
session.create_entity('NerProcess', name=u'dbpedia38-en', host=u'ner', type=u'rql', lang=u'en', request=u'Any U WHERE X label %(token)s, X cwuri U, ' 'X ner_source NS, NS name "dbpedia38-en"')
Commands
Semnews provide to commands:
A command to extract named entities from articles:
cubicweb-ctl process-ner <name-of-instance>A command to cleanup recognized entities according to some Dbpedia categories (see entities/external_resources.py):
cubicweb-ctl cleanup-ner <name-of-instance>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.