Skip to main content

An automated ingestion service for blogs to construct a corpus for NLP research.

Project description

Baleen is a tool for ingesting formal natural language data from the discourse of professional and amateur writers: e.g. bloggers and news outlets. Rather than performing web scraping, Baleen focuses on data ingestion through the use of RSS feeds. It performs as much raw data collection as it can, saving data into a Mongo document store.

For more, please see the full documentation at: http://baleen-ingest.readthedocs.org/en/latest/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

baleen-0.3.3.tar.gz (646.2 kB view details)

Uploaded Source

Built Distribution

baleen-0.3.3-py2-none-any.whl (41.5 kB view details)

Uploaded Python 2

File details

Details for the file baleen-0.3.3.tar.gz.

File metadata

  • Download URL: baleen-0.3.3.tar.gz
  • Upload date:
  • Size: 646.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for baleen-0.3.3.tar.gz
Algorithm Hash digest
SHA256 62af5a325dc5e378c3e31a444d825ae67150f1dca5ef77c9646a9ffac8e5b500
MD5 59fd70bb28f580af5dbaaa13d526b8dc
BLAKE2b-256 b1b55e11b384815555b6775d49e86438d5b702dfa6e934a1a5341d43f7194d3d

See more details on using hashes here.

File details

Details for the file baleen-0.3.3-py2-none-any.whl.

File metadata

File hashes

Hashes for baleen-0.3.3-py2-none-any.whl
Algorithm Hash digest
SHA256 cd610220d7b00569c09e2f7c6d99eaa820b0a02b33adae6336c01bcd09a494b5
MD5 9f5fa10f5ade08e0eb10e9d41fc79035
BLAKE2b-256 bceb0aad41ccfe5c26ddfacdb05632912c7624638c445ca05de4be6fbf4c8969

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page