readable-content

Collect actual content of any article, blog, news, etc.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Project description

Collects actual content of any article, blog, news, etc.

Installation

pip install readable-content

Usage

After installing you need to do just add following two variables in settings.py of your Scrapy project

from readable_content.parser import ContentParser
parser = ContentParser("https://ideas.ted.com/how-do-animals-learn-how-to-be-well-animals-through-a-shared-culture/")
content = parser.get_content()
print(readable_content)

In case the website does not allow getting the content and throws 4XX or 3XX or any other error codes, we can first get the HTML using other techniques like using requests, using user-agent, applying proxies on your own, etc. Then the html content can be passed as following:

parser = ContentParser("https://ideas.ted.com/how-do-animals-learn-how-to-be-well-animals-through-a-shared-culture/", html_content)

Here html_content variable is string representation of the HTML.

Thank you!

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.1.2

May 24, 2020

0.1.1

May 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

readable-content-0.1.2.tar.gz (4.6 kB view details)

Uploaded May 24, 2020 Source

File details

Details for the file readable-content-0.1.2.tar.gz.

File metadata

Download URL: readable-content-0.1.2.tar.gz
Upload date: May 24, 2020
Size: 4.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.3.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7

File hashes

Hashes for readable-content-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`94e6c41c95db814473a190170c861d7a2a20ce04b245c2ecbd20da9eef1f8f93`
MD5	`4ecf9f80dc097e2b67242644fc6c6c58`
BLAKE2b-256	`9464f47d78e2ff5c7174795b492843e73fe033d825264fb658199a7afbea6295`

See more details on using hashes here.

readable-content 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes