Skip to main content

A light weight library that takes in a url and extracts any readable text in it.

Project description

urltotext

A light weight library that takes in a url and extracts any readable text in it.

Accepting any and all PRs!

Installation

pip install urltotext

Pre-requisites

  1. urltotext uses selenium with the driver scope currently limited to chrome only. Please ensure that chromedriver is properly configured. Use this link for installation instructions.

Usage

  1. Import and initialize ContentFinder
from urltotext import ContentFinder
cf = ContentFinder()
  1. Scrape a url
# scrape a url
cs.scrape_url(url="your_url_here")

# print the article
cs.print_article(url="your_url_here")

# all urls passed will be stored in the class instance.
# use the flush_data method to free memory
cs.flush_data()

Enjoy!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

urltotext-0.3.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

urltotext-0.3.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file urltotext-0.3.0.tar.gz.

File metadata

  • Download URL: urltotext-0.3.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for urltotext-0.3.0.tar.gz
Algorithm Hash digest
SHA256 86a9204af6c38c734a4eb0ee34882477d13f9635eceae6dfc716aff68a638b7c
MD5 85f3b6eb42a498230abba92fc0fa6ada
BLAKE2b-256 a304923ca3bbd26f493555bafe7ee6227d86c962ef4f8ef04c60de02f5cc029f

See more details on using hashes here.

File details

Details for the file urltotext-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: urltotext-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for urltotext-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 598b47b8e71a4ac07618aec5af09f6916340b174dfb57c5d26247c45fbe9765c
MD5 470dcc38a66a23b3514f49c13b9b9952
BLAKE2b-256 9eb5a9e7a8540124e27c1a8206a2eeecc63933c355c40b34dea536c4a6ae5a1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page