Skip to main content

Fetch webpage full-text, persist link and full text to SQLITE3 db, resumable with tqdm progressbar.

Project description

web2db

Fetches the full text of input URLs and persists them to sqlite3 DB file.
Fetching is resumable and comes with a progressbar.

Install:

pip install web2db

Quickstart:

import web2db  
web2db.dump('data.db', urls=[
    'https://www.google.com',
    'https://www.yahoo.com',
    'https://www.msn.com'
])

Query the DB file:

df = web2db.to_df(sqlite3_file_path)
print(df.shape)
print(df)

SQL Schema:

  • Table:
    • WebPages

      url fulltext status_code
      text text int

Features:

  • Resumable webpage fetching
  • Saves to local SQLITE3 DB
  • tqdm progress bar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web2db-0.1.5.tar.gz (2.1 kB view details)

Uploaded Source

Built Distribution

web2db-0.1.5-py2.py3-none-any.whl (2.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file web2db-0.1.5.tar.gz.

File metadata

  • Download URL: web2db-0.1.5.tar.gz
  • Upload date:
  • Size: 2.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.8

File hashes

Hashes for web2db-0.1.5.tar.gz
Algorithm Hash digest
SHA256 87f3f2e109e446f913c116396022de63765f11ae33998bbefc4e5b25cc348ff2
MD5 649d04538646f5394850a95e77e25b4b
BLAKE2b-256 a5ccc266bd98567645f011b94b3b0b0bd40e30050e4438ff8250b62464d596b1

See more details on using hashes here.

File details

Details for the file web2db-0.1.5-py2.py3-none-any.whl.

File metadata

  • Download URL: web2db-0.1.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 2.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.8

File hashes

Hashes for web2db-0.1.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 4d25c612be301941d1d15c1028a9e824fa203bdd08c042bc6c17d28fd4306798
MD5 c995a565a03c099601756fb2cc9b5709
BLAKE2b-256 a20f3751d7f307830ceff167bfbc44d7257b0876872f9cf21f5007c3a3960e11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page