Skip to main content

Fetch webpage full-text, persist link and full text to SQLITE3 db, resumable with tqdm progressbar.

Project description

web2db

Fetches the full text of input URLs and persists them to sqlite3 DB file.
Fetching is resumable and comes with a progressbar.

Install:

pip install web2db

Quickstart:

import web2db  
web2db.dump('data.db', urls=[
    'https://www.google.com',
    'https://www.yahoo.com',
    'https://www.msn.com'
])

Query the DB file:

df = web2db.to_df(sqlite3_file_path)
print(df.shape)
print(df)

SQL Schema:

  • Table:
    • WebPages

      url fulltext status_code
      text text int

Features:

  • Resumable webpage fetching
  • Saves to local SQLITE3 DB
  • tqdm progress bar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

web2db-0.1.4.tar.gz (2.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

web2db-0.1.4-py2.py3-none-any.whl (2.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file web2db-0.1.4.tar.gz.

File metadata

  • Download URL: web2db-0.1.4.tar.gz
  • Upload date:
  • Size: 2.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.8

File hashes

Hashes for web2db-0.1.4.tar.gz
Algorithm Hash digest
SHA256 2f840fa767e22a6ae3b7aa3976051945652a4c07f44a23e859b0fbac5033af52
MD5 10c739cabc97326d7465a9e1d2a433df
BLAKE2b-256 fbecc87ddb2cc65dfe8d59e3673ea5279a43076942e3baf1af682d815edd19d7

See more details on using hashes here.

File details

Details for the file web2db-0.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: web2db-0.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 2.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.8

File hashes

Hashes for web2db-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c35452b6d873a44c7533cf64a5b868d79b0b4b007d9d971980a861479a1ee4c2
MD5 f3dd943231d707737a1a27a6c5e9f543
BLAKE2b-256 83c73059535c8c81fb3d62e78634c6a692b029fd2cb100dc1018e800d1a8228f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page