Skip to main content

A tool for scraping bible verses from the web

Project description

Biblescrapeway

A scraping tool for pulling bible verses from the web, check it out here!

Basic Usage

Install with pip

 $ pip3 install biblescrapeway

CLI

biblescrapeway comes with a simple cli (bsw) to pull specific bible passages:

 $ bsw John3.16

You can also specify a translation (default is ESV):

 $ bsw --translation KJV John3.16

Or, get multiple verses with comma delimiting:

 $ bsw John3.16,1Peter3:8

Or, get a range of verses using a hyphon

 $ bsw John3.16-17

You can specify a formatting type with the --format/-f option, which exposes raw json:

 $ bsw -f json John3.16

You can also set the --cache/--no-cache flag to cache the results of queries locally, so that they can just be looked up on repeated evaluations. By default, bsw uses --cache.

 $ bsw --no-cache John3.16 # scraps the verse from the web
 $ bsw --no-cache John3.16 # scraps the verse from the web again
 $ bsw --cache John3.16    # scraps the verse, then saves it locally at '~/.bsw_cache.json'
 $ bsw --cache John3.16    # looks up the verse locally, does not re-scrape it
 $ bsw --no-cache John3.16 # scraps the verse from the web again

Programmatic

It is also possible to get full verse objects via python, using the query function:

from biblescrapeway import query
verse = query("John 3:16", version = "NIV")[0]
verse.to_dict()

The function returns a scraper.Verse object, which can be convered into a dict using the .to_dict() method. The resulting object has the following format:

{
    "book"    : "str | name of the bible book",
    "chapter" : "int | chapter number",
    "verse"   : "int | verse number",
    "version" : "str | bible version abbreviation",
    "text"    : "str | text content of the verse",
    "footnotes" : [
        {
            "str_index" : "int | index in text string of footnote location",
            "html"      : "str | html of footnote content"
        }
    ],
    "crossrefs" : [
        {
            "str_index" : "int  | index in text string of footnote location",
            "ref_list"  : "list | list of strings of cross referenced verses"
        }
    ]
}

The caching functionality is also accessible from the query function as:

verse_list = query("John3.16", cache=True) # scraps from the web
verse_list = query("John3.16", cache=True) # just looks result up

Development

# Create the venv
python3 -m venv venv
./venv/bin/pip install -r requirements.txt

# install for development
./venv/bin/pip install --editable .

# Test
./scripts/run_tests.sh

# Build
./scripts/build.sh

# Deploy
twine upload dist/*

Known Bugs

TODO

  • Add more than just bgw as the scraping backend
  • More carefully handle formatting (unicode, text transforms, woj, etc).
  • Add WAY more documentations, like some docstrings for the modules . .
  • Add more unit tests
  • expand cli?
  • finish string_cleaner to convert special unicode characters into simpler characters
  • standardize some of the naming -- inconsisten use of reference to sometimes mean Range, also, scrape is pretty overloaded.
  • Descide how to handle 'Genesis 1:3-4:5,6', does that last one mean verse 6 or chapter 6?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biblescrapeway-0.3.3.tar.gz (13.5 kB view details)

Uploaded Source

File details

Details for the file biblescrapeway-0.3.3.tar.gz.

File metadata

  • Download URL: biblescrapeway-0.3.3.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for biblescrapeway-0.3.3.tar.gz
Algorithm Hash digest
SHA256 5a9c029064afd7460f49edbb69dbb90b04feb5b57703612ce58eeda5f7c69f7d
MD5 338d4075c3690cfdcee91361b02a8bbe
BLAKE2b-256 be1fd5661c0e43ea998bea41cfc1989e1dcc184aa9a3dc11a3a8e2d1d5d7f5fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page