Skip to main content

A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA) with Bible reference parsing

Project description

Bible XML Parser

PyPI Python Versions License Downloads

A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA). This package provides both direct parsing and database-backed approaches for handling Bible data in your Python applications.

Features

  • 📖 Parse Bible texts in multiple formats (USFX, OSIS, ZEFANIA)
  • 🔍 Automatic format detection
  • 🚀 Memory-efficient streaming XML parsing using defusedxml
  • 🗄️ SQLite database caching for improved performance
  • 🔎 Full-text search functionality (FTS5)
  • 📝 Bible reference parsing - Parse references like "John 3:16-18" or "Genesis 1:1-2:3"
  • 🔒 Secure XML parsing (protected against XXE attacks)
  • 📝 Type hints throughout for better IDE support
  • 🐍 Python 3.8+ support

Installation

pip install bible-xml-parser

Development Installation

git clone https://github.com/Omarzintan/bible_parser_python.git
cd bible_parser_python
pip install -e ".[dev]"

Quick Start

Direct Parsing Approach

Parse a Bible file directly without database caching:

from bible_parser import BibleParser

# Parse from file (format auto-detected)
parser = BibleParser('path/to/bible.xml')

# Or parse from string with explicit format
xml_content = open('bible.xml').read()
parser = BibleParser.from_string(xml_content, format='USFX')

# Iterate over books
for book in parser.books:
    print(f"{book.title} ({book.id})")
    print(f"  Chapters: {len(book.chapters)}")
    print(f"  Verses: {len(book.verses)}")

# Or iterate over verses directly
for verse in parser.verses:
    print(f"{verse.book_id} {verse.chapter_num}:{verse.num} - {verse.text}")

Database Approach (Recommended for Production)

For better performance, use the database approach:

from bible_parser import BibleRepository

# Create repository
repo = BibleRepository(xml_path='path/to/bible.xml', format='USFX')

# Initialize database (only needed once)
repo.initialize('my_bible.db')

# Get all books
books = repo.get_books()
for book in books:
    print(f"{book.title} ({book.id})")

# Get verses from a specific chapter
verses = repo.get_verses('gen', 1)  # Genesis chapter 1
for verse in verses:
    print(f"{verse.num}. {verse.text}")

# Get a specific verse
verse = repo.get_verse('jhn', 3, 16)  # John 3:16
if verse:
    print(verse.text)

# Search for verses containing specific text
results = repo.search_verses('love')
print(f"Found {len(results)} verses containing 'love'")

# Don't forget to close
repo.close()

Using Context Manager

from bible_parser import BibleRepository

with BibleRepository(xml_path='bible.xml') as repo:
    repo.initialize('my_bible.db')
    
    # Use the repository
    verses = repo.get_verses('mat', 5)  # Matthew chapter 5
    for verse in verses:
        print(f"{verse.num}. {verse.text}")
    
    # Search
    results = repo.search_verses('faith hope love')
    for verse in results:
        print(f"{verse.book_id} {verse.chapter_num}:{verse.num}")

# Database automatically closed

Bible Reference Parsing

Parse Bible references in various formats and retrieve verses:

from bible_parser import BibleReferenceFormatter, BibleRepository

with BibleRepository(xml_path='bible.xml') as repo:
    repo.initialize('bible.db')
    
    # Parse a simple verse reference
    ref = BibleReferenceFormatter.parse("John 3:16", repo)
    print(f"Book: {ref.book_id}, Chapter: {ref.chapter_num}, Verse: {ref.verse_num}")
    
    # Get verses directly (convenience method)
    verses = BibleReferenceFormatter.get_verses_from_reference("John 3:16-18", repo)
    for verse in verses:
        print(f"{verse.chapter_num}:{verse.num} - {verse.text}")
    
    # Extract first verse from complex references
    first = BibleReferenceFormatter.get_first_verse_in_reference("Genesis 1:1-2:3")
    print(first)  # "Genesis 1:1"
    
    # Validate book names
    is_valid = BibleReferenceFormatter.is_valid_book("John")  # True

Supported Reference Formats:

  • Single verse: "John 3:16"
  • Verse range: "John 3:16-18"
  • Multi-chapter: "Genesis 1:1-2:3"
  • Chapter only: "Psalm 23"
  • Multi-chapter range: "Ruth 1-4"
  • Complex patterns: "John 3:16,18,20-22"
  • Semicolon-separated: "Genesis 1:1-3;2:3-4"
  • With descriptions: "1 Samuel 17:1-58 (David and Goliath)"

Supported Formats

USFX (Unified Standard Format XML)

<usfx>
  <book id="gen">
    <c id="1"/>
    <v id="1">In the beginning...</v>
  </book>
</usfx>

OSIS (Open Scripture Information Standard)

<osis>
  <osisText>
    <div type="book" osisID="Gen">
      <verse osisID="Gen.1.1">In the beginning...</verse>
    </div>
  </osisText>
</osis>

Zefania XML

<XMLBIBLE>
  <BIBLEBOOK bnumber="1" bname="Genesis">
    <CHAPTER cnumber="1">
      <VERS vnumber="1">In the beginning...</VERS>
    </CHAPTER>
  </BIBLEBOOK>
</XMLBIBLE>

API Reference

BibleParser

Main parser class with automatic format detection.

Methods:

  • __init__(source, format=None) - Initialize parser
  • from_string(xml_content, format=None) - Create from XML string
  • books - Property that yields Book objects
  • verses - Property that yields Verse objects

BibleRepository

Database-backed repository for efficient Bible data access.

Methods:

  • __init__(xml_path=None, xml_string=None, format=None) - Initialize repository
  • initialize(database_name) - Create/open database
  • get_books() - Get all books
  • get_verses(book_id, chapter_num) - Get verses from a chapter
  • get_verse(book_id, chapter_num, verse_num) - Get a specific verse
  • get_chapter_count(book_id) - Get number of chapters in a book
  • search_verses(query, limit=100) - Full-text search
  • close() - Close database connection

BibleReferenceFormatter

Utility class for parsing Bible references.

Methods:

  • parse(reference, bible_repository) - Parse a reference string into a BibleReference object
  • get_verses_from_reference(reference, bible_repository) - Parse and retrieve verses in one call
  • get_first_verse_in_reference(reference) - Extract the first verse from a complex reference
  • is_valid_book(book_name) - Check if a book name is valid

Data Models

Verse:

  • num (int) - Verse number
  • chapter_num (int) - Chapter number
  • text (str) - Verse text
  • book_id (str) - Book identifier

Chapter:

  • num (int) - Chapter number
  • verses (List[Verse]) - List of verses

Book:

  • id (str) - Book identifier (e.g., 'gen', 'mat')
  • num (int) - Book number
  • title (str) - Book title (e.g., 'Genesis', 'Matthew')
  • chapters (List[Chapter]) - List of chapters
  • verses (List[Verse]) - Flat list of all verses

BibleReference:

  • book_id (str) - Book identifier
  • chapter_num (int) - Starting chapter number
  • verse_num (int) - Starting verse number (None for chapter-only)
  • end_chapter_num (int) - Ending chapter for multi-chapter ranges
  • end_verse_num (int) - Ending verse for verse ranges
  • is_chapter_only (bool) - True if reference is chapter-only
  • additional_verses (List[VerseRange]) - Additional verses for complex patterns

VerseRange:

  • chapter_num (int) - Chapter number (optional)
  • start_verse (int) - Starting verse number
  • end_verse (int) - Ending verse number (None for single verse)

Performance Considerations

Direct Parsing

Pros:

  • Simple implementation
  • No database setup required
  • Always uses the latest source files

Cons:

  • CPU and memory intensive
  • Slower for repeated access
  • Repeated parsing on each run

Database Approach

Pros:

  • Much faster access once data is loaded
  • Lower memory usage during queries
  • Efficient full-text search with FTS5
  • Works offline without re-parsing

Cons:

  • Initial setup time
  • Requires disk space
  • Additional complexity

Security

This package uses defusedxml for secure XML parsing, protecting against:

  • XXE (XML External Entity) attacks - Prevents reading local files or making network requests
  • Billion Laughs attack - Prevents exponential entity expansion
  • Quadratic blowup - Prevents memory exhaustion

All database queries use parameterized statements to prevent SQL injection.

Examples

See the examples/ directory for complete working examples:

  • direct_parsing.py - Direct parsing example
  • database_approach.py - Database caching example
  • search_example.py - Full-text search example

Testing

Run tests with pytest:

pytest

With coverage:

pytest --cov=bible_parser --cov-report=term-missing

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Changelog

See CHANGELOG.md for version history.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bible_xml_parser-0.3.0.tar.gz (40.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bible_xml_parser-0.3.0-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file bible_xml_parser-0.3.0.tar.gz.

File metadata

  • Download URL: bible_xml_parser-0.3.0.tar.gz
  • Upload date:
  • Size: 40.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.1

File hashes

Hashes for bible_xml_parser-0.3.0.tar.gz
Algorithm Hash digest
SHA256 de2361074bd1e674d189864d9fe2a167cf8a78e5414c9a80f777de28cff5a656
MD5 2b39875e74c129e3da26b02ee0b5de3f
BLAKE2b-256 eb8c46656e063479a37ba4e19e7955ddbe0bb73beb05ec6f2509a2c447fbc6f7

See more details on using hashes here.

File details

Details for the file bible_xml_parser-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bible_xml_parser-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 deecac9cb817294f6245d445dee2d78410c42b2e54eef8983b47bf1abcfd8b3c
MD5 bc4f1c361434b99f25049d5101810a14
BLAKE2b-256 b335d2ce1670d03985a74e5ad20efd4e9142ac08d65caece85884098eda83c78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page