Skip to main content

A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA)

Project description

Bible XML Parser

A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA). This package provides both direct parsing and database-backed approaches for handling Bible data in your Python applications.

Features

  • 📖 Parse Bible texts in multiple formats (USFX, OSIS, ZEFANIA)
  • 🔍 Automatic format detection
  • 🚀 Memory-efficient streaming XML parsing using defusedxml
  • 🗄️ SQLite database caching for improved performance
  • 🔎 Full-text search functionality (FTS5)
  • 🔒 Secure XML parsing (protected against XXE attacks)
  • 📝 Type hints throughout for better IDE support
  • 🐍 Python 3.8+ support

Installation

pip install bible-xml-parser

Development Installation

git clone https://github.com/Omarzintan/bible_parser_python.git
cd bible_parser_python
pip install -e ".[dev]"

Quick Start

Direct Parsing Approach

Parse a Bible file directly without database caching:

from bible_parser import BibleParser

# Parse from file (format auto-detected)
parser = BibleParser('path/to/bible.xml')

# Or parse from string with explicit format
xml_content = open('bible.xml').read()
parser = BibleParser.from_string(xml_content, format='USFX')

# Iterate over books
for book in parser.books:
    print(f"{book.title} ({book.id})")
    print(f"  Chapters: {len(book.chapters)}")
    print(f"  Verses: {len(book.verses)}")

# Or iterate over verses directly
for verse in parser.verses:
    print(f"{verse.book_id} {verse.chapter_num}:{verse.num} - {verse.text}")

Database Approach (Recommended for Production)

For better performance, use the database approach:

from bible_parser import BibleRepository

# Create repository
repo = BibleRepository(xml_path='path/to/bible.xml', format='USFX')

# Initialize database (only needed once)
repo.initialize('my_bible.db')

# Get all books
books = repo.get_books()
for book in books:
    print(f"{book.title} ({book.id})")

# Get verses from a specific chapter
verses = repo.get_verses('gen', 1)  # Genesis chapter 1
for verse in verses:
    print(f"{verse.num}. {verse.text}")

# Get a specific verse
verse = repo.get_verse('jhn', 3, 16)  # John 3:16
if verse:
    print(verse.text)

# Search for verses containing specific text
results = repo.search_verses('love')
print(f"Found {len(results)} verses containing 'love'")

# Don't forget to close
repo.close()

Using Context Manager

from bible_parser import BibleRepository

with BibleRepository(xml_path='bible.xml') as repo:
    repo.initialize('my_bible.db')
    
    # Use the repository
    verses = repo.get_verses('mat', 5)  # Matthew chapter 5
    for verse in verses:
        print(f"{verse.num}. {verse.text}")
    
    # Search
    results = repo.search_verses('faith hope love')
    for verse in results:
        print(f"{verse.book_id} {verse.chapter_num}:{verse.num}")

# Database automatically closed

Supported Formats

USFX (Unified Standard Format XML)

<usfx>
  <book id="gen">
    <c id="1"/>
    <v id="1">In the beginning...</v>
  </book>
</usfx>

OSIS (Open Scripture Information Standard)

<osis>
  <osisText>
    <div type="book" osisID="Gen">
      <verse osisID="Gen.1.1">In the beginning...</verse>
    </div>
  </osisText>
</osis>

Zefania XML

<XMLBIBLE>
  <BIBLEBOOK bnumber="1" bname="Genesis">
    <CHAPTER cnumber="1">
      <VERS vnumber="1">In the beginning...</VERS>
    </CHAPTER>
  </BIBLEBOOK>
</XMLBIBLE>

API Reference

BibleParser

Main parser class with automatic format detection.

Methods:

  • __init__(source, format=None) - Initialize parser
  • from_string(xml_content, format=None) - Create from XML string
  • books - Property that yields Book objects
  • verses - Property that yields Verse objects

BibleRepository

Database-backed repository for efficient Bible data access.

Methods:

  • __init__(xml_path=None, xml_string=None, format=None) - Initialize repository
  • initialize(database_name) - Create/open database
  • get_books() - Get all books
  • get_verses(book_id, chapter_num) - Get verses from a chapter
  • get_verse(book_id, chapter_num, verse_num) - Get a specific verse
  • get_chapter_count(book_id) - Get number of chapters in a book
  • search_verses(query, limit=100) - Full-text search
  • close() - Close database connection

Data Models

Verse:

  • num (int) - Verse number
  • chapter_num (int) - Chapter number
  • text (str) - Verse text
  • book_id (str) - Book identifier

Chapter:

  • num (int) - Chapter number
  • verses (List[Verse]) - List of verses

Book:

  • id (str) - Book identifier (e.g., 'gen', 'mat')
  • num (int) - Book number
  • title (str) - Book title (e.g., 'Genesis', 'Matthew')
  • chapters (List[Chapter]) - List of chapters
  • verses (List[Verse]) - Flat list of all verses

Performance Considerations

Direct Parsing

Pros:

  • Simple implementation
  • No database setup required
  • Always uses the latest source files

Cons:

  • CPU and memory intensive
  • Slower for repeated access
  • Repeated parsing on each run

Database Approach

Pros:

  • Much faster access once data is loaded
  • Lower memory usage during queries
  • Efficient full-text search with FTS5
  • Works offline without re-parsing

Cons:

  • Initial setup time
  • Requires disk space
  • Additional complexity

Security

This package uses defusedxml for secure XML parsing, protecting against:

  • XXE (XML External Entity) attacks - Prevents reading local files or making network requests
  • Billion Laughs attack - Prevents exponential entity expansion
  • Quadratic blowup - Prevents memory exhaustion

All database queries use parameterized statements to prevent SQL injection.

Examples

See the examples/ directory for complete working examples:

  • direct_parsing.py - Direct parsing example
  • database_approach.py - Database caching example
  • search_example.py - Full-text search example

Testing

Run tests with pytest:

pytest

With coverage:

pytest --cov=bible_parser --cov-report=term-missing

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Changelog

See CHANGELOG.md for version history.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bible_xml_parser-0.1.0.tar.gz (24.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bible_xml_parser-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file bible_xml_parser-0.1.0.tar.gz.

File metadata

  • Download URL: bible_xml_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 24.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.1

File hashes

Hashes for bible_xml_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 baa2bca7642b1c3b70704f7a7a9fa4f2cd3a361b1fe47ad561653324c5a5cb9c
MD5 59abdc608d86ce887f5579c92fe7de53
BLAKE2b-256 6e3dc15b6451a2d97c789c152fb393d0e876393a14a4f1a90074d4cb23d41d38

See more details on using hashes here.

File details

Details for the file bible_xml_parser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bible_xml_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ea2303e4dc152d6ce675c473b5146f87540c9b4c8d303212298e7a627e90b3e5
MD5 e63764547eef861c198176e3c9df72ba
BLAKE2b-256 766f88b8fb253919c7bbf33699a8d85360770fc9896ab6efa22fa9246d06da29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page