A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA) with Bible reference parsing
Project description
Bible XML Parser
A Python package for parsing Bible texts in various XML formats (USFX, OSIS, ZEFANIA). This package provides both direct parsing and database-backed approaches for handling Bible data in your Python applications.
Features
- 📖 Parse Bible texts in multiple formats (USFX, OSIS, ZEFANIA)
- 🔍 Automatic format detection
- 🚀 Memory-efficient streaming XML parsing using defusedxml
- 🗄️ SQLite database caching for improved performance
- 🔎 Full-text search functionality (FTS5)
- 📝 Bible reference parsing - Parse references like "John 3:16-18" or "Genesis 1:1-2:3"
- 🔒 Secure XML parsing (protected against XXE attacks)
- 📝 Type hints throughout for better IDE support
- 🐍 Python 3.8+ support
Installation
pip install bible-xml-parser
Development Installation
git clone https://github.com/Omarzintan/bible_parser_python.git
cd bible_parser_python
pip install -e ".[dev]"
Quick Start
Direct Parsing Approach
Parse a Bible file directly without database caching:
from bible_parser import BibleParser
# Parse from file (format auto-detected)
parser = BibleParser('path/to/bible.xml')
# Or parse from string with explicit format
xml_content = open('bible.xml').read()
parser = BibleParser.from_string(xml_content, format='USFX')
# Iterate over books
for book in parser.books:
print(f"{book.title} ({book.id})")
print(f" Chapters: {len(book.chapters)}")
print(f" Verses: {len(book.verses)}")
# Or iterate over verses directly
for verse in parser.verses:
print(f"{verse.book_id} {verse.chapter_num}:{verse.num} - {verse.text}")
Database Approach (Recommended for Production)
For better performance, use the database approach:
from bible_parser import BibleRepository
# Create repository
repo = BibleRepository(xml_path='path/to/bible.xml', format='USFX')
# Initialize database (only needed once)
repo.initialize('my_bible.db')
# Get all books
books = repo.get_books()
for book in books:
print(f"{book.title} ({book.id})")
# Get verses from a specific chapter
verses = repo.get_verses('gen', 1) # Genesis chapter 1
for verse in verses:
print(f"{verse.num}. {verse.text}")
# Get a specific verse
verse = repo.get_verse('jhn', 3, 16) # John 3:16
if verse:
print(verse.text)
# Search for verses containing specific text
results = repo.search_verses('love')
print(f"Found {len(results)} verses containing 'love'")
# Don't forget to close
repo.close()
Using Context Manager
from bible_parser import BibleRepository
with BibleRepository(xml_path='bible.xml') as repo:
repo.initialize('my_bible.db')
# Use the repository
verses = repo.get_verses('mat', 5) # Matthew chapter 5
for verse in verses:
print(f"{verse.num}. {verse.text}")
# Search
results = repo.search_verses('faith hope love')
for verse in results:
print(f"{verse.book_id} {verse.chapter_num}:{verse.num}")
# Database automatically closed
Bible Reference Parsing
Parse Bible references in various formats and retrieve verses:
from bible_parser import BibleReferenceFormatter, BibleRepository
with BibleRepository(xml_path='bible.xml') as repo:
repo.initialize('bible.db')
# Parse a simple verse reference
ref = BibleReferenceFormatter.parse("John 3:16", repo)
print(f"Book: {ref.book_id}, Chapter: {ref.chapter_num}, Verse: {ref.verse_num}")
# Get verses directly (convenience method)
verses = BibleReferenceFormatter.get_verses_from_reference("John 3:16-18", repo)
for verse in verses:
print(f"{verse.chapter_num}:{verse.num} - {verse.text}")
# Extract first verse from complex references
first = BibleReferenceFormatter.get_first_verse_in_reference("Genesis 1:1-2:3")
print(first) # "Genesis 1:1"
# Validate book names
is_valid = BibleReferenceFormatter.is_valid_book("John") # True
Supported Reference Formats:
- Single verse:
"John 3:16" - Verse range:
"John 3:16-18" - Multi-chapter:
"Genesis 1:1-2:3" - Chapter only:
"Psalm 23" - Multi-chapter range:
"Ruth 1-4" - Complex patterns:
"John 3:16,18,20-22" - Semicolon-separated:
"Genesis 1:1-3;2:3-4" - With descriptions:
"1 Samuel 17:1-58 (David and Goliath)"
Supported Formats
USFX (Unified Standard Format XML)
<usfx>
<book id="gen">
<c id="1"/>
<v id="1">In the beginning...</v>
</book>
</usfx>
OSIS (Open Scripture Information Standard)
<osis>
<osisText>
<div type="book" osisID="Gen">
<verse osisID="Gen.1.1">In the beginning...</verse>
</div>
</osisText>
</osis>
Zefania XML
<XMLBIBLE>
<BIBLEBOOK bnumber="1" bname="Genesis">
<CHAPTER cnumber="1">
<VERS vnumber="1">In the beginning...</VERS>
</CHAPTER>
</BIBLEBOOK>
</XMLBIBLE>
API Reference
BibleParser
Main parser class with automatic format detection.
Methods:
__init__(source, format=None)- Initialize parserfrom_string(xml_content, format=None)- Create from XML stringbooks- Property that yields Book objectsverses- Property that yields Verse objects
BibleRepository
Database-backed repository for efficient Bible data access.
Methods:
__init__(xml_path=None, xml_string=None, format=None)- Initialize repositoryinitialize(database_name)- Create/open databaseget_books()- Get all booksget_verses(book_id, chapter_num)- Get verses from a chapterget_verse(book_id, chapter_num, verse_num)- Get a specific verseget_chapter_count(book_id)- Get number of chapters in a booksearch_verses(query, limit=100)- Full-text searchclose()- Close database connection
BibleReferenceFormatter
Utility class for parsing Bible references.
Methods:
parse(reference, bible_repository)- Parse a reference string into a BibleReference objectget_verses_from_reference(reference, bible_repository)- Parse and retrieve verses in one callget_first_verse_in_reference(reference)- Extract the first verse from a complex referenceis_valid_book(book_name)- Check if a book name is valid
Data Models
Verse:
num(int) - Verse numberchapter_num(int) - Chapter numbertext(str) - Verse textbook_id(str) - Book identifier
Chapter:
num(int) - Chapter numberverses(List[Verse]) - List of verses
Book:
id(str) - Book identifier (e.g., 'gen', 'mat')num(int) - Book numbertitle(str) - Book title (e.g., 'Genesis', 'Matthew')chapters(List[Chapter]) - List of chaptersverses(List[Verse]) - Flat list of all verses
BibleReference:
book_id(str) - Book identifierchapter_num(int) - Starting chapter numberverse_num(int) - Starting verse number (None for chapter-only)end_chapter_num(int) - Ending chapter for multi-chapter rangesend_verse_num(int) - Ending verse for verse rangesis_chapter_only(bool) - True if reference is chapter-onlyadditional_verses(List[VerseRange]) - Additional verses for complex patterns
VerseRange:
chapter_num(int) - Chapter number (optional)start_verse(int) - Starting verse numberend_verse(int) - Ending verse number (None for single verse)
Performance Considerations
Direct Parsing
Pros:
- Simple implementation
- No database setup required
- Always uses the latest source files
Cons:
- CPU and memory intensive
- Slower for repeated access
- Repeated parsing on each run
Database Approach
Pros:
- Much faster access once data is loaded
- Lower memory usage during queries
- Efficient full-text search with FTS5
- Works offline without re-parsing
Cons:
- Initial setup time
- Requires disk space
- Additional complexity
Security
This package uses defusedxml for secure XML parsing, protecting against:
- XXE (XML External Entity) attacks - Prevents reading local files or making network requests
- Billion Laughs attack - Prevents exponential entity expansion
- Quadratic blowup - Prevents memory exhaustion
All database queries use parameterized statements to prevent SQL injection.
Examples
See the examples/ directory for complete working examples:
direct_parsing.py- Direct parsing exampledatabase_approach.py- Database caching examplesearch_example.py- Full-text search example
Testing
Run tests with pytest:
pytest
With coverage:
pytest --cov=bible_parser --cov-report=term-missing
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Inspired by the Ruby bible_parser library
- Flutter bible_parser_flutter implementation
- Bible XML files from the open-bibles repository
Changelog
See CHANGELOG.md for version history.
Support
- 📫 Issues: GitHub Issues
- 📖 Documentation: GitHub Wiki
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bible_xml_parser-0.2.1.tar.gz.
File metadata
- Download URL: bible_xml_parser-0.2.1.tar.gz
- Upload date:
- Size: 40.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2315d42de8573d9e730640b1665063d68401081687a8f9eee1a2398fa63cd1d3
|
|
| MD5 |
519a88a0cb9673034b2989c45b8f5ee9
|
|
| BLAKE2b-256 |
4584fd9a1404a88a344d67c8df628b5ee9beefefa83838ea0e4afe45f60494a3
|
File details
Details for the file bible_xml_parser-0.2.1-py3-none-any.whl.
File metadata
- Download URL: bible_xml_parser-0.2.1-py3-none-any.whl
- Upload date:
- Size: 28.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e817498a0ab2c9579e772dd70019bcbe61fea55bc47362d532b5a56bb6e9bf8
|
|
| MD5 |
2d425a846cfb3eaf9d2922bb2981546e
|
|
| BLAKE2b-256 |
fc53b5b112c31e5f7a28bdf8afa0d93ce84789582b67d13dd04e4c3d2bc554cb
|