A lightweight parser for DBLP website XML pages (record, person, venue pages), not the database dump.
Project description
dblp-webxml-parser
A lightweight Python library for parsing DBLP website XML pages.
⚠️ Note: This is NOT a parser for the DBLP XML database dump (
dblp.xml.gz). This library parses the live XML pages served by the DBLP website (e.g.,https://dblp.org/rec/xxx.xml).
Features
- 🔍 RecordPageParser - Parse publication record pages (
https://dblp.org/rec/xxx.xml) - 👤 PersonPageParser - Parse author/person pages (
https://dblp.org/pid/xxx.xml) - 📚 VenuePageParser - Parse venue pages (
https://dblp.org/db/xxx/index.xml) - 🪶 Zero dependencies - Uses only Python standard library
- 🐍 Python 3.10+ - Modern Python with type hints
Installation
pip install dblp-webxml-parser
Quick Start
Parse a Publication Record
import urllib.request
from dblp_webxml_parser import RecordPageParser
# Fetch and parse a record page
url = "https://dblp.org/rec/conf/cvpr/HeZRS16.xml"
with urllib.request.urlopen(url) as response:
xml_text = response.read().decode("utf-8")
record = RecordPageParser(xml_text)
print(f"Title: {record.title}")
print(f"Year: {record.year}")
print(f"Type: {record.type}")
print(f"Key: {record.key}")
print(f"Authors: {[a.name for a in record.authors]}")
Parse a Person Page
from dblp_webxml_parser import PersonPageParser
# Parse a person page
person = PersonPageParser(xml_text)
print(f"Name: {person.name}")
print(f"PID: {person.pid}")
print(f"Publications: {len(list(person.publications))}")
# Iterate over publications
for pub in person.publications:
print(f" - {pub.title} ({pub.year})")
Parse a Venue Page
from dblp_webxml_parser import VenuePageParser
# Parse a venue page (conference/journal)
venue = VenuePageParser(xml_text)
print(f"Venue: {venue.title}")
# Get all publications in this venue
for pub in venue.publications:
print(f" - {pub.title}")
API Reference
RecordPageParser
Parses record pages from https://dblp.org/rec/xxx.xml.
| Property | Type | Description |
|---|---|---|
key |
str | None |
DBLP paper key |
type |
str |
Publication type (article, inproceedings, etc.) |
title |
str | None |
Publication title |
year |
int | None |
Publication year |
month |
int | None |
Publication month |
authors |
Iterator[RecordAuthor] |
Authors iterator |
venue |
str | None |
Venue name (journal/booktitle/series) |
venue_type |
str | None |
Venue type (journal/proceedings/book) |
journal |
str | None |
Journal name |
booktitle |
str | None |
Book/proceedings title |
volume |
str | None |
Volume number |
number |
str | None |
Issue number |
pages |
str | None |
Page range |
ees |
Iterator[str] |
Electronic edition URLs |
url |
str | None |
DBLP URL |
crossref |
str | None |
Crossref key |
RecordAuthor
Author information from a record.
| Property | Type | Description |
|---|---|---|
name |
str | None |
Author name |
pid |
str | None |
DBLP person ID |
orcid |
str | None |
ORCID identifier |
PersonPageParser
Parses person pages from https://dblp.org/pid/xxx.xml.
| Property | Type | Description |
|---|---|---|
pid |
str | None |
DBLP person ID |
name |
str | None |
Primary name |
names |
Iterator[str] |
All name variants |
publications |
Iterator[RecordParser] |
Publications iterator |
VenuePageParser
Parses venue pages from https://dblp.org/db/xxx/index.xml.
| Property | Type | Description |
|---|---|---|
title |
str | None |
Venue title |
publications |
Iterator[RecordParser] |
Publications iterator |
Comparison with Other Tools
| Tool | Purpose |
|---|---|
| dblp-webxml-parser (this) | Parse live XML pages from DBLP website |
| dblp-parser | Parse the DBLP XML database dump file |
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dblp_webxml_parser-0.1.0.tar.gz.
File metadata
- Download URL: dblp_webxml_parser-0.1.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba864148ba6381b277d2b7594ca6704af297b36a03f936076fb1961c02040058
|
|
| MD5 |
438431f8d5798bebfcb32edcf6e7c9d2
|
|
| BLAKE2b-256 |
4d2586860c87cc279925fa6943ad86a9b26359e8928faf5bb6fc45eb8182f0f8
|
File details
Details for the file dblp_webxml_parser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dblp_webxml_parser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f3afa69e3c32bc21fc523660169d23343f4c191efe1d8c4fb957e8307c694cd
|
|
| MD5 |
01a9dc1a8430569cea91c1d01656cb9a
|
|
| BLAKE2b-256 |
388501f0f8b130af2537bba0456f776280fcbd60e91b5f7ee1a1dfc413f4d8c5
|