Skip to main content

A lightweight parser for DBLP website XML pages (record, person, venue pages), not the database dump.

Project description

dblp-webxml-parser

A lightweight Python library for parsing DBLP website XML pages.

⚠️ Note: This is NOT a parser for the DBLP XML database dump (dblp.xml.gz). This library parses the live XML pages served by the DBLP website (e.g., https://dblp.org/rec/xxx.xml).

Features

  • 🔍 RecordPageParser - Parse publication record pages (https://dblp.org/rec/xxx.xml)
  • 👤 PersonPageParser - Parse author/person pages (https://dblp.org/pid/xxx.xml)
  • 📚 VenuePageParser - Parse venue pages (https://dblp.org/db/xxx/index.xml)
  • 🪶 Zero dependencies - Uses only Python standard library
  • 🐍 Python 3.10+ - Modern Python with type hints

Installation

pip install dblp-webxml-parser

Quick Start

Parse a Publication Record

import urllib.request
from dblp_webxml_parser import RecordPageParser

# Fetch and parse a record page
url = "https://dblp.org/rec/conf/cvpr/HeZRS16.xml"
with urllib.request.urlopen(url) as response:
    xml_text = response.read().decode("utf-8")

record = RecordPageParser(xml_text)
print(f"Title: {record.title}")
print(f"Year: {record.year}")
print(f"Type: {record.type}")
print(f"Key: {record.key}")
print(f"Authors: {[a.name for a in record.authors]}")

Parse a Person Page

from dblp_webxml_parser import PersonPageParser

# Parse a person page
person = PersonPageParser(xml_text)
print(f"Name: {person.name}")
print(f"PID: {person.pid}")
print(f"Publications: {len(list(person.publications))}")

# Iterate over publications
for pub in person.publications:
    print(f"  - {pub.title} ({pub.year})")

Parse a Venue Page

from dblp_webxml_parser import VenuePageParser

# Parse a venue page (conference/journal)
venue = VenuePageParser(xml_text)
print(f"Venue: {venue.title}")

# Get all publications in this venue
for pub in venue.publications:
    print(f"  - {pub.title}")

API Reference

RecordPageParser

Parses record pages from https://dblp.org/rec/xxx.xml.

Property Type Description
key str | None DBLP paper key
type str Publication type (article, inproceedings, etc.)
title str | None Publication title
year int | None Publication year
month int | None Publication month
authors Iterator[RecordAuthor] Authors iterator
venue str | None Venue name (journal/booktitle/series)
venue_type str | None Venue type (journal/proceedings/book)
journal str | None Journal name
booktitle str | None Book/proceedings title
volume str | None Volume number
number str | None Issue number
pages str | None Page range
ees Iterator[str] Electronic edition URLs
url str | None DBLP URL
crossref str | None Crossref key

RecordAuthor

Author information from a record.

Property Type Description
name str | None Author name
pid str | None DBLP person ID
orcid str | None ORCID identifier

PersonPageParser

Parses person pages from https://dblp.org/pid/xxx.xml.

Property Type Description
pid str | None DBLP person ID
name str | None Primary name
names Iterator[str] All name variants
publications Iterator[RecordParser] Publications iterator

VenuePageParser

Parses venue pages from https://dblp.org/db/xxx/index.xml.

Property Type Description
title str | None Venue title
publications Iterator[RecordParser] Publications iterator

Comparison with Other Tools

Tool Purpose
dblp-webxml-parser (this) Parse live XML pages from DBLP website
dblp-parser Parse the DBLP XML database dump file

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dblp_webxml_parser-0.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dblp_webxml_parser-0.1.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file dblp_webxml_parser-0.1.0.tar.gz.

File metadata

  • Download URL: dblp_webxml_parser-0.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for dblp_webxml_parser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ba864148ba6381b277d2b7594ca6704af297b36a03f936076fb1961c02040058
MD5 438431f8d5798bebfcb32edcf6e7c9d2
BLAKE2b-256 4d2586860c87cc279925fa6943ad86a9b26359e8928faf5bb6fc45eb8182f0f8

See more details on using hashes here.

File details

Details for the file dblp_webxml_parser-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for dblp_webxml_parser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f3afa69e3c32bc21fc523660169d23343f4c191efe1d8c4fb957e8307c694cd
MD5 01a9dc1a8430569cea91c1d01656cb9a
BLAKE2b-256 388501f0f8b130af2537bba0456f776280fcbd60e91b5f7ee1a1dfc413f4d8c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page