A Python module to scrape PhD offers from academicpositions.com

These details have not been verified by PyPI

Project description

PhD Scraper for Academic Positions

A Python module to scrape PhD offers from academicpositions.com.

Features

🔍 Scrape PhD positions with filtering by country and field
📋 Extract detailed information: title, university, requirements, deadlines, etc.
💾 Export to JSON, CSV, or Markdown formats
🔄 Iterator support for memory-efficient processing
⚡ Concurrent fetching with rate limiting
🖥️ Command-line interface included

Installation

# Clone or navigate to the project directory
cd PhDFinder

# Install dependencies
pip install -e .

# Or install dependencies directly
pip install requests beautifulsoup4

Quick Start

Python API

from phd_scraper import AcademicPositionsScraper, PhDPosition

# Create scraper instance
scraper = AcademicPositionsScraper()

# Get PhD positions (basic usage)
positions = scraper.get_phd_positions(max_pages=2)

# Print results
for pos in positions:
    print(f"{pos.title} at {pos.university}")
    print(f"  Location: {pos.location}")
    print(f"  Deadline: {pos.deadline}")
    print(f"  URL: {pos.url}")
    print()

Filter by Country and Field

# Get Computer Science PhDs in Germany
positions = scraper.get_phd_positions(
    max_pages=3,
    country="germany",
    field="computer-science"
)

# Get Physics PhDs in Switzerland
positions = scraper.get_phd_positions(
    country="switzerland",
    field="physics"
)

Search with Keywords

# Search for specific keywords
positions = scraper.search_positions(
    keywords=["machine learning", "deep learning", "AI"],
    country="germany",
    max_pages=5
)

Export Results

from phd_scraper.utils import export_to_json, export_to_csv, export_to_markdown

# Get positions
positions = scraper.get_phd_positions(max_pages=2)

# Export to different formats
export_to_json(positions, "phd_positions.json")
export_to_csv(positions, "phd_positions.csv")
export_to_markdown(positions, "phd_positions.md")

Memory-Efficient Iterator

# Process positions one at a time (good for large datasets)
for position in scraper.iter_positions(country="sweden"):
    print(position.summary())
    # Process each position without loading all into memory

Command-Line Interface

# Basic usage - get 2 pages of positions
python -m phd_scraper --pages 2

# Filter by country and field
python -m phd_scraper --country germany --field computer-science

# Export to JSON
python -m phd_scraper --output positions.json --format json --pages 3

# Export to CSV
python -m phd_scraper --output positions.csv --format csv

# Search with keywords
python -m phd_scraper --keywords "machine learning" "neural networks" --pages 5

# List available filters
python -m phd_scraper --list-filters

# Fast mode (skip detailed info)
python -m phd_scraper --no-details --pages 10

# Verbose output
python -m phd_scraper --verbose --pages 1

Available Filters

Countries

germany, sweden, belgium, switzerland, netherlands, finland
norway, austria, france, united-kingdom, united-states
italy, spain, denmark, luxembourg

Fields

computer-science, physics, chemistry, biology, mathematics
engineering, medicine, economics, social-science, geosciences
artificial-intelligence, machine-learning, psychology, law

Data Model

Each PhDPosition object contains:

Field	Description
`title`	Position title
`university`	University/employer name
`location`	Full location (city, country)
`country`	Country name
`city`	City name
`deadline`	Application deadline
`published_date`	When the position was published
`job_type`	Type of position (PhD)
`fields`	Research fields/disciplines
`description`	Full job description
`requirements`	Qualifications needed
`benefits`	What the position offers
`url`	Link to the job posting
`apply_url`	Direct application link

Configuration

scraper = AcademicPositionsScraper(
    request_delay=1.5,      # Delay between requests (seconds)
    timeout=30,             # Request timeout (seconds)
    max_retries=3,          # Number of retry attempts
    user_agent="Custom UA"  # Custom user agent string
)

Utility Functions

from phd_scraper.utils import (
    filter_positions,
    deduplicate_positions,
    sort_positions
)

# Filter positions
filtered = filter_positions(
    positions,
    keywords=["AI", "robotics"],
    countries=["germany", "switzerland"],
    has_deadline=True
)

# Remove duplicates
unique = deduplicate_positions(positions)

# Sort by field
sorted_pos = sort_positions(positions, by="deadline")

Example Output

[1] PhD Position in AI and Strategy
    University: ETH Zürich
    Location: Zurich, Switzerland
    Deadline: Unspecified
    Fields: Business Administration, Management, Artificial Intelligence
    URL: https://academicpositions.com/ad/eth-zurich/2026/...

[2] Doctoral student in Radiofrequency ranging for Lunar orbits
    University: KTH Royal Institute of Technology
    Location: Stockholm, Sweden
    Deadline: 2026-01-31 (Europe/Stockholm)
    Fields: Physics, Space Science
    URL: https://academicpositions.com/ad/kth-royal-institute-of-technology/2025/...

Important Notes

Rate Limiting: The scraper includes built-in delays to be respectful of the server
Terms of Service: Please review academicpositions.com's terms before scraping
Data Accuracy: Always verify position details on the original website
Updates: Website structure may change; report issues if scraping fails

License

MIT License

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.0

Jan 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

academic_phd_scraper-1.0.0.tar.gz (12.7 kB view details)

Uploaded Jan 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

academic_phd_scraper-1.0.0-py3-none-any.whl (14.5 kB view details)

Uploaded Jan 14, 2026 Python 3

File details

Details for the file academic_phd_scraper-1.0.0.tar.gz.

File metadata

Download URL: academic_phd_scraper-1.0.0.tar.gz
Upload date: Jan 14, 2026
Size: 12.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for academic_phd_scraper-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`7e50cf093b90b0daa32ad8eaec73f46246265d6decded638d2cda0e7518d0c68`
MD5	`6ea39a8c9306e7732e1e2b3d256ebc96`
BLAKE2b-256	`29879561a53df5d413e80a07850ece971d533512588c21adf413b6925d3795c3`

See more details on using hashes here.

File details

Details for the file academic_phd_scraper-1.0.0-py3-none-any.whl.

File metadata

Download URL: academic_phd_scraper-1.0.0-py3-none-any.whl
Upload date: Jan 14, 2026
Size: 14.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for academic_phd_scraper-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`984678dca5b615e01f33e46739b42bdb046c18e40859f597de6d73469183eb20`
MD5	`661a61570e2b7a03f29ef3b61b74e5da`
BLAKE2b-256	`268dc2697f3218378b042bcc6f234eef0aa44565c0479fdc8b819cace1e7f887`

See more details on using hashes here.

academic-phd-scraper 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

PhD Scraper for Academic Positions

Features

Installation

Quick Start

Python API

Filter by Country and Field

Search with Keywords

Export Results

Memory-Efficient Iterator

Command-Line Interface

Available Filters

Countries

Fields

Data Model

Configuration

Utility Functions

Example Output

Important Notes

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes