A Python module to scrape PhD offers from academicpositions.com
Project description
PhD Scraper for Academic Positions
A Python module to scrape PhD offers from academicpositions.com.
Features
- 🔍 Scrape PhD positions with filtering by country and field
- 📋 Extract detailed information: title, university, requirements, deadlines, etc.
- 💾 Export to JSON, CSV, or Markdown formats
- 🔄 Iterator support for memory-efficient processing
- ⚡ Concurrent fetching with rate limiting
- 🖥️ Command-line interface included
Installation
# Clone or navigate to the project directory
cd PhDFinder
# Install dependencies
pip install -e .
# Or install dependencies directly
pip install requests beautifulsoup4
Quick Start
Python API
from phd_scraper import AcademicPositionsScraper, PhDPosition
# Create scraper instance
scraper = AcademicPositionsScraper()
# Get PhD positions (basic usage)
positions = scraper.get_phd_positions(max_pages=2)
# Print results
for pos in positions:
print(f"{pos.title} at {pos.university}")
print(f" Location: {pos.location}")
print(f" Deadline: {pos.deadline}")
print(f" URL: {pos.url}")
print()
Filter by Country and Field
# Get Computer Science PhDs in Germany
positions = scraper.get_phd_positions(
max_pages=3,
country="germany",
field="computer-science"
)
# Get Physics PhDs in Switzerland
positions = scraper.get_phd_positions(
country="switzerland",
field="physics"
)
Search with Keywords
# Search for specific keywords
positions = scraper.search_positions(
keywords=["machine learning", "deep learning", "AI"],
country="germany",
max_pages=5
)
Export Results
from phd_scraper.utils import export_to_json, export_to_csv, export_to_markdown
# Get positions
positions = scraper.get_phd_positions(max_pages=2)
# Export to different formats
export_to_json(positions, "phd_positions.json")
export_to_csv(positions, "phd_positions.csv")
export_to_markdown(positions, "phd_positions.md")
Memory-Efficient Iterator
# Process positions one at a time (good for large datasets)
for position in scraper.iter_positions(country="sweden"):
print(position.summary())
# Process each position without loading all into memory
Command-Line Interface
# Basic usage - get 2 pages of positions
python -m phd_scraper --pages 2
# Filter by country and field
python -m phd_scraper --country germany --field computer-science
# Export to JSON
python -m phd_scraper --output positions.json --format json --pages 3
# Export to CSV
python -m phd_scraper --output positions.csv --format csv
# Search with keywords
python -m phd_scraper --keywords "machine learning" "neural networks" --pages 5
# List available filters
python -m phd_scraper --list-filters
# Fast mode (skip detailed info)
python -m phd_scraper --no-details --pages 10
# Verbose output
python -m phd_scraper --verbose --pages 1
Available Filters
Countries
- germany, sweden, belgium, switzerland, netherlands, finland
- norway, austria, france, united-kingdom, united-states
- italy, spain, denmark, luxembourg
Fields
- computer-science, physics, chemistry, biology, mathematics
- engineering, medicine, economics, social-science, geosciences
- artificial-intelligence, machine-learning, psychology, law
Data Model
Each PhDPosition object contains:
| Field | Description |
|---|---|
title |
Position title |
university |
University/employer name |
location |
Full location (city, country) |
country |
Country name |
city |
City name |
deadline |
Application deadline |
published_date |
When the position was published |
job_type |
Type of position (PhD) |
fields |
Research fields/disciplines |
description |
Full job description |
requirements |
Qualifications needed |
benefits |
What the position offers |
url |
Link to the job posting |
apply_url |
Direct application link |
Configuration
scraper = AcademicPositionsScraper(
request_delay=1.5, # Delay between requests (seconds)
timeout=30, # Request timeout (seconds)
max_retries=3, # Number of retry attempts
user_agent="Custom UA" # Custom user agent string
)
Utility Functions
from phd_scraper.utils import (
filter_positions,
deduplicate_positions,
sort_positions
)
# Filter positions
filtered = filter_positions(
positions,
keywords=["AI", "robotics"],
countries=["germany", "switzerland"],
has_deadline=True
)
# Remove duplicates
unique = deduplicate_positions(positions)
# Sort by field
sorted_pos = sort_positions(positions, by="deadline")
Example Output
[1] PhD Position in AI and Strategy
University: ETH Zürich
Location: Zurich, Switzerland
Deadline: Unspecified
Fields: Business Administration, Management, Artificial Intelligence
URL: https://academicpositions.com/ad/eth-zurich/2026/...
[2] Doctoral student in Radiofrequency ranging for Lunar orbits
University: KTH Royal Institute of Technology
Location: Stockholm, Sweden
Deadline: 2026-01-31 (Europe/Stockholm)
Fields: Physics, Space Science
URL: https://academicpositions.com/ad/kth-royal-institute-of-technology/2025/...
Important Notes
- Rate Limiting: The scraper includes built-in delays to be respectful of the server
- Terms of Service: Please review academicpositions.com's terms before scraping
- Data Accuracy: Always verify position details on the original website
- Updates: Website structure may change; report issues if scraping fails
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file academic_phd_scraper-1.0.0.tar.gz.
File metadata
- Download URL: academic_phd_scraper-1.0.0.tar.gz
- Upload date:
- Size: 12.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e50cf093b90b0daa32ad8eaec73f46246265d6decded638d2cda0e7518d0c68
|
|
| MD5 |
6ea39a8c9306e7732e1e2b3d256ebc96
|
|
| BLAKE2b-256 |
29879561a53df5d413e80a07850ece971d533512588c21adf413b6925d3795c3
|
File details
Details for the file academic_phd_scraper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: academic_phd_scraper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 14.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
984678dca5b615e01f33e46739b42bdb046c18e40859f597de6d73469183eb20
|
|
| MD5 |
661a61570e2b7a03f29ef3b61b74e5da
|
|
| BLAKE2b-256 |
268dc2697f3218378b042bcc6f234eef0aa44565c0479fdc8b819cace1e7f887
|