A project for fetching, processing, and classifying IEEE papers.
Project description
IEEE Papers Mapper
Overview
IEEE Papers Mapper is a comprehensive tool for retrieving, processing, classifying, and visualizing research papers from the IEEE Xplore API. It automates data ingestion, applies machine learning for classification, and offers interactive dashboards for insights.
Badges
Table of Contents
Demo
Key Features
- Automated Data Retrieval: Scheduled fetching of research papers using APScheduler.
- Data Processing: Cleans, formats, and prepares data for analysis.
- Machine Learning Classification: Zero-shot classification using transformer models.
- Interactive Dashboard: Visualize categorized papers and insights using Plotly Dash.
Installation
Prerequisites
- Python 3.12+
- Virtual Environment (optional but recommended)
- Required tools: pip, git
Steps (for Usage)
-
Create a project directory:
mkdir ~/workspace/my_project cd ~/workspace/my_project
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # For Linux/Mac venv\Scripts\activate # For Windows
-
Install the
pippackage and start using it at will:pip install ieee-papers-mapper
Steps (for Development)
-
Clone the repository
git clone https://github.com/alex-anast/ieee-papers-mapper.git cd ieee-papers-mapper
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # For Linux/Mac venv\Scripts\activate # For Windows
-
Install the required packages:
pip install -r requirements.txt
-
Install the package locally:
pip install .
Usage
Running the Application
Dashboard
To launch the dashboard, run:
python ieee_papers_mapper/app/dash_webapp.py
Visit http://localhost:8050 to view the dashboard.
Data Pipeline
To run the pipeline of retrieving, processing and classifying the papers automatically, execute:
python ieee_papers_mapper/main.py --days 1
NOTE: Currently the scheduler is commented out. The pipeline runs must be executed manually.
Functionality
- Data Retrieval: Automatically fetches new papers based on categories from IEEE Xplore.
- Data Processing: Handles missing columns and formats data for classification.
- Classification: Uses a DeBERTa-v3 model for zero-shot classification into predefined categories.
- Data Storage: Uses SQLite3 for storing the data in an SQL database (scalability, modularity over CSV files).
Documentation
Link to Docs
Complete documentation is available at: https://alex-anast.com/ieee-papers-mapper/
Code structure
./ieee-pappers-mapper
├── conftest.py
├── docs # MkDocs
│ ├── about.md
│ ├── developer_guide
│ │ ├── api_reference.md
│ │ └── code_structure.md
│ ├── index.md
│ └── user_guide
│ ├── installation.md
│ ├── overview.md
│ └── usage.md
├── LICENSE
├── mkdocs.yml # MkDocs config
├── pyproject.toml
├── README.md
├── requirements.txt
├── setup.py
├── src
│ └── ieee_papers_mapper
│ ├── app # Web App (plotly dash)
│ │ ├── assets
│ │ │ └── styles.css
│ │ ├── callbacks.py
│ │ ├── dash_webapp.py
│ │ └── __init__.py
│ ├── config # Config and util files
│ │ ├── config.py
│ │ ├── progress.json
│ │ └── scheduler.py # Custom scheduler wrapper class
│ ├── data
│ │ ├── classify_papers.py # Classification
│ │ ├── database.py # Custom Database wrapper class
│ │ ├── get_papers.py # Paper retrieval
│ │ ├── __init__.py
│ │ ├── pipeline.py # Pipeline actions
│ │ └── process_papers.py # Paper (pre)processing
│ ├── ieee_papers.db
│ ├── __init__.py
│ └── main.py
└── tests
├── __init__.py
├── test_classify_papers.py
├── test_database.py
├── test_get_papers.py
└── test_process_papers.py
Testing
Run the tests with:
python -m pytest
Testing Coverage
- get_papers.py: Validates API integration and error handling.
- process_papers.py: Ensures data cleaning and formatting.
- classify_papers.py: Verifies ML classification accuracy and runtime performance.
- database.py: Checks database initialization and CRUD operations.
Contributing
Guidelines
- Fork the repository and submit a pull request.
- Adhere to PEP 8 code style.
- Include unit tests for new core functionality.
- Lint with
blackformatter.
Roadmap
Future Features
- Currently
author index termsis not consistent, and therefore commented out. Fix. - Scheduler is not enabled.
- Add more advanced ML models for classification.
- Enhance the dashboard with dynamic filtering.
Known Issues
Limited to 20 API calls/day and to max 200 papers/call, due to IEEE Xplore API restrictions.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
Inspiration
This project is a recreated minimal duplicate to my internship at Toyota Motor Europe, Belgium.
Special Thanks
To my mentors at TME.
Contact
- Owner: Alexandros Anastasiou
- Email: anastasioyaa@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ieee_papers_mapper-1.0.0.tar.gz.
File metadata
- Download URL: ieee_papers_mapper-1.0.0.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e29f9e95f670d158ecd429e0af7a1b92d87efbe70ff2da31ea5f86bbfac45013
|
|
| MD5 |
0cac3291defccbbeee9ad8a416ce0018
|
|
| BLAKE2b-256 |
82ca6032ea21fa143802220a897efc2d165e8928252ce57599f26447453900e1
|
File details
Details for the file ieee_papers_mapper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ieee_papers_mapper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a15cf4dbabccf2911218b3193cf10f1390e7f60bf0ba1cc57b9947be36db7b1f
|
|
| MD5 |
e809396df785fd606db356140e729734
|
|
| BLAKE2b-256 |
ed53f0fb99a0c5efe64ac6dae193f4d9c0997d1829b676f6b48a96e0987698f4
|