Skip to main content

A project for fetching, processing, and classifying IEEE papers.

Project description

IEEE Papers Mapper

Overview

IEEE Papers Mapper is a comprehensive tool for retrieving, processing, classifying, and visualizing research papers from the IEEE Xplore API. It automates data ingestion, applies machine learning for classification, and offers interactive dashboards for insights.

Badges

PyPI version
License
Dependencies

Build Status
Code Coverage
Issues
Last Commit

Table of Contents

Demo

Watch the video

Key Features

  • Automated Data Retrieval: Scheduled fetching of research papers using APScheduler.
  • Data Processing: Cleans, formats, and prepares data for analysis.
  • Machine Learning Classification: Zero-shot classification using transformer models.
  • Interactive Dashboard: Visualize categorized papers and insights using Plotly Dash.

Installation

Prerequisites

  • Python 3.12+
  • Virtual Environment (optional but recommended)
  • Required tools: pip, git

Steps (for Usage)

  1. Create a project directory:

    mkdir ~/workspace/my_project
    cd ~/workspace/my_project
    
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # For Linux/Mac
    venv\Scripts\activate     # For Windows
    
  3. Install the pip package and start using it at will:

    pip install ieee-papers-mapper
    

Steps (for Development)

  1. Clone the repository

    git clone https://github.com/alex-anast/ieee-papers-mapper.git
    cd ieee-papers-mapper
    
  2. Create and activate a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # For Linux/Mac
    venv\Scripts\activate     # For Windows
    
  3. Install the required packages:

    pip install -r requirements.txt
    
  4. Install the package locally:

    pip install .
    

Usage

Running the Application

Dashboard

To launch the dashboard, run:

python ieee_papers_mapper/app/dash_webapp.py

Visit http://localhost:8050 to view the dashboard.

Data Pipeline

To run the pipeline of retrieving, processing and classifying the papers automatically, execute:

python ieee_papers_mapper/main.py --days 1

NOTE: Currently the scheduler is commented out. The pipeline runs must be executed manually.

Functionality

  • Data Retrieval: Automatically fetches new papers based on categories from IEEE Xplore.
  • Data Processing: Handles missing columns and formats data for classification.
  • Classification: Uses a DeBERTa-v3 model for zero-shot classification into predefined categories.
  • Data Storage: Uses SQLite3 for storing the data in an SQL database (scalability, modularity over CSV files).

Documentation

Link to Docs

Complete documentation is available at: https://alex-anast.com/ieee-papers-mapper/

Code structure

./ieee-pappers-mapper
├── conftest.py
├── docs                                # MkDocs   ├── about.md
│   ├── developer_guide
│      ├── api_reference.md
│      └── code_structure.md
│   ├── index.md
│   └── user_guide
│       ├── installation.md
│       ├── overview.md
│       └── usage.md
├── LICENSE
├── mkdocs.yml                          # MkDocs config
├── pyproject.toml
├── README.md
├── requirements.txt
├── setup.py
├── src
│   └── ieee_papers_mapper
│       ├── app                         # Web App (plotly dash)          ├── assets
│             └── styles.css
│          ├── callbacks.py
│          ├── dash_webapp.py
│          └── __init__.py
│       ├── config                      # Config and util files          ├── config.py
│          ├── progress.json
│          └── scheduler.py            # Custom scheduler wrapper class       ├── data
│          ├── classify_papers.py      # Classification          ├── database.py             # Custom Database wrapper class          ├── get_papers.py           # Paper retrieval          ├── __init__.py
│          ├── pipeline.py             # Pipeline actions          └── process_papers.py       # Paper (pre)processing       ├── ieee_papers.db
│       ├── __init__.py
│       └── main.py
└── tests
    ├── __init__.py
    ├── test_classify_papers.py
    ├── test_database.py
    ├── test_get_papers.py
    └── test_process_papers.py

Testing

Run the tests with:

python -m pytest

Testing Coverage

  • get_papers.py: Validates API integration and error handling.
  • process_papers.py: Ensures data cleaning and formatting.
  • classify_papers.py: Verifies ML classification accuracy and runtime performance.
  • database.py: Checks database initialization and CRUD operations.

Contributing

Guidelines

  • Fork the repository and submit a pull request.
  • Adhere to PEP 8 code style.
  • Include unit tests for new core functionality.
  • Lint with black formatter.

Roadmap

Future Features

  1. Currently author index terms is not consistent, and therefore commented out. Fix.
  2. Scheduler is not enabled.
  3. Add more advanced ML models for classification.
  4. Enhance the dashboard with dynamic filtering.

Known Issues

Limited to 20 API calls/day and to max 200 papers/call, due to IEEE Xplore API restrictions.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Inspiration

This project is a recreated minimal duplicate to my internship at Toyota Motor Europe, Belgium.

Special Thanks

To my mentors at TME.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ieee_papers_mapper-1.0.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ieee_papers_mapper-1.0.0-py3-none-any.whl (20.1 kB view details)

Uploaded Python 3

File details

Details for the file ieee_papers_mapper-1.0.0.tar.gz.

File metadata

  • Download URL: ieee_papers_mapper-1.0.0.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for ieee_papers_mapper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e29f9e95f670d158ecd429e0af7a1b92d87efbe70ff2da31ea5f86bbfac45013
MD5 0cac3291defccbbeee9ad8a416ce0018
BLAKE2b-256 82ca6032ea21fa143802220a897efc2d165e8928252ce57599f26447453900e1

See more details on using hashes here.

File details

Details for the file ieee_papers_mapper-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ieee_papers_mapper-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a15cf4dbabccf2911218b3193cf10f1390e7f60bf0ba1cc57b9947be36db7b1f
MD5 e809396df785fd606db356140e729734
BLAKE2b-256 ed53f0fb99a0c5efe64ac6dae193f4d9c0997d1829b676f6b48a96e0987698f4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page