Skip to main content

A package that automatically generates a concept map for a PDF document using LLM.

Project description

Paper2CMap

A package that automatically generates a concept map for a PDF document using LLM.

TODO

  • Optimize concept map generating prompt
  • Add few shots for concept map generating
  • Support LNKG/CXL/SVG format output
  • Support QA based on generated concept map

Overview

What is Concept Map?

(Answerd by ChatGPT)

A concept map is a visual tool that is used to organize and represent knowledge, ideas, and information in a hierarchical or non-linear way. It is a graphical representation of a network of interconnected concepts, ideas, or themes, and the relationships between them. Concept maps are commonly used in education, particularly in fields such as science, social studies, and language arts, to help students organize and understand complex information.

Concept maps typically consist of nodes or boxes, which represent concepts or ideas, and lines or arrows, which show the relationships between the concepts. The nodes can be labeled with keywords or short phrases, and the lines or arrows can be labeled with connecting words, such as "leads to," "is a type of," or "causes."

Concept maps can be created by individuals or groups, and can be used to facilitate learning, problem-solving, decision-making, and communication. They are a flexible and powerful tool that can be adapted to many different contexts and purposes.

What is Paper2CMap?

Paper2CMap is a package that automatically generates a concept map for a PDF document using LLM. It will first extract the text from the PDF document, then cut the text into sections, and finally generate concept map based on the sections. Currently the generated concept map is in JSON format:

[{"source": "source concept", "target": "target concept", "relationship": "relationship between source and target"}]

In future, we will support more formats export, such as LNKG/CXL/SVG.

Quick Start

Prerequisites

  • Python 3.8+
  • An OpenAI API key or Azure OpenAI Service deployment
  • Set environment variables:
    # If you are using OpenAI Official Service
    export OPENAI_API_TYPE="openai"
    export OPENAI_API_KEY="<OpenAI API Key>"
    
    # If you are using Azure OpenAI Service
    export OPENAI_API_TYPE="azure"
    export OPENAI_API_BASE="<Azure OpenAI Service Endpoint>"
    export OPENAI_API_KEY="<Azure OpenAI Service Key>"
    

Installation

You can now install Paper2CMap with pip:

pip install paper2cmap

Usage

Now you can easily generate a concept map from a PDF document within 3 lines of code:

from paper2cmap import Paper2CMap

paper2cmap = Paper2CMap(model_name="gpt-3.5-turb")
paper2cmap.load("path/to/paper.pdf")
paper2cmap.generate_cmap()

For more details of the API, please refer to API Reference.

Gradio App

We also host a Gradio App at HuggingFace Space for you to try out Paper2CMap without installing it locally. You can also deploy it to your own server:

pip install gradio
gradio app.py

License

This project is licensed under the MIT License - see the LICENSE file for details

Contributing

As an open source project, we welcome contributions and suggestions. Please follow the fork and pull request workflow to contribute to this project. Please do not try to push directly to this repo unless you are maintainer.

Contact

If you have any questions, please feel free to contact us via weitian.bnu@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paper2cmap-0.1.0.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

paper2cmap-0.1.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file paper2cmap-0.1.0.tar.gz.

File metadata

  • Download URL: paper2cmap-0.1.0.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for paper2cmap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22da0f3098f430c83cafa1d0e3b392ea9e508d82dea57dcb1dbca6aff994ce3f
MD5 943bbd22f714f590ece7e4f24af4a439
BLAKE2b-256 a1ebb2d4260cd1c3e1907fa46c4dd5969d04e70848afc7aad185305eb7d5fb04

See more details on using hashes here.

File details

Details for the file paper2cmap-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: paper2cmap-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.10

File hashes

Hashes for paper2cmap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14464db670d3928af4f5424372bf6f18d49b08f79df8ff31c1fd4e70359dcad7
MD5 4e8e0f794a49b9d3d96e57ea73d82134
BLAKE2b-256 4044d3e627c7fef72c9cb8d763d3f50e873bb2e6d5a2060378ef519e089c5f6a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page