Skip to main content

The MRT framework to generate evolution roadmap for publications.

Project description

mrtframework

NPM Python Style Guide

Demo Web Page | UI Library

Introduction

This is the python code for generating MRT (Master Reading Tree). The output json can be loaded using the React Component react-mrt. You can directly go to the demo page and click the Load Json button to upload the output json as well.

The AMiner system has already integrated this library and can generate MRTs for papers. So if you just want to see MRTs for papers, you can go to AMiner directly.

If you want to generate MRTs with customized settings or dive deeper to substitute some modules, read the following descriptions.

Run scripts to generate your MRT

Clone this branch first.

git clone git@github.com:THUDM/MRT.git -b mrtframework

Currently, this library supports SemanticScholar as data source. So to generate the MRT for your interested paper, you need to go to SemanticScholar and find the paper id for this paper. For example, the famous GPT-3 paper has the s2 paper id 6b85b63579a916f705a8e10a49bd8d849d91b1fc.

Then run the following scripts to generate the MRT for GPT-3.

python examples/generate_mrt_json.py \
--pub_id 6b85b63579a916f705a8e10a49bd8d849d91b1fc \
--output_path outputs/gpt-3.json

The output MRT will saved as Json file at location outputs/gpt-3.json.

There are some parameters you can change to alter the generation process. For example, you can set --use_sbert=0 to disable the use of Sentence-BERT and only use TF-IDF during the generation. A full list of configurable parameters can be listed with

python examples/generate_mrt_json.py -h

Notice that the SemanticScholar has rate limit for its api. Generating MRTs will trigger lots of api calls. Therefore, you may encounter rate limitation when using SemanticScholar data source. The use of Web API must follow the agreements of SemanticScholar.

Use the python library instead of cloning the codes

The mrtframework has already been published to the python library. So you can install the library and direcly call it.

# Install the library
pip install mrtframework
# Caculate mrt for the paper GPT-3 with SemanticScholar as data source
from mrtframework import MasterReadingTree
from mrtframework.data_provider import DataProvider
provider = DataProvider(downloader='s2')
query_pub = provider.get('6b85b63579a916f705a8e10a49bd8d849d91b1fc')
mrt = MasterReadingTree(provider=provider, query_pub=query_pub)
print(mrt.to_json())

Use customized data sources

If you want to use other data sources, you can write your own downloader for MRT to use as follows

def customized_downloader(pid: str) -> Optional[dict]:
    # do something here like retrieving data
    return {
        '_id': pid,
        'id': pid,
        'title': 'MRT: Tracing the Evolution of Scientific Publications',
        'abstract': 'The fast development of science and technology is accompanied by the booming of cutting edge research. Researchers need to digest more and more recently published publications in order to keep themselves up to date. This becomes tough in particular with the prevalence of preprint publishing such as arXiv, where inspiring works could come out without being peer-reviewed. Is that possible to design an automatic system to help researchers quickly gain a glimpse of a piece of work or gain useful background knowledge for deeply understanding it? To this end, we proposed a practical framework called Master Reading Tree (MRT) to trace the evolution of scientific publications. In this framework, we can build annotated evolution roadmaps for publications and identify important previous works or evolution tracks by generating expressive embeddings and clustering them into various groups. With comprehensive evaluations, our proposed framework demonstrates its superior capability in capturing underlying relations behind publications over several baseline algorithms. Finally, we integrated the proposed MRT framework on AMiner, an online academic platform, where users can generate roadmaps using MRT for free and their interactions are further used to refine the model.',
        'citations': [101, 102, 103], # the pids of citation papers
        'references': [104, 105, 106], # the pids of reference papers
        'year': 2021,
        'venue': 'TKDE',
        'authors': [{
            'name': 'Da Yin'
        }, {
            'name': 'Weng Lam Tam'
        }, {
            'name': 'Ming Ding'
        }, {
            'name': 'Jie Tang'
        }]
    }
# replace the downloader in provider
provider = DataProvider(downloader=customized_downloader)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrtframework-0.2.1.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

mrtframework-0.2.1-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file mrtframework-0.2.1.tar.gz.

File metadata

  • Download URL: mrtframework-0.2.1.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for mrtframework-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3eb85a7ef2bbeca1b911b8c56e3c6db46b1a96b21a6c9fbfe95cd07877d202fd
MD5 2b9b7faffd98592272aea6caa6d3faf0
BLAKE2b-256 deda4a81118f536550ce92893699c64a2d8b63c14b82468f629d42f8934d4ba9

See more details on using hashes here.

File details

Details for the file mrtframework-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: mrtframework-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191203 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for mrtframework-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7ee8dc110ee2e43c1a38e6476003eecbf98dceacdd372ab09183cdfb7fdafa73
MD5 d397b136f5a73c7845704e6435034837
BLAKE2b-256 c11960cd3b4f9b495bc97d0822ee29785c96db0d06730e234c237f7d9a428ec8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page