Skip to main content

Modern Scientific Document Processing Framework

Project description

sciwing logo

A Modern Toolkit for Scientific Document Processing from WING-NUS

Build Status Open Issues Last Commit Updates

SciWING is a modern framework from WING-NUS to facilitate Scientific Document Processing. It is built on PyTorch and believes in modularity from ground up and easy to use interface. SciWING includes many pre-trained models for fundamental tasks in Scientific Document Processing for practitioners. It has the following advantages

  • Modularity - The framework embraces modularity from ground-up. SciWING helps in creating new models by combining multiple re-usable modules. You can combine different modules and experiment with new approaches in an easy manner

  • Pre-trained Models - SciWING has many pre-trained models for fundamental tasks like Logical Section Classifier for scientific documents, Citation string Parsing (Take a look at some of the other project related to station parsing Parscit, Neural Parscit. Easy access to pre-trained models are made available through web APIs.

  • Run from Config File- SciWING enables you to declare datasets, models and experiment hyper-params in a TOML file. The models declared in a TOML file have a one-one correspondence with their respective class declaration in a python file. SciWING parses the model to a Directed Acyclic Graph and instantiates the model using the DAG's topological ordering.

  • Extensible - SciWING enables easy addition of new datasets and provides command line tools for it. It enables addition of custom modules which are PyTorch modules.

Installation

You can install SciWING from pip. We recommend using a virtual environment to install the package.

pip install sciwing

Tasks

These are some of the tasks included in SciWING and their performance metrics

Task Dataset SciWING model SciWING Previous Best
Logical Structure Recovery SectLabel BiLSTM + Elmo Embeddings 73.2 (Macro F-score) -
Header Normalisation SectLabel Bag of Words Elmo 93.52 (Macro F-Score) -
Citation String Parsing Neural Parscit Bi-LSTM-CRF + GloVe + Elmo + Char-LSTM 88.44 (Macro F-Score) 90.45 Prasad et al(not comparable)
Citation Intent Classification SciCite Bi-LSTM + Elmo 82.16 (Fscore) 82.6 Cohan et al (without multi-task learning)
Biomedical NER - BC5CDR (Upcoming) - - - -
I2b2 NER (Upcoming) - - - -

Simple Example

Using Citation String Parsing

from sciwing.models.neural_parscit import NeuralParscit 

# instantiate an object 
neural_parscit = NeuralParscit()

# predict on a citation 
neural_parscit.predict_for_text("Calzolari, N. (1982) Towards the organization of lexical definitions on a database structure. In E. Hajicova (Ed.), COLING '82 Abstracts, Charles University, Prague, pp.61-64.")

# if you have a file of citations with one citation per line 
neural_parscit.predict_for_file("/path/to/filename")

Using Citation Intent Classification

from sciwing.models.citation_intent_clf import CitationIntentClassification 

# instantiate an object 
citation_intent_clf = CitationIntentClassification()

# predict the intention of the citation 
citation_intent_clf.predict_for_text("")

Running API services

The APIs are built using Fast API. We have APIs for citation string parsing and citation intent classification. There are more APIs on the way. To run the APIs navigate into the api folder of this repository and run

uvicorn api:app --reload

Running the Demos

The demos are built using Streamlit. The Demos make use of the APIs. Please make sure that the APIs are running before the demos can be started. Navigate to the app folder and run the demo using streamlit (Installed along with the package). For example

streamlit run ner_demo.py

Contributing

Thank you for your interest in contributing. You can directly email the author at (email omitted for submission purposes). We will be happy to help.

If you want to get involved in the development we recommend that you install SciWING on a local machine using the instructions below. All our classes and methods are documented and hope you can find your way around it.

Instructions to install SciWING locally

SciWING requires Python 3.7, We recommend that you install pyenv.

Instructions to install pyenv are available here. If you have problems installing python 3.7 on your machine, make sure to check out their common build problems site here and install all dependencies.

  1. Clone from git

    https://github.com/abhinavkashyap/sciwing.git

  2. cd sciwing

  3. Install all the requirements

    pip install -r requirements.txt

  4. Download spacy models

    python -m spacy download en

  5. Install the package locally

    pip install -e .

  6. Create directories where sciwing stores embeddings and experiment results

    sciwing develop makedirs

    sciwing develop download

    This will take some time to download all the data and embeddings required for development

    Sip some :coffee:. Come back later

  7. Run Tests

    SciWING uses pytest for testing. You can use the following command to run tests

    pytest tests -n auto --dist=loadfile

    The test suite is huge and again, it will take some time to run. We will put efforts to reduce the test time in the next iterations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sciwing-0.1.0.tar.gz (84.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page