Python library to help with scientific literature research
Project description
NaimAI is a Python package that (1) searches effeciently in papers and (2) generates an automatic review. It does do by structures each scientific paper using their abstract into 3 categories : objectives methods and results.
Hence, when searching, the results will be showed by category. The results can then be reviewed and a review text will be
automatically generated along with the references.
All the features are deployed on the NaimAI's website, where millions of paper are processed.
A Medium article goes more in depth with naimai's features of the web app.
Search in your own papers
You can either give a directory of the folder with articles in PDF format or a csv file with abstracts and other meta data as showed
here.
The processing, the results and searching for relevent papers are explained in
this colab.
Search in millions of papers
To search in the millions of papers already processed, you can use the naimai website. I might open source this part too if needed.Structure your abstract
If you already have an abstract and want to test the segmentor (naimai's algorithm that structures abstract into Background, Objectives, Methods and Results), this colab walks you through the necessary steps.Example of structured abstract :
Features to improve
Review Generation
The review generation needs more enhancement. The actual method consists of only rephrasing the objective phrase of each paper. I've some idea to go further and improve the review generation part. Let me know if you're interested and we'll do it together!
Besides the generated text, the references generation still can be brushed up to meet with many references style, and also to export it to other formats (BibTeX..).
Semantic search
The search is mainly based on a v0 semantic algorithm (using TfIdf model mainly). In a previous version, I've finetuned bert model for each field and the results were pretty interesting. The problem is that, with 10 fields on the web app, I ended up having 10 fine-tuned model. So the usage was pretty slow and the models were heavy. If you have any idea and/or want to contribute in this part, I'll be happy to talk to you!Data papers
I've used about 10 millions open access abstracts I found here and there on the internet. If you've any source that could be useful, or even better, if we can process much more papers together to get more informations for the users, that'd be cool!References
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.