Skip to main content

Topic modeling for scientific articles

Project description

BERTeley

PyPI License Build Status Documentation Status Test Coverage

Topic modeling for scientific articles

Installation

pip install berteley

Quick Start

# code snippet showing very basic usage

Topic Modeling Colab Notebook: TBD

Features

  • A text pre-processing suite tailored for scientific articles. Including a curated stopword list.
  • 3 readily available language models that are trained on scientific articles.
  • Real time metric calculation for topic model comparison.

Copyright Notice

BERTeley Copyright (c) 2023, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and University of California, Berkeley. All rights reserved.

If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.

NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.

License Agreement

BERTeley Copyright (c) 2023, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy) and University of California, Berkeley. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

(1) Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

(2) Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

(3) Neither the name of the University of California, Lawrence Berkeley National Laboratory, U.S. Dept. of Energy, University of California, Berkeley nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

You are under no obligation whatsoever to provide any bug fixes, patches, or upgrades to the features, functionality or performance of the source code ("Enhancements") to anyone; however, if you choose to make your Enhancements available either publicly, or directly to Lawrence Berkeley National Laboratory, without imposing a separate written license agreement for such Enhancements, then you hereby grant the following license: a non-exclusive, royalty-free perpetual license to install, use, modify, prepare derivative works, incorporate into other computer software, distribute, and sublicense such enhancements or derivative works thereof, in binary and source code form.

Credits

Please reference this work:

      @article{Berteley,
      title = {Benchmarking topic models on scientific articles using BERTeley},
      journal = {Natural Language Processing Journal},
      volume = {6},
      pages = {100044},
      year = {2024},
      issn = {2949-7191},
      doi = {https://doi.org/10.1016/j.nlp.2023.100044},
      url = {https://www.sciencedirect.com/science/article/pii/S2949719123000419},
      author = {Eric Chagnon and Ronald Pandolfi and Jeffrey Donatelli and Daniela Ushizima},
      keywords = {NLP, Topic modeling, Scientific articles, Transformers},
      abstract = {The introduction of BERTopic marked a crucial advancement in topic modeling and presented a topic model that outperformed both traditional and modern topic models in terms of topic modeling metrics on a variety of corpora. However, unique issues arise when topic modeling is performed on scientific articles. This paper introduces BERTeley, an innovative tool built upon BERTopic, designed to alleviate these shortcomings and improve the usability of BERTopic when conducting topic modeling on a corpus consisting of scientific articles. This is accomplished through BERTeley’s three main features: scientific article preprocessing, topic modeling using pre-trained scientific language models, and topic model metric calculation. Furthermore, an experiment was conducted comparing topic models using four different language models in three corpora consisting of scientific articles.}
      }
      

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

berteley-0.1.3.tar.gz (36.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

berteley-0.1.3-py2.py3-none-any.whl (13.7 kB view details)

Uploaded Python 2Python 3

File details

Details for the file berteley-0.1.3.tar.gz.

File metadata

  • Download URL: berteley-0.1.3.tar.gz
  • Upload date:
  • Size: 36.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for berteley-0.1.3.tar.gz
Algorithm Hash digest
SHA256 82cfea1a2c41e292f5a317afdcafc75316d63d1684806c05438b24f3e08bb863
MD5 c17c03211e36db8f0e45db4897996a3c
BLAKE2b-256 0cc55596af850e225ae514a324092430bc54997e2b583061e656e90903a82096

See more details on using hashes here.

File details

Details for the file berteley-0.1.3-py2.py3-none-any.whl.

File metadata

  • Download URL: berteley-0.1.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for berteley-0.1.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 e944d58354b5a5c48988838a49d74ba5a6122df822021d20fb807c050d0d3331
MD5 a04f5a9751cb25efa2be082c67061c49
BLAKE2b-256 41914a5471baf082b47126eb98058f941a991dbeaee18c2f8878c0c432b5cc8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page