Skip to main content

EXSCLAIM! is a library for the automatic EXtraction, Separation, and Caption-based natural Language Annotation of IMages from scientific figures.

Project description

EXSCLAIM2.0: LLM-powered Automatic EXtraction, Separation, and Caption-based natural Language Annotation of IMages from scientific figures

License Website Release DOI

🤔 Consider Collaboration

If you find this tool or any of its derived capabilities useful, please consider registering as a user of Center for Nanoscale Materials. We will keep you posted of latest developments, as well as opportunities for computational resources, relevant data, and collaboration. Please contact Maria Chan (mchan@anl.gov) for details.

Introduction to EXSCLAIM2.0

EXSCLAIM2.0 is a Python package combining EXSCLAIM! code with Large Language models (LLMs) that can be used for the automatic generation of datasets of labeled images from published papers. There are four main steps:

  1. JournalScraper: scrap journal websites, acquiring figures, captions, and metadata
  2. CaptionDistributor: separate figure captions into the component chunks that refer to the figure's subfigures using LLMs and prompt engineering
  3. FigureSeparator: separate figures into subfigures, detect scale information, label, and type of image

Examples and tutorials

We provide several tutorials demonstrating how to use EXSCLAIM2.0:

  1. Nature_exsclaim_search: automatically scrapping data from literature and performing Named Entity Recognition (NER) on the extracted captions.
  2. HTMLScraper: automatically scrapping data from user provided HTML files
  3. Microscopy_CLIP_retrieval: Using Microscopy_CLIP to perform image-to-image and text-to-image retrieval on our multimodal microscopy dataset.

Installation

The guides to install EXSCLAIM through Pip, Git and Docker can be found within the wiki. The guides include installing pre-compiled versions as well as building from the source code and then installing.

Using Exsclaim 2.0

from exsclaim import Pipeline
search_query = {
		...
}
results = Pipeline(search_query_json)

where search_query is either a dictionary representing a valid JSON object, or a Pathlike string pointing towards a valid JSON file, or

python -m exsclaim query {path to json file holding search query}

More extensive guides can be found within the wiki.

Using Docker Compose

To use Docker Compose to host the service, run the following commands in the base directory:

docker compose build base
docker compose build {service(s) here}
docker compose up {service(s) here}

Acknowledgements

This material is based upon work supported by Laboratory Directed Research and Development (LDRD) funding from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357

This work was performed at the Center for Nanoscale Materials, a U.S. Department of Energy Office of Science User Facility, and supported by the U.S. Department of Energy, Office of Science, under Contract No. DE-AC02-06CH11357.

We gratefully acknowledge the computing resources provided on Bebop, a high-performance computing cluster operated by the Laboratory Computing Resource Center at Argonne National Laboratory.

Citation

If you find EXSCLAIM! useful, please encourage its development by citing the following paper in your research:

Schwenker, E., Jiang, W. Spreadbury, T., Ferrier N., Cossairt, O., Chan M.K.Y., EXSCLAIM! - An automated pipeline for the construction and
labeling of materials imaging datasets from scientific literature. arXiv e-prints (2021): arXiv-2103

Bibtex

@article{schwenker2021exsclaim,
  title={EXSCLAIM! - An automated pipeline for the construction of labeled materials imaging datasets from literature},
  author={Schwenker, Eric and Jiang, Weixin and Spreadbury, Trevor and Ferrier, Nicola and Cossairt, Oliver and Chan, Maria KY},
  journal={arXiv e-prints},
  pages={arXiv--2103},
  year={2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exsclaim-2.4.1.tar.gz (3.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exsclaim-2.4.1-py3-none-any.whl (3.4 MB view details)

Uploaded Python 3

File details

Details for the file exsclaim-2.4.1.tar.gz.

File metadata

  • Download URL: exsclaim-2.4.1.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for exsclaim-2.4.1.tar.gz
Algorithm Hash digest
SHA256 e1fea545cfd71ed9c461f7933f6e3fe1e23ec3ea853fa84fbe2667cd610348cd
MD5 7b8f2b6b96b5e9cc0e1e45774424b370
BLAKE2b-256 d43f8b0295cb14a4446e57208d5cf5196388530cc3e3e449a5aec2d6920be9bd

See more details on using hashes here.

File details

Details for the file exsclaim-2.4.1-py3-none-any.whl.

File metadata

  • Download URL: exsclaim-2.4.1-py3-none-any.whl
  • Upload date:
  • Size: 3.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for exsclaim-2.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4fa7202fc9f32ede376863067681fd61728268d71f0595eba09f5e01c471dc04
MD5 735b866b783b9c045b4d3728da8fe64b
BLAKE2b-256 f3be5037aef98b6e8d006fc4507266b5aa2ab4377ddf8b82861e38114eb457f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page