Skip to main content

JATS: Convert JATS XML articles to Markdown with peer review extraction

Project description

jats - JATS XML Parser

A Python CLI tool for converting JATS (Journal Article Tag Suite) XML files to Markdown format, with support for extracting peer review comments and author responses.

Overview

jats parses JATS XML files from scientific publishers (bioRxiv, eLife, etc.) and converts them to clean, readable Markdown. It's particularly useful for working with preprint manuscripts and their associated peer review materials.

Key Features

  • Convert JATS XML articles to Markdown
  • Extract peer review comments and author responses from multi-article XML files
  • Support for bioRxiv manifest files (optional metadata)
  • Organize reviews and responses by revision round
  • Simple CLI interface with stdout or file output

Installation

Prerequisites

  • Python >=3.10

Install with uv (recommended)

cd jats
uv pip install -e .

Install with pip

cd jats
pip install -e .

Usage

Basic Conversion

Convert a JATS XML file to Markdown:

# Output to stdout
jats convert article.xml

# Output to file
jats convert article.xml -o article.md

# With bioRxiv manifest file (optional)
jats convert article.xml -m manifest.xml -o article.md

Extract Peer Reviews

Extract peer review comments and author responses from JATS XML files that include sub-articles (common in eLife and some bioRxiv articles):

# Extract reviews and responses to separate files
jats convert article.xml -o article.md -r output_base

# Creates:
# - output_base_reviews.md    (all review comments, organized by round)
# - output_base_responses.md  (all author responses, organized by round)

The -r flag extracts sub-articles with the following JATS article types:

  • Review comments: decision-letter, referee-report, editor-report, reviewer-report
  • Author responses: author-comment, reply

Reviews and responses are automatically organized by revision round using JATS4R peer-review-revision-round metadata (defaults to round 1 if not specified).

Examples

Convert bioRxiv Preprint

jats convert 2023.01.01.12345.xml -o paper.md

Convert eLife Article with Peer Reviews

# Convert main article and extract reviews
jats convert elife-12345-v1.xml -o paper.md -r elife-12345-v1

# Output files:
# - paper.md                        (main article)
# - elife-12345-v1_reviews.md      (peer review comments)
# - elife-12345-v1_responses.md    (author responses)

bioRxiv with Manifest

# manifest.xml provides additional metadata
jats convert article.xml -m manifest.xml -o article.md

Input File Format

jats expects JATS XML files following the JATS (Journal Article Tag Suite) standard. This format is used by:

  • bioRxiv and medRxiv preprint servers
  • eLife journal
  • PubMed Central (PMC)
  • Many other scientific publishers

JATS XML Structure

A typical JATS XML file contains:

  • <front>: Article metadata (title, authors, abstract)
  • <body>: Main article content organized in sections
  • <back>: References, acknowledgments, etc.
  • <sub-article>: Optional peer review materials (eLife, some bioRxiv)

Manifest Files (bioRxiv)

bioRxiv articles may include an optional manifest.xml file that provides:

  • Collection/category information
  • Version history
  • Links to published versions
  • Peer review URLs

Output Format

jats converts JATS XML to clean, readable Markdown with:

  • Article title as H1 heading
  • Authors with affiliations
  • Abstract
  • Body sections with appropriate heading levels
  • Inline figures with captions
  • References (when available)

Peer Review Output

When using -r, peer review materials are extracted to separate Markdown files:

Reviews file (*_reviews.md):

# Revision Round 1

## Reviewer 1

[Review content...]

---

## Reviewer 2

[Review content...]

Responses file (*_responses.md):

# Revision Round 1

## Author Response

[Response content...]

Development

Running Tests

# Install development dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

Project Structure

jats/
├── jats/
│   ├── __init__.py
│   ├── main.py         # CLI entry point
│   ├── parser.py       # JATS XML parsing
│   ├── converter.py    # Markdown conversion
│   └── models.py       # Data models
├── tests/
│   ├── test_*.py       # Test files
│   └── *.xml           # Test fixtures
├── pyproject.toml      # Package configuration
└── README.md

See DEVELOPMENT.md for detailed development documentation and code style guide.

JATS Resources

License

MIT

Support

For issues or questions, please open an issue on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jats-0.2.0.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jats-0.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file jats-0.2.0.tar.gz.

File metadata

  • Download URL: jats-0.2.0.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.7

File hashes

Hashes for jats-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0fcd4c019625993bdf00d626a7670d6f16076e5a146ee78f3d5769764bc9a309
MD5 d32323139c0a15d0fd779dd67299f226
BLAKE2b-256 30a1c76d7480b24ef27d31b7f0e3f659cb4c44dc244f1bca360d0c25cc7615b3

See more details on using hashes here.

File details

Details for the file jats-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: jats-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.7

File hashes

Hashes for jats-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22a08d8e21736bf3c68dd33d706ac5249e4ec8e52714f5534317c5a932b490e1
MD5 e12fb30a21f0471a517b7542532f0d6d
BLAKE2b-256 17b9ed15c261d86a5f5d0662641677ef563e1716f9d713a35ed2c50c979afde8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page