Tools for working with the Open Journal Systems (OJS) API.
Project description
ojs
Tools for working with the Open Journal Systems (OJS) API.
Pulls submissions, publications, reviews, users, and publication view statistics
from an OJS journal's /api/v1/* REST API, downloads the attached file artifacts
(manuscripts, revisions, reviewer attachments, and production galleys), and
normalizes the JSON into typed relational tables backed by polars. Incremental
sync re-pulls only what changed since the last run, so routine top-ups stay
cheap. A typed schema layer (Column/Table classes) is the single source of
truth for normalization and doubles as exportable column documentation. Also
normalizes the OJS dashboard's Articles and Reviews CSV report exports. Ships a
Typer CLI for the common fetch, download, and normalize workflows. Built against
the OJS 3.3 REST API; other versions
are untested and may differ, as the REST API saw breaking changes between 3.2
and 3.3.
Project Structure
ojs/
├── cli.py # Typer CLI: init, articles, reviews, api (+ schema docs)
├── schema.py # Typed schema framework: Column/Table, apply(), doc export
├── utils.py # HTML stripping + localized-field extraction
├── website/ # Manual website CSV-export pipelines
│ ├── articles/ # Wide CSV → submissions, authors, editors, decisions
│ └── reviews/ # Long CSV → reviews
└── api/ # REST pipeline
├── client.py # OJS REST client (httpx, pagination, retry, early-stop)
├── files.py # Submission file artifact downloads (disk layout, manifest)
├── normalize.py # JSON → relational tables (schema-driven)
├── schemas.py # API table schema classes
├── sync.py # Incremental sync: high-water-mark state, raw-JSON upsert
└── swagger.json # OJS API reference (snapshot)
Installation
uv tool install ojs
As a project dependency:
uv add ojs
From GitHub instead of PyPI:
uv tool install git+https://github.com/gitronald/ojs.git
# or, as a dependency: uv add git+https://github.com/gitronald/ojs.git
From source (for development):
git clone https://github.com/gitronald/ojs.git
cd ojs
uv sync
Configuration
The CLI reads from a .env file in the current directory. Run ojs init to
scaffold one — it prompts for the journal URL and API token, and writes .env
with 0600 permissions:
ojs init
Getting an API key. In OJS, open your user profile
(https://example.org/index.php/myjournal/user/profile), select the API Key
tab, check Enable external applications with the API key to access this
account, and copy the key — use the (re)generate button if one isn't set
yet.
Values can also come from the environment. A user-level config file is loaded as
a fallback for anything not set in the current directory's .env (which takes
precedence): ~/.config/ojs/.env by default, or the file named by
OJS_CONFIG_PATH.
| Variable | Default | Purpose |
|---|---|---|
OJS_BASE_URL |
(required for api) |
OJS journal URL (e.g. https://example.org/index.php/myjournal) |
OJS_API_KEY |
(required for api) |
OJS API token |
OJS_DATA_DIR |
data/ojs-api |
Root for inputs and outputs |
OJS_DOWNLOADS_DIR |
data/ojs-website |
Where CSV exports land |
OJS_ARTICLES_DIR |
$OJS_DATA_DIR/articles |
Articles output dir |
OJS_REVIEWS_DIR |
$OJS_DATA_DIR/reviews |
Reviews output dir |
OJS_API_DIR |
$OJS_DATA_DIR |
API JSON dump dir |
OJS_FILES_DIR |
$OJS_API_DIR/files |
Where downloaded submission files land |
CLI Commands
norm reads the typed schema classes directly — no separate step is required.
schema exports a table_schemas.csv documenting each table's columns, dtypes,
source mapping, and whether each column appears in the normalized output
(in_output).
API
Fetch raw JSON from the REST API, download file artifacts, and normalize into relational tables.
ojs api fetch # fetch raw JSON from the OJS REST API
ojs api download # download submission file artifacts (PDFs, etc.)
ojs api norm # normalize API JSON into relational tables
ojs api schema # export table_schemas.csv docs
Articles
Normalize the OJS dashboard's Articles Report CSV export.
ojs articles norm # normalize the most recent articles export
ojs articles schema # export table_schemas.csv docs
Reviews
Normalize the OJS dashboard's Review Report CSV export.
ojs reviews norm # normalize the most recent reviews export
ojs reviews schema # export table_schemas.csv docs
Article view stats
ojs api fetch also pulls publication view stats from the OJS /stats/publications/*
endpoints (skip with --no-stats). The API only exposes aggregated counts — the
finest granularity is daily (there are no per-event timestamps).
| Flag | Default | Purpose |
|---|---|---|
--stats / --no-stats |
on | Toggle stats collection (e.g. when the API key lacks stats access) |
--stats-interval |
day |
Timeline granularity: day or month |
--stats-since |
(none) | dateStart filter (YYYY-MM-DD) |
--stats-until |
(none) | dateEnd filter (YYYY-MM-DD) |
ojs api norm then writes three extra tables:
publication_stats— one row per published submission with abstract, all-galley, PDF, HTML, and other view totals.views_timeline— long format (submission_id,date,interval,views,kind) with a per-submission abstract and galley series.intervalrecords the granularity (dayormonth) a point was fetched at, so a file mixing both stays separable — filter on it rather than summing across intervals.views_timeline_totals— long format (date,interval,views,kind) with the journal-wide abstract and galley series, from the aggregate/stats/publications/{abstract,galley}endpoints (the data behind the OJS statistics-page graph). Use this for journal-wide totals rather than summingviews_timeline.
If the API key lacks stats access, fetch prints a warning and skips the stats files, and norm simply omits the two tables.
Submission files
OJS attaches the actual file artifacts (manuscripts, revisions, reviewer
attachments, production galleys) to each submission. ojs api download fetches
their metadata and then downloads the binaries.
ojs api fetch --files # also dump file metadata -> submission_files.json
ojs api download # download all files for all submissions
ojs api download -s 123 -s 456 # only these submissions (repeatable)
ojs api download --type galleys # only published galley files
ojs api download --type review # only review files / revisions / attachments
ojs api download --file-stage 4 --file-stage 15 # raw fileStage ids
ojs api download --no-revisions # current files only, skip prior revisions
| Flag | Default | Purpose |
|---|---|---|
--submission-id / -s |
all | Limit to these submission ids (repeatable) |
--type |
all |
all, galleys (published), or review |
--file-stage |
(none) | Raw fileStage id(s); overrides --type |
--revisions / --no-revisions |
on | Also download prior revisions of each file |
--fetch / --no-fetch |
on | Refresh file metadata first (off: use stored JSON) |
Files are laid out under OJS_FILES_DIR as
<submission_id>/<stage>/<fileId>_<name>. A manifest (manifest.json) records
every artifact by its immutable physical fileId, so reruns skip files already
on disk — new uploads and revisions are downloaded incrementally.
Rounds and revisions. OJS tracks two distinct axes. A file's stage
(fileStage) says where in the workflow it lives; review files additionally
carry an assocId naming the review round they belong to. Separately, each
file's revisions[] holds prior uploads of that same logical file. ojs api norm writes a submission_files table with one row per current file, including
file_stage_label, review_round_id (joins to review_assignments.round_id),
and revision_count. Downloads cover the current file plus every revision, each
keyed by its own fileId.
Downloading files requires an API token with permission to view them; the API
returns 403 for files the key cannot access.
Incremental fetch
By default ojs api fetch does a full cold pull. For routine top-ups, --incremental fetches only what changed since the last successful sync and merges it into the existing JSON dumps, so ojs api norm stays a stateless re-derivation from the complete files.
| Flag | Purpose |
|---|---|
--incremental / -i |
Fetch only records changed since the last sync, merging into the JSON dumps |
--since YYYY-MM-DD |
Override the stored watermark (implies --incremental) |
--full |
Force a complete pull and reset the sync state |
How it works:
- A high-water mark lives in
data/ojs-api/sync_state.json(the last sync time, plus each submission'sdateLastActivity). It advances only after a run fully succeeds, so a failed fetch never skips records on the next run. - Submissions and extended submissions are pulled newest-first by
dateLastActivityand stop early at the watermark. Publication details are skipped for submissions whosedateLastActivityis unchanged — the biggest saving, since that endpoint costs one request per submission. - A one-day overlap buffer re-pulls the boundary on each run; merges are idempotent (upsert by id), so the overlap is harmless.
- View stats:
publication_stats(cumulative totals) is always pulled in full, while the dailyviews_timelineis re-pulled over a rolling window and merged by(submission_id, interval, date, kind), refreshing recent buckets without dropping history. - Users are always pulled in full — the API exposes no recency sort for users.
The OJS API has no server-side "modified since" filter, so incremental cannot detect upstream deletions; run ojs api fetch --full periodically to reconcile.
Security & privacy
- The API token lives in
.env(theinitprompt hides input)..envis gitignored — keep it out of version control and out of shared locations. - The API JSON dumps contain personal data pulled from OJS:
users.jsonholds user records including email addresses, and the author/submission tables carry author names, emails, and ORCIDs. These files are written with the process umask (typically0644, i.e. world-readable). On a shared or multi-user host, run with a restrictive umask (e.g.umask 077) or pointOJS_DATA_DIRat a private directory so other local users can't read them.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ojs-0.7.2.tar.gz.
File metadata
- Download URL: ojs-0.7.2.tar.gz
- Upload date:
- Size: 122.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e0f1e3c4b9a70b248698a3a7f5e190a0757c20bc5c9c1ee37da46470a536f93
|
|
| MD5 |
ca6ac90d5c42011f69c1e39cfd2806f9
|
|
| BLAKE2b-256 |
a1d19c75229a878e1a90c4e1ea9d73c4f4c32058750c91c6c746708609787c49
|
Provenance
The following attestation bundles were made for ojs-0.7.2.tar.gz:
Publisher:
publish.yml on gitronald/ojs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ojs-0.7.2.tar.gz -
Subject digest:
7e0f1e3c4b9a70b248698a3a7f5e190a0757c20bc5c9c1ee37da46470a536f93 - Sigstore transparency entry: 1760646583
- Sigstore integration time:
-
Permalink:
gitronald/ojs@5402c993aca4150f3d0916711854a9865fe7a474 -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/gitronald
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5402c993aca4150f3d0916711854a9865fe7a474 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ojs-0.7.2-py3-none-any.whl.
File metadata
- Download URL: ojs-0.7.2-py3-none-any.whl
- Upload date:
- Size: 77.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28dc74cd01281b9b5ecb935fd874627572f7a4e119ff40489485b2b75195c324
|
|
| MD5 |
62622815718906d15ec70711e3e50107
|
|
| BLAKE2b-256 |
ac881daaeb2bbb7d4b01fbc3ae6f44fab0c363c6679b8a01034ae12f96b9332e
|
Provenance
The following attestation bundles were made for ojs-0.7.2-py3-none-any.whl:
Publisher:
publish.yml on gitronald/ojs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ojs-0.7.2-py3-none-any.whl -
Subject digest:
28dc74cd01281b9b5ecb935fd874627572f7a4e119ff40489485b2b75195c324 - Sigstore transparency entry: 1760646791
- Sigstore integration time:
-
Permalink:
gitronald/ojs@5402c993aca4150f3d0916711854a9865fe7a474 -
Branch / Tag:
refs/tags/v0.7.2 - Owner: https://github.com/gitronald
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5402c993aca4150f3d0916711854a9865fe7a474 -
Trigger Event:
push
-
Statement type: