A tool for mapping CINECA IRIS bibliographic records to OpenCitations Meta and Index datasets, with built-in utilities for interacting with IRIS data dumps.

These details have not been verified by PyPI

Project description

iris-oc-mapper

A tool for mapping CINECA IRIS bibliographic records to OpenCitations Meta and Index datasets, with built-in utilities for interacting with IRIS data dumps.

Description

iris-oc-mapper provides a command-line tool to search bibliographic entities from an IRIS (Institutional Research Information System) dump within OpenCitations Meta and Index data dumps. It also provides a high-level interface for interacting with IRIS data dumps.

It allows to:

Convert IRIS dumps into structured and manageable CSV archives.
Map IRIS records types to the types defined by MIUR.
Analyze IRIS dumps to extract relevant bibliographic information.
Map the coverage of IRIS dumps within the OpenCitations Meta and Index datasets.
Create sub-datasets of IRIS dumps based on their mapping status (found in OC Meta, not found, found in OC Index, records without persistent identifiers).
Generate reports summarizing the analysis and mapping results.

Installation

From PyPI

pip install iris-oc-mapper

From Source

Clone this repository:

git clone https://github.com/leonardozilli/iris-oc-mapper.git
cd iris-oc-mapper

Install the package:
```
pip install .
```

Usage

iris-oc-mapper provides two main commands: map and convert. In order to map IRIS records, it is advised to first process the original IRIS dump using the convert command.

1. Process original IRIS dump

This step converts the original IRIS dump files into structured CSV files that can be used for mapping. It also allows to include subcategories from an optional ITEM_TYPE IRIS file into the main IRIS tables, as well as providing a way to map the IRIS internal record types to MIUR types.

iris-oc-mapper convert [OPTIONS]

Options

--path PATH, -p PATH: Path of the folder containing original IRIS dump files.
--destination PATH, -d PATH: Destination folder for converted CSV files.
--types, -t: Include if ITEM_TYPE is present in the IRIS dump to concatenate subtypes to the main type.
--separator STRING, -s STRING: Column separator in original files. Defaults to ,.
--encoding STRING, -e STRING: File encoding. Defaults to utf-8.
--format STRING, -f STRING: Original dump file format (extension). Defaults to csv.
--miur-map PATH, -m PATH: Path to the MIUR type mapping CSV file to map IRIS types to MIUR types. If not provided, no mapping is performed.

Example

iris-oc-mapper convert \
  --path data/original_iris \
  --destination data/iris_csv \
  --types \
  --separator "," \
  --encoding "utf-8"
  --miur-map resources/miur_type_mapping.csv

2. Map IRIS records to OpenCitations

Searches for IRIS bibliographic entries within the OpenCitations Meta and Index data dumps.

iris-oc-mapper map [OPTIONS]

Options

--iris PATH, -i PATH: Path to the IRIS data dump folder or compressed archive.
--meta PATH, -m PATH: Path to the OpenCitations Meta dump folder or compressed archive.
--index PATH, -x PATH: Path to the OpenCitations Index dump folder or compressed archive.
--skip-index, -si: Skip OC Index mapping.
--output PATH, -o PATH: Output directory for results. Defaults to results/.
--output-format [csv|parquet], -f FORMAT: Format for output datasets. Defaults to csv.
--cutoff INTEGER, -c INTEGER: Include only records published up to this year.
--generate-report, -r: Generate an HTML mapping report. Defaults to True.
--save-datasets STRING, -s STRING: Save final output datasets to disk. Use "all" to save all, or a comma-separated list: "in_meta,no_id,not_in_meta,in_index".
--batch-size INTEGER, -b INTEGER: Number of files per OC Meta batch. Defaults to 200.
--max-workers INTEGER, -w INTEGER: Max parallel workers for OC Index processing. Defaults to 2.
--config PATH, -cf PATH: YAML configuration file to override defaults.
--debug, -d: Enable debug logging.

Example

iris-oc-mapper map \
  --iris data/iris.zip \
  --meta data/oc_meta.zip \
  --index data/oc_index.zip \
  --cutoff 2024 \
  -s "in_meta, in_index, not_in_meta, no_pid" \
  --output results/ \

Configuration

Download OC Data Dumps

Download the most recent OpenCitations data dumps at:

ISBN validation and MIUR Type Mapping

In order to prevent false positive matches during the mapping process, the tool validates PIDs against the record types of their corresponding IRIS entries. This is especially important for ISBNs, as they can often be incorrectly assigned to items that should not have them (e.g., journal articles). By declaring a set of types that are legitimately allowed to contain ISBNs, the tool can avoid considering records with invalid ISBN assignments, and improve the mapping accuracy.

The set of record types specified in the default configuration of the tool consists of MIUR types, hence the need to map IRIS internal record types to MIUR categories in the preliminary conversion step. The MIUR mapping has the advantage of providing a standardized set of categories that can be consistently applied across different IRIS instances, facilitating comparisons and analyses.

To create your own MIUR type mapping file, you can inspect the IRIS type labels and their descriptors directly from the IRIS dataset:

from iris_oc_mapper.datasets.iris import load_iris_dataset
iris = load_iris_dataset('path_to_iris_dump')
type_dict = iris.get_type_dict()
print(type_dict)

The list of MIUR types considered valid for ISBN validation is specified in the YAML configuration file under the miur_types section.

When building your MIUR mapping CSV, ensure that all IRIS and MIUR type labels are written exactly as defined in their sources, preserving both case and spacing.

Use the resulting labels to construct the MIUR mapping CSV file, following the example provided in the resources/ directory.

If you prefer not to use MIUR types for validation, you can disable MIUR-based checks by adjusting the YAML configuration. In particular:

set type_validation_column to OWNING_COLLECTION, and
define in pid_type_validation the IRIS type codes that are valid for each PID type you wish to validate.

Then pass your configuration file using the --config option when running the map command.

YAML Configuration File

A YAML configuration file can be provided to override default settings for the mapping process. This file can specify parameters such as valid PID types and batch sizes for processing. An example configuration file is available in the resources/ directory.

Performance Considerations

Mapping large IRIS dumps against OpenCitations datasets can be resource-intensive. For a full mapping, at least 5 GB of available RAM space is recommended. The full mapping process takes approximately 15 minutes to complete.

You can optimize resource usage by:

Adjusting the --batch-size option to control the number of files processed in each batch during the OC Meta mapping.
Using the --max-workers option to tame resource usage during the OC Index mapping process.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contacts and Acknowledgements

Project repository: https://github.com/opencitations/iris-oc-mapper

For issues, discussions, or contributions, please open a GitHub issue, or contact:

Prof. Silvio Peroni (supervision) – @essepuntato – silvio.peroni@unibo.it
Dr. Ivan Heibi (supervision) – @ivanhb - ivan.heibi2@unibo.it
Leonardo Zilli (software development) – @leonardozilli – leonardo.zilli@studio.unibo.it
Erica Andreose (core contributor) – @EricaAndreose – erica.andreose@studio.unibo.it

The authors would also like to express their gratitude to the collaborators and colleagues from the various universities and institutions who provided valuable feedback and support throughout the development of the project.

Citation

tba

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.5

Dec 23, 2025

This version

1.0.4

Dec 21, 2025

1.0.3

Dec 15, 2025

1.0.2

Dec 11, 2025

1.0.1

Dec 9, 2025

1.0.0

Dec 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iris_oc_mapper-1.0.4.tar.gz (347.6 kB view details)

Uploaded Dec 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

iris_oc_mapper-1.0.4-py3-none-any.whl (440.1 kB view details)

Uploaded Dec 21, 2025 Python 3

File details

Details for the file iris_oc_mapper-1.0.4.tar.gz.

File metadata

Download URL: iris_oc_mapper-1.0.4.tar.gz
Upload date: Dec 21, 2025
Size: 347.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for iris_oc_mapper-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`62153534ced71f7ab1f9b3708663de840d9395061862ff94c0b817c9d66fdb57`
MD5	`64b92ba8166824d39d398743d7718c57`
BLAKE2b-256	`82e068f82af2cadcb1227328b7e8a3e3f0b42063bdacd8b3297836645cd81792`

See more details on using hashes here.

File details

Details for the file iris_oc_mapper-1.0.4-py3-none-any.whl.

File metadata

Download URL: iris_oc_mapper-1.0.4-py3-none-any.whl
Upload date: Dec 21, 2025
Size: 440.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for iris_oc_mapper-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b984e09ee2e3be51add2324fe1a39aa775c0027f1352c94ef62df7a08dbd6e5c`
MD5	`3393c4ea0cc5bc508f2883f6b366a8d1`
BLAKE2b-256	`9b783509e3bdda6f42f0cc9ca8d09fa48ba5ac452e1f860c9e47e7a45d69b3af`

See more details on using hashes here.

iris-oc-mapper 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

iris-oc-mapper

Description

Installation

From PyPI

From Source

Usage

1. Process original IRIS dump

Options

Example

2. Map IRIS records to OpenCitations

Options

Example

Configuration

Download OC Data Dumps

ISBN validation and MIUR Type Mapping

YAML Configuration File

Performance Considerations

License

Contacts and Acknowledgements

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes