A python module to perform bulk import of data into a FOLIO environment. Currently supports MARC and user data import.
Project description
folio_data_import
Description
This project is designed to import data into the FOLIO LSP. It provides a simple and efficient way to import data from various sources using FOLIO's REST APIs.
Features
- Import MARC records using FOLIO's Data Import system
- Import User records using FOLIO's User APIs
- Batch post Instances, Holdings, and Items to FOLIO inventory storage
Installation
Installation
Using pip
pip install folio_data_import
or uv pip
uv pip install folio_data_import
To install the project from the git repo using Poetry, follow these steps:
- Clone the repository.
- Navigate to the project directory:
$ cd /path/to/folio_data_import. - Install
uvif you haven't already. - Install the project and its dependencies:
$ uv sync. - Run the application using Poetry:
$ uv run folio-data-import --help.
Make sure to activate the virtual environment created by uv before running the application.
Usage
This package provides CLI commands for importing data into FOLIO:
# Main command with subcommands
folio-data-import <subcommand> [options]
# Or use standalone commands
folio-user-import [options]
folio-marc-import [options]
folio-batch-poster [options]
Tab Completion: Install shell completions for better CLI experience:
folio-data-import --install-completion
Environment Variables
All commands support environment variables for FOLIO connection credentials, allowing you to avoid repeating these parameters:
export FOLIO_GATEWAY_URL="https://folio-snapshot-okapi.dev.folio.org"
export FOLIO_TENANT_ID="diku"
export FOLIO_USERNAME="diku_admin"
export FOLIO_PASSWORD="admin"
Once set, you can omit these parameters from your commands:
# Instead of:
folio-data-import users --gateway-url "..." --tenant-id "..." --username "..." --password "..." --user-file users.jsonl
# You can simply use:
folio-data-import users --user-file users.jsonl
This works for all subcommands: users, marc, and batch-poster.
CLI Commands
folio-data-import users
Alias: folio-user-import
Import users to FOLIO with extended functionality beyond mod-user-import.
Quick Start
folio-data-import users \
--gateway-url "https://folio-snapshot-okapi.dev.folio.org" \
--tenant-id diku \
--username diku_admin \
--password admin \
--user-file users.jsonl
Features
Service Point Management: Specify service points using codes instead of UUIDs:
{
"username": "checkin-all",
"barcode": "1728439497039848103",
"active": true,
"type": "patron",
"patronGroup": "staff",
"departments": [],
"personal": {
"lastName": "Admin",
"firstName": "checkin-all",
"addresses": [
{
"countryId": "HU",
"addressLine1": "Andrássy Street 1.",
"addressLine2": "",
"city": "Budapest",
"region": "Pest",
"postalCode": "1061",
"addressTypeId": "Home",
"primaryAddress": true
}
],
"preferredContactTypeId": "email"
},
"requestPreference": {
"holdShelf": true,
"delivery": false,
"fulfillment": "Hold Shelf"
}
"servicePointsUser": {
"defaultServicePointId": "cd1",
"servicePointsIds": [
"cd1",
"Online",
"000",
"cd2"
]
}
}
Flexible Matching: Match users by id, externalSystemId, username, or barcode:
folio-data-import users --user-file users.jsonl --user-match-key username
Preferred Contact Type: Accepts FOLIO IDs or human-friendly strings (mail, email, text, phone, mobile). Set a default for users without a valid value:
folio-data-import users --user-file users.jsonl --default-preferred-contact-type email
Field Protection: Protect specific fields from being updated:
-
Job-level protection (applies to all records):
folio-data-import users --user-file users.jsonl \ --fields-to-protect "personal.preferredFirstName,barcode"
-
Per-record protection (using custom field
protectedFields):{ "username": "jdoe", "customFields": { "protectedFields": "barcode,personal.telephone,personal.addresses" } }
Input Format
JSON Lines format - one user object in the style* of mod-user-import (with extended support mentioned above) per line
*also supports dereferenced (UUIDs instead of reference strings) user objects (eg. directly extracted from /users)
folio-data-import marc
Alias: folio-marc-import
Import binary MARC21 records via FOLIO's Data Import system using the change-manager APIs.
Quick Start
folio-data-import marc \
--gateway-url "https://folio-snapshot-okapi.dev.folio.org" \
--tenant-id diku \
--username diku_admin \
--password admin \
--marc-source-path records.mrc
The command will prompt you to select a Data Import Job Profile configured in your FOLIO tenant.
Features
- Process single files or entire directories of MARC files
- Interactive job profile selection
- Real-time progress tracking
- Automatic retry on transient errors
Note: FOLIO's import logs can be unreliable. If you don't see a job summary when your job completes, check Data Import in FOLIO (Data Import > Actions > View all logs...).
folio-data-import batch-poster
Alias: folio-batch-poster
Efficiently batch post Instances, Holdings, and Items to FOLIO's inventory storage endpoints with support for creating new records and updating existing ones.
Quick Start
folio-data-import batch-poster \
--gateway-url "https://folio-snapshot-okapi.dev.folio.org" \
--tenant-id diku \
--username diku_admin \
--password admin \
--object-type Items \
--file-paths items.jsonl \
--batch-size 100 \
--upsert
Key Features
-
Multiple File Support: Process multiple files with glob patterns
folio-data-import batch-poster --object-type Items --file-paths "items_*.jsonl"
-
Upsert Mode: Create new records or update existing ones
folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert
-
Field Preservation: Control which fields are preserved during updates
folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert \ --preserve-statistical-codes \ --preserve-administrative-notes \ --preserve-temporary-locations \ --overwrite-item-status
-
Selective Patching: Update only specific fields
folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert \ --patch-existing-records \ --patch-paths "barcode,status,itemLevelCallNumber"
-
Failed Records: Automatically save failed records to a file
folio-data-import batch-poster --object-type Items --file-paths items.jsonl \ --failed-records-file failed_items.jsonl
-
Progress Tracking: Real-time progress bar with statistics
- Disable with
--no-progressfor CI/CD environments
- Disable with
-
Config File Support: Use a JSON config file for complex configurations
folio-data-import batch-poster config.json
Example
config.json:{ "object_type": "Items", "file_paths": ["items1.jsonl", "items2.jsonl"], "batch_size": 100, "upsert": true, "preserve_statistical_codes": true, "preserve_item_status": true, "failed_records_file": "failed_items.jsonl" }
Input Format
Input files should be JSONL (JSON Lines) format - one complete JSON object per line:
{"id": "item-001", "barcode": "12345", "status": {"name": "Available"}}
{"id": "item-002", "barcode": "12346", "status": {"name": "Available"}}
{"id": "item-003", "barcode": "12347", "status": {"name": "Checked out"}}
Common Use Cases
Create new items:
folio-data-import batch-poster --object-type Items --file-paths new_items.jsonl
Update existing items (by ID):
folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert
Update only barcodes and call numbers:
folio-data-import batch-poster --object-type Items --file-paths items.jsonl --upsert \
--patch-existing-records \
--patch-paths "barcode,itemLevelCallNumber"
Process multiple files:
folio-data-import batch-poster --object-type Holdings \
--file-paths holdings_*.jsonl \
--batch-size 500 \
--upsert
Available Options
Run folio-data-import batch-poster --help to see all available options:
--object-type: Type of inventory object (Items, Holdings, or Instances) - Required--file-paths: Path(s) to JSONL file(s) - supports glob patterns - Required--batch-size: Number of records per batch (1-1000, default: 100)--upsert: Enable create-or-update mode--preserve-statistical-codes: Keep existing statistical codes during updates--preserve-administrative-notes: Keep existing administrative notes--preserve-temporary-locations: Keep temporary location (Items only)--preserve-temporary-loan-types: Keep temporary loan type (Items only)--preserve-item-status: Keep item status (Items only, default: true)--patch-existing-records: Enable selective field patching--patch-paths: Comma-separated list of fields to patch--failed-records-file: Path to save failed records--no-progress: Disable progress bar (useful for CI/CD)
Programmatic Usage
All CLI commands can also be used programmatically in your Python applications.
BatchPoster
import asyncio
from folioclient import FolioClient
from folio_data_import.BatchPoster import BatchPoster
async def post_items():
# Create FOLIO client
folio = FolioClient(
okapi_url="https://folio-snapshot-okapi.dev.folio.org",
tenant_id="diku",
username="diku_admin",
password="admin"
)
# Configure batch poster
config = BatchPoster.Config(
object_type="Items",
batch_size=100,
upsert=True,
preserve_statistical_codes=True
)
# Post records
async with BatchPoster(
folio,
config,
failed_records_file="failed_items.jsonl"
) as poster:
await poster.post_records("items.jsonl")
print(f"Posted: {poster.stats.records_posted}, Failed: {poster.stats.records_failed}")
asyncio.run(post_items())
UserImporter
import asyncio
from folioclient import FolioClient
from folio_data_import.UserImport import UserImporter
async def import_users():
folio = FolioClient(
okapi_url="https://folio-snapshot-okapi.dev.folio.org",
tenant_id="diku",
username="diku_admin",
password="admin"
)
config = UserImporter.Config(
user_file="users.jsonl",
user_match_key="username",
default_preferred_contact_type="email"
)
importer = UserImporter(folio, config)
result = await importer.do_work()
print(f"Imported {result.records_created} users")
asyncio.run(import_users())
MARCImportJob
import asyncio
from folioclient import FolioClient
from folio_data_import.MARCDataImport import MARCImportJob
async def import_marc():
folio = FolioClient(
okapi_url="https://folio-snapshot-okapi.dev.folio.org",
tenant_id="diku",
username="diku_admin",
password="admin"
)
config = MARCImportJob.Config(
marc_source_path="records.mrc",
job_profile_id="profile-uuid",
job_profile_name="Bibliographic records"
)
job = MARCImportJob(folio, config)
result = await job.start_import()
print(f"Imported {result.created_records} MARC records")
asyncio.run(import_marc())
Additional Documentation
For complete API documentation and advanced usage:
- BatchPoster.md - Comprehensive BatchPoster guide
- BatchPoster_Quick_Reference.md - Quick reference
Contributing
Contributions are welcome! If you have any ideas, suggestions, or bug reports, please open an issue or submit a pull request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folio_data_import-0.5.0b4.tar.gz.
File metadata
- Download URL: folio_data_import-0.5.0b4.tar.gz
- Upload date:
- Size: 44.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d1f5fccbcc14438153c538c581f8dc61487a5768ed6e58b6147d127d9892928
|
|
| MD5 |
b0831a51fe1e3a04bb2c8d15d4c24a95
|
|
| BLAKE2b-256 |
ade9ac6d8bd66265f1480191e47262405eaed9abbe3b4e21163d8f83f779fedc
|
Provenance
The following attestation bundles were made for folio_data_import-0.5.0b4.tar.gz:
Publisher:
python-publish.yml on FOLIO-FSE/folio_data_import
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
folio_data_import-0.5.0b4.tar.gz -
Subject digest:
7d1f5fccbcc14438153c538c581f8dc61487a5768ed6e58b6147d127d9892928 - Sigstore transparency entry: 771653515
- Sigstore integration time:
-
Permalink:
FOLIO-FSE/folio_data_import@530b7ce36383681f1b37c0b917d097dee6efa4aa -
Branch / Tag:
refs/tags/v0.5.0b4 - Owner: https://github.com/FOLIO-FSE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@530b7ce36383681f1b37c0b917d097dee6efa4aa -
Trigger Event:
release
-
Statement type:
File details
Details for the file folio_data_import-0.5.0b4-py3-none-any.whl.
File metadata
- Download URL: folio_data_import-0.5.0b4-py3-none-any.whl
- Upload date:
- Size: 48.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edb0fcd6d3ccdbf14e174f70d0dc58eacc96f8c69c1430430031173d931ae333
|
|
| MD5 |
ef400ff4a236b88c4a84d1c3aba3e627
|
|
| BLAKE2b-256 |
30e33469c7a974f436fa8a612e82481b0b76f4aaf3a19f06fd7569090fe7246a
|
Provenance
The following attestation bundles were made for folio_data_import-0.5.0b4-py3-none-any.whl:
Publisher:
python-publish.yml on FOLIO-FSE/folio_data_import
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
folio_data_import-0.5.0b4-py3-none-any.whl -
Subject digest:
edb0fcd6d3ccdbf14e174f70d0dc58eacc96f8c69c1430430031173d931ae333 - Sigstore transparency entry: 771653516
- Sigstore integration time:
-
Permalink:
FOLIO-FSE/folio_data_import@530b7ce36383681f1b37c0b917d097dee6efa4aa -
Branch / Tag:
refs/tags/v0.5.0b4 - Owner: https://github.com/FOLIO-FSE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@530b7ce36383681f1b37c0b917d097dee6efa4aa -
Trigger Event:
release
-
Statement type: