Async Jira data pipeline supporting Cloud and Data Center, with pluggable multi-protocol output
Project description
jira-ingest
Async Jira data pipeline for Data Center and Cloud. Fetches projects, releases, boards, issues, and transitions; writes Parquet, CSV, or JSON Lines to local disk, S3, Azure Blob, or GCS; and optionally loads records into any SQLAlchemy-compatible database.
Features
- Dual-mode: Jira Data Center (Bearer PAT + optional mTLS) and Jira Cloud (Basic Auth)
- Async fetching with concurrent project processing, in-memory caching, and exponential-backoff retry
- Configurable custom fields extraction -- map any
customfield_XXXXXto a logical name - PII hashing for assignee and author fields
- Output formats: Parquet (Snappy), CSV, JSON Lines
- Output destinations: local filesystem, S3, Azure Blob, GCS via fsspec
- Pluggable database loader: PostgreSQL, Redshift (with S3 COPY fast path), Snowflake, DuckDB, SQLite
- Click CLI with
runandvalidatecommands - Pydantic v2 settings and data schemas
- ruff + mypy strict + pre-commit + GitHub Actions CI
Quick start
pip install pipewell-jira-ingest
cp .env.example .env # edit with your Jira URL and credentials
jira-ingest validate # confirm connectivity
jira-ingest run # fetch everything and write to ./output
For database loading, install the optional extra:
pip install "pipewell-jira-ingest[database]" # PostgreSQL, SQLite, etc.
pip install "pipewell-jira-ingest[redshift]" # Redshift with S3 COPY fast path
Documentation
| Guide | Description |
|---|---|
| Authentication | Jira Cloud vs Data Center, PAT vs Basic Auth, mTLS certificates, scoping by project |
| Output sinks | Local filesystem, S3, Azure Blob, GCS -- URIs, auth options, output layout |
| Database loading | PostgreSQL, Redshift S3 COPY, Snowflake, DuckDB, SQLite; programmatic API |
| Custom fields | Mapping customfield_XXXXX IDs to logical names, finding field IDs |
Configuration reference
All settings are read from environment variables (or a .env file) with the prefix JIRA_.
| Variable | Default | Description |
|---|---|---|
JIRA_MODE |
cloud |
cloud or dc |
JIRA_URL |
required | Jira base URL |
JIRA_API_TOKEN |
required | API token (Cloud) or PAT (DC) |
JIRA_EMAIL |
required for Cloud | Account email |
JIRA_CERT_PEM |
Base64-encoded PEM for mTLS (DC only) | |
JIRA_PROJECT_KEYS |
all projects | Comma-separated project keys to scope the run |
JIRA_OUTPUT_FORMAT |
parquet |
parquet, csv, or jsonl |
JIRA_SINK_URI |
./output |
fsspec URI for output destination |
JIRA_SINK_OPTIONS |
{} |
JSON dict of auth options forwarded to fsspec |
JIRA_CUSTOM_FIELDS |
{} |
JSON dict mapping logical name to Jira field ID |
JIRA_LOG_LEVEL |
INFO |
Log verbosity |
DATABASE_URL |
SQLAlchemy URL to load into a database after writing | |
DATABASE_SCHEMA |
Target schema (used with DATABASE_URL) |
|
REDSHIFT_IAM_ROLE |
IAM role ARN for Redshift S3 COPY |
CLI
jira-ingest run [OPTIONS]
--env-file TEXT Path to .env file [default: .env]
--start-date TEXT Filter issues from date (YYYY-MM-DD)
--end-date TEXT Filter issues until date (YYYY-MM-DD)
--date-suffix TEXT Output file date suffix [default: today]
--database-url TEXT SQLAlchemy URL to load into a database
--db-schema TEXT Target database schema
--redshift-iam-role TEXT IAM role ARN for Redshift S3 COPY
jira-ingest validate [OPTIONS]
--env-file TEXT Path to .env file [default: .env]
Output layout
{JIRA_SINK_URI}/
issues/issues_{date}.parquet
projects/projects_{date}.parquet
releases/releases_{date}.parquet
boards/boards_{date}.parquet
transitions/transitions_{date}.parquet
Development
pip install -e ".[dev]"
pre-commit install
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pipewell_jira_ingest-1.0.1.tar.gz.
File metadata
- Download URL: pipewell_jira_ingest-1.0.1.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d127a97c63a3a6c1c962b57a771338c0b30e04f993b3959499879a726435b26
|
|
| MD5 |
3d1afc3654bbe002a7b170a6dbd0c7d8
|
|
| BLAKE2b-256 |
9d08fcaaca78eba2af887a1bb70a64d102f8b23c76f0ba4c7082ce6ebcea3c16
|
File details
Details for the file pipewell_jira_ingest-1.0.1-py3-none-any.whl.
File metadata
- Download URL: pipewell_jira_ingest-1.0.1-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
034979523efc0e9ec5be0ba5131deaab1becb2e776229852be0ea4f5f8a7358f
|
|
| MD5 |
0594ada795f33f972d5d3c8ee3f3fc25
|
|
| BLAKE2b-256 |
d530511923b8bd2017c87b6a99ec018e483bdd565b86dc3ad86b323f52cf82f9
|