Skip to main content

Async Jira data pipeline supporting Cloud and Data Center, with pluggable multi-protocol output

Project description

jira-ingest

Async Jira data pipeline for Data Center and Cloud. Fetches projects, releases, boards, issues, and transitions; writes Parquet, CSV, or JSON Lines to local disk, S3, Azure Blob, or GCS; and optionally loads records into any SQLAlchemy-compatible database.

Features

  • Dual-mode: Jira Data Center (Bearer PAT + optional mTLS) and Jira Cloud (Basic Auth)
  • Async fetching with concurrent project processing, in-memory caching, and exponential-backoff retry
  • Configurable custom fields extraction -- map any customfield_XXXXX to a logical name
  • PII hashing for assignee and author fields
  • Output formats: Parquet (Snappy), CSV, JSON Lines
  • Output destinations: local filesystem, S3, Azure Blob, GCS via fsspec
  • Pluggable database loader: PostgreSQL, Redshift (with S3 COPY fast path), Snowflake, DuckDB, SQLite
  • Click CLI with run and validate commands
  • Pydantic v2 settings and data schemas
  • ruff + mypy strict + pre-commit + GitHub Actions CI

Quick start

pip install -e ".[dev]"
cp .env.example .env   # edit with your Jira URL and credentials
jira-ingest validate   # confirm connectivity
jira-ingest run        # fetch everything and write to ./output

Documentation

Guide Description
Authentication Jira Cloud vs Data Center, PAT vs Basic Auth, mTLS certificates, scoping by project
Output sinks Local filesystem, S3, Azure Blob, GCS -- URIs, auth options, output layout
Database loading PostgreSQL, Redshift S3 COPY, Snowflake, DuckDB, SQLite; programmatic API
Custom fields Mapping customfield_XXXXX IDs to logical names, finding field IDs

Configuration reference

All settings are read from environment variables (or a .env file) with the prefix JIRA_.

Variable Default Description
JIRA_MODE cloud cloud or dc
JIRA_URL required Jira base URL
JIRA_API_TOKEN required API token (Cloud) or PAT (DC)
JIRA_EMAIL required for Cloud Account email
JIRA_CERT_PEM Base64-encoded PEM for mTLS (DC only)
JIRA_PROJECT_KEYS all projects Comma-separated project keys to scope the run
JIRA_OUTPUT_FORMAT parquet parquet, csv, or jsonl
JIRA_SINK_URI ./output fsspec URI for output destination
JIRA_SINK_OPTIONS {} JSON dict of auth options forwarded to fsspec
JIRA_CUSTOM_FIELDS {} JSON dict mapping logical name to Jira field ID
JIRA_LOG_LEVEL INFO Log verbosity
DATABASE_URL SQLAlchemy URL to load into a database after writing
DATABASE_SCHEMA Target schema (used with DATABASE_URL)
REDSHIFT_IAM_ROLE IAM role ARN for Redshift S3 COPY

CLI

jira-ingest run [OPTIONS]

  --env-file TEXT            Path to .env file  [default: .env]
  --start-date TEXT          Filter issues from date (YYYY-MM-DD)
  --end-date TEXT            Filter issues until date (YYYY-MM-DD)
  --date-suffix TEXT         Output file date suffix  [default: today]
  --database-url TEXT        SQLAlchemy URL to load into a database
  --db-schema TEXT           Target database schema
  --redshift-iam-role TEXT   IAM role ARN for Redshift S3 COPY

jira-ingest validate [OPTIONS]

  --env-file TEXT            Path to .env file  [default: .env]

Output layout

{JIRA_SINK_URI}/
  issues/issues_{date}.parquet
  projects/projects_{date}.parquet
  releases/releases_{date}.parquet
  boards/boards_{date}.parquet
  transitions/transitions_{date}.parquet

Development

pip install -e ".[dev]"
pre-commit install
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipewell_jira_ingest-1.0.0.tar.gz (32.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipewell_jira_ingest-1.0.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file pipewell_jira_ingest-1.0.0.tar.gz.

File metadata

  • Download URL: pipewell_jira_ingest-1.0.0.tar.gz
  • Upload date:
  • Size: 32.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pipewell_jira_ingest-1.0.0.tar.gz
Algorithm Hash digest
SHA256 94d4bf6c684bd997812f99a863596f1f55d84d320f6c2a8bea9414258577122c
MD5 9314fd5d4dc123ddc374e763d1660f9c
BLAKE2b-256 9a3ac6fd737971eabb779cf4d67769d1559407bb2e3416e875b488959a6b1026

See more details on using hashes here.

File details

Details for the file pipewell_jira_ingest-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pipewell_jira_ingest-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2f0b7f33cfd31764ec8f9dab5b7459ecac973e7f06ff5c26ba146fbd428efbc5
MD5 0ab19529ff9eb3f9cf4e164f865f1277
BLAKE2b-256 6eedaad4fc3d46ec3ffb23fdb7404759eda8e102d0dbcf8c1abc631bd9cfd6cb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page