Skip to main content

Async Jira data pipeline supporting Cloud and Data Center, with pluggable multi-protocol output

Project description

jira-ingest

PyPI version Python versions Licence: MIT

Async Jira data pipeline for Data Center and Cloud. Fetches projects, releases, boards, issues, and transitions; writes Parquet, CSV, or JSON Lines to local disk, S3, Azure Blob, or GCS; and optionally loads records into any SQLAlchemy-compatible database.

Features

  • Dual-mode: Jira Data Center (Bearer PAT + optional mTLS) and Jira Cloud (Basic Auth)
  • Async fetching with concurrent project processing, in-memory caching, and exponential-backoff retry
  • Configurable custom fields extraction -- map any customfield_XXXXX to a logical name
  • PII hashing for assignee and author fields
  • Output formats: Parquet (Snappy), CSV, JSON Lines
  • Output destinations: local filesystem, S3, Azure Blob, GCS via fsspec
  • Pluggable database loader: PostgreSQL, Redshift (with S3 COPY fast path), Snowflake, DuckDB, SQLite
  • Click CLI with run and validate commands
  • Pydantic v2 settings and data schemas
  • ruff + mypy strict + pre-commit + GitHub Actions CI

Quick start

pip install pipewell-jira-ingest
cp .env.example .env   # edit with your Jira URL and credentials
jira-ingest validate   # confirm connectivity
jira-ingest run        # fetch everything and write to ./output

For database loading, install the optional extra:

pip install "pipewell-jira-ingest[database]"   # PostgreSQL, SQLite, etc.
pip install "pipewell-jira-ingest[redshift]"   # Redshift with S3 COPY fast path

Documentation

Guide Description
Authentication Jira Cloud vs Data Center, PAT vs Basic Auth, mTLS certificates, scoping by project
Output sinks Local filesystem, S3, Azure Blob, GCS -- URIs, auth options, output layout
Database loading PostgreSQL, Redshift S3 COPY, Snowflake, DuckDB, SQLite; programmatic API
Custom fields Mapping customfield_XXXXX IDs to logical names, finding field IDs

Configuration reference

All settings are read from environment variables (or a .env file) with the prefix JIRA_.

Variable Default Description
JIRA_MODE cloud cloud or dc
JIRA_URL required Jira base URL
JIRA_API_TOKEN required API token (Cloud) or PAT (DC)
JIRA_EMAIL required for Cloud Account email
JIRA_CERT_PEM Base64-encoded PEM for mTLS (DC only)
JIRA_PROJECT_KEYS all projects Comma-separated project keys to scope the run
JIRA_OUTPUT_FORMAT parquet parquet, csv, or jsonl
JIRA_SINK_URI ./output fsspec URI for output destination
JIRA_SINK_OPTIONS {} JSON dict of auth options forwarded to fsspec
JIRA_CUSTOM_FIELDS {} JSON dict mapping logical name to Jira field ID
JIRA_LOG_LEVEL INFO Log verbosity
DATABASE_URL SQLAlchemy URL to load into a database after writing
DATABASE_SCHEMA Target schema (used with DATABASE_URL)
REDSHIFT_IAM_ROLE IAM role ARN for Redshift S3 COPY

CLI

jira-ingest run [OPTIONS]

  --env-file TEXT            Path to .env file  [default: .env]
  --start-date TEXT          Filter issues from date (YYYY-MM-DD)
  --end-date TEXT            Filter issues until date (YYYY-MM-DD)
  --date-suffix TEXT         Output file date suffix  [default: today]
  --database-url TEXT        SQLAlchemy URL to load into a database
  --db-schema TEXT           Target database schema
  --redshift-iam-role TEXT   IAM role ARN for Redshift S3 COPY

jira-ingest validate [OPTIONS]

  --env-file TEXT            Path to .env file  [default: .env]

Output layout

{JIRA_SINK_URI}/
  issues/issues_{date}.parquet
  projects/projects_{date}.parquet
  releases/releases_{date}.parquet
  boards/boards_{date}.parquet
  transitions/transitions_{date}.parquet

Development

pip install -e ".[dev]"
pre-commit install
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pipewell_jira_ingest-1.0.1.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pipewell_jira_ingest-1.0.1-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file pipewell_jira_ingest-1.0.1.tar.gz.

File metadata

  • Download URL: pipewell_jira_ingest-1.0.1.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for pipewell_jira_ingest-1.0.1.tar.gz
Algorithm Hash digest
SHA256 3d127a97c63a3a6c1c962b57a771338c0b30e04f993b3959499879a726435b26
MD5 3d1afc3654bbe002a7b170a6dbd0c7d8
BLAKE2b-256 9d08fcaaca78eba2af887a1bb70a64d102f8b23c76f0ba4c7082ce6ebcea3c16

See more details on using hashes here.

File details

Details for the file pipewell_jira_ingest-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pipewell_jira_ingest-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 034979523efc0e9ec5be0ba5131deaab1becb2e776229852be0ea4f5f8a7358f
MD5 0594ada795f33f972d5d3c8ee3f3fc25
BLAKE2b-256 d530511923b8bd2017c87b6a99ec018e483bdd565b86dc3ad86b323f52cf82f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page