Skip to main content

Orquestador de consultas sobre SECOP desde el API del portal de datos abiertos.

Project description

pysecop 🇨🇴

Python 3.9+ License: MIT

pysecop is a high-performance Python package designed to interact seamlessly with Colombia's Public Procurement Data (SECOP I & II).

It abstracts the complexity of the Socrata (SODA) API, handles messy government data cleaning, and provides a fluent interface for building complex queries that are ready for Machine Learning and Big Data pipelines.


🚀 Why pysecop?

Public procurement data is the foundation of transparency and market intelligence. However, raw government APIs often return inconsistent formats, "polluted" URL strings, and fragmented schemas. pysecop solves this by providing:

  • 🏗️ Fluent SoQL Builder: Build complex Socrata queries without writing a single line of raw SQL.
  • 🧹 Automated Data Hygiene: Pre-configured processors for dates, URLs, and categorical encoding.
  • 🔗 Unified Schema: High-level methods to join data across SECOP I and SECOP II seamlessly.
  • 🐳 Production Ready: Fully Dockerized and tested for mission-critical ETL environments.

🛠️ Quick Start

Installation

pip install pysecop

Unified Search (SECOP I & II)

The most powerful feature of pysecop is the ability to search across both SECOP I and SECOP II with a single command and get a single, consolidated DataFrame. The engine includes Intelligent Input Resilience, allowing you to provide formatted IDs (like NITs with dashes) that are automatically cleaned for the backend.

from pysecop import SecopClient

client = SecopClient()

# Search by NIT across both datasets simultaneously (automatic ID cleaning)
df = client.search(nit_entidad="900000000-1")

# The result is a single, consolidated "Matrix-in-Blocks" DataFrame
print(df[["source", "nombre_entidad", "valor_del_contrato", "estado_contrato"]].head())

Parallel Ingestion & Staggered Offsets (v1.2.1+)

For high-throughput pipelines (e.g., using Dagster or Airflow), pysecop now supports staggered offsets and automatic rate limit resilience. You can slice the 20M+ historical record matrix across multiple threads:

# Thread 1: Process first 50k
df1 = client.search(limit=50000, offset=0)

# Thread 2: Process next 50k (in parallel)
df2 = client.search(limit=50000, offset=50000)

[!TIP] Automatic Resilience: Version 1.2.1+ includes internal exponential backoff for 429 Too Many Requests status codes, allowing your ingestion workers to self-throttle without failing the pipeline.


🏛️ Project Architecture

The system follows a modular design to ensure scalability and ease of maintenance:

graph LR
    A[SecopClient] -->|Builds| B[QueryBuilder]
    A -->|Authenticates| C[Socrata API]
    C -->|Returns Raw| D[DataFrame]
    D -->|Refines| E[DataProcessor]
    E -->|Output| F[Analysis Ready Data]

For a deeper dive into the system design, check out the Architecture Deep Dive.


📂 Documentation Layers

  • ARCHITECTURE.md: Technical design, data flow, and architectural trade-offs.
  • GUIDE.md: Full API reference, installation, and extension guide.
  • USE_CASES.md: Business value, anti-corruption use cases, and market intelligence examples.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysecop-1.2.2.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysecop-1.2.2-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file pysecop-1.2.2.tar.gz.

File metadata

  • Download URL: pysecop-1.2.2.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pysecop-1.2.2.tar.gz
Algorithm Hash digest
SHA256 ccfd1df55dc73d85265f4e42f085f8f6250673ffc3d9914ff2c423cbab2e8b78
MD5 31011abebe459cc0022b13a1b4c5ef5d
BLAKE2b-256 1689aa0663a60d10e4471ccdcf299f736c37ac4f70cc2e88382f327a9863dcd0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysecop-1.2.2.tar.gz:

Publisher: python-publish.yml on 26-jorge-01/pysecop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pysecop-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: pysecop-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pysecop-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6e70607860a3d7c8a6c63bf560879bc89042d8ecdd4a2161913ea95df1804503
MD5 e9bdc1528a410331ddc081798ec7e409
BLAKE2b-256 66159442794d2f2279d3aa8914b7e62d1202bdcd785e6692ea0f988c4caec79d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pysecop-1.2.2-py3-none-any.whl:

Publisher: python-publish.yml on 26-jorge-01/pysecop

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page