Skip to main content

Bee ETL is a Python package for extracting data from one source, transforming it and loading it into another

Project description

Python Synchronization Engine based on Polars DataFrames

General Configuration

---
# Configuration for synchronizations in yaml
configVersion: 1
sync:
    # The datasource to retrieve data from
  - source:
      type: relational
      config:
        connectionString: "mysql://user:pass@server/Database"
      query: "SELECT * FROM table"
      model:
        id: int
        name: string
        age: int
    format:
      # No transformation done
      id: {}
      name: {}
      
      # Split and choose first in list
      firstName:
        type: string
        source: name
        transform:
          - action: split
            args:
              char: " "
          - action: index
            args:
              index: 0
      # Split and choose last in list
      lastName:
        type: string
        source: name
        transform:
          - action: split
            args:
              - " "
          - action: index
            args:
              - "-1"
      
      # Generate an uuid
      uid:
        type: string
        source: false
        generate:
          type: uuid
      
      # Generate a random street address
      street:
        type: string
        source: false
        generate:
          type: faker
          args:
            - street_address

      # Use a custom function for formatting
      corporateId:
        type: string
        source: name,uuid
        custom:
          - class: CustomClass
            function: generateCorporateId
    
    # Where to put the data
    destination:
        type: relational
        config:
          connectionString: "mysql://user:pass@server/Database"
        table: "table"
        # Whether to insert, update, delete or all
        modes:
          insert: true
          update: true
          delete: true
        # Whether to use soft delete, set to false if you want to delete the rows
        softDelete:
            enabled: true
            field: deleted
            value: true
        fieldMapping:
          id: id
          name: name
          firstName: firstName
          lastName: lastName
          uid: uid
          street: street
          corporateId: corporateId
        uniqueKeys:
          - id
        preventUpdate:
          - corporateId

Data Sources

SQL/Relational Databases

...
source:
  type: relational
  config:
    connectionString: "mysql://user:pass@server/Database"
  query: "SELECT * FROM table"
...

Files

...
source:
  type: file
  config:
    path: "path/to/file"
    charset: utf-8
    format: json
...

Urls

...
source:
  type: url
  config:
    url: "https://example.com/file.json"
    type: get (post/patch/delete...)
    basicAuth: false
    headers: {}
    authConfig:
      username: "username"
      password: "password"
...
authType: basic
authConfig:
    username: "username"
    password: "password"
...
authType: header
authConfig:
    header-1: "content"
    header-2: "content"
...
authType: certificate
authConfig:
    path: "path/to/certificate"
    key: "keyphrase"
...
authType: none

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

beetl-0.1.0.tar.gz (5.8 kB view hashes)

Uploaded Source

Built Distribution

beetl-0.1.0-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page