Skip to main content

Helps Python and Django projects import data exposed by Data Flow into a S3 bucket

Project description

Data Flow S3 Importer

This package helps Python and Django projects import data exposed by Data Flow into an S3 bucket.

Data Flow is a data pipeline service that can be made to write data into S3 buckets for ingestion by client applications.

This package will use boto to connect to the given bucket, find the right location within the bucket and read the list of files in there.

It will then take a single file and process it line by line, expecting a JSON object with an 'object' key containing a single entity on each line:

...
{"object": {...}}
...

It is possible to override the base class for plain python projects, or the subclass for Django projects.

The subclass will process each object into an instance of the given model, which will be saved to the DB.

If the model inherits from the provided IngestedModel, any instances not included in the most recent fetch will be flagged as deleted upstream and won't by default appear in the queryset, although they won't be deleted.

Usage

Make a subclass for each of the record sets you want to import.

Plain python

If you're not using Django, or you want full control over how your models are synced (for example you don't want to use queryset methods to update them) then you should subclass the DataFlowS3Ingest ingester class.

This class provides various hooks for configuration and processing; the config you'll need can be applied in the subclass attributes as follows

from data_flow_s3_import.ingest import DataFlowS3Ingest

class MyIngest(DataFlowS3Ingest):
    export_bucket = "my_bucket_name"
    export_path = "bucket_import_type_prefix"
    export_directory = "ingested_data_prefix/"

    def get_s3_resource(self):
        # this should return a configured and instantiated boto3 S3 resource

You will then want to override process_object and/or the other hooks in the class provided as suits your requirements.

Instantiating the class will run the ingestion automatically.

Standard Django models

If you're using Django and want to have your import process automated, start by making a custom model in your app, extending IngestedModel

from data_flow_s3_import.models import IngestedModel

class MyIngestedModel(IngestedModel):
    ...

You will also want to subclass the DataFlowS3IngestToModel importer, setting the mapping dictionary with the key being the model field name and the value being the imported data column name

from data_flow_s3_import.ingest import DataFlowS3IngestToModel

class MyModelIngest(DataFlowS3IngestToModel):
    model = MyIngestedModel
    mapping = {
        "id": "importedColumns:id",
        "name": "importedColumns:NameField",
    }

And then simply instantiate your class and the ingestion will run automatically, syncing your models with the ingested records

MyModelIngest(s3_resource=boto_s3_instance, bucket_name="my_bucket")

You will also need to configure the S3, bucket and path information as in the plain python implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_flow_s3_import-0.0.3.tar.gz (9.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_flow_s3_import-0.0.3-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file data_flow_s3_import-0.0.3.tar.gz.

File metadata

  • Download URL: data_flow_s3_import-0.0.3.tar.gz
  • Upload date:
  • Size: 9.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.1 CPython/3.12.2 Darwin/24.4.0

File hashes

Hashes for data_flow_s3_import-0.0.3.tar.gz
Algorithm Hash digest
SHA256 e791dd569f022f8d6f433060b6261f19a3b0d1f3ab6c38479051d5b481f49026
MD5 ff1360b402e23cd1857043199822cdfb
BLAKE2b-256 b3cd9eee02fe635da6ede444c741e8d7606f6c9f4f5f8ea83e64cd8e5a02f5e0

See more details on using hashes here.

File details

Details for the file data_flow_s3_import-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for data_flow_s3_import-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2055c2575368343b4f00180388bda7d21187c7e1db5bde81e5edfd67bcbf6e60
MD5 edac069284ca9cf1a20517d7c98a3d61
BLAKE2b-256 caf9e30ab61e1ffdf3d071779dbc0fea54b9715e78e11c1fdef8d88ae4a3a0c5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page