Skip to main content

GeoLibs data ingestor

Project description

Glutemulo

A HA geo socio demo data ingestor

Usage

Read de examples files.

We use environ vars. See Environ vars file example for complete list, and examples.

Using producer to upload data to kafka

See python examples bellow. You must produce a dict with column_mame:value

Using the ingestor consumer

Use gluto docker and fill enviroment vars.

Select the backend using GLUTEMULO_BACKEND and specific vars for it (database, host, etc). You can select 2 backends: postgres or carto See Environ vars file example for complete list.

Then set:

  1. GLUTEMULO_INGESTOR_DATASET
    Table to upload data
  2. GLUTEMULO_INGESTOR_DATASET_COLUMNS
    Comma separted list of column names

Now, create the table on backend or set GLUTEMULO_INGESTOR_DATASET_DDL and GLUTEMULO_INGESTOR_DATASET_AUTOCREATE=False

Then configure ingestor for kafka. First read the python-kafka doc and then use the following vars:

  1. GLUTEMULO_INGESTOR_TOPIC
    Topic to use
  2. GLUTEMULO_INGESTOR_BOOTSTRAP_SERVERS
    List of servers to connect
  3. GLUTEMULO_INGESTOR_GROUP_ID
    Group id.
  4. GLUTEMULO_INGESTOR_AUTO_OFFSET_RESET
    latest or earliest.
  5. GLUTEMULO_INGESTOR_MAX_POLL_RECORDS
    The maximum number of records returned in a batch of messages
  6. GLUTEMULO_INGESTOR_FETCH_MIN_BYTES
    Minimum amount of data the server should return for a fetch request, otherwise wait up to fetch_max_wait_ms for more data to accumulate. Default: 1

For the docker, we include a example docker-compose file. Remember you can scale with same group_id

docker-compose scale gluto=3

Run flask demo

$ FLASK_ENV=development flask run
 * Environment: development
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 194-409-049

Test

$ http -j POST localhost:5000/v1/ uno=1 dos=2`
HTTP/1.0 201 CREATED
Content-Length: 13
Content-Type: text/html; charset=utf-8
Date: Thu, 02 May 2019 14:56:07 GMT
Server: Werkzeug/0.15.2 Python/3.7.2

DATA Received

Producer / Consumer

Kafka + json

Async producer:

from glutemulo.kafka.producer import JsonKafka
productor = JsonKafka(bootstrap_servers="localhost:9092")
future = productor.produce('simple-topic', dict(dos='BB'))

Consumer in batches:

from glutemulo.kafka.consumer import JsonKafka
consumer = JsonKafka('simple-topic', bootstrap_servers="localhost:9092")
for msg in consumer.consume():
    for msg in messages:
        print(msg)

Kafka + Avro

sync producer:

SCHEMA = {
    "type": "record",
    "name": "simpledata",
    "doc": "This is a sample Avro schema to get you started.",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "number1", "type": "int"},
    ],
}
SCHEMA_ID = 1
from glutemulo.kafka.producer import AvroKafka as Producer
productor = Producer(SCHEMA, SCHEMA_ID,bootstrap_servers="localhost:9092")
future = productor.produce('simple-topic-avro', dict(name='Un nombre', number1=10))

Consumer:

from glutemulo.kafka.consumer import AvroKafka as Consumer
consumer = Consumer('simple-topic-avro', SCHEMA, SCHEMA_ID, bootstrap_servers="localhost:9092")
for messages in consumer.consume():
    for msg in messages:
        print(msg)

For testing

You can setup a Kafka Consumer using the kafka-console-consumer script that comes with Kafka.

$ bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.240:9092 --topic pylog --from-beginning

this is an awsome log

Testing With KafkaCat

You ca use an application called KafkaCat.

After the application is installed we will run it in consumer mode (which is the default).

kafkacat -b 192.168.240.41:9092 -t one-test

This should not show anything yet because we haven't sent anything to our topic yet...

To send stuff we can copy any text file into our current directory and send it to our Kafka Topic. In another window, run the following command.

$ cat README | kafkacat -b 192.168.240.41 -t one-test

You should see the output in the first window which has KafkaCat still running in consumer mode.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geolibs-glutemulo-0.1.3.tar.gz (13.2 kB view hashes)

Uploaded Source

Built Distribution

geolibs_glutemulo-0.1.3-py3-none-any.whl (15.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page