Skip to main content

GeoLibs data ingestor

Project description

Glutemulo

A HA geo socio demo data ingestor

Usage

Read de examples files.

We use environ vars. See Environ vars file example for complete list, and examples.

Using producer to upload data to kafka

See python examples bellow. You must produce a dict with column_mame:value

Using the ingestor consumer

Use gluto docker and fill enviroment vars.

Select the backend using GLUTEMULO_BACKEND and specific vars for it (database, host, etc). You can select 2 backends: postgres or carto See Environ vars file example for complete list.

Then set:

  1. GLUTEMULO_INGESTOR_DATASET
    Table to upload data
  2. GLUTEMULO_INGESTOR_DATASET_COLUMNS
    Comma separted list of column names

Now, create the table on backend or set GLUTEMULO_INGESTOR_DATASET_DDL and GLUTEMULO_INGESTOR_DATASET_AUTOCREATE=False

Then configure ingestor for kafka. First read the python-kafka doc and then use the following vars:

  1. GLUTEMULO_INGESTOR_TOPIC
    Topic to use
  2. GLUTEMULO_INGESTOR_BOOTSTRAP_SERVERS
    List of servers to connect
  3. GLUTEMULO_INGESTOR_GROUP_ID
    Group id.
  4. GLUTEMULO_INGESTOR_AUTO_OFFSET_RESET
    latest or earliest.
  5. GLUTEMULO_INGESTOR_MAX_POLL_RECORDS
    The maximum number of records returned in a batch of messages
  6. GLUTEMULO_INGESTOR_FETCH_MIN_BYTES
    Minimum amount of data the server should return for a fetch request, otherwise wait up to fetch_max_wait_ms for more data to accumulate. Default: 1

For the docker, we include a example docker-compose file. Remember you can scale with same group_id

docker-compose scale gluto=3

Run flask demo

$ FLASK_ENV=development flask run
 * Environment: development
 * Debug mode: on
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 194-409-049

Test

$ http -j POST localhost:5000/v1/ uno=1 dos=2`
HTTP/1.0 201 CREATED
Content-Length: 13
Content-Type: text/html; charset=utf-8
Date: Thu, 02 May 2019 14:56:07 GMT
Server: Werkzeug/0.15.2 Python/3.7.2

DATA Received

Producer / Consumer

Kafka + json

Async producer:

from glutemulo.kafka.producer import JsonKafka
productor = JsonKafka(bootstrap_servers="localhost:9092")
future = productor.produce('simple-topic', dict(dos='BB'))

Consumer in batches:

from glutemulo.kafka.consumer import JsonKafka
consumer = JsonKafka('simple-topic', bootstrap_servers="localhost:9092")
for msg in consumer.consume():
    for msg in messages:
        print(msg)

Kafka + Avro

sync producer:

SCHEMA = {
    "type": "record",
    "name": "simpledata",
    "doc": "This is a sample Avro schema to get you started.",
    "fields": [
        {"name": "name", "type": "string"},
        {"name": "number1", "type": "int"},
    ],
}
SCHEMA_ID = 1
from glutemulo.kafka.producer import AvroKafka as Producer
productor = Producer(SCHEMA, SCHEMA_ID,bootstrap_servers="localhost:9092")
future = productor.produce('simple-topic-avro', dict(name='Un nombre', number1=10))

Consumer:

from glutemulo.kafka.consumer import AvroKafka as Consumer
consumer = Consumer('simple-topic-avro', SCHEMA, SCHEMA_ID, bootstrap_servers="localhost:9092")
for messages in consumer.consume():
    for msg in messages:
        print(msg)

For testing

You can setup a Kafka Consumer using the kafka-console-consumer script that comes with Kafka.

$ bin/kafka-console-consumer.sh --bootstrap-server 192.168.1.240:9092 --topic pylog --from-beginning

this is an awsome log

Testing With KafkaCat

You ca use an application called KafkaCat.

After the application is installed we will run it in consumer mode (which is the default).

kafkacat -b 192.168.240.41:9092 -t one-test

This should not show anything yet because we haven't sent anything to our topic yet...

To send stuff we can copy any text file into our current directory and send it to our Kafka Topic. In another window, run the following command.

$ cat README | kafkacat -b 192.168.240.41 -t one-test

You should see the output in the first window which has KafkaCat still running in consumer mode.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geolibs-glutemulo-0.1.3.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

geolibs_glutemulo-0.1.3-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file geolibs-glutemulo-0.1.3.tar.gz.

File metadata

  • Download URL: geolibs-glutemulo-0.1.3.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.16 CPython/3.7.5 Linux/4.14.137+

File hashes

Hashes for geolibs-glutemulo-0.1.3.tar.gz
Algorithm Hash digest
SHA256 53af8d9eb0bef1756d63bca6ae51f02b4c4dd54e6b38e6bcdd9dd2e5d6faf5c6
MD5 6d0ab80a5090e27d6711493b0e04bf6b
BLAKE2b-256 a39ef94a2897eadc34f8345c3e37ac7289259e1e0578422064e062fba64b78ef

See more details on using hashes here.

File details

Details for the file geolibs_glutemulo-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for geolibs_glutemulo-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f84cf1230539ef4e9cf400a291da4c234bc6c7b42f42f16a65c196a551995716
MD5 ef0878751a5e458c9f7769c04b7544f8
BLAKE2b-256 406651e1e2f375bbd2e867329d69fa4f04fabea59bf043dde7b2da36589dab79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page