Skip to main content

Export

Project description

Export CSV To Influx

Export CSV To Influx: Process CSV data, and write the data to influx db

Support:

  • Influx 0.x, 1.x
  • influx 2.x: Start supporting from 0.2.0

Important Note: Influx 2.x has build-in csv write feature, it is more powerful: https://docs.influxdata.com/influxdb/v2.1/write-data/developer-tools/csv/

Install

Use the pip to install the library. Then the binary export_csv_to_influx is ready.

pip install ExportCsvToInflux

Features

  1. [Highlight :star2::tada::heart_eyes:] Allow to use binary export_csv_to_influx to run exporter
  2. [Highlight :star2::tada::heart_eyes:] Allow to check dozens of csv files in a folder
  3. [Highlight :star2::tada::heart_eyes::confetti_ball::four_leaf_clover::balloon:] Auto convert csv data to int/float/string in Influx
  4. [Highlight :star2::tada::heart_eyes:] Allow to match or filter the data by using string or regex.
  5. [Highlight :star2::tada::heart_eyes:] Allow to count, and generate count measurement
  6. Allow to limit string length in Influx
  7. Allow to judge the csv has new data or not
  8. Allow to use the latest file modify time as time column
  9. Auto Create database if not exist
  10. Allow to drop database before inserting data
  11. Allow to drop measurements before inserting data

Command Arguments

You could use export_csv_to_influx -h to see the help guide.

Note:

  1. You could pass * to --field_columns to match all the fields: --field_columns=*, --field_columns '*'
  2. CSV data won't insert into influx again if no update. Use to force insert, default True: --force_insert_even_csv_no_update=True, --force_insert_even_csv_no_update True
  3. If some csv cells have no value, auto fill the influx db based on column data type: int: -999, float: -999.0, string: -
# Option Mandatory Default Description
1 -c, --csv Yes CSV file path, or the folder path
2 -db, --dbname For 0.x, 1.x only: Yes InfluxDB Database name
3 -u, --user For 0.x, 1.x only: No admin InfluxDB User name
4 -p, --password For 0.x, 1.x only: No admin InfluxDB Password
5 -org, --org For 2.x only: No my-org For 2.x only, my org
6 -bucket, --bucket For 2.x only: No my-bucket For 2.x only, my bucket
7 -http_schema, --http_schema For 2.x only: No http For 2.x only, influxdb http schema, could be http or https
8 -token, --token For 2.x only: Yes For 2.x only, n
9 -m, --measurement Yes Measurement name
10 -fc, --field_columns Yes List of csv columns to use as fields, separated by comma
11 -tc, --tag_columns No None List of csv columns to use as tags, separated by comma
12 -d, --delimiter No , CSV delimiter
13 -lt, --lineterminator No \n CSV lineterminator
14 -s, --server No localhost:8086 InfluxDB Server address
15 -t, --time_column No timestamp Timestamp column name. If no timestamp column, the timestamp is set to the last file modify time for whole csv rows. Note: Also support the pure timestamp, like: 1517587275. Auto detected
16 -tf, --time_format No %Y-%m-%d %H:%M:%S Timestamp format, see more: https://strftime.org/
17 -tz, --time_zone No UTC Timezone of supplied data
18 -b, --batch_size No 500 Batch size when inserting data to influx
19 -lslc, --limit_string_length_columns No None Limit string length column, separated by comma
20 -ls, --limit_length No 20 Limit length
21 -dd, --drop_database Compatible with 2.x: No False Drop database or bucket before inserting data
22 -dm, --drop_measurement No False Drop measurement before inserting data
23 -mc, --match_columns No None Match the data you want to get for certain columns, separated by comma. Match Rule: All matches, then match
24 -mbs, --match_by_string No None Match by string, separated by comma
25 -mbr, --match_by_regex No None Match by regex, separated by comma
26 -fic, --filter_columns No None Filter the data you want to filter for certain columns, separated by comma. Filter Rule: Any one filter success, the filter
27 -fibs, --filter_by_string No None Filter by string, separated by comma
28 -fibr, --filter_by_regex No None Filter by regex, separated by comma
29 -ecm, --enable_count_measurement No False Enable count measurement
30 -fi, --force_insert_even_csv_no_update No True Force insert data to influx, even csv no update
31 -fsc, --force_string_columns No None Force columns as string type, separated as comma
32 -fintc, --force_int_columns No None Force columns as int type, separated as comma
33 -ffc, --force_float_columns No None Force columns as float type, separated as comma
34 -uniq, --unique No False Write duplicated points
35 --csv_charset, --csv_charset No None The csv charset. Default: None, which will auto detect

Programmatically

Also, we could run the exporter programmatically.

from ExportCsvToInflux import ExporterObject

exporter = ExporterObject()
exporter.export_csv_to_influx(...)

# You could get the export_csv_to_influx parameter details by:
print(exporter.export_csv_to_influx.__doc__)

Sample

  1. Here is the demo.csv
timestamp,url,response_time
2022-03-08 02:04:05,https://jmeter.apache.org/,1.434
2022-03-08 02:04:06,https://jmeter.apache.org/,2.434
2022-03-08 02:04:07,https://jmeter.apache.org/,1.200
2022-03-08 02:04:08,https://jmeter.apache.org/,1.675
2022-03-08 02:04:09,https://jmeter.apache.org/,2.265
2022-03-08 02:04:10,https://sample-demo.org/,1.430
2022-03-08 03:54:13,https://sample-show.org/,1.300
2022-03-07 04:06:00,https://sample-7.org/,1.289
2022-03-07 05:45:34,https://sample-8.org/,2.876
  1. Command samples
# Description Influx 0.x, 1.x Influx 2.x
1 Write whole data into influx
export_csv_to_influx \
--csv demo.csv \
--dbname demo \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--user admin \
--password admin \
--server 127.0.0.1:8086
 export_csv_to_influx \
--csv demo.csv \
--org my-org \
--bucket my-bucket \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--token YourToken \
--server 127.0.0.1:8086
2 Write whole data into influx, but: drop database or bucket
export_csv_to_influx \
--csv demo.csv \
--dbname demo \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--user admin \
--password admin \
--server 127.0.0.1:8086 \
--drop_database=True
 // The Read/Write API Token cannot create bucket. Before you using the --drop_database, make sure your toke have the access  
// See the bug here: https://github.com/influxdata/influxdb/issues/23170
export_csv_to_influx \
--csv demo.csv \
--org my-org \
--bucket my-bucket \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--token YourToken \
--server 127.0.0.1:8086 \
--drop_database=True
3 Write part of data: timestamp matches 2022-03-07 and url matches sample-\d+
export_csv_to_influx \
--csv demo.csv \
--dbname demo \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--user admin \
--password admin \
--server 127.0.0.1:8086 \
--drop_database=True \
--match_columns=timestamp,url \
--match_by_reg='2022-03-07,sample-\d+'
export_csv_to_influx \
--csv demo.csv \
--org my-org \
--bucket my-bucket \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--token YourToken \
--server 127.0.0.1:8086 \
--drop_measurement=True \
--match_columns=timestamp,url \
--match_by_reg='2022-03-07,sample-\d+'
4 Filter part of data, and write into influx: url filters sample
export_csv_to_influx \
--csv demo.csv \
--dbname demo \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--user admin \
--password admin \
--server 127.0.0.1:8086 \
--drop_database True \
--filter_columns url \
--filter_by_reg 'sample'
export_csv_to_influx \
--csv demo.csv \
--org my-org \
--bucket my-bucket \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--token YourToken \
--server 127.0.0.1:8086 \
--drop_measurement=True \
  --filter_columns url \
--filter_by_reg 'sample'
5 Enable count measurement. A new measurement named: demo.count generated, with match: timestamp matches 2022-03-07 and url matches sample-\d+
export_csv_to_influx \
--csv demo.csv \
--dbname demo \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--user admin \
--password admin \
--server 127.0.0.1:8086 \
--drop_database True \
--match_columns timestamp,url \
--match_by_reg '2022-03-07,sample-\d+' \
--enable_count_measurement True
export_csv_to_influx \
--csv demo.csv \
--org my-org \
--bucket my-bucket \
--measurement demo \
--tag_columns url \
--field_columns response_time \
--token YourToken \
--server 127.0.0.1:8086 \
--drop_measurement=True \
--match_columns=timestamp,url \
--match_by_reg='2022-03-07,sample-\d+' \
  --enable_count_measurement True
  1. If enable the count measurement, the count measurement is:

    // Influx 0.x, 1.x
    select * from "demo.count"
    
    name: demo.count
    time                match_timestamp match_url total
    ----                --------------- --------- -----
    1562957134000000000 3               2         9
    
    // Influx 2.x: For more info about Flux, see https://docs.influxdata.com/influxdb/v2.1/query-data/flux/
    influx query 'from(bucket:"my-bucket") |> range(start:-100h) |> filter(fn: (r) => r._measurement == "demo.count")' --raw
    
    #group,false,false,true,true,false,false,true,true
    #datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,long,string,string
    #default,_result,,,,,,,
    ,result,table,_start,_stop,_time,_value,_field,_measurement
    ,,2,2022-03-04T09:51:49.7425566Z,2022-03-08T13:51:49.7425566Z,2022-03-07T05:45:34Z,2,match_timestamp,demo.count
    ,,3,2022-03-04T09:51:49.7425566Z,2022-03-08T13:51:49.7425566Z,2022-03-07T05:45:34Z,2,match_url,demo.count
    ,,4,2022-03-04T09:51:49.7425566Z,2022-03-08T13:51:49.7425566Z,2022-03-07T05:45:34Z,9,total,demo.count
    

Special Thanks

The lib is inspired by: https://github.com/fabio-miranda/csv-to-influxdb

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ExportCsvToInflux-0.2.2.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

ExportCsvToInflux-0.2.2-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file ExportCsvToInflux-0.2.2.tar.gz.

File metadata

  • Download URL: ExportCsvToInflux-0.2.2.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.9

File hashes

Hashes for ExportCsvToInflux-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4806523e74ad24b319c15b6fd306bb8b75379d0c8863662c7b62d4538f9eb960
MD5 b97b017e95ac3e2eba846748e4e50ae5
BLAKE2b-256 550a030cf3982fcf296254af2b1dba740f77352a8fbc4ca6a8525f04f9d3904f

See more details on using hashes here.

File details

Details for the file ExportCsvToInflux-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: ExportCsvToInflux-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.9

File hashes

Hashes for ExportCsvToInflux-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e0024c1a0dceb4d36164c347220e93ca0d6ac314b67fde00a7a69fa625160f94
MD5 c7429162b485e3af4df0cb03b11197f5
BLAKE2b-256 d68572457e7ca4bd56b4c4b42415e4b8995a1dd8e3db2e91a986bb89da498930

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page