Create features with sql

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Project description

featsql

(EN) There are three kinds of functions that can be used to create features from numerical, categorical and aggregated data. They are:

For the SQLite database:
- sqlite_create_query_num(): Numerical variables such as mean, sum, etc., within the user-provided window of n months.
- sqlite_create_query_cat(): Categorical variables such as mean, sum, etc., within the user-provided window of n months.
- sqlite_create_query_agregada(): Numerical variables grouped by a specific categorical variable value with mean, sum, etc., within the user-provided window of n months.
For the MySQL database:
- mysql_create_query_num(): Numerical variables such as mean, sum, etc., within the user-provided window of n months.
- mysql_create_query_cat(): Categorical variables such as mean, sum, etc., within the user-provided window of n months.
- mysql_create_query_agregada(): Numerical variables grouped by a specific categorical variable value with mean, sum, etc., within the user-provided window of n months.
For the Snowflake database:
- snow_create_query_num(): Numerical variables such as mean, sum, etc., within the user-provided window of n months.
- snow_create_query_cat(): Categorical variables such as mean, sum, etc., within the user-provided window of n months.
- snow_create_query_agregada(): Numerical variables grouped by a specific categorical variable value with mean, sum, etc., within the user-provided window of n months.

Next, we have a detailed example for creating variables in SQLite and MySQL databases, along with an example output for the Snowflake database.

(PT) Existem três tipos de funções que podem ser usadas para criar variáveis a partir de dados numéricos, categóricos e agregados. São elas: - Para o banco SQLite:

+ ``sqlite_create_query_num()``: variáveis numéricas como média, soma e etc na janela de n meses fornecida pelo usuário.
+ ``sqlite_create_query_cat()``: variáveis categóricas como média, soma e etc na janela de n meses fornecida pelo usuário.
+ ``sqlite_create_query_agregada()``: variáveis numéricas agrupadas por uma valor específico de variável categórica com a média, soma e etc na janela de n meses fornecida pelo usuário.

Para o banco MySQL:
- mysql_create_query_num(): variáveis numéricas como média, soma e etc na janela de n meses fornecida pelo usuário.
- mysql_create_query_cat(): variáveis categóricas como média, soma e etc na janela de n meses fornecida pelo usuário.
- mysql_create_query_agregada(): variáveis numéricas agrupadas por uma valor específico de variável categórica com a média, soma e etc na janela de n meses fornecida pelo usuário.
Para o banco snowflake:
- snow_create_query_num(): variáveis numéricas como média, soma e etc na janela de n meses fornecida pelo usuário.
- snow_create_query_cat(): variáveis categóricas como média, soma e etc na janela de n meses fornecida pelo usuário.
- snow_create_query_agregada(): variáveis numéricas agrupadas por uma valor específico de variável categórica com a média, soma e etc na janela de n meses fornecida pelo usuário.

A seguir, temos um exemplo detalhado para a criação de variáveis no banco sqlite e mysql. Além de um exemplo de output para o banco do tipo snowflake.

Install

(EN) To install, simply use the command line:

pip install featsql

(PT) Para instalar, simplesmente use o comando:

pip install featsql

Imports

from featsql.featsqlite import *
from featsql.featmysql import *
from featsql.featsnow import *

import pandas as pd

import mysql.connector
import sqlite3
pd.set_option('display.max_columns', None)

Exemples

1. SQLITE

Setting up the engine (Configurando a engine)

url_db = "../../data/mydatabase.db" 
conn = sqlite3.connect(url_db)
cursor = conn.cursor()

Initial view of the public (Visão inicial do público)

(EN) First, let’s observe the format of the spine table.

(PT) Primeiro vamos observar o formato da tabela spine

df_spine = pd.read_sql("SELECT * FROM tb_spine", conn)
df_spine.head()

	ID	SAFRA_REF	Target
0	4	2023-02-01	0
1	5	2023-02-01	0
2	6	2023-02-01	0
3	7	2023-02-01	0
4	10	2023-02-01	1

Initial view of the variables table (Visão inicial da tabela de variáveis)

(EN) The table tb_feat contains 4 variables, two numeric and two categorical. Note that there are more unique IDs and dates available in this than in the spine table, as is often the case.

(PT) A tabela tb_feat contém 4 variáveis, duas sendo numéricas e duas categórica. Perceba que existem mais ID’s únicos e datas disponíveis nessa tabela do que na tabela spine, caso que ocorre no dia a dia.

df_data = pd.read_sql("SELECT * FROM tb_feat", conn)
df_data.head()

	ID	SAFRA	FEAT_NUM1	FEAT_NUM2	FEAT_CAT1	FEAT_CAT2
0	1	2023-01-01	-19	-52	A	B
1	2	2023-01-01	-7	-33	A	B
2	3	2023-01-01	6	91	C	C
3	4	2023-01-01	74	52	B	A
4	5	2023-01-01	79	77	B	B

Creation of numerical variables (Criação de variáveis numéricas)

(EN) The function sqlite_create_query_num() creates a query text for generating variables with the sum, minimum, maximum, and average operations on the variables listed in feat_num_lista and within the time window listed in list_window.

(PT) A função sqlite_create_query_num() cria um texto com a query para a criação de variáveis com as operações soma, mínimo, máximo e média das variáveis listadas em feat_num_lista e com a janela de tempo listada em list_window.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)

Complete query creation with no saved file.

df_num_sqlite = pd.read_sql(query_final_num_sqlite, conn)
df_num_sqlite.head()

	ID	SAFRA_REF	FEAT_NUM1_SUM_1M	FEAT_NUM1_MIN_1M	FEAT_NUM1_MAX_1M	FEAT_NUM1_AVG_1M	FEAT_NUM2_SUM_1M	FEAT_NUM2_MIN_1M	FEAT_NUM2_MAX_1M	FEAT_NUM2_AVG_1M	FEAT_NUM1_SUM_2M	FEAT_NUM1_MIN_2M	FEAT_NUM1_MAX_2M	FEAT_NUM1_AVG_2M	FEAT_NUM2_SUM_2M	FEAT_NUM2_MIN_2M	FEAT_NUM2_MAX_2M	FEAT_NUM2_AVG_2M	FEAT_NUM1_SUM_3M	FEAT_NUM1_MIN_3M	FEAT_NUM1_MAX_3M	FEAT_NUM1_AVG_3M	FEAT_NUM2_SUM_3M	FEAT_NUM2_MIN_3M	FEAT_NUM2_MAX_3M	FEAT_NUM2_AVG_3M
0	4	2023-02-01	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0	74	74	74	74.0	52	52	52	52.0	74	74	74	74.0	52	52	52	52.0
1	5	2023-02-01	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0	79	79	79	79.0	77	77	77	77.0	79	79	79	79.0	77	77	77	77.0
2	6	2023-02-01	-13.0	-13.0	-13.0	-13.0	-45.0	-45.0	-45.0	-45.0	-13	-13	-13	-13.0	-45	-45	-45	-45.0	-13	-13	-13	-13.0	-45	-45	-45	-45.0
3	7	2023-02-01	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0	-44	-44	-44	-44.0	0	0	0	0.0	-44	-44	-44	-44.0	0	0	0	0.0
4	10	2023-02-01	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0	11	11	11	11.0	-39	-39	-39	-39.0	11	11	11	11.0	-39	-39	-39	-39.0

(EN) We can use the nome_arquivo parameter with a name in the format ‘table.sql’ to save the creation query in a file in the same folder where it is executed.

(PT) Podemos utilizar o parâmetro nome_arquivo com um nome no formato “table.sql” para salvar query de criação em um arquivo na mesma pasta que é executado.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
file_name = 'table.sql'

query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)

Complete query creation with table.sql saved file

(EN) Besides saving the query to a file, we can create a table directly in the database using the [sqlite_create_query_num](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num) function with the creation parameters. Here is an example:

(PT) Além de salvar a query em um arquivo, podemos criar uma tabela diretamente no banco de dados a partir da função [sqlite_create_query_num](https://ravennaro.github.io/featsql/creation_sqlite.html#sqlite_create_query_num) com os parâmetros de criação. A seguir um exemplo:

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_num_sqlite = sqlite_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)

Complete query creation with table.sql saved file

(EN) To verify the creation, let’s check the database table using the name of the created table:

(PT) Para conferir a criação, vamos consultar o banco de dados com o nome da tabela que foi criada:

query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()

	ID	SAFRA_REF	FEAT_NUM1_SUM_1M	FEAT_NUM1_MIN_1M	FEAT_NUM1_MAX_1M	FEAT_NUM1_AVG_1M	FEAT_NUM2_SUM_1M	FEAT_NUM2_MIN_1M	FEAT_NUM2_MAX_1M	FEAT_NUM2_AVG_1M	FEAT_NUM1_SUM_2M	FEAT_NUM1_MIN_2M	FEAT_NUM1_MAX_2M	FEAT_NUM1_AVG_2M	FEAT_NUM2_SUM_2M	FEAT_NUM2_MIN_2M	FEAT_NUM2_MAX_2M	FEAT_NUM2_AVG_2M	FEAT_NUM1_SUM_3M	FEAT_NUM1_MIN_3M	FEAT_NUM1_MAX_3M	FEAT_NUM1_AVG_3M	FEAT_NUM2_SUM_3M	FEAT_NUM2_MIN_3M	FEAT_NUM2_MAX_3M	FEAT_NUM2_AVG_3M
0	4	2023-02-01	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0
1	5	2023-02-01	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0
2	6	2023-02-01	-13.0	-13.0	-13.0	-13.0	-45.0	-45.0	-45.0	-45.0	-13.0	-13.0	-13.0	-13.0	-45.0	-45.0	-45.0	-45.0	-13.0	-13.0	-13.0	-13.0	-45.0	-45.0	-45.0	-45.0
3	7	2023-02-01	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0
4	10	2023-02-01	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0

Creation of categorical variables (Criação de variáveis categóricas)

(EN) The function sqlite_create_query_cat() generates a query text for creating variables with the mode of each variable listed in feat_num_lista within the time window provided in list_window.

(PT) A função sqlite_create_query_cat() cria um texto com a query para a criação de variáveis com a moda de cada uma das variáveis listadas em feat_num_lista na janela de tempo fornecida em list_window.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]

query_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)

Complete query creation with no saved file.

df_cat_sqlite_sqlite = pd.read_sql(query_final_cat_sqlite, conn)
df_cat_sqlite_sqlite.head()

	ID	SAFRA_REF	FEAT_CAT1_MODA_1M	FEAT_CAT2_MODA_1M	FEAT_CAT1_MODA_3M	FEAT_CAT2_MODA_3M	FEAT_CAT1_MODA_6M	FEAT_CAT2_MODA_6M
0	4	2023-02-01	B	A	B	A	B	A
1	5	2023-02-01	B	B	B	B	B	B
2	6	2023-02-01	C	B	C	B	C	B
3	7	2023-02-01	B	C	B	C	B	C
4	10	2023-02-01	A	A	A	A	A	A

(EN) The function sqlite_create_query_cat() also allows saving a file with the created query and creating the table directly in the database. Here’s an example:

(PT) A função sqlite_create_query_cat() também permite salvar um arquivo com a query criada e criar a tabela diretamente no banco de dados. Segue um exemplo:

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_cat_sqlite = sqlite_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=conn)

Complete query creation with table.sql saved file

(EN) To verify the creation, let’s check the database table using the name of the created table:

(PT) Para conferir a criação, vamos consultar o banco de dados com o nome da tabela que foi criada:

query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()

	ID	SAFRA_REF	FEAT_CAT1_MODA_1M	FEAT_CAT2_MODA_1M	FEAT_CAT1_MODA_3M	FEAT_CAT2_MODA_3M	FEAT_CAT1_MODA_6M	FEAT_CAT2_MODA_6M
0	4	2023-02-01	B	A	B	A	B	A
1	5	2023-02-01	B	B	B	B	B	B
2	6	2023-02-01	C	B	C	B	C	B
3	7	2023-02-01	B	C	B	C	B	C
4	10	2023-02-01	A	A	A	A	A	A

Creation of aggregated variables (Criação de variáveis agregadas)

(EN) We can create features by aggregating on a specific value of a categorical variable and calculating sum, avg, min, and max operations for user-defined numerical variables. For example, for the variable FEAT_CAT1 having a value of A or B, we calculate sum, avg, min, and max operations for the variables FEAT_NUM1 and FEAT_NUM2. Here’s a use case:

(PT) Podemos criar variáveis agregando por valor específico de variável categórica e calculando as operações de sum, avg, min e max para as variáveis númericas definidas pelo usuário. Por exemplo, para a variável FEAT_CAT1 tendo valor igual a A ou B, calculamos as operações de sum, avg, min e max para as variáveis FEAT_NUM1 e FEAT_NUM2. Segue um caso de uso:

tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
list_aggregator_value = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'

query = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value)

Complete query creation with no saved file.

df_sqlite_agregada = pd.read_sql(query, conn)
df_sqlite_agregada.head()

	ID	SAFRA_REF	SUM_FEAT_NUM1_FEAT_CAT1_A_3M	MAX_FEAT_NUM1_FEAT_CAT1_A_3M	MIN_FEAT_NUM1_FEAT_CAT1_A_3M	AVG_FEAT_NUM1_FEAT_CAT1_A_3M	SUM_FEAT_NUM2_FEAT_CAT1_A_3M	MAX_FEAT_NUM2_FEAT_CAT1_A_3M	MIN_FEAT_NUM2_FEAT_CAT1_A_3M	AVG_FEAT_NUM2_FEAT_CAT1_A_3M	SUM_FEAT_NUM1_FEAT_CAT1_A_6M	MAX_FEAT_NUM1_FEAT_CAT1_A_6M	MIN_FEAT_NUM1_FEAT_CAT1_A_6M	AVG_FEAT_NUM1_FEAT_CAT1_A_6M	SUM_FEAT_NUM2_FEAT_CAT1_A_6M	MAX_FEAT_NUM2_FEAT_CAT1_A_6M	MIN_FEAT_NUM2_FEAT_CAT1_A_6M	AVG_FEAT_NUM2_FEAT_CAT1_A_6M	SUM_FEAT_NUM1_FEAT_CAT1_B_3M	MAX_FEAT_NUM1_FEAT_CAT1_B_3M	MIN_FEAT_NUM1_FEAT_CAT1_B_3M	AVG_FEAT_NUM1_FEAT_CAT1_B_3M	SUM_FEAT_NUM2_FEAT_CAT1_B_3M	MAX_FEAT_NUM2_FEAT_CAT1_B_3M	MIN_FEAT_NUM2_FEAT_CAT1_B_3M	AVG_FEAT_NUM2_FEAT_CAT1_B_3M	SUM_FEAT_NUM1_FEAT_CAT1_B_6M	MAX_FEAT_NUM1_FEAT_CAT1_B_6M	MIN_FEAT_NUM1_FEAT_CAT1_B_6M	AVG_FEAT_NUM1_FEAT_CAT1_B_6M	SUM_FEAT_NUM2_FEAT_CAT1_B_6M	MAX_FEAT_NUM2_FEAT_CAT1_B_6M	MIN_FEAT_NUM2_FEAT_CAT1_B_6M	AVG_FEAT_NUM2_FEAT_CAT1_B_6M
0	4	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0
1	5	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0
2	6	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	7	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0
4	10	2023-02-01	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

(EN) Here’s an example of saving a file with the created query and directly creating the table in the database:

(PT) Segue um exemplo para salvar um arquivo com a query criada e criar a tabela diretamente no banco de dados:

tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
list_aggregator_value = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
status = True
file_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query = sqlite_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, list_aggregator_value, file_name=file_name, status=status, table_name=table_name, conn=conn)

Complete query creation with table.sql saved file

(EN) To verify the creation, let’s check the database table using the name of the created table:

(PT) Para conferir a criação, vamos consultar o banco de dados com o nome da tabela que foi criada:

query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, conn)
df.head()

	ID	SAFRA_REF	SUM_FEAT_NUM1_FEAT_CAT1_A_3M	MAX_FEAT_NUM1_FEAT_CAT1_A_3M	MIN_FEAT_NUM1_FEAT_CAT1_A_3M	AVG_FEAT_NUM1_FEAT_CAT1_A_3M	SUM_FEAT_NUM2_FEAT_CAT1_A_3M	MAX_FEAT_NUM2_FEAT_CAT1_A_3M	MIN_FEAT_NUM2_FEAT_CAT1_A_3M	AVG_FEAT_NUM2_FEAT_CAT1_A_3M	SUM_FEAT_NUM1_FEAT_CAT1_A_6M	MAX_FEAT_NUM1_FEAT_CAT1_A_6M	MIN_FEAT_NUM1_FEAT_CAT1_A_6M	AVG_FEAT_NUM1_FEAT_CAT1_A_6M	SUM_FEAT_NUM2_FEAT_CAT1_A_6M	MAX_FEAT_NUM2_FEAT_CAT1_A_6M	MIN_FEAT_NUM2_FEAT_CAT1_A_6M	AVG_FEAT_NUM2_FEAT_CAT1_A_6M	SUM_FEAT_NUM1_FEAT_CAT1_B_3M	MAX_FEAT_NUM1_FEAT_CAT1_B_3M	MIN_FEAT_NUM1_FEAT_CAT1_B_3M	AVG_FEAT_NUM1_FEAT_CAT1_B_3M	SUM_FEAT_NUM2_FEAT_CAT1_B_3M	MAX_FEAT_NUM2_FEAT_CAT1_B_3M	MIN_FEAT_NUM2_FEAT_CAT1_B_3M	AVG_FEAT_NUM2_FEAT_CAT1_B_3M	SUM_FEAT_NUM1_FEAT_CAT1_B_6M	MAX_FEAT_NUM1_FEAT_CAT1_B_6M	MIN_FEAT_NUM1_FEAT_CAT1_B_6M	AVG_FEAT_NUM1_FEAT_CAT1_B_6M	SUM_FEAT_NUM2_FEAT_CAT1_B_6M	MAX_FEAT_NUM2_FEAT_CAT1_B_6M	MIN_FEAT_NUM2_FEAT_CAT1_B_6M	AVG_FEAT_NUM2_FEAT_CAT1_B_6M
0	4	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0	74.0	74.0	74.0	74.0	52.0	52.0	52.0	52.0
1	5	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0	79.0	79.0	79.0	79.0	77.0	77.0	77.0	77.0
2	6	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	7	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0	-44.0	-44.0	-44.0	-44.0	0.0	0.0	0.0	0.0
4	10	2023-02-01	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0	11.0	11.0	11.0	11.0	-39.0	-39.0	-39.0	-39.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

2. MySQL

Setting up the connection (Configurando a conexão)

host = "localhost"
user = "sqluser"
password = "password"
database = "mydatabase"

# Conectar ao MySQL
connection = mysql.connector.connect(
    host=host,
    user=user,
    password=password,
    database=database
)

Initial view of the public (Visão inicial do público)

(EN) First, let’s observe the format of the spine table.

(PT) Primeiro vamos observar o formato da tabela spine

df_spine = pd.read_sql("SELECT * FROM tb_spine", connection)
df_spine.head()

/tmp/ipykernel_31659/1385309256.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df_spine = pd.read_sql("SELECT * FROM tb_spine", connection)

	ID	SAFRA_REF	Target
0	4	2023-02-01	1
1	5	2023-02-01	0
2	6	2023-02-01	0
3	7	2023-02-01	0
4	10	2023-02-01	0

Initial view of the variables table (Visão inicial da tabela de variáveis)

(EN) The table tb_feat contains 4 variables, two numeric and two categorical. Note that there are more unique IDs and dates available in this than in the spine table, as is often the case.

df_data = pd.read_sql("SELECT * FROM tb_feat", connection)
df_data.head()

/tmp/ipykernel_31659/548342132.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df_data = pd.read_sql("SELECT * FROM tb_feat", connection)

	ID	SAFRA	FEAT_NUM1	FEAT_NUM2	FEAT_CAT1	FEAT_CAT2
0	1	2023-01-01	73	23	B	B
1	3	2023-01-01	15	1	B	B
2	5	2023-01-01	75	71	A	A
3	7	2023-01-01	73	82	B	C
4	9	2023-01-01	61	8	C	B

Creation of numerical variables (Criação de variáveis numéricas)

(EN) The function mysql_create_query_num() creates a query text for generating variables with the sum, minimum, maximum, and average operations on the variables listed in feat_num_lista and within the time window listed in list_window.

(PT) A função mysql_create_query_num() cria um texto com a query para a criação de variáveis com as operações soma, mínimo, máximo e média das variáveis listadas em feat_num_lista e com a janela de tempo listada em list_window.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
query_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)

Complete query creation with no saved file.

df_num_mysql = pd.read_sql(query_final_num_mysql, connection)
df_num_mysql.head()

/tmp/ipykernel_31659/3114881597.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df_num_mysql = pd.read_sql(query_final_num_mysql, connection)

	ID	SAFRA_REF	FEAT_NUM1_SUM_1M	FEAT_NUM1_MIN_1M	FEAT_NUM1_MAX_1M	FEAT_NUM1_AVG_1M	FEAT_NUM2_SUM_1M	FEAT_NUM2_MIN_1M	FEAT_NUM2_MAX_1M	FEAT_NUM2_AVG_1M	FEAT_NUM1_SUM_2M	FEAT_NUM1_MIN_2M	FEAT_NUM1_MAX_2M	FEAT_NUM1_AVG_2M	FEAT_NUM2_SUM_2M	FEAT_NUM2_MIN_2M	FEAT_NUM2_MAX_2M	FEAT_NUM2_AVG_2M	FEAT_NUM1_SUM_3M	FEAT_NUM1_MIN_3M	FEAT_NUM1_MAX_3M	FEAT_NUM1_AVG_3M	FEAT_NUM2_SUM_3M	FEAT_NUM2_MIN_3M	FEAT_NUM2_MAX_3M	FEAT_NUM2_AVG_3M
0	4	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	5	2023-02-01	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0
2	6	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	7	2023-02-01	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0
4	10	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

(EN) We can use the file_name parameter with a name in the format ‘table.sql’ to save the creation query in a file in the same folder where it is executed.

(PT) Podemos utilizar o parâmetro file_name com um nome no formato “table.sql” para salvar query de criação em um arquivo na mesma pasta que é executado.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_num_mysql = mysql_create_query_num(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)

ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ']
-- Create table with the name nome_da_tabela_criada
-- Criar a tabela nome_da_' at line 1

(EN) To verify the creation, let’s check the database table using the name of the created table:

(PT) Para conferir a criação, vamos consultar o banco de dados com o nome da tabela que foi criada:

query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()

/tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df = pd.read_sql(query, connection)

	ID	SAFRA_REF	FEAT_NUM1_SUM_1M	FEAT_NUM1_MIN_1M	FEAT_NUM1_MAX_1M	FEAT_NUM1_AVG_1M	FEAT_NUM2_SUM_1M	FEAT_NUM2_MIN_1M	FEAT_NUM2_MAX_1M	FEAT_NUM2_AVG_1M	FEAT_NUM1_SUM_2M	FEAT_NUM1_MIN_2M	FEAT_NUM1_MAX_2M	FEAT_NUM1_AVG_2M	FEAT_NUM2_SUM_2M	FEAT_NUM2_MIN_2M	FEAT_NUM2_MAX_2M	FEAT_NUM2_AVG_2M	FEAT_NUM1_SUM_3M	FEAT_NUM1_MIN_3M	FEAT_NUM1_MAX_3M	FEAT_NUM1_AVG_3M	FEAT_NUM2_SUM_3M	FEAT_NUM2_MIN_3M	FEAT_NUM2_MAX_3M	FEAT_NUM2_AVG_3M
0	4	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	5	2023-02-01	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0
2	6	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	7	2023-02-01	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0
4	10	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Creation of categorical variables (Criação de variáveis categóricas)

(EN) The function mysql_create_query_cat() generates a query text for creating variables with the mode of each variable listed in feat_num_lista within the time window provided in list_window.

(PT) A função mysql_create_query_cat() cria um texto com a query para a criação de variáveis com a moda de cada uma das variáveis listadas em feat_num_lista na janela de tempo fornecida em list_window.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]
query_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)

Complete query creation with no saved file.

df_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)
df_cat_sqlite_mysql.head()

/tmp/ipykernel_1661/827211967.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df_cat_sqlite_mysql = pd.read_sql(query_final_cat_mysql, connection)

	ID	SAFRA_REF	FEAT_CAT1_MODA_1M	FEAT_CAT2_MODA_1M	FEAT_CAT1_MODA_2M	FEAT_CAT2_MODA_2M	FEAT_CAT1_MODA_3M	FEAT_CAT2_MODA_3M
0	4	2023-02-01	None	None	None	None	None	None
1	5	2023-02-01	A	A	A	A	A	A
2	6	2023-02-01	None	None	None	None	None	None
3	7	2023-02-01	B	C	B	C	B	C
4	10	2023-02-01	None	None	None	None	None	None

(EN) The function sqlite_create_query_cat() also allows saving a file with the created query and creating the table directly in the database. Here’s an example:

(PT) A função sqlite_create_query_cat() também permite salvar um arquivo com a query criada e criar a tabela diretamente no banco de dados. Segue um exemplo:

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1', 'FEAT_CAT2']
list_window = [1, 3, 6]
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query_final_cat_mysql = mysql_create_query_cat(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name=file_name, status=status, table_name=table_name, conn=connection)

Complete query creation with table.sql saved file

(EN) To verify the creation, let’s check the database table using the name of the created table:

(PT) Para conferir a criação, vamos consultar o banco de dados com o nome da tabela que foi criada:

query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()

/tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df = pd.read_sql(query, connection)

	ID	SAFRA_REF	FEAT_CAT1_MODA_1M	FEAT_CAT2_MODA_1M	FEAT_CAT1_MODA_3M	FEAT_CAT2_MODA_3M	FEAT_CAT1_MODA_6M	FEAT_CAT2_MODA_6M
0	4	2023-02-01	None	None	None	None	None	None
1	5	2023-02-01	A	A	A	A	A	A
2	6	2023-02-01	None	None	None	None	None	None
3	7	2023-02-01	B	C	B	C	B	C
4	10	2023-02-01	None	None	None	None	None	None

Creation of aggregated variables (Criação de variáveis agregadas)

tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'

query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)

Complete query creation with no saved file.

df_mysql_agregada = pd.read_sql(query, connection)
df_mysql_agregada.head()

/tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df_mysql_agregada = pd.read_sql(query, connection)

	ID	SAFRA_REF	SUM_FEAT_NUM1_FEAT_CAT1_A_3M	MAX_FEAT_NUM1_FEAT_CAT1_A_3M	MIN_FEAT_NUM1_FEAT_CAT1_A_3M	AVG_FEAT_NUM1_FEAT_CAT1_A_3M	SUM_FEAT_NUM2_FEAT_CAT1_A_3M	MAX_FEAT_NUM2_FEAT_CAT1_A_3M	MIN_FEAT_NUM2_FEAT_CAT1_A_3M	AVG_FEAT_NUM2_FEAT_CAT1_A_3M	SUM_FEAT_NUM1_FEAT_CAT1_A_6M	MAX_FEAT_NUM1_FEAT_CAT1_A_6M	MIN_FEAT_NUM1_FEAT_CAT1_A_6M	AVG_FEAT_NUM1_FEAT_CAT1_A_6M	SUM_FEAT_NUM2_FEAT_CAT1_A_6M	MAX_FEAT_NUM2_FEAT_CAT1_A_6M	MIN_FEAT_NUM2_FEAT_CAT1_A_6M	AVG_FEAT_NUM2_FEAT_CAT1_A_6M	SUM_FEAT_NUM1_FEAT_CAT1_B_3M	MAX_FEAT_NUM1_FEAT_CAT1_B_3M	MIN_FEAT_NUM1_FEAT_CAT1_B_3M	AVG_FEAT_NUM1_FEAT_CAT1_B_3M	SUM_FEAT_NUM2_FEAT_CAT1_B_3M	MAX_FEAT_NUM2_FEAT_CAT1_B_3M	MIN_FEAT_NUM2_FEAT_CAT1_B_3M	AVG_FEAT_NUM2_FEAT_CAT1_B_3M	SUM_FEAT_NUM1_FEAT_CAT1_B_6M	MAX_FEAT_NUM1_FEAT_CAT1_B_6M	MIN_FEAT_NUM1_FEAT_CAT1_B_6M	AVG_FEAT_NUM1_FEAT_CAT1_B_6M	SUM_FEAT_NUM2_FEAT_CAT1_B_6M	MAX_FEAT_NUM2_FEAT_CAT1_B_6M	MIN_FEAT_NUM2_FEAT_CAT1_B_6M	AVG_FEAT_NUM2_FEAT_CAT1_B_6M
0	4	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	5	2023-02-01	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	6	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	7	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0
4	10	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT2'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'

query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)

Complete query creation with no saved file.

df_mysql_agregada = pd.read_sql(query, connection)
df_mysql_agregada.head()

/tmp/ipykernel_1661/789764728.py:3: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df_mysql_agregada = pd.read_sql(query, connection)

	ID	SAFRA_REF	SUM_FEAT_NUM1_FEAT_CAT2_B_3M	MAX_FEAT_NUM1_FEAT_CAT2_B_3M	MIN_FEAT_NUM1_FEAT_CAT2_B_3M	AVG_FEAT_NUM1_FEAT_CAT2_B_3M	SUM_FEAT_NUM2_FEAT_CAT2_B_3M	MAX_FEAT_NUM2_FEAT_CAT2_B_3M	MIN_FEAT_NUM2_FEAT_CAT2_B_3M	AVG_FEAT_NUM2_FEAT_CAT2_B_3M	SUM_FEAT_NUM1_FEAT_CAT2_B_6M	MAX_FEAT_NUM1_FEAT_CAT2_B_6M	MIN_FEAT_NUM1_FEAT_CAT2_B_6M	AVG_FEAT_NUM1_FEAT_CAT2_B_6M	SUM_FEAT_NUM2_FEAT_CAT2_B_6M	MAX_FEAT_NUM2_FEAT_CAT2_B_6M	MIN_FEAT_NUM2_FEAT_CAT2_B_6M	AVG_FEAT_NUM2_FEAT_CAT2_B_6M	SUM_FEAT_NUM1_FEAT_CAT2_C_3M	MAX_FEAT_NUM1_FEAT_CAT2_C_3M	MIN_FEAT_NUM1_FEAT_CAT2_C_3M	AVG_FEAT_NUM1_FEAT_CAT2_C_3M	SUM_FEAT_NUM2_FEAT_CAT2_C_3M	MAX_FEAT_NUM2_FEAT_CAT2_C_3M	MIN_FEAT_NUM2_FEAT_CAT2_C_3M	AVG_FEAT_NUM2_FEAT_CAT2_C_3M	SUM_FEAT_NUM1_FEAT_CAT2_C_6M	MAX_FEAT_NUM1_FEAT_CAT2_C_6M	MIN_FEAT_NUM1_FEAT_CAT2_C_6M	AVG_FEAT_NUM1_FEAT_CAT2_C_6M	SUM_FEAT_NUM2_FEAT_CAT2_C_6M	MAX_FEAT_NUM2_FEAT_CAT2_C_6M	MIN_FEAT_NUM2_FEAT_CAT2_C_6M	AVG_FEAT_NUM2_FEAT_CAT2_C_6M
0	4	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	5	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	6	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	7	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0
4	10	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

(EN) Here’s an example of saving a file with the created query and directly creating the table in the database:

(PT) Segue um exemplo para salvar um arquivo com a query criada e criar a tabela diretamente no banco de dados:

tb_publico = 'tb_spine'
list_window = [3, 6]
lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['A', 'B']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
status = True
table_name = 'nome_da_tabela_criada'
file_name = 'table.sql'

query = mysql_create_query_agregada(tb_publico, tb_feat, list_window, lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name=file_name, status=status, table_name=table_name, conn=connection)

Complete query creation with table.sql saved file

(EN) To verify the creation, let’s check the database table using the name of the created table:

(PT) Para conferir a criação, vamos consultar o banco de dados com o nome da tabela que foi criada:

query=f"""SELECT  * from {table_name}"""
df = pd.read_sql(query, connection)
df.head()

/tmp/ipykernel_1661/3805114983.py:4: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
  df = pd.read_sql(query, connection)

	ID	SAFRA_REF	SUM_FEAT_NUM1_FEAT_CAT1_A_3M	MAX_FEAT_NUM1_FEAT_CAT1_A_3M	MIN_FEAT_NUM1_FEAT_CAT1_A_3M	AVG_FEAT_NUM1_FEAT_CAT1_A_3M	SUM_FEAT_NUM2_FEAT_CAT1_A_3M	MAX_FEAT_NUM2_FEAT_CAT1_A_3M	MIN_FEAT_NUM2_FEAT_CAT1_A_3M	AVG_FEAT_NUM2_FEAT_CAT1_A_3M	SUM_FEAT_NUM1_FEAT_CAT1_A_6M	MAX_FEAT_NUM1_FEAT_CAT1_A_6M	MIN_FEAT_NUM1_FEAT_CAT1_A_6M	AVG_FEAT_NUM1_FEAT_CAT1_A_6M	SUM_FEAT_NUM2_FEAT_CAT1_A_6M	MAX_FEAT_NUM2_FEAT_CAT1_A_6M	MIN_FEAT_NUM2_FEAT_CAT1_A_6M	AVG_FEAT_NUM2_FEAT_CAT1_A_6M	SUM_FEAT_NUM1_FEAT_CAT1_B_3M	MAX_FEAT_NUM1_FEAT_CAT1_B_3M	MIN_FEAT_NUM1_FEAT_CAT1_B_3M	AVG_FEAT_NUM1_FEAT_CAT1_B_3M	SUM_FEAT_NUM2_FEAT_CAT1_B_3M	MAX_FEAT_NUM2_FEAT_CAT1_B_3M	MIN_FEAT_NUM2_FEAT_CAT1_B_3M	AVG_FEAT_NUM2_FEAT_CAT1_B_3M	SUM_FEAT_NUM1_FEAT_CAT1_B_6M	MAX_FEAT_NUM1_FEAT_CAT1_B_6M	MIN_FEAT_NUM1_FEAT_CAT1_B_6M	AVG_FEAT_NUM1_FEAT_CAT1_B_6M	SUM_FEAT_NUM2_FEAT_CAT1_B_6M	MAX_FEAT_NUM2_FEAT_CAT1_B_6M	MIN_FEAT_NUM2_FEAT_CAT1_B_6M	AVG_FEAT_NUM2_FEAT_CAT1_B_6M
0	4	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	5	2023-02-01	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	75.0	75.0	75.0	75.0	71.0	71.0	71.0	71.0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	6	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	7	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0	73.0	73.0	73.0	73.0	82.0	82.0	82.0	82.0
4	10	2023-02-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

3. Snowflake

Criação de variáveis numéricas

(EN) The function snow_create_query_num() creates a query text for generating variables with the sum, minimum, maximum, and average operations on the variables listed in feat_num_lista and within the time window listed in list_window.

(PT) A função snow_create_query_num() cria um texto com a query para a criação de variáveis com as operações soma, mínimo, máximo e média das variáveis listadas em feat_num_lista e com a janela de tempo listada em list_window.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]

query_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra)

Complete query creation with no saved file.

Podemos salvar a consulta da query em um arquivo também:

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_NUM1','FEAT_NUM2']
list_window = [1,2,3]
file_name = 'table.sql'

query_final_num_snow = snow_create_query_num(tb_publico, tb_feat, list_window,feat_num_lista, id, safra_ref, safra, file_name)

Complete query creation with table.sql saved file

Criação de variáveis categóricas

(EN) The function mysql_create_query_cat() generates a query text for creating variables with the mode of each variable listed in feat_num_lista within the time window provided in list_window.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]

query_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra)

Complete query creation with no saved file.

tb_publico = 'tb_spine'
tb_feat = 'tb_feat'
id = 'ID'
safra_ref = 'SAFRA_REF'
safra = 'SAFRA'
feat_num_lista = ['FEAT_CAT1','FEAT_CAT2']
list_window = [1,2,3]
file_name = 'table.sql'

query_final_cat_snow = snow_create_query_cat(tb_publico, tb_feat, list_window, feat_num_lista, id, safra_ref, safra, file_name)

Complete query creation with table.sql saved file

Criação de variáveis agregadas

lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
janelas = [1, 2, 3]
tb_publico = 'tb_spine'


query_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas,  lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador)

Complete query creation with no saved file.

(EN) We can also save the query to a file:

(PT) Podemos também salvar a consulta em um arquivo:

lista_feat_num = ['FEAT_NUM1', 'FEAT_NUM2']
feat_cat = 'FEAT_CAT1'
lista_valor_agragador = ['B', 'C']
id = 'ID'
safra_ref = 'SAFRA_REF'
tb_feat = 'tb_feat'
safra = 'SAFRA'
janelas = [1, 2, 3]
tb_publico = 'tb_spine'
file_name = 'table.sql'

query_final_cat_agre_snow = snow_create_query_agregada(tb_publico, tb_feat, janelas,  lista_feat_num, id, safra_ref, safra, feat_cat, lista_valor_agragador, file_name)

Complete query creation with table.sql saved file

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.1.1

Jun 16, 2024

0.0.1

Feb 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featsql-0.1.1.tar.gz (32.6 kB view hashes)

Uploaded Jun 16, 2024 Source

Built Distribution

featsql-0.1.1-py3-none-any.whl (22.1 kB view hashes)

Uploaded Jun 16, 2024 Python 3

Hashes for featsql-0.1.1.tar.gz

Hashes for featsql-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`d83ece53fdc9004db9ffb42fd473368afb4a286ff9bcf811b7c104c410c05756`
MD5	`421f41cf93e980833cc29037175e5bc0`
BLAKE2b-256	`940d8e81f47909c4a062d0b097229e215cc3c26be010d55d423aaee1f9cce2da`

Hashes for featsql-0.1.1-py3-none-any.whl

Hashes for featsql-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9df93b161012dd1ec738e5e412484df593a48ce6e0f5a1555a41b24c2c5c7dcf`
MD5	`9d83728da6c3e10ed5793ac57abae49b`
BLAKE2b-256	`4fd0117c037676dd6626ff2ff843860fdb677a91a098335c09df3f3387b0704c`