Skip to main content

A data analysis focused language built with Python

Project description

Extensive Language for Data Analysis (ELDA)

About ELDA

Vision / Purpose

Extensive Language for Data Analysis is born with the vision of providing a tool for data analysis that results from a commitment in equal parts between the learning of a programming language focused on data analysis, and the efficient handling of data for processing in useful calculations for anyone with context in data science. All this with the purpose that the language serves as a facilitator for users without technological context in the adoption of analysis tools, thus being a bridge between people with mathematical knowledge but without programming knowledge to the use of more robust data analysis and manipulation tools.

As a result of this, two main conclusions can be drawn. First, it is not expected that language users have prior knowledge about programming of any kind, since it is expected that this knowledge can be obtained through the use of it. And second, once the basic knowledge about the programming oriented to the data analysis is achieved, calculations can be made with enough depth for the language to be considered useful in the prototyping of more robust solutions.

Main objective

The objective of E.L.D.A is to be a facilitator for people with no programming experience who wish to do basic statistical calculations such as: mean, variance, standard deviation, clustering, data classification, graph plotting, etc. It aims to shorten the gap between people with and without experience in technology such as graduates and students.   ELDA belongs to the same category of languages as R, MATLAB and Octave, which allow matrix manipulation and data mapping. The difference between these languages that are more established and our project is that ELDA does not intend to do everything that these languages do; Analysis of time series, for example, are things that are beyond the scope of this compiler. The main purpose of our compiler is to be the first step in the transition to use these languages.

Quick Reference Manual

Installation

Since ELDA is a language completely developed over python, it is distributed as a pyhton package and hence available through the package manager PIP.

To install ELDA, simply run:

pip install elda

This will install all required packages and make the language compiler and virtual machine available through a command line interface. This command has 2 parts: The compilation and the execution, to compile a program run:

elda -c <file>

This will generate an object file with the .eo extension and the same name as your program. To execute it, simply run:

elda -e <compiled_file>

NOTE: Depending on your existing python setup, the first time that elda is called may take some time to execute. This is due to matplotlib (which is used under the hood) caching some files needed to execute and will only happen once.

Language Examples

An ELDA program is heavily structured, with declarations, statements and returns all following a set order. For example, to create a recursive fibonacci program, the return value should be stored and return should be called only once at the end of the function, like so:

int fibonacci(int x) {
    int return_value;
    if (x == 0) {
        return_value = 0;
    }
    if (x == 1) {
        return_value = 1;
    }
    if (x != 0 and x != 1) {
        return_value = fibonacci(x-1) + fibonacci(x-2);
    }
    return return_value;
}

void main() {
    out(fibonacci(10));
}

Array handling

To declare arrays with ELDA, you must add the size of the array after the type declaration but before the variable id. For example, a simple program that handles matrixes:

int[5] vectorA = [1,2,3,4,5];
int[5] vectorB = [5,4,3,2,1];

int[5][5] matrixA = [[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]];
int[5][5] matrixB = [[1,2,3,4,5],[6,7,8,9,10],[11,12,13,14,15],[16,17,18,19,20],[21,22,23,24,25]];

int[5][5] mult;
int[5][5] trans;
int[5] sum;

void print_matrix(int w) {
    string row = "";
    for i with range(0, 5) {
        for j with range(0, 5) {
            if (w == 1) {
                row = row + mult[i][j] + " ";
            } else {
                row = row + trans[i][j] + " ";
            }
        }
        out(row);
        row = "";
    }
}

void print_vector() {
    string row = "";
    for i with range(0, 5) {
        row = row + sum[i] + " ";
    }
    out(row);
    row = "";
}

void multiply_matrixes() {
    for i with range(0, 5) {
        for j with range(0, 5) {
            for k with range(0, 5) {
                mult[i][j] = mult[i][j] + matrixA[i][k] * matrixB[k][j];
            }
        }
    }
    print_matrix(1);
}

void sum_vectors() {
    for i with range(0, 5) {
        sum[i] = vectorA[i] + vectorB[i];
    }
    print_vector();
}

void transpose_matrix() {
    for i with range(0, 5) {
        for j with range(0, 5) {
            trans[j][i] = matrixA[i][j];
        }
    }
    print_matrix(2);
}

void main() {
    out("Matrix multiplication of 5x5");
    multiply_matrixes();

    out("Vector sum");
    sum_vectors();

    out("Transposed matrix of matrixA");
    transpose_matrix();
}

Lastly, ELDA comes with some analysis functions out of the box. This functions can be used by simply calling them from anywhere inside the program, and since they are part of the language, no module or import is required.

int[50] x = [4, 5, 7, 8, 11, 14, 16, 18, 19, 20, 25, 27, 28, 33, 34, 35, 37, 38, 41, 43, 44, 45, 48, 49, 50, 52, 53, 55, 56, 58, 63, 64, 66, 67, 71, 73, 74, 76, 79, 81, 83, 84, 85, 86, 87, 90, 92, 94, 96, 100];
int[50] y = [2, 4, 5, 6, 10, 11, 12, 13, 15, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 34, 35, 37, 40, 41, 45, 46, 50, 51, 52, 54, 57, 59, 60, 61, 64, 65, 66, 69, 70, 78, 82, 83, 85, 86, 91, 92, 95, 97, 98, 99];

int[50] reg_x;
int[50] reg_y;

int[50][2] xy;

void display_data(int v) {
	if (v == 1) {
		out("Data for x");
		out("Min of x: " + min(x));
		out("Max of x: " + max(x));
		out("Mean of x: " + mean(x));
		out("Median of x: " + median(x));
		out("Std deviation of x: " + std(x));
		out("Variance of x: " + var(x));
	} else {
		out("Data for y");
		out("Min of y: " + min(y));
		out("Max of y: " + max(y));
		out("Mean of y: " + mean(y));
		out("Median of y: " + median(y));
		out("Std deviation of y: " + std(y));
		out("Variance of y: " + var(y));
	}
}

void compute_linear_regression() {
	float[2] reg_params = linear_regression(x, y);
	int val_x = 2;
	string res = "Linear function: " + reg_params[0];
	res = res + "x + " + reg_params[1];

	out(res);
	for i with range(0, 50) {
		reg_x[i] = val_x;
		reg_y[i] = reg_params[0] * val_x + reg_params[1];
		val_x = val_x + 2;
	}
	graph(reg_x, reg_y, "plot");
}

void compute_logistic_regression() {
	float[2] reg_params = logistic_regression(x, y);

	out("Logistic Regression parameters: ");
	out(reg_params[0]);
	out(reg_params[1]);
}

void populate_xy() {
	for i with range(0, size(x)) {
		xy[i][0] = x[i];
		xy[i][1] = y[i];
	}
}

void compute_kmeans() {
	float[2][2] centers = k_means(2, xy);
	float[2] val_x;
	float[2] val_y;

	val_x[0] = centers[0][0];
	val_y[0] = centers[0][1];
	val_x[1] = centers[1][0];
	val_y[1] = centers[1][1];

	out("First center at: ");
	out("X: " + val_x[0]);
	out("Y: " + val_y[0]);

	out("Second center at: ");
	out("X: " + val_x[1]);
	out("Y: " + val_y[1]);

	graph(val_x, val_y, "scatter");
}

void main() {
	display_data(1);
	out("----------------------");
	display_data(2);

	graph(x, y, "scatter");

	compute_linear_regression();
	compute_logistic_regression();

	populate_xy();
	compute_kmeans();
}

Available Special Functions

  • mean(arr) - Compute the mean of an array of data

    • arr: a one dimensional array.
    • returns: a float with the mean of the array.
  • min(arr) - Get the minimum value on an array

    • arr: a one dimensional array.
    • returns: a float with the minimum value of the array.
  • max(arr) - Get the maximum value on an array

    • arr: a one dimensional array.
    • returns: a float with the maximum value of the array.
  • median(arr) - Get the median value on an array

    • arr: a one dimensional array.
    • returns: a float with the median value of the array.
  • var(arr) - Get the variance value on an array

    • arr: a one dimensional array.
    • returns: a float with the variance value of the array.
  • std(arr) - Get the standard deviation value on an array

    • arr: a one dimensional array.
    • returns: a float with the standard deviation value of the array.
  • linear_regression(arr_x, arr_y) - Get the linear regression parameters given two arrays representing x values and y values.

    • arr_x: a one dimensional array.
    • arr_y: a one dimensional array.
    • returns: an array with the regression parameters.
  • logistic_regression(arr_x, arr_y) - Get the logistic regression parameters given two arrays representing x values and y values.

    • arr_x: a one dimensional array.
    • arr_y: a one dimensional array.
    • returns: an array with the regression parameters.
  • k_means(k, arr_xy) - Get the k cluster centers given one array representing x and y value pairs.

    • k: an int representing the number of clusters.
    • arr_xy: a two dimensional array represnting x and y value pairs. Ex: [[1, 2], [2, 3], [4, 5]].
    • returns: an array with the cluster centers c and y coordinates.
  • size(arr) - Get the size of an array

    • arr: a one dimensional or two dimensional array.
    • returns: an int with the size of the array.
  • type(var) - Get the type of the variable as a string

    • arr: a variable, cannot be an array.
    • returns: a string with the type name.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elda-0.1.0.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elda-0.1.0-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file elda-0.1.0.tar.gz.

File metadata

  • Download URL: elda-0.1.0.tar.gz
  • Upload date:
  • Size: 34.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for elda-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d26ac7a66aa476900b1e438981af679eb681a0b6e73fb3baa3a1bfe5295a6c2d
MD5 4e1bcd4b4326b8ac1479db1c296e107e
BLAKE2b-256 262dd88a26474ac0f11b5aa7a7d92b1fb7aaa5d1b3c0ff4c93a24061f504cfdd

See more details on using hashes here.

File details

Details for the file elda-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: elda-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.5

File hashes

Hashes for elda-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 abbd49ccbbf3ac54c41afc23409b21e85e8b44ebf0245af3486e75b25ee4f244
MD5 21dbad2fd30d41ba943b51ca85fcec84
BLAKE2b-256 dee0e598cddc88ddb0710b5bcdbb7ba6ce639a2aac0fd4fe33499eaeb380b502

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page