Skip to main content

Full workflow for ETL, statistics, and Machine learning modelling of (usually) time-stamped industrial facilities data.

Project description

Industrial Data Science Workflow

Industrial Data Science Workflow: full workflow for ETL, statistics, and Machine learning modelling of (usually) time-stamped industrial facilities data.

Not only applicable to monitoring quality and industrial facilities systems, the package can be applied to data manipulation, characterization and modelling of different numeric and categorical datasets to boost your work and replace tradicional tools like SAS, Minitab and Statistica software.

Authors:

  1. Open the terminal and:

Run:

git clone "https://github.com/marcosoares-92/IndustrialDataScienceWorkflow" 

to clone all the files (you could also fork them).

  1. Go to the directory called idsw.
  2. Now, open the Python terminal and:

Navigate to the idsw folder to run:

pip install .
  • You can use command cd "...\idsw", providing the full idsw path to navigate to it. Alternatively, run pip install ".\*.tar.gz" in the folder terminal.

After cloning the directory, you can also run the package without installing it:

  1. Copy the whole idsw folder to the working directory where your python or jupyter notebook file is saved.
  • There must be an idsw folder on the python file directory.
  1. In your Python file:

Run the command or run a cell (Jupyter notebook) with:

from idsw import *

for importing all idsw functions without the alias idsw; or:

import idsw

to import the package with the alias idsw.

Alternatively, if you do not want to clone the repository, you may download the file load.py and copy it to the working directory.

  1. After downloading load.py and copying it to the working directory, in your Python environment, run:

    import load

  2. After conclusion of this step, you may import the package as:

    from idsw import *

or as:

import idsw

The load.py file runs the following code, which may be copied to your Python environment and run:

class LoadIDSW:

  def __init__(self, timeout = 60):  
    self.cmd_line1 = """git clone https://github.com/marcosoares-92/IndustrialDataScienceWorkflow IndustrialDataScienceWorkflow"""
    self.msg1 = "Cloning IndustrialDataScienceWorkflow to working directory."
    self.cmd_line2 = """mv IndustrialDataScienceWorkflow/idsw ."""
    self.msg2 = "Subdirectory 'idsw' moved to root directory. Now it can be directly imported."
    self.timeout = timeout

  def set_process (self, cmd_line):
    from subprocess import Popen, PIPE, TimeoutExpired
    proc = Popen(cmd_line.split(" "), stdout = PIPE, stderr = PIPE)
    return proc

  def run_process (self, proc, msg = ''):
    try:
        output, error = proc.communicate(timeout = self.timeout)
        if len(msg > 0):
          print (msg)
    except:
        output, error = proc.communicate()       
    return output, error

  def clone_repo(self):
    self.proc1 = self.set_process (self.cmd_line1)
    self.output1, self.error1 = self.run_process(self.proc1, self.msg1)
    return self

  def move_pkg(self):
    self.proc2 = self.set_process (self.cmd_line2)
    self.output2, self.error2 = self.run_process(self.proc2, self.msg2)
    return self

  def move_pkg_alternative(self):
    import shutil
    source = 'IndustrialDataScienceWorkflow/idsw'  
    destination = '.'
    dest = shutil.move(source, destination)    
    return self

loader = LoadIDSW(timeout = 60)
loader = loader.clone_repo()
loader = loader.move_pkg()

try:
  from idsw import *
except ModuleNotFoundError:
  loader = loader.move_pkg_alternative()

msg = """Package copied to the working directory.
	To import its whole content, run:
	
	    from idsw import *
	"""
print(msg)

History

1.2.0

Fixed

  • Deprecated structures

Added

  • New functionalities added.

Reshape of project design.

  • New division into modules and new names for functions and classes.

Removed

  • Removed support for Python < 3.7

1.2.1

Fixed

  • Setup issues.

1.2.2

Fixed

  • Setup issues: need for rigid and specific versions of the libraries.

1.2.3

Fixed

  • Setup issues.

1.2.4

Fixed

  • Import bugs.
  • Introduced function for Excel writing.

1.2.5

Fixed

  • Matplotlib export figures bugs.

  • 'quality' argument is no longer supported by plt.savefig function (Matplotlib), so it was removed.

  • This modification was needed for allowing the correct functioning of the steelindustrysimulator, which is based on idsw.

  • Check simulator project on: https://github.com/marcosoares-92/steelindustrysimulator

    • The Ideal Tool for Process Improvement, and Data Collection, Analyzing and Modelling Training.

1.2.6

Fixed

  • Export of figures generated a message like with '{new_file_path}.png.png'. Fixed to '{new_file_path}.png'.

1.3.0

Added

  • New functionalities added.

Reshape of project design.

  • New division of functions and classes and correspondent modules.
  • Refactoring of functions and classes to improve code efficiency.
  • Added new pipelines for fetching data and modified the storage of connectors.
  • It includes pipelines for fetching table regions in Excel files, even if they are stored in a same tab; and a pipeline for downloading files stored in MS SharePoint.
  • Added ControlVars dataclass to store if the user wants to hide results and plots.

1.3.1

Improved

  • Benford algorith for fraud and outlier detection.
  • Pipeline for fetching SharePoint and downloading files.

1.4

  • Module datafetch.texts added: functions based on LangChain for extracting texts from PDFs, DOCX, CSV, HTML and TXT and creating a text database.
  • Several bugs fixed, specially in functions based on Pandas and NumPy deprecated structures.

1.4.1

  • Bug fixed in linear regression function.
  • Bug fixed: Python ~ operator deprecated, and replaced by "not".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

idsw-1.4.1.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

idsw-1.4.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file idsw-1.4.1.tar.gz.

File metadata

  • Download URL: idsw-1.4.1.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for idsw-1.4.1.tar.gz
Algorithm Hash digest
SHA256 1e2ca5531767bec7f1616fc6367c9ef9b58f406e025007acca9c017045703d82
MD5 2af399ac15d1b72af25faa2877fcee33
BLAKE2b-256 6b380d30dfd7a50d031bc249990c977fbd84080331be34e039751fd4e219a16d

See more details on using hashes here.

File details

Details for the file idsw-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: idsw-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for idsw-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 27c9fad37af26038e9bb4e637fd77742516ec6f0def562a608824dca26654482
MD5 a706201965749c8453f64937d1a70713
BLAKE2b-256 c311ea6cecdbf47d9206db77fcb3d2979a25506300ceda8c6f381f7706037f0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page