Skip to main content

Manager for Intelligent Knowledge Access (MIKA)

Project description

MIKA (Manager for Intelligent Knowledge Access) is a toolkit intended to assist design-time risk analysis and safety assurance via advanced natural language processing capabilities.

The full documentation is available at: https://nasa.github.io/mika/

State-of-the-art natural language processing (NLP) techniques enable new ways to access safety-relevant knowledge available in text-based documents. MIKA packages advanced NLP techniques and uses models specially trained for engineering applications to allow engineers to better tap into knowledge available in safety reports, accident reports, incident reports, lessons learned documents, and other engineering docuements.

To this end, the MIKA open-source toolkit has been developed for the following uses:

  1. Enabling rapid exploration of a set of text-based engineering text documents

  2. Analyzing large, unstructured datasets, or exploiting structure in data when it is available (flexibility)

  3. Increasing the value of engineering documents through adding metadata, analyses, and summaries

Key Features

MIKA includes two key capabilties, Knowledge Discovery and Information Retrieval, for exploring text-based repositories. Both use BERT models as a backbone for multiple functions.

Knowledge Discovery (KD) enables the user to extract useful, meaningful information from narrative-based engineering documents. This includes both supervised and unsupervised methods, such as:

  1. A variety of topic modeling methods

  2. Custom named-entity recognition extraction of a Failure Modes and Effect Analysis (FMEA)-style table

  3. The ability to analyze trends in hazards or failures

Information Retrieval (IR) enables the user to search a set of documents and obtain relevant documents or passages according to their query. This includes:

  1. An information retrieval pipeline using a bi-encoder and cross-encoder with options for users to choose from pretrained or custom models

Installation

MIKA is available on PyPI and can be installed with:

pip install nasa-mika

Note that some users have had issues with certain MIKA dependencies, such as HDBSCAN. If you encounter an issue installing a dependency via pip, we recommend first installing the dependency using conda prior to installing MIKA, for example:

conda install -c conda-forge hdbscan
pip install nasa-mika

After installing mika, initialize nltk by running the following in python:

import nltk
nltk.download('words')

Also, download the spacy transformer model by running the following command:

python -m spacy download en_core_web_trf

Now you can import anything in MIKA:

from mika.kd import FMEA
from mika.kd import Topic_Model_plus
from mika.kd.trend_analysis import *
from mika.kd.NER import *
from mika.ir import search

from mika.utils import Data
from mika.utils.SAFECOM import *
from mika.utils.SAFENET import *
from mika.utils.LLIS import *
from mika.utils.ICS import *

The latest version of MIKA is also available via the NASA github page using:

git clone https://github.com/nasa/mika.git

MIKA includes three custom large language models, which can be found on the NASA huggingface at: https://huggingface.co/NASA-AIML

Examples in MIKA use specific datasets which are NOT included in the software distribution, however, they can be easily created by following the instuctions in the documentation at: https://nasa.github.io/mika/data.html

Prerequisites

MIKA uses Python 3 and has been tested on python>=3.8. We recommend installing pytorch via anaconda first and configuring it for GPU use if desired. If installing via pip, all prerequesits are included.

Alternatively, you can manually clone MIKA and install the requirements. MIKA requires the following packages and their dependencies outlined in requirements.txt:

BERTopic
datasets
gensim
matplotlib
nltk
numpy
octis
pandas
pathlib
pingouin
pkg_resources
pyLDAvis
regex
scikit-learn
scipy
seaborn
sentence-transformers
spacy
symspellpy
tomotopy
torch
transformers
wordcloud

These can be installed with pip.

Additional packages that should be downloaded for optional functions include:

graphvis #(to plot hierarchical topic models)
pickle   #(to save results)
jupyter notebook #(to view examples in the repository)

Support

MIKA is considered research code and is under development to refine features, add new capabilities, and improve workflows. Certain functions may change over time. Please contact the contributors if any bugs or issues are present.

Contributors

Hannah Walsh : Semantic Search capability, Custom Information Retrieval capability, Topic Model Plus, Data utility, Documentation

Sequoia Andrade : FMEA capability, custom NER, Trend Analysis, Topic Model Plus, Data utilty, Dataset-specific utilities, Code Review, Documentation

Notices

Copyright © 2023 United States Government as represented by the Administrator of the National Aeronautics and Space Administration. All Rights Reserved.

Disclaimers

No Warranty: THE SUBJECT SOFTWARE IS PROVIDED “AS IS” WITHOUT ANY WARRANTY OF ANY KIND, EITHER EXPRESSED, IMPLIED, OR STATUTORY, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL CONFORM TO SPECIFICATIONS, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR FREEDOM FROM INFRINGEMENT, ANY WARRANTY THAT THE SUBJECT SOFTWARE WILL BE ERROR FREE, OR ANY WARRANTY THAT DOCUMENTATION, IF PROVIDED, WILL CONFORM TO THE SUBJECT SOFTWARE. THIS AGREEMENT DOES NOT, IN ANY MANNER, CONSTITUTE AN ENDORSEMENT BY GOVERNMENT AGENCY OR ANY PRIOR RECIPIENT OF ANY RESULTS, RESULTING DESIGNS, HARDWARE, SOFTWARE PRODUCTS OR ANY OTHER APPLICATIONS RESULTING FROM USE OF THE SUBJECT SOFTWARE. FURTHER, GOVERNMENT AGENCY DISCLAIMS ALL WARRANTIES AND LIABILITIES REGARDING THIRD-PARTY SOFTWARE, IF PRESENT IN THE ORIGINAL SOFTWARE, AND DISTRIBUTES IT “AS IS.”

Waiver and Indemnity: RECIPIENT AGREES TO WAIVE ANY AND ALL CLAIMS AGAINST THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT. IF RECIPIENT’S USE OF THE SUBJECT SOFTWARE RESULTS IN ANY LIABILITIES, DEMANDS, DAMAGES, EXPENSES OR LOSSES ARISING FROM SUCH USE, INCLUDING ANY DAMAGES FROM PRODUCTS BASED ON, OR RESULTING FROM, RECIPIENT’S USE OF THE SUBJECT SOFTWARE, RECIPIENT SHALL INDEMNIFY AND HOLD HARMLESS THE UNITED STATES GOVERNMENT, ITS CONTRACTORS AND SUBCONTRACTORS, AS WELL AS ANY PRIOR RECIPIENT, TO THE EXTENT PERMITTED BY LAW. RECIPIENT’S SOLE REMEDY FOR ANY SUCH MATTER SHALL BE THE IMMEDIATE, UNILATERAL TERMINATION OF THIS AGREEMENT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nasa-mika-1.0.3.tar.gz (81.2 kB view details)

Uploaded Source

File details

Details for the file nasa-mika-1.0.3.tar.gz.

File metadata

  • Download URL: nasa-mika-1.0.3.tar.gz
  • Upload date:
  • Size: 81.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for nasa-mika-1.0.3.tar.gz
Algorithm Hash digest
SHA256 e388a909f2799092efb1e248cdb984eb0bb007c05728fb76a24d541738570169
MD5 fce5afb88070a2a9695b3164e99be52c
BLAKE2b-256 ffcc9dc8bcb4765585a5b9e1edc2107458601e3def2f82bdcd1bfa88e0398e04

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page