Data Analytics

About

At MKOPSC, extensive research has been conducted to deal with ‘Big Data’ using ‘Machine Learning’ and ‘Artificial Intelligence’ for application in the field of process safety. ‘Big Data’ collected in databases cannot be processed altogether by a human due to the immense amount of data and information they contain. Two major fields of this research track have been towards identification of weak signals in an operating facility and for learning from past incident records.

Identification of weak signals through Big Data analytics

Vast amount of data collected in chemical facilities from equipment and from other information sources such as those related operator training, competencies, hazard communication, contractor records, etc. are analyzed. Deficiencies in certain domains or entities can culminate to a major issue that can lead to a large-scale disaster, yet, due to the large amount of the data, it is not always possible identify these deficiencies proactively. At MKOPSC, a systems approach is being taken to analyze such Big Data for hazard identification and risk analysis.

System-based approaches such as Functional Resonance Analysis Model (FRAM) can model complex interactions of system variables and their performance variabilities that have the potential to develop a hazardous scenario. However, they are heavily based on qualitative analysis and expert elicitations. To overcome the limitations a FRAM-based framework that integrates a human performance model, an equipment performance model, and a first-principle based chemical process model into a hybrid simulator has been developed, which will be able to aid hazard analysis in the process industries. The simulator is capable of simulating the performance variabilities of the functions through the aggregation of mathematical models within a complex system, which can be used to simulate potential hazard situations and identify the corresponding interactions. The ultimate target is to allow operators to identify the weak signals that indicate a process safety incident may be imminent through analysis of their data.

Artificial Intelligence for learning from incidents:

The key to learning from the past incidents is to identify the underlying causes of the incidents. Large databases exist that have been developed by regulatory bodies gathering vast information from incident through mandatory reporting requirements. Examples of such databases include those developed by PHMSA, BSEE etc. Databases like these can form a key learning resource, if properly utilized. When an incident occurs, operators report the incident by following a set of guidelines outlined by the regulatory body. Many of these guidelines have pre-defined causes and sub-causes that may or may not reflect the underlying causes behind the incidents. One way to extract underlying causes is to utilize the narrative sections in the incident report. However, the vast amount and unstructured nature of the narrative comments impede generating insights on incidents. At the MKOPSC, research is underway to applying natural language processing (NLP) and text mining techniques to utilize the resource for understanding clustering patterns and causations behind the incidents. Two methods of text analytics, K-means clustering and co-occurrence network, have been employed to infer latent causality of incidents. The co-occurrence network approach the hierarchy of causations on a certain type of incidents can be identified, while K-means clustering can indicate general correlations among the various factors that lead to an incident. Application of these artificial intelligence techniques will enable detailed information from the large databases to be extracted and utilized to ensure retainment of knowledge from past incidents.

Artificial Neural Network for incident prediction:

To predict cause and consequences of corrosion in pipelines carrying hazardous liquids, an artificial neural network (ANN) has been built using incidents data collected by the Pipeline Hazardous Material Safety Administration (PHMSA) of the US Department of Transportation. From the incidents recorded between 2010 and 2019, 70 attributes have been selected for their ability to predict corrosion. Using selected attributes as input to the ANN model, the model is constructed and optimized for its hyperparameters. The ANN predicts the type of corrosion, total cost of property damage, net material loss and type of incident (rupture/release) with 60-90% accuracy. In order to establish credibility of developed ANN model, the model accuracy obtained using ANN model is compared against another machine learning model.