Proactive Fault Tolerance Through Cloud Failure Prediction Using Machine Learning

Authors

Keywords:

Cloud Computing, Fault Prediction, KNN, Machine Learning, SVM

Abstract

One of the crucial aspects of cloud infrastructure is fault tolerance, and its primary responsibility is to address the situations that arise when different architectural parts fail. A sizeable cloud data center must deliver high service dependability and availability while minimizing failure incidence. However, modern large cloud data centers continue to have significant failure rates owing to a variety of factors, including hardware and software faults, which often lead to task and job failures. To reduce unexpected loss, it is critical to forecast task or job failures with high accuracy before they occur. This research examines the performance of four machine learning (ML) algorithms for forecasting failure in a real-time cloud environment to increase system availability using real-time data gathered from the Google Cluster Workload Traces 2019. We applied four distinct supervised machine learning algorithms are logistic regression, KNN, SVM, decision tree, and logistic regression classifiers. Confusion matrices as well as ROC curves were used to assess the reliability and robustness of each algorithm. This study will assist cloud service providers developing a robust fault tolerance design by optimizing device selection, consequently boosting system availability and eliminating unexpected system downtime.

cloud failure prediction

Downloads

Published

2020-11-20

How to Cite

Bandari, V. . (2020). Proactive Fault Tolerance Through Cloud Failure Prediction Using Machine Learning. ResearchBerg Review of Science and Technology, 3(1), 51–65. Retrieved from https://researchberg.com/index.php/rrst/article/view/54