Implementing machine learning (ML) modeling to predict controller downtime to enhance the efficiency of IT operations.
In retail sector, ensuring the health of POS controllers has become essential for organizations especially those operating at a large-scale to ensure smooth flow of data from the store registers to mainframe computers. Controller downtime can be particularly costly due to disrupted staff productivity and loss of data which is crucial to business growth and revenue.
To mitigate this risk, one of Confiz’s biggest Fortune 10 customers needed a solution to predict POS controller downtime for the smooth and secure transportation of data from source to destination.
Under Confiz’s leadership, a team of highly skilled data scientists, software engineers and certified architects were brought together to propose that ‘Artificial Intelligence for IT Operations (AIOps)’ techniques are crucial to POS controller downtime prediction. The goal was to predict downtime at least one hour in advance, hence giving the IT department enough time to proactively resolve any network issues.
Large data (3-4 GB/day) in the form of event logs was exported to Hive on Hadoop for better performance during ML model training in real time. Multiple algorithms including Support Vector Machine, Logistic Regression and Random Forest Tree were run to increase accuracy in prediction. To develop a sophisticated pipeline for modeling, Kafka for messaging layer was used to ensure scalability and fault tolerance, SparkMLlib was used to enable high performance model serving and Databricks Slack was used to trigger email alerts. The resultant solution successfully predicted the POS controller downtime, hence empowering IT operations.