IT firms across the globe are partnering with clients in their digital transformation journey by building complex systems, employing disruptive technologies like Artificial Intelligence and Big Data. But how much are these technologies being used in the day-to-day operations run by these same IT firms. In most cases, they continue to take a conservative stand towards the adoption of technology, and follow the traditional reactive and manual approach. Quite a paradox!
IT operations are on the cusp of a major paradigm shift. The status quo is slowly but surely being challenged by the IT operation leaders. These organisations are looking at how they can leverage their knowledge of the AI technologies like Machine Learning, Natural Language Processing, Pattern Recognition and Knowledge Management, to enhance the way IT operations are currently being run. Gartner had taken note of these trends and coined the term Artificial Intelligence for IT Operations or AIOps to describe this shift in approach to IT operations.
AIOps – What it really means?
So, what is AIOps or Artificial Intelligence for IT Operations? Although you might find a multitude of definitions for the term across the internet, let’s try to understand the concept with the definition given by the ones of who coined the term. According to Gartner, Inc., AIOps platforms utilize big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight. AIOps platforms enable the concurrent use of multiple data sources, data collection methods, analytical (real-time and deep) technologies, and presentation technologies.
Fig 1: Gartner’s representation of the AIOps Platform
The basic idea behind AIOps is to use the power of Artificial Intelligence for proactive management of the IT environment, by harnessing the true potential of the data that the IT landscape continuously generates. Traditional approaches to IT operations has always been reactive, with deterministic and rule-based automation. AI, combined with Big Data, can automatically assess, learn and adapt from the hidden insights in the data gathered from the IT environment to deliver intelligent automation. This would vastly enhance the service levels and improve the reliability of the IT environment managed, elevating the level of IT operations currently being delivered.
Why now? – Key Drivers for adoption
- Complexity and Scale of IT landscape. With the growing digitalization, the complexity of enterprise IT landscape is also growing. Ever evolving network, virtualization, wide array of devices, unpredictable security threats and a multitude of tools used to manage these systems, all of this makes traditional approach to IT operations ineffective.
- Business aligned IT delivery: Customer experience is the top priority for today’s business and no enterprise can afford any service or application downtime. A slight delay in fixing the problem could lead to losing a customer, translating to revenue loss. So, enterprises are becoming more and more demanding not only about the effectiveness, but also the swiftness of IT operations.
- Exponential growth in data: Today’s enterprise IT landscape is cluttered with a wide variety of tools, generating a deluge of data. Hidden within this data are invaluable insights about the behaviour of the network and systems connected to it. Removing the chaff and identifying patterns would lead to valuable insights for managing the IT environment.
- Siloed IT operations: Operation teams tend to work in silos and as a result, don’t have the necessary visibility on how systems are correlated and how problems propagate across. When an issue is reported, each of these teams perform independent RCA, thus wasting invaluable time. A holistic view of the landscape with visibility on how systems are correlated and how the data flows across network, would help in faster and more effective RCA.
- Technology evolution: As much as it’s contributing to adding complexity, the technology evolution is also making it possible to innovatively handle the challenges it throws. Machine Learning and NLP techniques have evolved and are becoming a common place with open sources tools and libraries easily being available. Teams can experiment and quickly build innovative solutions addressing specific operational areas or build full-fledged platforms to manage enterprise IT.
AIOps Platform – How it works?
A typical AIOps platform would have the following key functional layers:
Fig 2: AIOps Platform Architecture
The very foundation of the AIOps platform is data, and so the platform should seamlessly connect to different data sources in the IT landscape and ingest the data that they generate. In a typical enterprise IT environment, the data sources would include the monitoring tools, the application logs, the ITSM tools, the collaboration and social media platforms, etc. These data sources are continuously generating voluminous data, in both structured and unstructured form, and hidden within this data are insights, which would help in the proactive management of the IT environment. So, one of the key elements of AIOps is a Big Data platform for managing the data that is being ingested.
The data layer acts as the eyes of the AIOps platform, continuously observing its surroundings. The AI layer, the brain of the platform, uses the data that is ingested and tries to extract actionable insights out of it. The Machine Learning algorithms access both the historical data, as well as real-time streaming data. The real differentiation that the AIOps platform brings is its ability to continuously learn and optimize its functioning, based on real-time data. This helps in proactive management of IT environment, an evolution from the erstwhile ITOA platforms.
So, what are some of the key capabilities that machine learning offer?
Fig 3: Key Capabilities that Machine Learning offers in AIOps Platform
- Correlation-based RCA: With unified access to all the data sources, Machine Learning can easily draw correlations across the infrastructure layer to the application layer, right up to the level of the business transactions. It, in fact, goes beyond just identifying the correlation and establishes the causation (cause-effect relationship). This helps in a more comprehensive RCA of issues and brings down the MTTR and MTTD drastically.
- Prediction-based on Pattern Analysis: Machine Learning can identify patterns within the historical data and can make predictions whenever it observes similar patterns in the real-time data that it continuously monitors. This helps the platform to predict possible events, which could affect the IT environment. Apart from preventive maintenance, capacity planning and risk assessment of changes to the landscape, are other applications of the prediction capability.
- Anomaly Detection: Based on the data captured by the platform, Machine Learning can establish the normal state behaviour of the IT environment. It would then be able to flag any deviation from this baselined behaviour, as an anomaly and call it to the attention of the operations team. The Machine Learning can also adapt to any changes in the IT environment, identify the new normal and recalibrate the baseline.
In addition, the platform uses Natural Language Processing or NLP for making sense of unstructured data, which gets captured in IT systems that interface with human users, primarily the ITMS tools and collaboration platforms. Machine Learning, combined with NLP capabilities, helps to deliver enhanced level of end-to-end automation.
Finally, there is the visualization layer. The visualization layer offers the interface through which the IT operations team can interact with the platform and employ all the capabilities on offer, for the day- to-day operations. It presents data in the most discernible way so that there is minimum wastage of time and effort for data analysis.
Buzzword or the Next Real Thing
Adoption of AI for IT operations is the natural and expected progression from the existing approach. Although it has arrived surprisingly late, it is here to stay. Large enterprises have started to embrace the fact that digital transformation will not be 100% complete if IT operations continue to use the traditional manual approach. So, it’s no surprise when Gartner predicts that by 2022, the adoption rate of AI-based approach to IT Operations, among large enterprises will go up to 40%, as opposed to the current 5% level. IT operation leaders are taking heed of the growing demand for AIOps platforms and are responding enthusiastically by bringing out tools with a broad array of AIOps capabilities, into the market. Some of the major players in the space include IBM, HPE, Moogsoft, BMC and Elastic. Analysts and research firms are also taking note of these trends. In fact, Gartner had released the Market Guide for AIOps Platforms, a comprehensive guide to current state of AIOps.
Enterprises and IT teams should however tread with caution when it comes to AIOps adoption. A big bang approach through a blind adoption of a popular AIOps platform available in the market is not the answer. Instead, they need to do a careful assessment of the IT landscape and develop a strategy around AIOps. A full-fledged AIOps platform might be the answer for a large enterprise, but for smaller organization a much more customized implementation of specific AIOps capabilities would be ideal. Nevertheless, for organizations to remain competitive and meaningful in today’s world, AIOps adoption is inevitable. The innovators and early adopters are already reaping the benefits. It’s not going to take much longer for AIOps to be the new norm in ITOM.