Troubleshooting & monitoring cloud applications

The challenge

Modern cloud applications are typically highly scalable and rely on modern cloud architectures to create and efficiently deliver value to it’s users. However, the inherent complexity of these applications create challenges when Dev and Ops are trying to troubleshoot or monitor cloud applications. Cloud applications tend to have:

  1. A high number of internal services components  and dependencies.
  2. Asynchronous processes that are crunching data to be consumed later upon user request (typically > 80% of the code).

As a result of these factors, over time – no single user knows the system end-to-end and every incident requires many hours of cross team collaboration. Most existing monitoring solutions are unable present the user with a big-picture of how the system is working. Instead, existing tools use a bottom-up approach and provide massive amount of raw metrics but no guidance on how to use them. It requires an expert to decide which of metrics are related and how to use them.

DevOps teams find that troubleshooting or deploying new features (e.g. a new internal service) becomes slower and more expensive. They increasingly spend more time firefighting instead of building new functionality.

 

You should be able to answer questions such as

  • What are the relationships and dependencies between my microservices / services?
  • Can I visualize those dependencies
  • Is my map of dependencies continually updating automatically?
  • Can I map relationships that are asynchronous in nature (e.g. through messaging queues)?
  • Can I quickly find metrics that are correlated?
  • Do I have a way to find or compare correlated metrics between two services (e.g. request latency in service “A” with CPU of service “C”)?
  • Can I find or compare related metrics between related components that but are Nth degrees away?
  • Can I quickly understand the values of the important service(s) KPIs without having to dive to the level of individual instances?
  • In cloud environment do I have a way of tracking the KPIs even if the components of my services are scaling or even replaced over time?
  • Do I have architecture documentation that is truly up to date (up to 15 minutes ago)?

 

Solution

Using ITculate, all this information is available to you. Using simple and clean visualizations of the architecture and dependencies. ITculate identifies all the components and how they interconnect.  Start from the level of your applications, traverse the relationships between and within services. Stop at the layer that is relevant to your issue and use the rolled up metrics from the underlying dynamic layers. Use the automatically provided KPIs per component or add / remove metrics as needed. ITculate’s meta-information engine knows the metrics units and places metrics to compare on the same chart. With ITculate, you do not need to gather the experts around a polycom. All team members have expert level information that is up-to-date to the minute. Empower both Dev and Ops to do more and be on the same page

ITculate discover automatically the relationships between components. Specialized algorithms project these relationships and identifies clusters, groups and services. These algorithms also find higher level relationships (e.g. between services) using the lower level ones. In addition the system can leverage predefined environment Tags or the ITculate REST SDK. Groups and services are displayed as collapsible elements to enable the user to manage the context.

ITculate also helps DevOps teams to be more proactive. Whenever a component or a service metric becomes abnormal, a detailed message is provided with detailed description of the issues, related charts and suggested remediation actions. This enables DevOps to address impending issues before they materialize.

 

Back to use cases

Start a 14 day free trial today
START MY FREE TRIAL