Schedule Demo
 
 
Case Study

World's largest cruise ship operator uses Applications Manager to ensure business uptime

Introduction

The traditional application delivery chain focused on hosting the application and databases on servers in data centers, which in itself presented more than it's own fair share of challenges concerning application performance monitoring. If monitoring the old school data center wasn't challenging enough, imagine a floating data center. Now imagine 24 of them. And for good measure, imagine having all of them communicate with each other and with a central server. Having to monitor such an intricately networked IT setup was the predicament in which Carnival Cruise Lines found itself. Enter Applications Manager. Pedro Esteban, senior analyst (IT), has been with the company for over a decade now. He had been on the lookout for a solid application monitoring tool since 2005. Here he tells us about life before and after Applications Manager.

Carnival's IT Environment

Carnival Cruise Lines (Carnival Corp.) is headquartered in Miami, Florida and currently operates 24 ships, with each ship behaving like a data center. The Miami HQ houses the main data center and is also the main BCP site. Carnival also hosts and operates one admin server, several managed servers, and its main website, Carnival.com, out of its Miami headquarters. Applications Manager is installed on all of its servers and aboard each of its ships.

Carnival's IT Challenge

Each ship has a ManageEngine server, which sends alerts to the alerts manager so that the on-board IT person can respond to issues and take action immediately. Because the admin server is at the Miami HQ, the ships connect and communicate to HQ – and to each other – via satellite link. The problem is that communication is not always up. When ships are changing direction or docking at the port, for instance, they sometimesget disconnected. “But at HQ...we need to be aware of everything that's going on...so [the] moment [the] connection is re – established, we can check historical data on downtime duration of any particular server,” Pedro tells us.

Why Choose Applications Manager?

“An environment monitoring tool needs to be 24/7, if something is down, we need to know in under five minutes, in real time," says Pedro, because downtime of any app for even a few minutes could potentially mean loss of crucial business.

Prior to Applications Manager, Pedro had been trying to land the right APM solution for over six years. “Some apps are tailor made for the cruise industry, others are generic,” Pedro says about Carnival's mix of business–critical apps, another factor in selecting the right monitoring solution, as not all APM solutions can monitor all apps.

  • Back in 2005, Carnival evaluated Microsoft Operations Manager (MOM). "Every few days,” Pedro recounts, “[MOM's] SQL database would go down and we'd have to open a case," deeming MOM unreliable.
  • Carnival scaled up to Microsoft's System Center Operations Manager (SCOM), but that changed the operations team's view from environment monitoring to point technology monitoring; so the team lost a great deal of visibility. According to Pedro, "The problem with SCOM was that an alert added today could not be tested till the next day" – a huge bottleneck for a team running a 24/7 operation.
  • Carnival next deployed EMC's SMARTS, but that tool prioritized network monitoring and had limited environmental monitoring capabilities. Carnival, however, needed a balance of environmental and network monitoring, so SMARTS was a poor fit.
  • Next up was HP's SiteScope. According to Pedro, “[With SiteScope], each probe is a license. [And hence] we ran out of licenses very fast.”

At long last, Pedro “stumbled upon” Applications Manager. "Someone fed up with SCOM was running a trial [of Applications Manager],” he said. “I saw it and then I realized I could make it work.”Applications Manager gave Pedro's team the comfort of being able to monitor the entire spectrum of Carnival's IT setup from a server to a service, a process, or something as basic as disk space – while remaining quick to implement and easy to work with. Let's see how Applications Manager works for Carnival Cruise Lines:

Complete visibility with historical data

Carnival HQ needs to have eyes on the horizon at all times. When a satellite link goes down, Pedro's team at HQ is in the dark about everything happening aboard a ship. Between then and when the link is restored with the mainland, Applications Manager fills in the gaps by continuing to record on–board performance stats. Pedro's team can pull up historical data and stay in the know about what happened during the outage, so there are no blind spots.

Root cause analysis

Carnival's IT used to be set up like a row of dominoes – one rogue element would bring down several functions simultaneously, be it on board a ship that's out at sea or the inoffice IT at mainland HQ. For example, an application with a runaway log or a runaway process writing things indiscriminately would bring down services, causing application crashes resulting in business downtime. Protocol kicks into action, i.e., someone calls the help desk, creates a P1 ticket, and only then Pedro and team enter the fray. Tickets on the ships get acted on by the on–board IT admin, whereas Pedro's immediate team members respond to tickets for the HQ office. Upon investigation, all there is to be found, at least nine times out of ten, is that the application service is down.

The ability to restart a service, Pedro says, has helped them a lot. And how quickly Applications Manager accomplishes this task is what really stands out. "The good thing is that it is so fast, the business doesn't know,” Pedro notes. “We don't really have noticeable business downtime [costing] us money!"

Robust reporting establishes cruise control

Applications Manager equips Pedro and his team to analyze enormous amounts of historical data. Carnival stores up to a year of historical data and relies on trend reports, graphs, and easily understandable vital statistics reports. The actionable intelligence helps make sense of all that data and helps report to management.

These comprehensive reports help Pedro analyze and identify patterns in IT performance degradation, empowering him to plan for capacity and schedule maintenance in advance. This preempts problems, thereby avoiding unwanted downtime for the business. If he's able to furnish solid proof for why a certain resource like disk space on a server or a certain file cluster went down, or goes down on a specific day and time, then engineering can back and check if it's just a capacity issue or a genuine performance problem.

Capacity planning

Management typically raises requests for more capacity or more disk space. Pedro is required by engineering to explain the requirement basis for any request. Before Applications Manager, he usually wouldn't have an answer. A typical scenario where capacity comes under the scanner is the “Monday morning rush–hour syndrome.” Every Monday morning, practically everyone at Carnival wants to pull reports from all the different applications, e.g., Siebel. This would be when load on capacity is at the very peak, and CPU utilization is close to 99 percent.

Delivering as promised

The other main business benefit of using Applications Manager is SLAs – being able to meet them and preventing ticket incidents (referred to as “P1 events” in their system) to reduce help desk calls. According to Pedro, before using Applications Manager, his team would “get calls every other night.” Now, with the ManageEngine solution in place, he's seen a “huge reduction of calls."

Troubleshooting

At Carnival, successful, timely troubleshooting comes down to historical data available on hand and to narrowing down the parameters. In turn, the operations team can, say, catch every network interruption, identify the exact minute when a server or application is disconnecting, monitor processes by process ID, check availability, and see if any resource actually went down overnight.

To sum it up, Applications Manager has proved to be invaluable in its contributions to Carnival's IT management. Where day–to–day operations, capacity planning, historical performance analysis, and troubleshooting are concerned, Pedro says, "We're very happy. The level of detail is so critical. Because most of the times, the simplest of things can have the greatest impact." And for Carnival, smooth sailing – for Pedro, his IT team, and the rest of the cruise line's business units – is the the greatest impact made by Applications Manager.

Try out Applications Manager today for free at: https://www.manageengine.com/products/applications_manager/download.html

For more information, please visit https://www.manageengine.com

Follow our blog at http://blogs.manageengine.com