Welcome!

Cloud Security Authors: Elizabeth White, Liz McMillan, Pat Romanski, Zakia Bouachraoui, Yeshim Deniz

Related Topics: Cloud Security, Java IoT, Microservices Expo, Linux Containers, @CloudExpo

Cloud Security: Blog Feed Post

What Are Good RTO and RPO?

The recovery procedures should always be driven by the business needs

Experiencing downtime is not something that companies wish for but as we have seen lately it is something that we hear quite often about. Interestingly enough very few enterprises, especially in the Small and Medium Business area, spent enough time to work out good procedures for recovery of their IT systems and applications. The recovery procedures should always be driven by the business needs, and this is the part where lot of IT departments are failing and as a result the recovery turns out to be reactive procedure that is triggered by the issue, results in a chaotic recovery activities and ends up with post-mortem but no improvements after that. Putting more initial thought into the Business Impact Analysis (BIA) is a prerequisite for a good recovery procedures and defining the two main characteristics - RTO and RPO are crucial part of this process.

No_u_turn_sign

Let's start with the first one - Recovery Time Objective (RTO). RTO is defined as the duration of time within which the system or the service must be restored after disruption in order to avoid unacceptable consequences related to break in business continuity. The first thing that you need to have in mind about RTO is that it is an objective - this means that it is a target that you may not be able to achieve all the time. There are certain activities that you need to do during this time that may have variable duration. At a high level those are grouped in:

  1. Recognizing that there is a disruption - this may depend on your level of monitoring or lack of it and may involve manual checking of each system or service that participates in the business process
  2. Troubleshooting and identifying the failing system and/or service - this will depend on the level of diagnostics you have implemented and may also involve different people or teams
  3. Fixing the issue - depending on the root cause this can be as simple as rebooting the system to as complex as requiting code changes or even ordering new hardware
  4. Testing the fix - last but not least you need to make sure that the fix actually resolves the issue

In all those four activities the human factor is the most variable part. People need to be notified, updated, they need time to understand the issue, troubleshoot, code etc. The more automation you provide the less impactful the human factor is for the recovery time.

Once the system or services is brought back to operation though you need to determine what is the state of the data. This is where the next characteristic becomes important - Recovery Point Objective (RPO). RPO is defined as the period in which data might be lost from the system due to disruption without major impact to the business continuity. Although this is also objective you need to be more careful with this one. There are few things to think about here:

  1. Is data loss acceptable at all? In lot of cases the answer is no but there are situations in which you can tolerate loss of data.
  2. How to recover the data? Does it require copying, shipping backup tapes or manual entry of the data?
  3. How long will it take to recover the data? Two extremes are from few seconds required to repoint the system to a replica of the data on another server to requesting an off-site backup copy of the data
  4. How to test that the data is recovered? This can vary from automated tests to manual tests

Depending on your RPO your time to recover the business operations for your system may vary.

When thinking about Business Continuity (BC) you need to think about both components - recovering the operation of the system or service (RTO) and recovering the data to a point at which it is usable for the business (RTO). Both those actions need to take time that is less than the Maximum Tolerable Downtime (MTD) as we defined it in Determining the Cost of Downtime. In general though you should set your RTO and RPO in a way that you have a buffer of time for unexpected issues that may occur during recovery.

This post was first published on our company's blog.

Read the original blog entry...

More Stories By Toddy Mladenov

Toddy Mladenov has more than 15 years experience in software development and technology consulting at companies like Microsoft, SAP and 3Com. Currently he is a CTO of Agitare Technologies, Inc. - a boutique consulting company that specializes in Cloud Computing and Big Data Solutions. Before Agitare Tech Toddy spent few years with PaaS startup Apprenda and more than six years working on Microsft's cloud computing platform Windows Azure, Windows Client and MSN/Windows Live. During his career at Microsoft he managed different aspects of the software development process for Windows Azure and Windows Services. He also evangelized Microsoft cloud services among open source communities like PHP and Java. In the past he developed enterprise software for German's software giant SAP and several startups in Europe, and managed the technical sales for 3Com in the Balkan region.

With his broad industry experience, international background and end-user point of view Toddy has an unique approach towards technology. He believes that technology should be develop to improve people's lives and is eager to share his knowledge in topics like cloud computing, mobile and web development.

IoT & Smart Cities Stories
The deluge of IoT sensor data collected from connected devices and the powerful AI required to make that data actionable are giving rise to a hybrid ecosystem in which cloud, on-prem and edge processes become interweaved. Attendees will learn how emerging composable infrastructure solutions deliver the adaptive architecture needed to manage this new data reality. Machine learning algorithms can better anticipate data storms and automate resources to support surges, including fully scalable GPU-c...
Machine learning has taken residence at our cities' cores and now we can finally have "smart cities." Cities are a collection of buildings made to provide the structure and safety necessary for people to function, create and survive. Buildings are a pool of ever-changing performance data from large automated systems such as heating and cooling to the people that live and work within them. Through machine learning, buildings can optimize performance, reduce costs, and improve occupant comfort by ...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
René Bostic is the Technical VP of the IBM Cloud Unit in North America. Enjoying her career with IBM during the modern millennial technological era, she is an expert in cloud computing, DevOps and emerging cloud technologies such as Blockchain. Her strengths and core competencies include a proven record of accomplishments in consensus building at all levels to assess, plan, and implement enterprise and cloud computing solutions. René is a member of the Society of Women Engineers (SWE) and a m...
Poor data quality and analytics drive down business value. In fact, Gartner estimated that the average financial impact of poor data quality on organizations is $9.7 million per year. But bad data is much more than a cost center. By eroding trust in information, analytics and the business decisions based on these, it is a serious impediment to digital transformation.
Digital Transformation: Preparing Cloud & IoT Security for the Age of Artificial Intelligence. As automation and artificial intelligence (AI) power solution development and delivery, many businesses need to build backend cloud capabilities. Well-poised organizations, marketing smart devices with AI and BlockChain capabilities prepare to refine compliance and regulatory capabilities in 2018. Volumes of health, financial, technical and privacy data, along with tightening compliance requirements by...
Predicting the future has never been more challenging - not because of the lack of data but because of the flood of ungoverned and risk laden information. Microsoft states that 2.5 exabytes of data are created every day. Expectations and reliance on data are being pushed to the limits, as demands around hybrid options continue to grow.
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Enterprises have taken advantage of IoT to achieve important revenue and cost advantages. What is less apparent is how incumbent enterprises operating at scale have, following success with IoT, built analytic, operations management and software development capabilities - ranging from autonomous vehicles to manageable robotics installations. They have embraced these capabilities as if they were Silicon Valley startups.
As IoT continues to increase momentum, so does the associated risk. Secure Device Lifecycle Management (DLM) is ranked as one of the most important technology areas of IoT. Driving this trend is the realization that secure support for IoT devices provides companies the ability to deliver high-quality, reliable, secure offerings faster, create new revenue streams, and reduce support costs, all while building a competitive advantage in their markets. In this session, we will use customer use cases...