Backups: Ahh! To Zzz 😴
Updated: Dec 23, 2020
Sleep Soundly with Good Backups!
Disaster Recovery (“DR”) plans are a portion of the enterprise IT environment which are often overlooked for the development into new technologies or projects — all while neglecting the proper maintenance and support of the existing infrastructure. Most internal IT or managed service providers will state there is a DR plan for the IT environment they manage, but how truthful is this? Historically DR plans have focused primarily on business continuity but with ransomware continuing to grow as a threat, modifications to the traditional model need to be made.
In this blog post MOXFIVE will provide a guide on how to develop an organization’s DR plan to minimize downtime through a ransomware incident, hardware failure, or natural disaster.
Failing to plan is planning to fail
Step one of a plan is always to build the plan! A DR plan is, at its base, a business decision which will govern policy, budget, and project management. The organization should assign an owner and stakeholder for the project as a whole, e.g. CIO, CTO, CISO. This owner should be in a position to take ownership of understanding and clearly documenting all IT systems, applications, and data within the organization.
A good exercise to provide visibility of value to the executive leadership is to attempt to determine the total cost of an outage. This is a formula unique to each organization but basic metrics to remember are Cost of Recovery (cyber insurance deductible, 3rd party IT services, ransom payment, etc.), Cost of Lost Productivity, Cost of Reputation Loss, and Lost Revenue Costs. This estimated outage cost is a base metric for estimating a budget for the overall project; it provides a clear rationale for spending money for something everyone hopes will never be needed! Once a budget is established, the stakeholder should draft a high-level framework for the policy and processes to be developed as the DR plan is further built out and decisions around technologies and methodologies are confirmed.
☑ Assign an executive project stakeholder
☑ Determine worst case scenario cost to business
☑ Establish loose budget
☑ Construct high-level framework
But What ARE the Crown Jewels?
Backups for data revolves around the data, so the organization’s data needs to be identified, reviewed, and prioritized. The DR plan’s stakeholder should have a strong understanding of the business goals and risks of the organization from a high-level perspective. This needs to be paired with the focused understanding of the data usage by each application and sub-organization’s owner. This pairing leads to a data priority hierarchy based its value to the whole organization, even if it is as simple as “critical”, “high”, “medium”, and “low”.
A mapping of each application’s impact on and storage of the data should be documented across the environment to allow for a plan forward on designing the backup plan. Based on the data’s priority, different aspects of the backup plan can change, including backup frequency, number of unique backups (i.e. versioned or immutable backups), and backup locations, both logically and physically. Increased backup frequency will equate to a shorter gap in data loss at the time of an incident but due to increased computational and storage costs is not effective for all data. Similarly, the number of and location of backups increases the cost of storage but lowers the risk of those backups being lost due to a threat actors involvement, data corruption, or storage failure. For critical servers and high priority data, MOXFIVE recommends utilizing the 3–2–1 rule, detailed in our recent blog post “Ransomware Recovery Tales: Protect the Kingdom” that highlights why you should protect your domain controllers. This means three copies of the data, on two distinct forms of media, and one offline or offsite backup. The last portion of this rule is very important, as this could mean a fully offline external drive, storage in cloud, or a service provider dedicated to DR and data backups.
Prioritizing and Mapping Business Data ☑ Assign criticality of each dataset based on business impact ☑ Document storage location of each dataset ☑ Define a logical backup plan for each criticality
Measure Twice, Cut Once
Building any large IT project requires strong vetting of the technologies and solutions that will be used throughout. The DR application stack and infrastructure needs should be evaluated in the same fashion as all other production IT stacks and as with any IT architecture — be sure to plan for growth. Take into consideration the organization’s five-year outlook and ensure the vendors, products, and infrastructure that is being evaluated is scalable to that outlook’s needs.
When comparing backup vendors and products, inquire into ransomware or tamper-proof protection. Often when a threat actor deploys ransomware in an environment, they will attempt to delete backups, bettering their chance of receiving a ransom payment. One good example of protection a backup solution can provide, is the ability to enable two-factor authentication, either when logging into a system or specifically for allowing any user to delete backups in a manual or ad-hoc fashion. Backups are your lifeline when you need them most, be sure to treat them in the same manner that the production dataset is to be treated! As previously mentioned, immutable backups or backup versioning allows for backups to not be overwritten in the event of a ransomware attack or security incident. This type of feature only allows for data to be written once, if a new backup is created it is stored separately from any already existing backup file(s) helping to avoid a situation where backups could be overwritten with encrypted files
A key part of architecting a strong backup solution is ensuring segmentation of backups. When segmenting backups, be sure to write backups to a non-domain joined system, which uses disparate credentials, to ensure if the domain is compromised, the system containing backups is not. Further segmentation from a DR perspective would include writing to a media outside of the organization’s environment. This can mean writing to tape or external drive backups that are kept completely offline, writing backups to a cloud provider using non-domain credentials, or using a vendor with a specific offsite backup offering (such as Iron Mountain or HP Enterprise). Cloud providers allow for tighter access control per user, allowing the backup service account to have the least privileges possible; this ensures even if the credential is compromised, the backups and larger cloud account will not be.
Segmentation and disparate storage locations also affect the Recovery Time Objective (“RTO”) and Recovery Point Objective (“RPO”) which are fundamentally a business decision. While both are factors of time, the RPO is the time of data loss between the most recent viable backup whereas RTO is the time taken to restore the environment to business functionality. Storing a tape backup in Fort Knox is secure but will increase the length of time on both ends. Additionally, the cost to the business and the criticality of the data must be included in calculating acceptable loss versus cost of a backup solution.
Architecting and Implementing ☑ Plan for growth ☑ Evaluate anti-ransomware features ☑ Ensure products and their configurations incorporate secure best practices ☑ Segment backups from production for a layered security approach ☑ Incorporate desired business continuity metrics
A properly designed DR implementation is a full production environment that adds value to an organization and just like a web server that provides e-commerce, the DR stack and policy must be monitored, reviewed, and improved upon. It is unfortunately commonplace for MOXFIVE to hear “we did have backups running but it appears to have failed <INSERT TIMEFRAME> ago and no one noticed”. Many backup solutions have built in reporting mechanisms but remember they only work if you review them! Critical and priority backups should have daily reports to allow admins or engineers to fix problems before they grow. All backups should be reported on weekly to spot trends in any failed backups, and monthly reports should be used for executive summaries.
From MOXFIVE’s experience, another downfall of many DR plans is the lack of testing, after all “No plan of operations extends with any certainty beyond the first contact with the main hostile force.” Periodic testing of a portion of or the DR plan as a whole will help test and train the organization’s IT and response teams. These periodic tests can provide measurable metrics for tracking the overall performance of the DR plan. In addition to proactive training, execution of the DR plan will help identify training, technology, and process gaps. Thinking from a business perspective, if a SQL server is hit by ransomware, how long until the system is back in production (i.e. RTO)? Is that amount of time acceptable for business continuity? Should more funds be allotted for the overall project?
All of the questions brought up during testing should help evolve the DR plan ultimately producing a living document. As the testing provides insight into how the plan needs to change, keep in mind the goals and growth of the overall organization. Have the business priorities of the organization changed? If yes, then does that mean the data priority hierarchy need to be re-evaluated? A good DR plan should be re-evaluated from top to bottom each year, including touchpoints with other groups outside of IT. New laws and industry regulations may have changed or been implemented that impact data retention or storage rules, which could impact the organization’s DR plan and backups.
Continual Improvement Lifecycle ☑ Implement monitoring and reporting of system and backup jobs ☑ Schedule periodic “fire drills” to test team and system effectiveness ☑ Conduct an annual review of the DR plan
Bad Backups lead to Nightmares
The key takeaway is that Disaster Recovery plans and the backups directed by those plans are critical portions of an organization’s business as well as IT infrastructure and should be treated as such. In the age of ransomware, a solid backup and recovery strategy can be the difference of millions of dollars — and that’s just the easily measurable costs. Organizations must ensure the major stakeholders of an organization understand the importance of the plan, take the time to properly form the strategic plan, implement it, test it’s resilience, and evolve the plan from the lesson’s learned from those tests. Every business leader is allowed to sleep soundly — knowing that your backup strategy is where it needs to be and is tested to ensure that means one less nightmare coming your way.
 Field Marshal Helmuth Karl Bernhard Graf von Moltke