Backup and Recovery

Today, we are firmly rooted in the information age; a world where technology is developing to play an essential role within our society and the global economies. As with all major changes, this has provided opportunities for those societies that have actively embraced it. Back in 2015, the Internet contributed toward 10% of UK GDP and was the largest in the G-20 with projected growth to 12.6% in 2016. In 2017 the UK’s digital economy was noted as one of the best in the world.

It is clear that Information Technology is going to continue to grow in importance for our society. Successive Governments have recognised this, implementing a range of initiatives to drive ICT teaching and broader IT adoption within our educational establishments.

Along with the greater use of IT, comes increased dependence on devices, systems and connectivity. So what happens when something goes wrong with your school’s IT systems?

This is when the Backup & Recovery planning by your IT team and SLT becomes important to your school.

What matters the most?

One of the most important considerations for a Network Manager is ensuring the security and availability of the school’s infrastructure services for their pupils and staff, be that end user devices, wireless, servers and data or internet connectivity. In planning for a disruption to, or complete unavailability of these services, these important questions need to be considered:

  • How long can the school be without critical infrastructure?
  • How much lesson time / productivity / data can we afford to lose with any outage?
  • What are the key services we need to restore first – network services, MIS systems, internet access etc.
  • What are our options, what are the costs and how much do we spend to mitigate the risk?

As with many things in the world, Backup & Recovery is a risk mitigation strategy that balances performance against cost. Each school must find an acceptable level of service that can be delivered within an affordable budget.

Backup & Recovery Considerations

When a failure occurs within your virtual server environment there are three main metrics to consider:

  • When was my last backup?

– consequently how much data has been lost since that point in time?

  • How long will it take to restore service?

– how much productivity is lost until we can get service back up and running?

  • What key services do I need to recover first?

not all services are of the same level of criticality.

The first two of these considerations carry the terms Recovery Point Objective (RPO) and Recovery Time Objective (RTO) respectively and the general goal is to keep them as chronologically short as possible. The current target in an enterprise environment for both combined RPO/RTO is 15 minutes though the systems required to deliver this level of availability can be expensive and impractical, particularly in an Educational setting.

Recovery Service Level (RSL) is a percentage measure (0-100%) of how much of the total computing resources are required to re-establish key systems immediately after an outage has occurred.                                                                                                     

As previously outlined the decisions around what level of RPO, RTO and RSL is needed be addressed by quantifying the impact of the unavailability of the Schools IT systems. If one day of unavailability is too long, then the RTO & RPO needs to be lower than 8 hours.

Quantifying these metrics will begin to inform the characteristics of a suitable Backup & Recovery solution for your school.

Backup Best Practice

As well as deriving a business acceptable RPO/RTO/ RSL values you also need to ensure that you protect your backup data. The accepted best practices rule for backing up data is the “three-two-one” rule. In summary, when backing data up, this rule suggests that you should have:

  • at least 3 copies,
  • kept in 2 different formats
  • with 1 copy held off-site.

All of these are designed around one concept, data redundancy. By making sure that the data is stored in multiple formats and in multiple locations the probability that at least one backup will survive is increased. Keeping data copies physically separated is important to protect against floods, fires and theft as well as possible corruption and hardware failures.

Until the advent of cloud services, one of the most popular approaches has been Disk to Disk to Tape (D2D2T) and whilst this is still valid it has been superseded by Disk to Disk to Cloud (D2D2C). Leveraging the growing ubiquity of cloud storage services the new D2D2C approach fulfills the 3-2-1 rule as follows:

  • at least 3 copies: two onsite and one in the cloud
  • kept in 2 different formats: disk and cloud storage*
  • with 1 copy held off-site.                              offsite – in the cloud

*Cloud storage is actually disk-based but it is a service delivered from high availability datacenters and backup data is replicated either within that facility, between regional data centers or globally for increased redundancy.

Whilst the 3-2-1 rule is nothing more than a best practice guideline it has been derived to provide the backbone of robust and reliable Backup & Recovery

Recovery Considerations

Backing up your data and virtual machines deals with the Recover Point Objective but you still need to consider

  • Recovery Service Level – how much compute is needed to run key services and which are they
  • Recovery Time Objective – how long will it take us to restore virtual machines from backups.

Recovery Service Level

When planning a Backup & Recovery strategy it will be the responsibility of the Network Manager and / or IT staff to decide what are the key services to get back up and running first and what resources are needed to run these. This will depend on the mix of services deployed within the school and the compute resources required to adequately run these workloads. Initially this tends to focus on key network services such as DNS, DHCP and Active Directory which provide the school with the basic mechanisms for being able to communicate on the network and log onto a machine. Secondary systems such as MIS platforms, file and print services would follow with tertiary IT support services coming up next.

They key here is to understand what to recover to where and when and ensuring you have the compute, storage and network resources to deliver it.

Recovery Time Objective

If you follow the 3-2-1 rule then you will have backups on-site, at your school. If you’ve had a hardware failure then once you sufficient compute resources then you should be able to restore virtual machines directly over your Local Area Network. The limiting factors of this will be:

  • size of the backups to restore,
  • the speed and reliability of the local area network (1Gbps / 10Gbps etc.),
  • the speed that the servers are connected to the network,
  • the disk types and configurations within the servers (read/write speeds)
  • number of concurrent restorations (contention)
  • efficacy of compression and deduplication algorithms within the Backup & Recovery service

If you don’t follow the 3-2-1 rule and backup data directly offsite then you need to be aware that service restoration will be limited by the speed of your Internet connection, this is typically a fraction of the speed of your Local Area Network and correspondingly restoration may take significantly longer.

 

Putting it into practice?

View Wave9’s Backup and Restore products and service here.

Alternatively, you can contact us for more information on any matter here.