Site Recovery Manager 6.5

Unsaved changes! You may continue editing other frames before saving.

Save Changes

This demo contains several individual chapters which can be viewed directly from the demo shortcut menu on the left side.
This chapter will show the process of using SRM to failover workloads during a disaster, then re-protect them and migrate them back to the original site.
This scenario starts with us in the middle of a disaster. We need to run a recovery plan immediately that will fail over resources currently running in New York to our recovery site in New Jersey. We start in the SRM home page within the vSphere Web UI. Here we see links to Sites, Array Based Replication (SRAs), Protection Groups and Recovery Plans. Since we are in the midst of a disaster we select 'Recovery Plans' to run a Recovery Plan and get our VMs running at our New Jersey site ASAP.
[Click on Recovery Plans]

55.

Here we see a list of our recovery plans. Since this disaster impacts our whole site we select 'Entire Site'
[Click Entire Site]

90.

This is where we can start our recovery plan, test it and monitor it and the steps it takes as it runs.
[Click on the Start Recovery Plan button]

120.

This is a process that will impact our VMs running in New York (if they aren't already impacted by the disaster as they will be powered off at our protected site and powered on at our recovery site).
[Click the check box]

160.

Since we are experiencing a disaster, we want to ensure a return to normal operations as quickly as possible. A short RTO (Recovery Time Objective) is important, and running the recovery plan in 'Disaster Recovery' mode will help us meet that as it will attempt to synchronize data and otherwise communicate and interact with the protected site and the recovery site, but it will carry on running regardless of if we experience errors related to communication with the protected site. We do not want to slow down the recovery if our protected site is unavailable or only partially available.
[Select Disaster Recovery]

180.

[Click Next]

210.

It is important to make sure we have chosen correctly so we verify the details of the plan we are about to execute: We are failing over from New York to New Jersey, we are running in Disaster Recovery mode (not planned migration), and most importantly we are running the correct recovery plan, 'Entire Site'
Note the steps as they execute:
- SRM attempts to synchronize storage
- SRM shuts down VMs at the protected site in reverse order of startup
- SRM synchronizes storage again (to minimize the time that VMs are turned off)
- SRM prepares recovery site storage (change storage to writeable, mounting to hosts, etc.)
- Power on VMs in defined order (by priority group and dependencies within priority groups if defined)
- There are timestamps on all steps for start and finish
[Click Finish]

690.

Our VMs are now running in New Jersey. The issue at the New York datacenter has been resolved so we will reprotect our VMs, configuring New York as the recovery site. This will allow us to migrate our VMs back when we want to, or fail VMs over to New York if we need to. Configuring the environment to ensure the VMs are once again protected is as simple as a single click.
[Click the re-protect button]

710.

Re-protect will communicate with the storage arrays and/or vSphere Replication and reverse the direction of replication for the datastores associated with the recovery plan. This ensures the VMs are protected and available to be migrated or failed over again.
[Select the check box]

740.

[Click Next]

820.

Ensure that the re-protect will execute in the desired manner on the desired objects. Note that when this completes we will have protected what was our recovery site New Jersey to what was our protected site New York. Once started note the reversal of replication.
[Click Finish]

1100.

Now that our VMs are reprotected we'll look at our VMs locations in the web interface.
[Click the Home Button]

1110.

[Click VMs and Templates]

1230.

Here we see our placeholder VMs (note the special icon on the ERP, Finance and Portal VMs - and that they are not powered on) in New York. The HR application was not part of our recovery plan and will be added to it later.
[Click New Jersey]

1340.

Looking at our New Jersey inventory we see the 3 applications VMs running normally. These last two screens show the benefit of the enhanced linked mode introduced in vSphere 6.0 and how it makes working with an SRM environment easier. Note that it is not required to use SRM. Let's return to SRM and work on moving our VMs back to New York.
[Click on Recovery Plans]

1370.

We will now execute a planned migration from New Jersey to New York. Since there is no longer a disaster in progress, we want to run the failback as a planned migration. A planned migration differs from a disaster recovery workflow in that planned migrations will stop if errors are encountered. Administrators may fix any problems that occur along the way with a planned migration and then re-run it.
The goal of a planned migration is different than disaster recovery, rather than a fast recovery being the highest priority, consistent data and a good state prior to recovery is a higher priority.
[Click on the Run Recovery Plan button]

1380.

[Click on the check box]

1420.

[Click Next]

1460.

Ensure the recovery plan information is correct. We are failing over from Site B to Site A and the recovery type is a Planned Migration.
[Click Finish]

1770.

While the planned migration is running, let's look at how priority groups/power on order for VMs is handled.
[Click on Step 10]

1820.

Here we see that the databases for the Finance and Portal applications are powered on first.
[Click on Step 11]

1850.

[Click on the scroll bar on the right]

2090.

And the application servers for ERP, Finance and Portal after the databases Now that the VMs have all started successfully, we want to ensure the environment is once again safe and protected should another disaster occur and also to allow us to do recovery plan testing or planned migrations. Once again we want to run 'Re-protect' and ensure that the VMs running in our New York datacenter are completely protected.
[Click the re-protect button]

2110.

Again we need to confirm this will change our environment and verify details.
[Click the check box]

2140.

[Click Next]

2170.

[Click Finish]

2380.

The recent disaster highlighted that we have an application (the HR app) that isn't replicated or protected by SRM.
Due to a lack of replicated storage space on our array the VMs that make up the HR application don't reside on replicated storage so they can't be protected by array-based replication.
We will use vSphere Replication to protect the VMs, create a protection group and add it to our entire site recovery plan.
[Click the Home button]

2390.

[Click VMs and Templates]

2410.

[Click HR]

2420.

vSphere Replication can be configured for more than one VM at a time so we will select both VMs and do that.
[Click HR-Web01]

2460.

[Click Actions]

2470.

[Click All vSphere Replication Actions]

2490.

[Click Configure Replication]

2520.

[Click Yes]

2600.

[Click Next]

2640.

We will be replicating to another vCenter so this option will stay selected. If we were going to use vCloud Air DR we would use the other option. Only highlight this if it is relevant to the customer.
[Click Next]

2690.

[Click Next]

2780.

We may have multiple VR servers that handle VM replication at the recovery site. The VR servers act as recipients of the replicated data and distribute it to the appropriate hosts. We will leave this on automatic. Note that it is easy to scale up vSphere Replication simply by adding additional VR servers at our recovery site.
[Click Next]

2810.

[Click Edit for all]

2840.

[Click iSCSI-LUN1]

2860.

[Click OK]

2990.

[Click Next]

3080.

vSphere Replication supports both guest OS quiescing for applications that require it and network compression for lower bandwidth links. Our environment doesn't require either so we'll leave them disabled. vSphere Replication supports RPOs from 5 mins (as of SRM 6.5 - this demo is 6.1) to 24 hours as well as multiple point in time instances. We'll use a 1 hour RPO and point in time instances will be left disabled.
[Click Next]

3140.

[Click Next]

3180.

Confirm settings are as desired. Note the 1 hour RPO, that replication is enabled and that there is no "initial copy found".
If we chose, we could have seeded a copy of the VM at the recovery site through any mechanism we chose, and vSphere Replication would have synchronized only changes between the two copies.
Since no copy was found, VR will need to replicate the entire VM on its initial synch job.
[Click Finish]

3210.

[Click the Home button]

3220.

Now that our VMs are being replicated with vSphere Replication we need to add it to a Protection Group.
[Click Site Recovery]

3240.

[Click Protection Groups]

3380.

Since this application is separate from our other applications and there is a desire to be able to fail it over and test its failover separately we will create a new Protection Group just for it.
[Click the New Protection Group button]

3400.

[Click in the Name text box]

3410.

[Hit any key to type]

3440.

[Click Next]

3530.

Since these VMs are currently located in New York we need to change the direction of protection to that and since they are replicated with vSphere Replication change the Protection Group type to that.
[Click New York -> New Jersey]

3610.

[Click on Individual VMs (vSphere Replication)]

3650.

[Click Next]

3730.

We see that one of our VMs has already replicated and the other is still completing it's initial full sync. We'll select both of them as they are both part of our HR application.
[Select the top check box]

3750.

[Click Next]

3800.

[Click Finish]

3930.

Here we can see that our new HR Protection Group shows a "Protection Status" of OK. We're ready to create a Recovery Plan for it and add the PG to our Entire Site RP.
[Click Site Recovery]

3970.

[Click Recovery Plans]

4070.

[Click the New Recovery Plan button]

4090.

[Click in the Name text box]

4100.

[Hit any key to type]

4140.

[Click Next]

4210.

Again we need to ensure we select the correct recovery site this time for the recovery plan.
[Click New Jersey]

4240.

[Click Next]

4400.

[Click HR]

4490.

[Click Next]

4560.

Here we have the option of choosing the test networks used by VMs in this recovery plan.
SRM supports auto created test networks as well as actual "test" networks on separate VLANs. This provides significant flexibility to SRM and allows for the recovery plan test process to be completely non-disruptive to production systems.
[Click Next]

4600.

[Click Finish]

4720.

Now that we've created the HR recovery plan we will add the HR Protection Group to the Entire Site Recovery Plan.
A PG can belong to more than one RP so this gives us flexibility in how we test and recover (eg, the ability to test only the HR app, and in the case of a disaster still recover the entire site.
[Click Entire Site]

4830.

[Click Actions]

4870.

[Click Edit Plan]

4930.

[Click Next]

5030.

[Click Next]

5110.

[Click HR]

5210.

[Click Next]

5290.

[Click Next]

5340.

[Click Finish]

5460.

Now let's set the priority groups for the VMs in our HR application so that it starts up correctly when it is tested and recovered.
[Click HR]

5610.

[Click Related Objects]

5740.

[Click Virtual Machines]

5940.

We need the database server to start before the web server so we'll change it's priority group to 1. [Click Actions]

5950.

[Click All Priority Actions]

5970.

[Click Priority 1]

6000.

[Click Yes]

6260.

Let's check the status of our replications.
[Click the Home button]

6270.

[Click VMs and Templates]

6500.

[Click vcnyc01.corp.com]

6520.

[Click Monitor]

6650.

[Click HR-Web01]

6670.

Here we can see the last sync time, duration, size and more.
[Click the Home button]

6680.

[Click Site Recovery]

6700.

[Click Sites]

6920.

[Click Manage]

6940.

One problem we found with our previous failover is that though our VMs failed over, we had to manually change their IP addresses after they were recovered. Let's setup SRM to handle this for us. With IP subnet mapping we can do it at the subnet level instead of for each individual IP address. This makes it much simpler initially as well as on an ongoing basis.
[Click VM-Static1-1060]

6960.

[Click Add IP Customization Rule]

6990.

[Hit any key to type]

7040.

[Click the Subnet text box for New York]

7060.

[Hit any key to type]

7140.

[Click the next text box]

7160.

[Hit any key to type]

7190.

[Click the Subnet text box for New Jersey]

7210.

[Hit any key to type]

7320.

[Click the Gateway text box]

7340.

[Hit any key to type]

7460.

[Click the DNS addresses text box]

7480.

[Hit any key to type]

7610.

[Click OK]

7650.

We can see that our IP customization rule is in place for the VM-Static1-1060 portgroup mapping between our sites. Now let's run a test of our newly created HR PG & RP.
[Click Site Recovery]

7670.

[Click Recovery Plans]

7730.

[Click HR]

7790.

[Click Monitor]

7890.

We want to test our new HR RP and PG non-disruptively. We can run this RP in "Test" mode to see how it will work when we need to use it without disrupting our production systems.
The protected VMs will not be changed or modified in any way and replication will continue uninterrupted while the test is run. The environment that will be created for the test at the recovery site will be network and storage isolated to ensure there is no possibility of conflict with the production environment.
[Click the Test Recovery Plan button]

7910.

Make sure the information is correct. We are testing a failover from New York to New Jersey. Also, note that we can choose whether we want to replicate recent changes. This is important if we want the recovery site test environment to look as close as possible to production as possible.
[Click Next]

7930.

When we run this recovery plan test we will observe the following:- A test environment is created at the recovery site- Storage snapshots are created to test the storage non-disruptively- IP addresses are changed as part of a test (if they are part of the recovery plan)- Priority group 1 VMs start prior to Priority group 3 VMs, just like when running the actual recovery plan
[Click Finish]

8150.

Let's look at what the test environment looks like at our recovery/New Jersey site.
[Click the Home Button]

8160.

[Click VMs and Templates]

8180.

We see our HR VMs at the New Jersey site with the SRM icon running. These VMs can be connected to via console or through a jumpbox with connection to the test network and the production network to test the functionality of the application.
Once testing is complete it is time to clean up the test so that the recovery plan is ready to be run or tested again. The cleanup process powers off the test VMs, removes them from inventory, removes the snapshot(s) and resets things so that another test can be run or the RP can be run in DR mode or planned migration mode.
[Click HR]

8230.

[Click Recovery Plans]

8280.

[Click the Cleanup button]

8290.

[Click Next]

8340.

[Click Finish]

8490.

Now let's look at the history reports available with SRM. We'll start with the Entire site recovery we ran earlier.
[Click Entire Site]

8540.

[Click History]

8590.

[Click Entire Site Recovery (second line)]

8670.

[Click the Export History button]

8680.

[Click Generate Report]

8790.

[Click Download Report]

8850.

[Click Save]

8880.

[Click Close]

9010.

[Click the Explore button on the Windows task bar]

9120.

[Click Entire Site]

9210.