NetApp vFiler DR mit Data ONTAP Simulator Teil 9: Geplanter Failover

Dies ist Teil einer Artikelserie.

So wie die Umgebung in den bisherigen Schritten konfiguriert wurde kann man netapp01 als Master und netapp02 als Slave bezeichnen. Clients nutzen die Freigaben des vFilers auf dem Master (netapp01) und die Daten werden auf den Slave (netapp02) repliziert. Dadurch kann der vFiler bei einem Ausfall des Masters ohne Datenverlust auch auf dem Slave gestartet werden. Die Abbildung verdeutlicht die Konfiguration noch einmal.

Master-Slave-Konfiguration

Wird der Slave (netapp02) z.B. für Wartungsarbeiten heruntergefahren und fällt somit aus, hat dies auf die Freigaben erst mal keine Auswirkungen. Die CIFS und NFS Freigaben sind weiter verfügbar. Natürlich können die Daten vom Master (netapp01) nicht mehr auf den Slave (netapp02) repliziert werden, die Datenbestände laufen also auseinander. Sobald netapp02 aber wieder erreichbar ist setzt die Replikation automatisch wieder ein und die Daten werden abgeglichen.

Aber was geschieht wenn der Master (netapp01) heruntergefahren werden muss oder ausfällt? Um weiter auf die Freigaben zugreifen zu können ist ein Failover auf den Slave (netapp02) notwendig. Dabei wird zwischen einem geplanten Failover (also ein kontrolliertes Umschalten auf den Slave) und einem Disaster Failover unterschieden. In diesem Teil wird der geplante Failover durchgespielt.

Geplanter Failover von Master zu Slave:

  1. vFiler auf Master (netapp01) stoppen => “stopped”
netapp01> vfiler stop vfiler01
vfiler01                         stopped
netapp01> Thu Apr 28 20:20:52 CEST [netapp01:vf.stopped:warning]: vfiler: 'vfiler01'; stopped
netapp01> vfiler status
vfiler0                          running
vfiler01                         stopped
netapp02> vfiler status
vfiler0                          running
vfiler01                         stopped, DR backup
  1. vFiler DR auf Slave starten => “running”
netapp02> vfiler dr activate vfiler01@netapp01
Waiting for "vol_vfiler01" to become stable.
Thu Apr 28 20:33:58 CEST [netapp02:snapmirror.sync.fail:notice]: Synchronous SnapMirror from netapp01_vfiler01_con:vol_vfiler01 to netapp02:vol_vfiler01 failed.
CIFS local server is running.
Thu Apr 28 20:34:04 CEST [vfiler01@netapp02:cifs.startup.local.succeeded:info]: CIFS: CIFS local server is running.
Thu Apr 28 20:34:04 CEST [netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Thu Apr 28 20:34:04 CEST [vfiler01@netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.
Thu Apr 28 20:34:04 CEST [vfiler01@netapp02:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Thu Apr 28 20:34:05 CEST [netapp02:wafl.scan.ownblocks.done:info]: Completed block ownership calculation on volume vol_vfiler01. The scanner took 0 ms.

Vfiler vfiler01 activated.
e0a: flags=0xe48867 mtu 1500
        inet 192.168.2.67 netmask 0xffffff00 broadcast 192.168.2.255
        ether 00:0c:29:61:01:2b (auto-1000t-fd-up) flowcontrol full
netapp02> Thu Apr 28 20:34:05 CEST [netapp02:cmds.vfiler.dr.activated:info]: Disaster recovery backup vFiler unit: 'vfiler01' of the vFiler unit at remote storage system: 'netapp01' was activated.
Thu Apr 28 20:34:29 CEST [vfiler01@netapp02:nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the local server.
netapp02> vfiler status
vfiler0                          running
vfiler01                         running
  1. State des SnapMirrors prüfen => “Source” auf Master und “Broken-off” auf Slave
netapp01> snapmirror status
Snapmirror is on.
Source                     Destination                State          Lag        Status
netapp01:vol_vfiler01      netapp02:vol_vfiler01      Source         00:04:22   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:05:12   Idle
  1. Resync von Slave auf Master (-s für synchrone Replizierung)
netapp01> vfiler dr resync -s vfiler01@netapp02
One can optionally provide an alternate ip path for sync snapmirroring
Alternate IP address/Hostname for remote filer netapp02 []:
Alternate IP address/Hostname for local filer netapp01 []:
netapp02's Administrative login: root
netapp02's Administrative password:

CIFS local server on vFiler vfiler is shutting down...

CIFS local server on vfiler vfiler has shut down...
Thu Apr 28 20:40:57 CEST [vfiler01@netapp01:telnet_0:notice]: IP address 192.168.2.68 is  removed from interface "e0a"
Configuring SnapMirror to mirror vfiler vfiler01's storage units from remote filer netapp02.
Starting snapmirror initialize commands. It
could take a very long time when the source or
destination filers are involved in many
simultaneous transfers. The console will not be
available until all initialize commands are
started successfully. Please use the
"snapmirror status" command on the source
filer to monitor the progress.

Thu Apr 28 20:41:00 CEST [netapp01:snapmirror.dst.resync.info:notice]: SnapMirror resync of vol_vfiler01 to netapp02:vol_vfiler01 is using netapp02(4082368507)_vol_vfiler01.4 as the base snapshot.
Thu Apr 28 20:41:00 CEST [netapp01:vFiler.storageUnit.off:warning]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now offline.
Thu Apr 28 20:41:01 CEST [netapp01:wafl.snaprestore.revert:info]: Reverting volume vol_vfiler01 to a previous snapshot.
Thu Apr 28 20:41:02 CEST [netapp01:vFiler.storageUnit.On:notice]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now online.
Revert to resync base snapshot was successful.
Thu Apr 28 20:41:02 CEST [netapp01:replication.dst.resync.success:notice]: SnapMirror resync of vol_vfiler01 to netapp02:vol_vfiler01 was successful.
SnapMirror transfer initiated for vfiler storage units.
  1. SnapMirror von netapp02 (Source) auf netapp01 (Destination) prüfen => zusätzliche Einträge mit State “Snapmirrored” auf Master und “Source” auf Slave
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler04      Snapmirrored   00:00:00   In-sync
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:10:45   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:11:31   Idle
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:00:00   In-sync

Nach diesen Aktionen haben Master und Slave ihre Rollen getauscht. Der vFiler läuft auf netapp02 und die Daten werden von netapp02 auf netapp01 repliziert. Jetzt kann netapp01 für Wartungszwecke heruntergefahren werden.

Master-Slave-Konfiguration

Ist der “alte” Master (netapp01) wieder einsatzbereit, kann der vFiler wieder zurück verschoben werden.

Rollback von Slave zu Master

  1. Warten bis SnapMirror von netapp02 (Source) auf netapp01 (Destination) “In-sync”
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Snapmirrored   00:00:00   In-sync
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:14:45   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:15:31   Idle
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:00:00   In-sync
  1. vFiler auf Slave stoppen => “stopped”
netapp02> vfiler stop vfiler01
vfiler01                         stopped
Thu Apr 28 20:47:10 CEST [netapp02:vf.stopped:warning]: vfiler: 'vfiler01'; stopped
netapp02> vfiler status
vfiler0                          running
vfiler01                         stopped
  1. vFiler auf Master starten => “running”
netapp01> vfiler dr activate vfiler01@netapp02
Waiting for "vol_vfiler01" to become stable.
Thu Apr 28 20:48:23 CEST [netapp01:snapmirror.sync.fail:notice]: Synchronous SnapMirror from netapp02_vfiler01_con:vol_vfiler01 to netapp01:vol_vfiler01 failed.
Thu Apr 28 20:48:30 CEST [netapp01:wafl.scan.ownblocks.done:info]: Completed block ownership calculation on volume vol_vfiler01. The scanner took 0 ms.
CIFS local server is running.
Thu Apr 28 20:48:31 CEST [vfiler01@netapp01:cifs.startup.local.succeeded:info]: CIFS: CIFS local server is running.
Thu Apr 28 20:48:31 CEST [netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.
Thu Apr 28 20:48:31 CEST [vfiler01@netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes file is missing.
Thu Apr 28 20:48:31 CEST [vfiler01@netapp01:httpd.config.mime.missing:warning]: /etc/httpd.mimetypes.sample file is missing.

Vfiler vfiler01 activated.
e0a: flags=0xe48867 mtu 1500
        inet 192.168.2.66 netmask 0xffffff00 broadcast 192.168.2.255
        inet 192.168.2.69 netmask 0xffffff00 broadcast 192.168.2.255
        ether 00:0c:29:ee:ee:f2 (auto-1000t-fd-up) flowcontrol full
netapp01> Thu Apr 28 20:48:32 CEST [netapp01:cmds.vfiler.dr.activated:info]: Disaster recovery backup vFiler unit: 'vfiler01' of the vFiler unit at remote storage system: 'netapp02' was activated.
Thu Apr 28 20:48:55 CEST [vfiler01@netapp01:nbt.nbns.registrationComplete:info]: NBT: All CIFS name registrations have completed for the local server.

netapp01> vfiler status
vfiler0                          running
vfiler01                         running
  1. Status des SnapMirrors von netapp02 (Source) auf netapp01 (Destination) prüfen => “Source” auf netapp02 und “Broken-off” auf netapp01
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Broken-off     00:03:38   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:17:04   Idle
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Broken-off     00:17:58   Idle
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:04:32   Idle
  1. Resync von Master auf Slave => Status SnapMirror von netapp01 (Source) auf netapp02 (Destination) “In-sync” (dauert eine Weile)
netapp02> vfiler dr resync -s vfiler01@netapp01
One can optionally provide an alternate ip
 path for sync snapmirroring
Alternate IP address/Hostname for remote filer netapp01 []:
Alternate IP address/Hostname for local filer netapp02 []:
netapp01's Administrative login: root
netapp01's Administrative password:

CIFS local server on vFiler vfiler01 is shutting down...

waiting for CIFS shut down (^C aborts)...

CIFS local server on vfiler vfiler01 has shut down...
Thu Apr 28 20:53:02 CEST [vfiler01@netapp02:telnet_0:notice]: IP address 192.168.2.68 is  removed from interface "e0a"
Configuring SnapMirror to mirror vfiler vfiler01's storage units from remote filer netapp01.
Starting snapmirror initialize commands. It
could take a very long time when the source or
destination filers are involved in many
simultaneous transfers. The console will not be
available until all initialize commands are
started successfully. Please use the
"snapmirror status" command on the source
filer to monitor the progress.

Thu Apr 28 20:53:06 CEST [netapp02:snapmirror.dst.resync.info:notice]: SnapMirror resync of vol_vfiler01 to netapp01:vol_vfiler01 is using netapp01(4082368508)_vol_vfiler01.5 as the base snapshot.
Thu Apr 28 20:53:06 CEST [netapp02:vFiler.storageUnit.off:warning]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now offline.
Thu Apr 28 20:53:08 CEST [netapp02:wafl.snaprestore.revert:info]: Reverting volume vol_vfiler01 to a previous snapshot.
Thu Apr 28 20:53:09 CEST [netapp02:vFiler.storageUnit.On:notice]: vFiler vfiler01: storage unit /vol/vol_vfiler01 now online.
Revert to resync base snapshot was successful.
Thu Apr 28 20:53:10 CEST [netapp02:replication.dst.resync.success:notice]: SnapMirror resync of vol_vfiler01 to netapp01:vol_vfiler01 was successful.
SnapMirror transfer initiated for vfiler storage units.

netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Snapmirrored   00:00:00   In-sync
netapp02:vol_vfiler01               netapp01:vol_vfiler01      Source         00:08:02   Idle
netapp01> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp02_vfiler01_con:vol_vfiler01  netapp01:vol_vfiler01      Broken-off     00:08:37   Idle
netapp01:vol_vfiler01               netapp02:vol_vfiler01      Source         00:00:00   In-sync
  1. SnapMirror Beziehungen von Slave zu Master löschen
netapp02> snapmirror release vol_vfiler01 netapp01:vol_vfiler01
snapmirror release: vol_vfiler01 netapp01:vol_vfiler01: No release-able destination found that matches those parameters.  Use 'snapmirror destinations' to see a list of release-able destinations.
netapp01> snapmirror release vol_vfiler01 netapp01:vol_vfiler01
snapmirror release: vol_vfiler01 netapp01:vol_vfiler01: No release-able destination found that matches those parameters.  Use 'snapmirror destinations' to see a list of release-able destinations.

Wie vor dem Failover wird der vFiler jetzt wieder auf dem Master (netapp01) ausgeführt und die Daten zum Slave (netapp02) repliziert (vgl. erste Abbildung).

netapp01> vfiler status
vfiler0                          running
vfiler01                         running
netapp01> snapmirror status
Snapmirror is on.
Source                     Destination                State          Lag        Status
netapp01:vol_vfiler01      netapp02:vol_vfiler01      Source         00:00:00   In-sync
netapp02> vfiler status
vfiler0                          running
vfiler01                         stopped, DR backup
netapp02> snapmirror status
Snapmirror is on.
Source                              Destination                State          Lag        Status
netapp01_vfiler01_con:vol_vfiler01  netapp02:vol_vfiler01      Snapmirrored   00:00:00   In-sync

Alle Artikel der Serie:
Teil 1: Download der benötigten Komponenten
Teil 2: Einrichtung des 1. Simulators
Teil 3: Einrichtung des 2. Simulators
Teil 4: Erstellen eines Aggregats und Volumes
Teil 5: DNS Konfiguration
Teil 6: vFiler erstellen und vFiler DR konfigurieren
Teil 7: Synchroner vFiler DR
Teil 8: Freigaben auf vFiler erstellen
Teil 9: Geplanter Failover
Teil 10: Disaster Failover