'get sys ha status' 명령어에서 HA failover가 발생한 원인 설명
HA에서 장애가 발생하여 Primary 장치가 새로 선택되는 이유를 'get sys ha status' 명령어로 확인 가능하다.
Failover에 대한 다양한 원인에 대한 설명이다.
아래 예는 'FGVMxxxxxxxJZPF5'가 Primary 장치이고, ' FGVMxxxxxxxLRLBE'가 Secondary 장치인 HA 상태이다.
1. Promary 장치(FGVMxxxxxxxJZPF5)의 모니터링 인터페이스가 다운된 경우
Primary selected using:
<2023/07/14 04:59:34> FGVMxxxxxxxLRLBE is selected as the primary because the value 0 of link-failure + pingsvr-failure is less than peer member FGVMxxxxxxxJZPF5. ← 04:59:34 FGVMxxxxxxxJZPF5 장치의 모니터링 인터페이스가 다운되어 failover가 발생되었고, FGVMxxxxxxxLRLBE 장치가 Primary가 됨
2. HA의 다른 멤버가 모두 다운된 경우
Primary selected using:
<2023/07/14 05:17:55> FGVMxxxxxxxJZPF5 is selected as the primary because it's the only member in the cluster. ← 05:17:55에 FGVMxxxxxxxJZPF5 HA 멤버중에 유일한(다른 멤버가 모두 다운) 디바이스라서 Primary 장치가 됨
3. HA 멤버(FGVMxxxxxxxJZPF5)가 HA에 다시 합류하여 높은 priority로 Primary 장치가 됨 (HA override is enabled)
<2023/07/14 05:24:48> FGVMxxxxxxxJZPF5 is selected as the primary because its override priority is larger than peer member FGVMxxxxxxxLRLBE.
<2023/07/14 05:37:30> FGVMxxxxxxxLRLBE is selected as the primary because its uptime is larger than peer member FGVMxxxxxxxJZPF5.
<2023/07/14 05:36:56> FGVMxxxxxxxLRLBE is selected as the primary because it's the only member in the cluster.
<2023/07/14 05:48:55> FGVMxxxxxxxJZPF5 is selected as the primary because its uptime is larger than peer member FGVMxxxxxxxLRLBE. ← 05:48:55 FGVMxxxxxxxLRLBE에서 HA uptime이 reset 되어 FGVMxxxxxxxJZPF5가 Primary 장치가 됨.
<2023/07/14 05:44:17> FGVMxxxxxxxLRLBE is selected as the primary because its uptime is larger than peer member FGVMxxxxxxxJZPF5. ← 05:44:17 FGVMxxxxxxxLRLBE가 Primary 장치가 됨
<2023/07/14 05:44:16> FGVMxxxxxxxJZPF5 is selected as the primary because there is high memory usage on peer member FGVMxxxxxxxLRLBE. ← FGVMxxxxxxxLRLBE의 높은 메모리 사용률로 HA failover가 발생하여, FGVMxxxxxxxJZPF58