Sunday, February 20, 2011

Resolving 3PAR/Vmware SCSI reservation conflicts

Resolving 3PAR/Vmware SCSI reservation conflicts:

Environment:
Vmware 4.1
3PAR 2.3.1 MU2 with VAAI API

Symptoms:
-All the esx host lost the datastores
-When you login to esxhost and do "df -h" it hangs
-You cannot shutdown the guest vm's properly, it will get stuck at say 95%
-Your VM's won't be accessible to power on


Login to 3PAR
3PAR_F200_PG1 cli% showrsv
VVname Host Owner Port Reservation Type
vv-datastore1 esx01 21000024 FF2FCA50 unknown SCSI-2
vv-datastore2 esx01 21000024 FF2FCA50 unknown SCSI-2
VV-datastore5-2MB esx01 21000024 FF2FCA50 unknown SCSI-2

The above output confirms that there are scsi reservation on the luns


VMWARE KB Article to resolve the issue
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002293



Solution from 3PAR side:

3PAR_F200_PG1 cli% setvv -clrrsv vv-datastore1

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv vv-datastore1
select q=quit y=yes n=no: y

3PAR_F200_PG1 cli% showrsv
VVname Host Owner Port ReservationType
vv-datastore2 esx01 21000024FF2FCA50 unknown SCSI-2
VV-datastore5-2MB esx01 21000024FF2FCA50 unknown SCSI-2

(the above output shows it reset lock on vv-datastore1)


3PAR_F200_PG1 cli% setvv -clrrsv vv-datastore2

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv vv-datastore2
select q=quit y=yes n=no: y

3PAR_F200_PG1 cli% showrsv
VVname Host Owner Port ReservationType
VV-datastore5-2MB esx01 21000024FF2FCA50 unknown SCSI-2
3PAR_F200_PG1 cli% setvv -clrrsv VV-datastore5-2MB

(the above output shows it reset lock on vv-datastore2)

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv VV-datastore5-2MB
select q=quit y=yes n=no: y


3PAR_F200_PG1 cli% showrsv
no reservations found

(the above output shows no reservations)




Solution from VMware ESx host side:

List all the pending reservation
#tail -1 /proc/scsi/qla2xxx/[0-9]*

Perform a LUN reset to clear the lock
# vmkfstools --lock lunreset /vmfs/devices/disks/naa.50002ac000080a7e



---------------
ESXi /var/log/messages Error Logs:
---------------
Feb 18 19:11:32 Vpxa: [2011-02-18 19:11:32.385 1D695B90 warning 'App'] [VpxaHalStats] Unexpected return result. Expect 1 sample, receive 2
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D613B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D613B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 info 'App' opID=task-internal-22417-dbfb04f9] [VpxLRO] -- BEGIN task-internal-22417 -- -- vpxapi.VpxaService.fetchQuickStats -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 verbose 'SoapAdapter.HTTPService' opID=task-internal-22417-dbfb04f9] HTTP Response: Complete (processed 396 bytes)
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 info 'App' opID=task-internal-22417-dbfb04f9] [VpxLRO] -- FINISH task-internal-22417 -- -- vpxapi.VpxaService.fetchQuickStats -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
############################################################### Error ###############################################################
Feb 18 19:11:37 vmkernel: 10:02:15:41.486 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000050a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb 18 19:11:37 vmkernel: 10:02:15:41.486 cpu4:4134)WARNING: FS3: 7030: Reservation error: Timeout
Feb 18 19:11:37 vmkernel: 10:02:15:41.534 cpu4:4134)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 49 stamp 868815074527 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
############################################################### Error ###############################################################
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000080a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu6:4135)WARNING: FS3: 7030: Reservation error: Timeout
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu18:697291)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 3517 stamp 871821852348 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu6:4135)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 3517 stamp 871821852348 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.322 1D85CB90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.323 1D85CB90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.323 1D613B90 info 'App' opID=CB8B255C-0000010A-52] [VpxLRO] -- BEGIN task-internal-22418 -- -- vpxapi.VpxaService.queryPerformanceStatistics -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.324 1D613B90 verbose 'App' opID=CB8B255C-0000010A-52] [VpxaMoService::QueryStats] Query last timestamp 1298056298
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.326 1D613B90 verbose 'SoapAdapter.HTTPService' opID=CB8B255C-0000010A-52] HTTP Response: Complete (processed 418 bytes)
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.326 1D613B90 info 'App' opID=CB8B255C-0000010A-52] [VpxLRO] -- FINISH task-internal-22418 -- -- vpxapi.VpxaService.queryPerformanceStatistics -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.244 1D5D2B90 verbose 'App'] [VpxaVMAP] Checking Node Resources
Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.249 1D5D2B90 verbose 'App'] [VpxaVMAP] CheckThreshold values stored:(35400,58178,35400,58178) - retrieved:(35400,57085,35400,58129)
Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.249 1D5D2B90 verbose 'App'] [VpxaVMAP] CheckThreshold percent changes (0,1,0,0) - threshold 5
Feb 18 19:11:40 Hostd: [2011-02-18 19:11:40.006 4FD49B90 error 'Statssvc'] HostCtl Exception during stats collection: Unable to complete Sysinfo operation. Please see the VMkernel log file for more details.
############################################################### Error ###############################################################
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000030a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu14:4133)WARNING: FS3: 7030: Reservation error: Timeout
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu18:1048314)Fil3: 11853: Max timeout retries exceeded for caller Fil3_FileIO (status 'Timeout')
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu12:2194046)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu14:4133)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu18:1048314)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:42 Hostd: [2011-02-18 19:11:42.035 4FBA0B90 verbose 'vm:/vmfs/volumes/4caa11fb-4eb23e3b-1ada-a4badb47bc9f/mdc3vr1904/mdc3vr1904.vmx'] Running status of tools changed to: running
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.037 1D591B90 verbose 'VpxaHalCnxHostagent'] Received callback in WaitForUpdatesDone
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'VpxaHalCnxHostagent'] [VpxaHalCnxHostagent::ProcessUpdate] Applying updates from 102653 to 102654 (at 102653)
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsStatus'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsRunningStatus'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsVersion'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestId'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestFamily'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestFullName'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.hostName'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.ipAddress'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.net'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.ipStack'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: GuestInfo changed 'guest.disk'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.screen'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestState'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: appHeartbeatStatus changed to appStatusGray
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalServices] appHeartbeatChange Event for vm(11) 160
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] App HeartbeatStatus Changed
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaHalServices] RuntimeChange Event for vm(11) 160
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] AAM won't be notifyed. vmid 11: notifyVMAP 0,isRestarting 0
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (4151): VmRuntime:VpxaInvtVmChangeListener::RuntimeChanged
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaHalServices] VmGuestDiskChange Event for vm(11) 160
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] Guest DiskInfo Changed
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (4152): VmRuntime:GuestDiskChanged
Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength 129)
Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Auto-completing at 129/129 bytes
Feb 18 19:11:45 vmkernel: 10:02:15:49.496 cpu14:1048552)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:45 vmkernel: 10:02:15:49.498 cpu8:2045185)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.083 1D717B90 verbose 'App'] [VpxaVMAP] Monitoring AAM health: vpxdDasStateOnLastInvocation(running) currentVpxdDasState(running) forceRunOfListNodes(0) isDasEnabled(1) skipOperation(0)
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.083 1D717B90 verbose 'App'] [VpxaVMAP::Invoke]Command to invoke is /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=md000ysesx01 -uname=VMkernel -cmd=monitornodes -domain=vmwar
Feb 18 19:11:45 e
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.084 1D717B90 info 'SysCommandPosix'] ForkExec(/opt/vmware/aam/bin/aamPerl) 2196440
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.316 1D717B90 verbose 'App'] [VpxaVMAP::Invoke] task percent done is 100
Feb 18 19:11:46 Vpxa: [2011-02-18 19:11:46.119 1D717B90 verbose 'App'] [VpxaVMAP::Invoke] Command output: 02/18/11 19:11:45 [print_args ] Invoked command: 02/18/11 19:11:45 [print_args ] /opt/vmware/aam/bin/ftPerl /opt/vmware/a
Feb 18 19:11:46 am/ha/aam_config_util.pl -z -shortname=md000ysesx01 -uname=VMkernel -cmd=monitornodes -domain=vmware 02/18/11 19:11:45 [print_args ] Environment: 02/18/11 19:11:45 [print_args ] FT_DIR=/opt/vmware/aam 02/18/11 19:11:45 [print_ar
Feb 18 19:11:46 gs ] FT_ISOLATION_TIME=1 02/18/11 19:11:45 [print_args ] GREP=/bin/grep 02/18/11 19:11:45 [print_args ] FT_CONFIG_DIR=/var/lib/vmware/aam 02/18/11 19:11:45 [print_args ] RPCINFO=/bin/rpcinfo 02/1

3 comments:

jcpowermac said...

Hello,
We had the exact same issue with our 3Par a couple days ago do you know what caused your scsi reservation conflict?
Thanks

Pankaj Gautam said...

We were on 2.3.1 MU2 on 3PAR
As per 3PAR as long as we have 3PAR VAAI plugin installed on ESXi hosts
we shouldn't see this conflict.

Unfortunately, we lost ESXi logs at that time to able to get any valid resolution from Vmware.

jcpowermac said...

Thanks, exactly what happened to us with the VMware ESXi logs, just implemented the vMA to help fix at least that problem. We haven't been told about using the VAAI plugin but I sent along an email regarding this post. Thanks for your quick response.