my tech scribbling: February 2011

Solaris ssh is offline

I'm sure you must have seen a situation like this, where for some reason ssh died and you cannot login to the server remotely. If you have console access to box, you see the ssh is offline.

root@app1 # svcs -a | grep ssh

offline 1:40:22 svc:/network/ssh:default

svcs -d will tell us what other services depends on ssh

root@app1 # svcs -d ssh

STATE STIME FMRI

online 1:40:19 svc:/network/loopback:default

online 1:40:24 svc:/network/physical:default

disable 1:41:04 svc:/system/cryptosvc:default

online 1:41:16 svc:/system/filesystem/local:default

online 1:42:44 svc:/system/filesystem/autofs:default

online 1:42:43 svc:/system/utmp:default

Offline means that the service is enabled, but something it depends on is missing, disable or in maintenance mode

Here in our case crypto is disable. You might have a service with lots of dependencies that are disabled, or you might have dependencies disabled many levels deep.

Do you want to walk through all those services, find out why they're not on, and enable every dependency by hand? Of course you don't. So svcadm has a "recursive enable" option that goes through and enables everything that your service depends on.

# svcadm enable -r network/ssh

#svcs network/ssh

STATE STIME FMRI

online 1:02:23 svc:/network/ssh:default

#svcs -d network/ssh:default

STATE STIME FMRI

online 1:40:19 svc:/network/loopback:default

online 1:40:24 svc:/network/physical:default

disabled 1:41:04 svc:/system/cryptosvc:default

online 1:41:16 svc:/system/filesystem/local:default

online 1:42:44 svc:/system/filesystem/autofs:default

online 1:42:43 svc:/system/utmp:default

As you can see, we recursively enabled not only ssh, but everything it depended on, allowing it to come online.

One last option of note for enable/disable is the "temporary" option. Say that you want to enable/disable a service just for this session, but have it revert to its previous state on reboot, in case there are problems. If ssh is disabled and you issue:

#svcadm enable -t network/ssh

The enable will only be temporary. If you reboot the machine, the service will once again be disabled.

refresh

Refresh serves two purposes. One is if you've changed any of the properties of your service, say that you've added a dependency or changed the timeout for starting, you refresh the service, and the properties become active. The other purpose is that there's an optional method, in addition to "start" and "stop", called "refresh" that you can define. If your daemon can be sent a HUP signal to re-read its configuration file, you put this in the refresh method, and when you refresh the service, this method is called.

restart

Restart is pretty self evident. Restarting a service means that you stop it and start it again. Where in the past you might have issued a

/etc/init.d/sendmail stop followed by /etc/init.d/sendmail start, now you would use:

#svcadm restart network/smtp:sendmail

... which will restart sendmail.

mark (degraded | maintenance)

Mark is used to force a service into a certain state. (The states are here if you've forgotten them) An administrator might want to force a service into the maintenance state to let other administrators know that there's something wrong with it that needs to be addressed before it's started again. You can force a service into either maintenance (which will shut the service down) or degraded (which will leave it running, but let others know that it's running in a degraded state).

Keeping with our earlier example of ssh:

#svcadm mark maintenance network/ssh

#svcs network/ssh

STATE STIME FMRI

maintenance 1:12:47 svc:/network/ssh:default

clear

Clear is used to "reset" the state of a service, and have it be re-evaluated. For example, say that syslog is in maintenance:

#svcs system/system-log

STATE STIME FMRI

maintenance 1:15:33 svc:/system/system-log:default

You debug the problem, and realize that syslog failed to start because someone had accidentally deleted syslog.conf, which syslog needs to start. It attempted to start, saw that the conf file was missing, and fell into maintenance. You repair the file, and issue a clear:

# svcadm clear system/system-log

# svcs system/system-log

STATE STIME FMRI

online 1:25:07 svc:/system/system-log:default

Summary

These are SMF basic maintenance on a Solaris 10 machine. SMF administration is quite easy, and incredibly powerful. No longer do you have to hunt around for daemons and init scripts, every service is given a unique FMRI, administered through a unified framework. This, combined with explicit states and dependencies, gives administrators flexibility and power that is unavailable in other Unix distributions.

Resolving 3PAR/Vmware SCSI reservation conflicts:

Environment:

Vmware 4.1
3PAR 2.3.1 MU2 with VAAI API

Symptoms:

-All the esx host lost the datastores

-When you login to esxhost and do "df -h" it hangs

-You cannot shutdown the guest vm's properly, it will get stuck at say 95%

-Your VM's won't be accessible to power on

3PAR_F200_PG1 cli% showrsv

VVname Host Owner Port Reservation Type

vv-datastore1 esx01 21000024 FF2FCA50 unknown SCSI-2

vv-datastore2 esx01 21000024 FF2FCA50 unknown SCSI-2

VV-datastore5-2MB esx01 21000024 FF2FCA50 unknown SCSI-2

The above output confirms that there are scsi reservation on the luns

VMWARE KB Article to resolve the issue

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002293

Solution from 3PAR side:

3PAR_F200_PG1 cli% setvv -clrrsv vv-datastore1

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv vv-datastore1

select q=quit y=yes n=no: y

3PAR_F200_PG1 cli% showrsv

VVname Host Owner Port ReservationType

vv-datastore2 esx01 21000024FF2FCA50 unknown SCSI-2

VV-datastore5-2MB esx01 21000024FF2FCA50 unknown SCSI-2

(the above output shows it reset lock on vv-datastore1)

3PAR_F200_PG1 cli% setvv -clrrsv vv-datastore2

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv vv-datastore2

select q=quit y=yes n=no: y

3PAR_F200_PG1 cli% showrsv

VVname Host Owner Port ReservationType

VV-datastore5-2MB esx01 21000024FF2FCA50 unknown SCSI-2

3PAR_F200_PG1 cli% setvv -clrrsv VV-datastore5-2MB

(the above output shows it reset lock on vv-datastore2)

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv VV-datastore5-2MB

select q=quit y=yes n=no: y

3PAR_F200_PG1 cli% showrsv

no reservations found

(the above output shows no reservations)

Solution from VMware ESx host side:

List all the pending reservation

#tail -1 /proc/scsi/qla2xxx/[0-9]*

Perform a LUN reset to clear the lock

# vmkfstools --lock lunreset /vmfs/devices/disks/naa.50002ac000080a7e

---------------

ESXi /var/log/messages Error Logs:

---------------

Feb 18 19:11:32 Vpxa: [2011-02-18 19:11:32.385 1D695B90 warning 'App'] [VpxaHalStats] Unexpected return result. Expect 1 sample, receive 2

Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D613B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'

Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D613B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)

Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 info 'App' opID=task-internal-22417-dbfb04f9] [VpxLRO] -- BEGIN task-internal-22417 -- -- vpxapi.VpxaService.fetchQuickStats -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4

Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 verbose 'SoapAdapter.HTTPService' opID=task-internal-22417-dbfb04f9] HTTP Response: Complete (processed 396 bytes)

Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 info 'App' opID=task-internal-22417-dbfb04f9] [VpxLRO] -- FINISH task-internal-22417 -- -- vpxapi.VpxaService.fetchQuickStats -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4

############################################################### Error ###############################################################

Feb 18 19:11:37 vmkernel: 10:02:15:41.486 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000050a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Feb 18 19:11:37 vmkernel: 10:02:15:41.486 cpu4:4134)WARNING: FS3: 7030: Reservation error: Timeout

Feb 18 19:11:37 vmkernel: 10:02:15:41.534 cpu4:4134)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 49 stamp 868815074527 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

############################################################### Error ###############################################################

Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000080a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu6:4135)WARNING: FS3: 7030: Reservation error: Timeout

Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu18:697291)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 3517 stamp 871821852348 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu6:4135)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 3517 stamp 871821852348 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.322 1D85CB90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'

Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.323 1D85CB90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)

Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.323 1D613B90 info 'App' opID=CB8B255C-0000010A-52] [VpxLRO] -- BEGIN task-internal-22418 -- -- vpxapi.VpxaService.queryPerformanceStatistics -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4

Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.324 1D613B90 verbose 'App' opID=CB8B255C-0000010A-52] [VpxaMoService::QueryStats] Query last timestamp 1298056298

Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.326 1D613B90 verbose 'SoapAdapter.HTTPService' opID=CB8B255C-0000010A-52] HTTP Response: Complete (processed 418 bytes)

Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.326 1D613B90 info 'App' opID=CB8B255C-0000010A-52] [VpxLRO] -- FINISH task-internal-22418 -- -- vpxapi.VpxaService.queryPerformanceStatistics -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4

Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.244 1D5D2B90 verbose 'App'] [VpxaVMAP] Checking Node Resources

Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.249 1D5D2B90 verbose 'App'] [VpxaVMAP] CheckThreshold values stored:(35400,58178,35400,58178) - retrieved:(35400,57085,35400,58129)

Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.249 1D5D2B90 verbose 'App'] [VpxaVMAP] CheckThreshold percent changes (0,1,0,0) - threshold 5

Feb 18 19:11:40 Hostd: [2011-02-18 19:11:40.006 4FD49B90 error 'Statssvc'] HostCtl Exception during stats collection: Unable to complete Sysinfo operation. Please see the VMkernel log file for more details.

############################################################### Error ###############################################################

Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000030a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu14:4133)WARNING: FS3: 7030: Reservation error: Timeout

Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu18:1048314)Fil3: 11853: Max timeout retries exceeded for caller Fil3_FileIO (status 'Timeout')

Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu12:2194046)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu14:4133)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu18:1048314)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

Feb 18 19:11:42 Hostd: [2011-02-18 19:11:42.035 4FBA0B90 verbose 'vm:/vmfs/volumes/4caa11fb-4eb23e3b-1ada-a4badb47bc9f/mdc3vr1904/mdc3vr1904.vmx'] Running status of tools changed to: running

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.037 1D591B90 verbose 'VpxaHalCnxHostagent'] Received callback in WaitForUpdatesDone

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'VpxaHalCnxHostagent'] [VpxaHalCnxHostagent::ProcessUpdate] Applying updates from 102653 to 102654 (at 102653)

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsStatus'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsRunningStatus'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsVersion'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestId'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestFamily'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestFullName'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.hostName'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.ipAddress'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.net'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.ipStack'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: GuestInfo changed 'guest.disk'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.screen'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestState'

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: appHeartbeatStatus changed to appStatusGray

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalServices] appHeartbeatChange Event for vm(11) 160

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] App HeartbeatStatus Changed

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaHalServices] RuntimeChange Event for vm(11) 160

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] AAM won't be notifyed. vmid 11: notifyVMAP 0,isRestarting 0

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (4151): VmRuntime:VpxaInvtVmChangeListener::RuntimeChanged

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaHalServices] VmGuestDiskChange Event for vm(11) 160

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] Guest DiskInfo Changed

Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (4152): VmRuntime:GuestDiskChanged

Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'

Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength 129)

Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Auto-completing at 129/129 bytes

Feb 18 19:11:45 vmkernel: 10:02:15:49.496 cpu14:1048552)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

Feb 18 19:11:45 vmkernel: 10:02:15:49.498 cpu8:2045185)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]

Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.083 1D717B90 verbose 'App'] [VpxaVMAP] Monitoring AAM health: vpxdDasStateOnLastInvocation(running) currentVpxdDasState(running) forceRunOfListNodes(0) isDasEnabled(1) skipOperation(0)

Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.083 1D717B90 verbose 'App'] [VpxaVMAP::Invoke]Command to invoke is /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=md000ysesx01 -uname=VMkernel -cmd=monitornodes -domain=vmwar

Feb 18 19:11:45 e

Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.084 1D717B90 info 'SysCommandPosix'] ForkExec(/opt/vmware/aam/bin/aamPerl) 2196440

Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.316 1D717B90 verbose 'App'] [VpxaVMAP::Invoke] task percent done is 100

Feb 18 19:11:46 Vpxa: [2011-02-18 19:11:46.119 1D717B90 verbose 'App'] [VpxaVMAP::Invoke] Command output: 02/18/11 19:11:45 [print_args ] Invoked command: 02/18/11 19:11:45 [print_args ] /opt/vmware/aam/bin/ftPerl /opt/vmware/a

Feb 18 19:11:46 am/ha/aam_config_util.pl -z -shortname=md000ysesx01 -uname=VMkernel -cmd=monitornodes -domain=vmware 02/18/11 19:11:45 [print_args ] Environment: 02/18/11 19:11:45 [print_args ] FT_DIR=/opt/vmware/aam 02/18/11 19:11:45 [print_ar

Feb 18 19:11:46 gs ] FT_ISOLATION_TIME=1 02/18/11 19:11:45 [print_args ] GREP=/bin/grep 02/18/11 19:11:45 [print_args ] FT_CONFIG_DIR=/var/lib/vmware/aam 02/18/11 19:11:45 [print_args ] RPCINFO=/bin/rpcinfo 02/1

my tech scribbling

Monday, February 21, 2011

Solaris SMF Commands

Solaris ssh is offline?

Sunday, February 20, 2011

Resolving 3PAR/Vmware SCSI reservation conflicts

AIX 5.3 - The parameter or environment lists are too long

tech scribbling archive

about me