Monday, February 21, 2011

Solaris SMF Commands

Solaris SMF Commands

SMF has a limited yet powerful set of commands.

Each command has several options which cover the tasks required to manage Solaris systems.
The following table lists the SMF commands.

Command Description
svcs Reports service status
svcadm Used for service management: e.g., starting, stopping and restoring services
svccfg Used to list properties of a service
svcprop Used to list properties of a service
inetadm Used to manage inetd services

Solaris ssh is offline?

Solaris ssh is offline

I'm sure you must have seen a situation like this, where for some reason ssh died and you cannot login to the server remotely. If you have console access to box, you see the ssh is offline.

root@app1 # svcs -a | grep ssh
offline 1:40:22 svc:/network/ssh:default

svcs -d will tell us what other services depends on ssh

root@app1 # svcs -d ssh
STATE STIME FMRI
online 1:40:19 svc:/network/loopback:default
online 1:40:24 svc:/network/physical:default
disable 1:41:04 svc:/system/cryptosvc:default
online 1:41:16 svc:/system/filesystem/local:default
online 1:42:44 svc:/system/filesystem/autofs:default
online 1:42:43 svc:/system/utmp:default

Offline means that the service is enabled, but something it depends on is missing, disable or in maintenance mode

Here in our case crypto is disable. You might have a service with lots of dependencies that are disabled, or you might have dependencies disabled many levels deep.

Do you want to walk through all those services, find out why they're not on, and enable every dependency by hand? Of course you don't. So svcadm has a "recursive enable" option that goes through and enables everything that your service depends on.

# svcadm enable -r network/ssh


#svcs network/ssh
STATE STIME FMRI
online 1:02:23 svc:/network/ssh:default


#svcs -d network/ssh:default
STATE STIME FMRI
online 1:40:19 svc:/network/loopback:default
online 1:40:24 svc:/network/physical:default
disabled 1:41:04 svc:/system/cryptosvc:default
online 1:41:16 svc:/system/filesystem/local:default
online 1:42:44 svc:/system/filesystem/autofs:default
online 1:42:43 svc:/system/utmp:default

As you can see, we recursively enabled not only ssh, but everything it depended on, allowing it to come online.

One last option of note for enable/disable is the "temporary" option. Say that you want to enable/disable a service just for this session, but have it revert to its previous state on reboot, in case there are problems. If ssh is disabled and you issue:

#svcadm enable -t network/ssh
The enable will only be temporary. If you reboot the machine, the service will once again be disabled.

refresh
Refresh serves two purposes. One is if you've changed any of the properties of your service, say that you've added a dependency or changed the timeout for starting, you refresh the service, and the properties become active. The other purpose is that there's an optional method, in addition to "start" and "stop", called "refresh" that you can define. If your daemon can be sent a HUP signal to re-read its configuration file, you put this in the refresh method, and when you refresh the service, this method is called.

restart
Restart is pretty self evident. Restarting a service means that you stop it and start it again. Where in the past you might have issued a

/etc/init.d/sendmail stop followed by /etc/init.d/sendmail start, now you would use:

#svcadm restart network/smtp:sendmail
... which will restart sendmail.

mark (degraded | maintenance)
Mark is used to force a service into a certain state. (The states are here if you've forgotten them) An administrator might want to force a service into the maintenance state to let other administrators know that there's something wrong with it that needs to be addressed before it's started again. You can force a service into either maintenance (which will shut the service down) or degraded (which will leave it running, but let others know that it's running in a degraded state).

Keeping with our earlier example of ssh:

#svcadm mark maintenance network/ssh

#svcs network/ssh
STATE STIME FMRI
maintenance 1:12:47 svc:/network/ssh:default

clear
Clear is used to "reset" the state of a service, and have it be re-evaluated. For example, say that syslog is in maintenance:

#svcs system/system-log
STATE STIME FMRI
maintenance 1:15:33 svc:/system/system-log:default
You debug the problem, and realize that syslog failed to start because someone had accidentally deleted syslog.conf, which syslog needs to start. It attempted to start, saw that the conf file was missing, and fell into maintenance. You repair the file, and issue a clear:

# svcadm clear system/system-log

# svcs system/system-log
STATE STIME FMRI
online 1:25:07 svc:/system/system-log:default

Summary
These are SMF basic maintenance on a Solaris 10 machine. SMF administration is quite easy, and incredibly powerful. No longer do you have to hunt around for daemons and init scripts, every service is given a unique FMRI, administered through a unified framework. This, combined with explicit states and dependencies, gives administrators flexibility and power that is unavailable in other Unix distributions.

Sunday, February 20, 2011

Resolving 3PAR/Vmware SCSI reservation conflicts

Resolving 3PAR/Vmware SCSI reservation conflicts:

Environment:
Vmware 4.1
3PAR 2.3.1 MU2 with VAAI API

Symptoms:
-All the esx host lost the datastores
-When you login to esxhost and do "df -h" it hangs
-You cannot shutdown the guest vm's properly, it will get stuck at say 95%
-Your VM's won't be accessible to power on


Login to 3PAR
3PAR_F200_PG1 cli% showrsv
VVname Host Owner Port Reservation Type
vv-datastore1 esx01 21000024 FF2FCA50 unknown SCSI-2
vv-datastore2 esx01 21000024 FF2FCA50 unknown SCSI-2
VV-datastore5-2MB esx01 21000024 FF2FCA50 unknown SCSI-2

The above output confirms that there are scsi reservation on the luns


VMWARE KB Article to resolve the issue
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1002293



Solution from 3PAR side:

3PAR_F200_PG1 cli% setvv -clrrsv vv-datastore1

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv vv-datastore1
select q=quit y=yes n=no: y

3PAR_F200_PG1 cli% showrsv
VVname Host Owner Port ReservationType
vv-datastore2 esx01 21000024FF2FCA50 unknown SCSI-2
VV-datastore5-2MB esx01 21000024FF2FCA50 unknown SCSI-2

(the above output shows it reset lock on vv-datastore1)


3PAR_F200_PG1 cli% setvv -clrrsv vv-datastore2

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv vv-datastore2
select q=quit y=yes n=no: y

3PAR_F200_PG1 cli% showrsv
VVname Host Owner Port ReservationType
VV-datastore5-2MB esx01 21000024FF2FCA50 unknown SCSI-2
3PAR_F200_PG1 cli% setvv -clrrsv VV-datastore5-2MB

(the above output shows it reset lock on vv-datastore2)

WARNING: Active hosts may be disrupted.

Are you sure you want to run setvv -clrrsv VV-datastore5-2MB
select q=quit y=yes n=no: y


3PAR_F200_PG1 cli% showrsv
no reservations found

(the above output shows no reservations)




Solution from VMware ESx host side:

List all the pending reservation
#tail -1 /proc/scsi/qla2xxx/[0-9]*

Perform a LUN reset to clear the lock
# vmkfstools --lock lunreset /vmfs/devices/disks/naa.50002ac000080a7e



---------------
ESXi /var/log/messages Error Logs:
---------------
Feb 18 19:11:32 Vpxa: [2011-02-18 19:11:32.385 1D695B90 warning 'App'] [VpxaHalStats] Unexpected return result. Expect 1 sample, receive 2
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D613B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D613B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 info 'App' opID=task-internal-22417-dbfb04f9] [VpxLRO] -- BEGIN task-internal-22417 -- -- vpxapi.VpxaService.fetchQuickStats -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 verbose 'SoapAdapter.HTTPService' opID=task-internal-22417-dbfb04f9] HTTP Response: Complete (processed 396 bytes)
Feb 18 19:11:35 Vpxa: [2011-02-18 19:11:35.216 1D5D2B90 info 'App' opID=task-internal-22417-dbfb04f9] [VpxLRO] -- FINISH task-internal-22417 -- -- vpxapi.VpxaService.fetchQuickStats -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
############################################################### Error ###############################################################
Feb 18 19:11:37 vmkernel: 10:02:15:41.486 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000050a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb 18 19:11:37 vmkernel: 10:02:15:41.486 cpu4:4134)WARNING: FS3: 7030: Reservation error: Timeout
Feb 18 19:11:37 vmkernel: 10:02:15:41.534 cpu4:4134)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 49 stamp 868815074527 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
############################################################### Error ###############################################################
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000080a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu6:4135)WARNING: FS3: 7030: Reservation error: Timeout
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu18:697291)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 3517 stamp 871821852348 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:37 vmkernel: 10:02:15:42.274 cpu6:4135)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 3517 stamp 871821852348 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.322 1D85CB90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.323 1D85CB90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength -1)
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.323 1D613B90 info 'App' opID=CB8B255C-0000010A-52] [VpxLRO] -- BEGIN task-internal-22418 -- -- vpxapi.VpxaService.queryPerformanceStatistics -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.324 1D613B90 verbose 'App' opID=CB8B255C-0000010A-52] [VpxaMoService::QueryStats] Query last timestamp 1298056298
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.326 1D613B90 verbose 'SoapAdapter.HTTPService' opID=CB8B255C-0000010A-52] HTTP Response: Complete (processed 418 bytes)
Feb 18 19:11:38 Vpxa: [2011-02-18 19:11:38.326 1D613B90 info 'App' opID=CB8B255C-0000010A-52] [VpxLRO] -- FINISH task-internal-22418 -- -- vpxapi.VpxaService.queryPerformanceStatistics -- 52c7f745-06cb-1a35-9dd0-388015a6d4f4
Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.244 1D5D2B90 verbose 'App'] [VpxaVMAP] Checking Node Resources
Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.249 1D5D2B90 verbose 'App'] [VpxaVMAP] CheckThreshold values stored:(35400,58178,35400,58178) - retrieved:(35400,57085,35400,58129)
Feb 18 19:11:39 Vpxa: [2011-02-18 19:11:39.249 1D5D2B90 verbose 'App'] [VpxaVMAP] CheckThreshold percent changes (0,1,0,0) - threshold 5
Feb 18 19:11:40 Hostd: [2011-02-18 19:11:40.006 4FD49B90 error 'Statssvc'] HostCtl Exception during stats collection: Unable to complete Sysinfo operation. Please see the VMkernel log file for more details.
############################################################### Error ###############################################################
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu17:4113)ScsiDeviceIO: 1672: Command 0x16 to device "naa.50002ac000030a7e" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu14:4133)WARNING: FS3: 7030: Reservation error: Timeout
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu18:1048314)Fil3: 11853: Max timeout retries exceeded for caller Fil3_FileIO (status 'Timeout')
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu12:2194046)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu14:4133)FS3: 6978: Starting HB reclaim for [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:42 vmkernel: 10:02:15:46.496 cpu18:1048314)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:42 Hostd: [2011-02-18 19:11:42.035 4FBA0B90 verbose 'vm:/vmfs/volumes/4caa11fb-4eb23e3b-1ada-a4badb47bc9f/mdc3vr1904/mdc3vr1904.vmx'] Running status of tools changed to: running
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.037 1D591B90 verbose 'VpxaHalCnxHostagent'] Received callback in WaitForUpdatesDone
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'VpxaHalCnxHostagent'] [VpxaHalCnxHostagent::ProcessUpdate] Applying updates from 102653 to 102654 (at 102653)
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsStatus'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsRunningStatus'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.toolsVersion'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestId'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestFamily'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestFullName'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.hostName'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.ipAddress'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.net'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.ipStack'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: GuestInfo changed 'guest.disk'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.screen'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: Runtime changed 'guest.guestState'
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalVmHostagent] 160: appHeartbeatStatus changed to appStatusGray
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaHalServices] appHeartbeatChange Event for vm(11) 160
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.038 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] App HeartbeatStatus Changed
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaHalServices] RuntimeChange Event for vm(11) 160
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] AAM won't be notifyed. vmid 11: notifyVMAP 0,isRestarting 0
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (4151): VmRuntime:VpxaInvtVmChangeListener::RuntimeChanged
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaHalServices] VmGuestDiskChange Event for vm(11) 160
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtVmChangeListener] Guest DiskInfo Changed
Feb 18 19:11:42 Vpxa: [2011-02-18 19:11:42.039 1D591B90 verbose 'App'] [VpxaInvtHost] Increment master gen. no to (4152): VmRuntime:GuestDiskChanged
Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] User agent is 'VMware-client/4.1.0'
Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Client: NeedsContentLength: false UnderstandsChunking: true CanKeepAlive: true (PresetContentLength 129)
Feb 18 19:11:43 Vpxa: [2011-02-18 19:11:43.242 1D799B90 verbose 'SoapAdapter.HTTPService'] HTTP Response: Auto-completing at 129/129 bytes
Feb 18 19:11:45 vmkernel: 10:02:15:49.496 cpu14:1048552)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:45 vmkernel: 10:02:15:49.498 cpu8:2045185)FS3: 7346: Waiting for timed-out heartbeat [HB state abcdef02 offset 3809280 gen 161 stamp 870183428483 uuid 4d5175ef-d3ceec8d-8b50-a4badb47bc9f jrnl drv 8.46]
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.083 1D717B90 verbose 'App'] [VpxaVMAP] Monitoring AAM health: vpxdDasStateOnLastInvocation(running) currentVpxdDasState(running) forceRunOfListNodes(0) isDasEnabled(1) skipOperation(0)
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.083 1D717B90 verbose 'App'] [VpxaVMAP::Invoke]Command to invoke is /opt/vmware/aam/bin/aamPerl /opt/vmware/aam/ha/aam_config_util.pl -z -shortname=md000ysesx01 -uname=VMkernel -cmd=monitornodes -domain=vmwar
Feb 18 19:11:45 e
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.084 1D717B90 info 'SysCommandPosix'] ForkExec(/opt/vmware/aam/bin/aamPerl) 2196440
Feb 18 19:11:45 Vpxa: [2011-02-18 19:11:45.316 1D717B90 verbose 'App'] [VpxaVMAP::Invoke] task percent done is 100
Feb 18 19:11:46 Vpxa: [2011-02-18 19:11:46.119 1D717B90 verbose 'App'] [VpxaVMAP::Invoke] Command output: 02/18/11 19:11:45 [print_args ] Invoked command: 02/18/11 19:11:45 [print_args ] /opt/vmware/aam/bin/ftPerl /opt/vmware/a
Feb 18 19:11:46 am/ha/aam_config_util.pl -z -shortname=md000ysesx01 -uname=VMkernel -cmd=monitornodes -domain=vmware 02/18/11 19:11:45 [print_args ] Environment: 02/18/11 19:11:45 [print_args ] FT_DIR=/opt/vmware/aam 02/18/11 19:11:45 [print_ar
Feb 18 19:11:46 gs ] FT_ISOLATION_TIME=1 02/18/11 19:11:45 [print_args ] GREP=/bin/grep 02/18/11 19:11:45 [print_args ] FT_CONFIG_DIR=/var/lib/vmware/aam 02/18/11 19:11:45 [print_args ] RPCINFO=/bin/rpcinfo 02/1

AIX 5.3 - The parameter or environment lists are too long

Example Error:
pg@ibmhost: files/$ ls -ltr /data/params_backup/files/*.env
bash: /usr/bin/ls: The parameter or environment lists are too long.

Explanation: NCARGS value is one of the scheduler and memory load control parameter to tune system memory. The value of NCARGS can be increased to overcome this problem. The value can be tuned anywhere within the range of 24576 to 524288 in 4 KB page size increments.

To display and update ncargs value, use the following command.

Purpose: Specifies the maximum allowable size of the ARG/ENV list (in 4KB blocks) when running exec() subroutines.

Values: Default: 6; Range: 6 to 128 (6=24576, 128=524288)

Display: lsattr -E -l sys0 -a ncargs
Change: chdev -l sys0 -a ncargs=NewValue

Change takes effect immediately and is preserved over boot.