Showing posts with label SCOM. Show all posts
Showing posts with label SCOM. Show all posts

Tuesday, March 25, 2014

SCOM 2007 R2 (Editing Company Knowledge)

Company Knowledge is used to capture the steps required to resolve an alert in your OpsMgr installation.  When paired with the Product knowledge (which provides you with the application developers knowledge on the causes and suggested resolution steps for an alert), the two will help any operator with the best steps and any historic knowledge to reduce the amount of time required to resolve an alert

Product Knowledge is embedded in a rule or monitor when it is authored. Company Knowledge can be added at any time provided you have the correct applications loaded and you are logged into the console with an account that is assigned to the correct role.


Software requirements:


1) Office Professional 2007 SP3 or higher, only Word is required.

2) Install the Visual Studio 2005 tools for the Microsoft Office System which can be downloaded from the below
http://www.microsoft.com/downloads/details.aspx?FamilyID=f5539a90-dc41-4792-8ef8-f4de62ff1e81&displaylang=en

Role requirements:

In OpsMgr 2007, user roles define the actions that can be taken (the profile) on what objects (the scope). There are several user roles predefined in the product and the OpsMgr Administrator will have assigned your user account to a role. Your account must be assigned to one of the roles before you can access the Operations Console.



All roles can view the Company Knowledge content either by accessing the rule or monitor directly (this access is limited by the role that you are in) or through an alert that was raised by a monitor or rule. In order to edit the Company Knowledge however you must be in either the Author or Administrator OpsMgr 2007 User Role.

When you click on the Edit, what you will be presented will be similar to the screenshot on the left


Wednesday, October 3, 2012

SCOM Agent for Workgroup not showing up on SCOM Console


In order to monitor a server that is in another domain/workgroup, a certificate will have to be imported to the server which the SCOM agent has been deployed. 

At times, you may wonder how is that with the certificate being imported into the server via MOMcertimport tool, the agent still do not show up under the pending management list of server on the SCOM console.
This usual has gotta do with the certificate.
To verify if the correct certificate is being used you may go to the below to check
  1. Log on to the computer with an account that is a member of the Administrators group.
  2. On the Windows desktop, click Start, click Run, type regedit, and then click OK.
  3. On the Registry Editor page, expand HKEY_LOCAL_MACHINE, expand SOFTWARE, expand Microsoft, expand Microsoft Operations Manager, expand 3.0, and then click Machine Settings.
  4. In the results pane, right-click ChannelCertificateSerialNumber
    (The value in this key should match the serial number of teh certificate in the reverse order)

    In any event that the value is wrong, you may choose to enter it manually else you can run the momcertimport tool again.

Friday, July 27, 2012

SCOM Subscriptions automatically disabled repeatedly

An issue was flagged to my side that certain IT teams are not getting the alerts that they have been subscribed to.

Upon logging onto the SCOM Console it has been found that these notification subscriptions were getting disabled every 30 minutes. The weird thing was that not all subscriptions were being disabled and the same subscriptions were the same subscriptions every time. I tried re-enabling them and had the same result, the subscriptions kept being disabled. After some digging through the operations manager logs I found this warning:


Log Name: Operations Manager
Source: Health Service Modules
Date: 7/27/2012 5:53:22 PM
Event ID: 11452
Task Category: None
Level: Warning
Keywords: Classic
User: N/A

Computer: RMSserver

Description:
Validate alert subscription data source module encountered an alert subscription data source with configuration that has gone out of scope. Disabling the alert subscription data source module.

Alert subscription name: Subscription45c18cec_e95d_4af6_877e_072844d147d0

One or more workflows were affected by this.
Workflow name: Microsoft.SystemCenter.ValidateAlertSubscription
Instance name: RMSServer
Instance ID: {AF86A1AC-F1F5-9BF7-1E89-F60F73982EB6}
Management group: ManagementGRP



The problem turned out to be that someone in the team has just recently cleaned up the SCOM Admins user group and one of the users removed from the group had created this subscriptions. By putting the user back in the SCOM Admins group and re-enabling the subscriptions the problem was solved but we really didn’t want this user (Who has left the company) in the SCOM Admins group.

What is the root cause of this? When a subscription is created the user who created the subscriptions SID is associated with that subscription. There is a workflow that checks every half hour for SIDs no longer valid. They could be invalid because their accounts access that had been removed, or possibly because the account has been disabled or deleted.

The Solution

To fix the issue permanently, the management pack “Microsoft.SystemCenter.Notifications.Internal” is exported in xml format.
This management pack is unsealed and contains all subscriptions.
Inside the management pack I searched for one of the subscriptions that were being disabled and one that was wasn’t. I then replaced the SID of the subscription that is disabled with the SID of the subscription which is enabled.
After replacing the SIDs I re-imported the management pack and re-enabled all subscriptions and the problem was solved for good.
Here is an example of one of the SIDs I had to replace.

<ExpirationStartTime>12/01/2010 10:00:21</ExpirationStartTime>
<IdleMinutes>5</IdleMinutes>
<PollingIntervalMinutes>1</PollingIntervalMinutes>
<UserSid>S-1-5-21-1202660629-706699826-839522115-63827</UserSid>
<LanguageCode>ENE</LanguageCode>
<ExcludeNonNullConnectorIds>false</ExcludeNonNullConnectorIds>
<RuleId>$MPElementlt;/RuleId>

Tuesday, July 10, 2012

Removing Delete Computers from SCOM View


There will be times when the SCOM agent has been a decommissioned server but after some time, this object is still displayed on the SCOM Computers view.

If removal is required, the SQL statement below will enable you to do so

UPDATE [OperationsManager].[dbo].[BaseManagedEntity] SET [IsDeleted] = 1   WHERE [DisplayName] LIKE 'servername'

Wednesday, June 20, 2012

SCOM SNMP Threshold Type


SCOM SNMP monitors/rules creation could be easy to create but it could be also a pain to configure the thresholds.
The SCOM Console makes assumption that whatever is being created for rules and monitors thresholds as strings.
Hence, when we have a need to have a threshold which requires the a numeric comparison such as greater than or less then, the string value her will not work.


To work around this, either you can export the management pack and start editing the XML or the easier way out which I prefer will be the below procedure
1)      export the management pack to XML
2)      Use the System Center Operations Manager 2007 R2 Authoring Console (http://www.microsoft.com/en-us/download/details.aspx?id=18222) open the exported MP in step 1. Navigate to the monitors on the left window


3)      Select the correct monitor and then right click, properties and configuration.

4)      Under each >>>XpathQuery and >>>value, you will see >>>@Typ. You will need to change 4 similar attributes like this to Integer. (Refer to screenshot above for sample)
5)      Once this is completed, save the modified management pack.
6)      Re-import the management pack into SCOM.

Tuesday, March 13, 2012

Netman.dll issue on SCOM 2007 R2 OS Management Pack version 6.0.6958.0

There is a known issue that Microsoft has released a Fast Publish article for. This is pertaining to the SCOM 2007 OS management pack that was released on 18/20/2011.
The below is an extract of what you may encounter on a Windows 2003 server that is affected by this issue
The server service stops unexpectedly on the Windows 2003 server. You can find the following event in the application event log:
Event Type:        Error
Event Source:    Application Error
Event Category:                (100)
Event ID:              1000
Date:                     29/10/2011
Time:                     1:02:12 AM
User:                     N/A
Computer:          <computer name>
Description:
Faulting application svchost.exe, version 5.2.3790.3959, faulting module netman.dll, version 5.2.3790.3959, fault address 0x0000000000008d4f. 


The svchost that is hosting the Server service crashes causing any other services in that svchost instance to fail. The service cannot be restarted and generates an "Access Denied" event on the event logs (Event 7023). This is a known issue in Netman.dll that becomes exposed after rules and monitors are ran from the SCOM server with the OS Management Pack which version is 6.0.6958.0

More information can be found in the KB below

Wednesday, March 7, 2012

Troubleshooting Grayed Agents in SCOM

Grayed agents in SCOM Console can be due to several reasons and of course the simplest of them all is that the System Management Service on the server is not started.
Other reasons includes the
-           Database that is used by the health state is corrupted
-          Heartbeat failure
-          Invalid configuration
-          System workflows failure
-          OpsMgr Database or data warehouse performance issues
-          RMS or primary MS or gateway performance issues
-          Network or authentication issues
-          Health service issues (service is not running)

The link article below provides a very useful reference to assist any administrators that requires assistance to resolve issues involving agents that repeatedly goes into grayed state despite repeated  attempts to resolve it.
The articles itself provides a number of possible scenarios of this issue as well as the various resolutions.

Microsoft has released a hotfix for Windows 2003 agents that are encountering this issue which can be downloaded in the link below.
Primarily the ESENT.dll file will be updated for the servers and Yes, this update requires a restart.

Friday, March 2, 2012

Some Helpful SCOM Queries

The below is 2 SQL queries that I use to work around some limitations in SCOM 2007 R2.
To do these, changes are required on the SCOM OprsMgr Database hence sysadmin rights is required on the database.

Have you ever found that the agent is still displayed as grayed out in the computer’s view of the SCOM Console after the agent has been uninstalled and deleted from the list of agent managed servers?
To remove those, you may use the statement below

Remove state servers from computer view after removal of agents

UPDATE [OperationsManager].[dbo].[BaseManagedEntity] SET [IsDeleted] = 1   WHERE [DisplayName] LIKE 'Server FQDN Name’

Though manually SCOM agents to server is not a good practice, there are times that it has to be done this way for whatever operational reasons.
The manually installed agents comes with short comings such as
-       Not being able to be managed from the SCOM console
o    Able to change Management Server
o    Not automatically update to the latest CU whenever you perform an upgrade on the Management Servers.
The query below will enable the to be converted to be remotely manageable despite being installed manually.

Convert manually installed agent to remotely manageable

UPDATE MT_HealthService
SET IsManuallyInstalled=0
WHERE IsManuallyInstalled=1
AND BaseManagedEntityId IN
(select BaseManagedEntityID from BaseManagedEntity
where BaseManagedTypeId = 'AB4C891F-3359-3FB6-0704-075FBFE36710'
AND DisplayName =’Server FQDN Name’)

Wednesday, February 22, 2012

Retrieving SCOM monitor thresholds by Powershell

Part of managing the SCOM infrastructure will include knowing what thresholds are being configured for the countless monitors that are available from the many Management Packs that comes from Microsoft or other vendors.
To compile all these thresholds tying them to their respective counters and monitors proves to be a challenge.
There is some powershell cmdlets that are provided on the Microsoft Site that enables us to do that but the thresholds suck as “Logical Disk Free Space” does not appear in the exported CSV.
The result will be something as per the format in the screenshot below


I have made some amendments to the cmdlet to provide this as below.
Other thresholds that are missing can be added by getting the information of the tag from the management pack xml or using the SCOM Management Pack Authoring console that can be downloaded from http://www.microsoft.com/download/en/details.aspx?id=18222. (Example of the tag for the threshold I added for logical Disk Free Space is as the screenshot below)


PowerShell Cmdlet (To use copy the entire contents and save it as <your filename>.ps1

    function GetThreshold ([String] $configuration)
    {
    $config = [xml] ("<config>" + $configuration + "</config>")
    $threshold = $config.Config.Threshold
    if($threshold -eq $null)
   {
    $threshold = $config.Config.MemoryThreshold
    }
    if($threshold -eq $null)
    {
    $threshold = $config.Config.CPUPercentageThreshold
    }
    if($threshold -eq $null)
    {
    if($config.Config.Threshold1 -ne $null -and $config.Config.Threshold2 -ne $null)
    {
                    $threshold = "Error threshold is: " + $config.Config.Threshold1 + " Warning threshold is: " + $config.Config.Threshold2
    }
    }
    if($threshold -eq $null)
    {
    if($config.Config.ThresholdWarnSec -ne $null -and $config.Config.ThresholdErrorSec -ne $null)
    {
     $threshold = "warning threshold is: " + $config.Config.ThresholdWarnSec + " error threshold is: " + $config.Config.ThresholdErrorSec
    }
    }
if($threshold -eq $null)
{
if($config.Config.SystemDriveErrorPercentThreshold -ne $null -and $config.Config.SystemDriveErrorMBytesThreshold -ne $null -and $config.Config.NonSystemDriveErrorMBytesThreshold -ne $null -and $config.Config.SystemDriveWarningPercentThreshold -ne $null)
{
$threshold = "System Drive Error/Warning(%) is: " + $config.Config.SystemDriveErrorPercentThreshold + "% / " + $config.Config.SystemDriveWarningPercentThreshold + "% System Drive Error/Warning (MB) threshold is: " + $config.Config.SystemDriveErrorMBytesThreshold + "%/ " + $config.Config.SystemDriveWarningMBytesThreshold + ". " + "Non-System Drive Error/Warning (%) is : " + $config.Config.NonSystemDriveErrorPercentThreshold + "% / " + $config.Config.SystemDriveWarningPercentThreshold + "%. " + "% Non-System Drive Error/Warning (MB) threshold is: " + $config.Config.NonSystemDriveErrorMBytesThreshold + "%/ " + $config.Config.NonSystemDriveWarningMBytesThreshold
}
}
    if($threshold -eq $null)
    {
     if($config.Config.LearningAndBaseliningSettings -ne $null)
     {            
      $threshold = "no threshold (baseline monitor)"
    }
    }
    return $threshold
    }
Function GetFrequency ([String] $configuration)
{
$config = [xml] ("<config>" + $configuration + "</config>")
$Frequency = $config.Config.Frequency
if($Frequency -eq $null)
{
$frequency = $config.Config.Frequency;
}
 return ($frequency)
}
Function GetNumsamples ([String] $configuration)
{
$config = [xml] ("<config>" + $configuration + "</config>")
$Samples = $config.Config.Samples
if($Samples -eq $null)
{
$Samples = $config.Config.NumSamples;
}
 return ($Samples)
}
Function GetCounterName ([String] $configuration)
{
$config = [xml] ("<config>" + $configuration + "</config>")
$Counter = $config.Config.Counter
if($Counter -eq $null)
{
$Counter = $config.Config.CounterName;
}
 return ($Counter)
}
Function GetObject ([String] $configuration)
{
$config = [xml] ("<config>" + $configuration + "</config>")
$Object = $config.Config.Object
if($Object -eq $null)
{
$Object = $config.Config.ObjectName;
}
 return ($Object)
}

    $perfMonitors = get-monitor -Criteria:"IsUnitMonitor=1"

$perfMonitors | select-object @{Name="MP";Expression={ foreach-object {$_.GetManagementPack().DisplayName }}},@{name="Target";expression={foreach-object {(Get-MonitoringClass -Id:$_.Target.Id).DisplayName}}},DisplayName,enabled,@{name="Threshold";expression={foreach-object {GetThreshold $_.Configuration}}}, @{name="Frequency";expression={foreach-object {GetFrequency $_.Configuration}}}, @{name="Samples";expression={foreach-object {GetNumSamples $_.Configuration}}}, @{name="Counter";expression={foreach-object {GetCounterName $_.Configuration}}}, @{name="Object";expression={foreach-object {GetObject $_.Configuration}}} | sort Target, DisplayName | export-csv "c:\temp\monitor_thresholds.csv"

Wednesday, February 15, 2012

Customized SSRS reporting for SCOM Alerts Breakdown

I was tasked to come up with something that will enable the breakdown of the SCOM alerts by the severity (Critical, Warning, Information) as well as Server Level.
The generic report that comes with SCOM reporting doesn’t seem to be able to provide what is required.
The report was created with the parameters of the year and month which the alerts are generated as well as select the group to query against (SCOM group which we usually use to group servers of a certain role , site etc)
I am using the below SQL query together with SQL Server Reporting Services (SSRS) .
For ease of use for all, I have uploaded the RDL to http://www.mediafire.com/?ve5zoueabbbhd
The output will be  something similar to the below screenshot.

SQL Query

Select
L1.server
,L1.ForestDNSName
,L1.FullName
,L1.AlertName
,L1.priority
,L1.severity
,sum(L1.severitynone) as 'Severity None Count'
,sum(L1.severitywarning) as 'Severity Warning Count'
,sum(L1.severityCritical) as 'Severity Critical Count'
,L1.triggermonth
,L1.triggeryear
from
(
select      AlertName as 'Alertname'
,ars.DWCreatedDateTime as 'createdDateTime'
,apy.Priority as 'Priority'
,asy.severity as 'Severity'
,case when (asy.Severity = 'Warning' and RepeatCount =0) then 1
when (Asy.Severity = 'Warning' and RepeatCount >0) then RepeatCount
else 0
end as 'SeverityWarning'
,case when (asy.Severity = 'Critical' and RepeatCount =0) then 1
when (Asy.Severity = 'Critical' and RepeatCount >0) then RepeatCount
else 0
end as 'SeverityCritical'
‘This classifies the servers into Development, DR or Production based on OU info and NetbiosName
,case when (lower(MTC.OrganizationalUnit) like '%dev%' or lower(MTC.NetbiosComputerName) like '%dev%') then 'Development'
when (lower(MTC.OrganizationalUnit) like '%dr%' or lower(MTC.NetbiosComputerName) like '%dr%') then 'DR'
else 'Production'
end as ServerRole

            ,day(ars.DWCreatedDateTime) triggerdate
            ,month(ars.DWCreatedDateTime) triggermonth
            ,year(ars.DWCreatedDateTime) triggeryear
            , MTC.ForestDnsName
            ,MTC.OrganizationalUnit
            ,MTC.NetbiosComputerName as 'Server'
            ,ME.FullName

from Alert.vAlertResolutionState ars
                  inner join alert.vAlertDetail adt on ars.alertguid = adt.alertguid
                  inner join Alert.vAlert alt on ars.alertguid = alt.alertguid
                  left join dbo.vManagedEntity ME on ME.ManagedEntityRowId = alt.ManagedEntityRowId
                  left join [OperationsManager].[dbo].[MT_Computer] MTC on lower(ME.path) like  '%' + lower(MTC.NetbiosComputerName)+ '%'
                  left join alertpriority Apy on Apy.alert = alt.Priority
                  left join alertseverity Asy on Asy.alert = alt.severity

where  month(ars.DWCreatedDateTime) in (@Month) and year(ars.DWCreatedDateTime) in (@YearDate) and ME.fullname NOT Like '%Jala%' and MTC.DNSName in (select TargetMonitoringObjectDisplayName as 'Group Members' from [OperationsManager].dbo.RelationshipGenericView where isDeleted=0 AND SourceMonitoringObjectDisplayName in (@Scomgrp))

)L1
Group by
L1.server
,L1.AlertName
,L1.priority
,L1.Severity
,L1.triggermonth
,L1.triggerdate
,L1.triggeryear
,L1.ForestDNSName
,L1.FullName

Wednesday, February 8, 2012

SCOM Service Level Objectives created not updated/available

There maybe instances where new Service Level Tracking objectives (SLO) are created in SCOM and these were not found in the Service Level Dashboard neither are the available on the service availability reports.


While troubleshooting, I have found this error on my SCOM RMS and I believe these are related.



Failed to store data in the Data Warehouse. Exception 'SqlException': Sql execution failed. Error 2627, Level 14, State 1, Procedure ManagementPackInstall, Line 2855, Message: Violation of UNIQUE KEY constraint 'UN_ManagementGroupManagementPackVersion_ManagementGroupRowIdManagementPackVersionRowId'. Cannot insert duplicate key in object 'dbo.ManagementGroupManagementPackVersion'. One or more workflows were affected by this. Workflow name: Microsoft.SystemCenter.DataWarehouse.Synchronization.Configuration Instance name: RMSServer Instance ID: {AF86A1AC-F1F5-9BF7-1E89-F60F73982EB6} Management group: GroupName

It turns out that MPS were queued awaiting synchronization which resulted in teh changes/additions made to the MPs not being updated. 


The SQL query below will enable you to find the MPs that are pending.

SELECT

ManagementPackId, MPFriendlyName,MPName, mp.MPVersionDependentId, MPLastModified, MPKeyToken, ContentReadable

FROM ManagementPack mp

WHERE MPVersionDependentId



NOT IN

(SELECT mpv.ManagementPackVersionDependentGuid

FROM OperationsMAnagerDW.dbo.ManagementPackVersion mpv

JOIN OperationsMAnagerDW.dbo.ManagementGroupManagementPackVersion mgmpv

ON (mpv.ManagementPackVersionRowId = mgmpv.ManagementPackVersionRowId)

WHERE (mgmpv.LatestVersionInd > 0))



AND NOT EXISTS

(SELECT * FROM ManagementPackReferences mpr

JOIN ManagementPack mpv

ON (mpr.ManagementPackIdSource = mpv.ManagementPackId)

WHERE (mpr.ManagementPackIdReffedBy = mp.ManagementPackId)

AND (mpv.MPVersionDependentId NOT IN

(SELECT mpv.ManagementPackVersionDependentGuid

FROM OperationsMAnagerDW.dbo.ManagementPackVersion mpv

JOIN OperationsMAnagerDW.dbo.ManagementGroupManagementPackVersion mgmpv

ON (mpv.ManagementPackVersionRowId = mgmpv.ManagementPackVersionRowId)

WHERE (mgmpv.LatestVersionInd > 0))))


Once the problematic MPs have been found , follow the procedures below to resolve the issue

We need to trigger sync manually. 
Follow the steps to starting at the top of the list, export the MP, and update the version attribute and reimport.

This will force that MP to resynchronize. Once all the MPs that are blocking synchronization have been updated, then all the other ones will update automatically

 Export MPs

For the unsealed MPs, please right click the MP in the SCOM console then export them as xml files.

For  sealed MPs, you can use following link and command to export as xml files http://blogs.technet.com/b/jonathanalmquist/archive/2009/03/30/export-a-management-pack.aspx

 Open the XML, Update the version attribute then save the files:

 For example:

<Identity>

<ID>Mpname<ID>

<Version>1.0.0.3</Version>

</Identity>



We increase the version to 1.0.0.4.



  1. Right click “management pack”, reimport the XML files.
  2. After reimporting all MPs in the query list, check whether we can get the service level object from report.



Note: Most of the MPs listing in the qurey result are customized MPs and will not have any impact.  

          For these Microsoft sealed MPs, the solution will impact following aspects:

          a. the sealed MP will be changed to unsealed b. it cannot auto update if new version of  this MPs are released. 


Hope this helps whoever i facing the same issue as I did.