SCOM 2012 agent or gateway certificate issue

NOTE: while I’m still keeping the current posts live as they still seem to help, currently my focus has changed and new activity moved to the new site iternia.be

After we were stuck for several weeks, the resolution to this problem was actually found by my colleague Jens Van Hove, so all credit goes to him 😉

Special thanks to Kurt Van Hoecke for providing a wall to bounce some ideas off

To start from the beginning: we had a problem adding a Windows Server 2012 machine to our SCOM 2012 SP1 monitoring environment when using a certificate based trust. Whether as an agent-monitored machine or a SCOM gateway, if the managed server is located in a different domain than the management server, the problem was identical in both cases. Deploying the agent and installing the SCOM agent certificate goes well but when you try to add the server to the environment to effectively start monitoring, you get an error stating that the certificate is not trusted. Using a browser to verify the certificate trusts reveals no issues. The chain is trusted and all root and intermediary certificates are in place. After we tried re-installation, renewing certificate templates and even temporarily bypassing the Cisco firewall between both machines, we still came no closer to a solution.

But by accident when searching on the different event id’s in the event logs, we came across a Read more of this post

SCOM alert – Max concurrent API reached

NOTE: while I’m still keeping the current posts live as they still seem to help, currently my focus has changed and new activity moved to the new site iternia.be

EDIT (11/03/2014): 2nd possible cause found for the SCOM alert and added to the article (at the bottom).

If you got a recently patched Operations Manager environment then the current version of the basic OS management pack includes new intelligence to check for problems due to the maximum amount of NTLM or Kerberos PAC password validations a particular server can handle at a time.

Symptoms

Performance issues; these can be veeery hard to troubleshoot due to the large amount of variables in your environment (from storage to networking to server hardware or virtualization performance etc etc). If you had your storage engineers, your network specialists and your HyperV or Vmware gurus run all the tests they can think of, try to look at the following as well (or better: SCOM could have done it preventively already 😉

Besides performance issues which are not only difficult but also often subjective, you can see some strange application behaviour. Read more of this post

Kerberos authentication and delegation: ServicePrincipalNames

NOTE: while I’m still keeping the current posts live as they still seem to help, currently my focus has changed and new activity moved to the new site iternia.be

SPN’s

One of the errors that often reoccur when deploying a service is the Kerberos authentication failing for some reason when another system depends on your service. Depending users or services try to log on to your service but are not allowed to access it. This is not a problem with the enduser but with the rights of the service account on which the service itself is running. The service account doesn’t have the right to delegate access or impersonate the enduser. About 9 times out of 10 this is caused by inproper Kerberos rights due to a faulty SPN (or ServicePrincipalName) configuration and sometimes due to the delegation settings on the service account.

First lets take a look at how SPNs work in theory. An SPN consists of 2 parts

  • Service type
  • Service name [:service port]

Read more of this post