SCOM 2012 agent or gateway certificate issue

After we were stuck for several weeks, the resolution to this problem was actually found by my colleague Jens Van Hove, so all credit goes to him 😉

Special thanks to Kurt Van Hoecke for providing a wall to bounce some ideas off

To start from the beginning: we had a problem adding a Windows Server 2012 machine to our SCOM 2012 SP1 monitoring environment when using a certificate based trust. Whether as an agent-monitored machine or a SCOM gateway, if the managed server is located in a different domain than the management server, the problem was identical in both cases. Deploying the agent and installing the SCOM agent certificate goes well but when you try to add the server to the environment to effectively start monitoring, you get an error stating that the certificate is not trusted. Using a browser to verify the certificate trusts reveals no issues. The chain is trusted and all root and intermediary certificates are in place. After we tried re-installation, renewing certificate templates and even temporarily bypassing the Cisco firewall between both machines, we still came no closer to a solution.

But by accident when searching on the different event id’s in the event logs, we came across a very interesting article about a similar problem within MS Dynamics Navision. Turns out version 10 of Internet Explorer in Windows Server 2012 is blocking this in some way. To correct the problem first make sure your basic config is in order (steps 1 through 3) and then look at the problem resolution in our case (steps 4 and 5).

Possible network layout

To make it easier to explain, I made up a possible scenario as seen in the following visio:

  • There are 3 different types of zones
    • Your primary domain or your infrastructure domain containing the SCOM environment (green rectangle).
    • A second domain, f.e. a large client environment running on a separate Active Directory (blue rectangle).
    • Smaller private bubbles containing servers for clients (orange rectangles) or your own private DMZ. Whether any of those machines are domain joined to another private AD or not doesn’t matter: the environments are too small to start working with a SCOM proxy server so we’re going to add each agent separately to the SCOM management service. Typically a hoster deploys a lot of those private bubbles to isolate different clients from each other.
  • In your private domain (green rectangle) you deploy SCOM servers (2 management servers and one webconsole f.e.)
  • In the secundairy AD (client) you deploy a SCOM proxy and try to create a certificate trust between the proxy and the management servers in the primary domain.

All servers related to the SCOM environment are in green.

All servers we want to monitor are in blue. They get a SCOM agent and are pointed either directly at the management servers or to the gateway server.

SCOM certificate error network design

SCOM certificate error network design

The problem and its symptoms

The issue occurs when adding either a gateway server (SCOM proxy 1) or one of the clients (Server 4 or 5). Servers 1 and 2 don’t have issues as they are trusted through domain membership and don’t need certificate trusts.

3 events turn up in the event log of the server containing the SCOM agent or gateway:

  • event 20070, OpsMgr Connector
  • “The OpsMgr Connector connected to <management server> but the connection was closed immediately after authentication occurred. The most likely cause of this
    error is that the agent is not authorized to communicate with the server, or the server has not received configuration. Check the event log on the server for the presence of 20000
    events, indicating that agents which are not approved are attempting to connect.”
SCOM eventid 20070

SCOM eventid 20070

  • event 20071, OpsMgr Connector
  • “The OpsMgr Connector connected to <management server> but the connection was closed immediately without authentication taking place. The most likely cause
    of this error is a failure to authenticate either this agent or the server. Check the event log on the server and on the agent for events which indicate a failure to authenticate.”
SCOM eventid 20071

SCOM eventid 20071

  • event 21016, OpsMgr Connector
  • “OpsMgr was unable to set up a communications channel to<management server> and there are no failover hosts. Communication will resume when <management server> is available and communication from this computer is allowed.”

And an event shows up every minute in the system log of the SCOM management server:

  • event 36888, Schannel
  • “A fatal alert was generated and sent to the remote endpoint. This may result in termination of the connection. The TLS protocol defined fatal error code is 70. The Windows SChannel error state
    is 105.”
SCOM eventid 36888

SCOM eventid 36888

Step 1 (healthcheck) – make sure your management server is registered correctly

On the agent/gateway the following registry keys have to contain the SCOM management server:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Agent Management Groups\<SCOM management group>\Parent Health Services

SCOM management server registry keys

SCOM management server registry keys

Step 2 (healthcheck) – open firewall port 5723

If there are any firewalls between them, check that port 5723 is open between the management server and the server you want to join to the SCOM environment.

Step 3 (healthcheck) – run momcertimport and check certificate thumbprint

SCOM comes with the tool “momcertimport”. Run it (elevated! as admin) and register the certificate. Eventid 20053 should show up in the event viewer.

SCOM certificate error - momcertimport

SCOM certificate error – momcertimport

And if you check the following registry key and compare it to the thumbprint of the certificate in your certificate store, then it has to match.

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Machine Settings

SCOM certificate thumbprint

SCOM certificate thumbprint

Step 4 (resolution) – TLS registry keys

For some strange reason the TLS 1.2 used for secure communication by Windows Server 2012 seems to be disabled by default. You need to make sure the following keys are present on the SCOM management server(s):

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\TLS 1.2\Server

SCOM - Windows TLS registry keys

SCOM – Windows TLS registry keys

By the way: similar issues with RDS are fixed with the same keys

Step 5 (resolution) – IE 10 security settings

Open the Internet settings on the SCOM management server(s) and make sure all TLS-related options are unchecked

SCOM - internet settings - TLS

SCOM – internet settings – TLS

Hopefully you now have a working SCOM environment. Enjoy!

—————————————————————-

BTC donate: 1DJF1TuJbWcL37tSf3iKP7TJFzTK7CpFQ4

—————————————————————-

Advertisements

13 Responses to SCOM 2012 agent or gateway certificate issue

  1. swissmike says:

    Thank you very much for this!!! I spend a couple of days before.

  2. Jon says:

    How curious that IE options needed to be changed. Browser settings affect monitoring in SCOM?

  3. geertbaeten says:

    The problem is that “internet settings” is actually a bunch of settings that are not necessarily related to browsers or internet alone anymore. Call it legacy? Those screens have not been redesigned for 20 years (since Windows 95).

    HTTPS f.e. is used by other applications as well, certainly with the booming of “Cloud”. F.e. “webservices” are more and more used as replacement for the traditional RPC calls that were only possible inside your private network for security reasons.

  4. Rorymon says:

    Hi There,

    My Machine Settings Registry Hive is actually completely blank. I created a Self Signed Certificate and set that for HTTP Binding. I had thought that was only used for the Web Console? It seemed like the Enable SSL appeared during that phase of the install.

    I have a one server setup for a POC. All components and SQL on the one server (I know it’s not best practice)

    I manually installed the agent on a server in my environment and sure enough, it popped up in Pending Management, I then Approved. I got the 21016 error right after the install of the agent, now every 15 minutes I get the 20070 error. It suggests the machine needs to be approved, which it is. I followed the steps in your post above…only thing I did different was that I didn’t set the certificate in the Machine Settings hive….I did import the cert to the server but thought it was a redundant step since it’s being used for Web Console only (or is it!?)

    Thanks,
    Rory

  5. RicD says:

    Thanks! Step 4 worked for me. The registry key for TLS 1.2 wasnt there at all. Created it and the 2 DWORD entries and all agents connected after a little bit. Thanks

  6. LvilleSystemsJockey says:

    Another Fix!! I ran into the issue with ConfigMgr and then again with the Gateway Server in SCOM. Because I’m using SHA512, TLS 1.2 is actually an invalid configuration. So let’s disable it! The reg export text is in this post: http://social.technet.microsoft.com/Forums/en-US/4eba74a1-ee67-46c7-9a42-508df5de63fc/osx-1091-client-enrollment-fails?forum=configmanagerdeployment

  7. Thank you. Step 4 fixed it for me

  8. Bob says:

    From step 1 to step 3(Specially for step3), we need to do the configuration on our SCOM server, right?
    ——Recently I received lots of Health Service Heartbeat Failure alert, didn’t find out the root cause, appreciated for your solution!

  9. Pingback: FyrSoft Tip-of-the-Week: Monitoring Cross Platform DMZ Systems – Part 1 FyrSoft

  10. Amazing! Its genuinely awesome piece of writing, I have got much clear idea on the topic of from this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: