This Was Weird…
You may (or may not) have read my previous blog post in which I wrote that I was going to apply an updated Hot Fix (#NTOR-8N5C2Y) on my Sametime R8.5.2 Classic servers to, hopefully, resolve the “Timed Out” errors we’ve been seeing over the past 2 months.
I applied the Hot Fix on all 3 regional servers and each server appeared to work correctly afterwards… it turns out that there *was* a problem… just nobody was awake at that time to “see” and report it.
Something Had Gone Wrong…
This morning, I woke up to a slew of emails (no phone calls however) from employees in Asia-Pacific saying they couldn’t see any European employees online and emails from European employees saying they couldn’t see *anybody* online. That was a rough way to start my day (and I hadn’t slept much because the upstairs neighbors had been kept me awake because of the noise they made re-enacting their honeymoon night from 1 AM to 3 AM)…
So once I dragged my tired body into the office this morning, I began by asking for a call back from our Lotus Support rep (who had asked me to put the Hot Fix on our server). While waiting for his call back, I poked around the Sametime_20111123.log file on my server to see if there weren’t any “obvious” errors.
And there was some very obvious errors! Here’s what I kept seeing all over the log file:
I StCommunity 23/Nov/11, 04:40:57 Logged in to server 10.10.10.10
E StCommunity 23/Nov/11, 06:02:22 Connection broken to server 10.10.10.10, reason 0×80000224
W StCommunity 23/Nov/11, 06:02:22 Logged out from server 10.10.10.10, reason 0×80000224.
Now, I can’t read Sametime error codes and remember what they all mean by heart (not yet anyway). So, I did a quick Google search and found Technote #1098479 titled “Explanation of error codes associated with Sametime Community Services” (here’s a link).
In that Technote, you can see that the error code #0×80000224 simply means “the connection has been reset”.
That is not useful at all.
The Weird Solution…
At that point, I had received a call back from our Lotus Support Rep and had sent him the CommunityConfig.txt and Sametime.ini from all 3 Sametime R8.5.2 classic servers in our company.
While the rep was busy looking for signs of trouble in the CommunityConfig file, I tried to contact our Notes Admin in Europe to ask him what he had tried so far. I couldn’t see him online but I tried to find him by name by using the search tool in the Sametime embedded client… here’s the weird thing I saw: he showed-up in the search results with his full canonical name instead of just his common name:

That was really weird. Normally, we only see 1 name (the 1st one)… we don’t see 2! The 2nd one even had the /UU (for Unseen University) listed twice.
Right away, I new something was wrong with the configuration of the Sametime Directory Service.
So, I opened the Sametime.ini of the server and found this little gem … there was 2 “[Directory]” section in the Sametime.ini of the server… and the 1st one was empty! Doh!

In the end, to resolve that weird “0×80000224″ error code, I simply had to remove the duplicate “[Directory]” section and reboot the server.
After that that server came back online, I restarted the ST Directory service on the 2 other Sametime servers and presto… all our users could see each other again.
Basically speaking, when a Sametime service loads, it reads the Sametime.ini from top-to-bottom. If it finds a section, it reads what’s in that section and loads according to the config of that section. It then stops and does not bother to check if there is another section below with more or different configuration settings. It’s completely logical and this is not a bug by my standards.
Now, why did the server work before without any problems with that duplicate section is a mystery to me… I’m just glad it’s back online and that all users can see each other now.
Conclusion?
Well, I don’t have much to say except that the error code is really cryptic and doesn’t help at all to explain what’s going in the “back” of the Sametime server.
I’m sure that if we had cranked up the debugging levels to crazy levels we would have eventually have found something… but how long would that have taken? Probably a few days… my track record with Lotus Support lately hasn’t been really good. I’m sure others have had much better experiences than me… I’m just unlucky.
Anyhow, I hope this helps someone, somewhere…
Thanks for reading and happy Thanksgiving to my colleagues in the USA!
Marc