Archive for the ‘Lotus Domino Server’ Category

Filed Under (Lotus Domino Server) by Marc Champoux on August-23-2010

The Devil Is In The Details …
 

And this is going to be a quick post about one of those “details” … let’s say your running two servers in a cluster and the 2nd server catches on fire (not that mine did).
 

If half of your users are on Server A and the other half are on Server B … what will happen to the mail being delivered to the users on Server B when it goes out in a ball of flames?
 

Logic dictates that, since you are running in a cluster, the mail router will “see” the 2nd server as being down and deliver the emails to the Server A mail file of those users … right?
 

Well, you’re correct *only* if you have configured the router to do that.
 

How To Configure The Router To Deliver Emails to Cluster Mates …
 

If you ever run into this same situation (as I did), you need to do the following:
 

(a) Open your Domino Directory

(b) Go to the “Configurations → Server → Configurations” view.

(c) Open your Server A configuration document.

(d) Put the Server A configuration document in Edit mode.

(e) Go to the “Router → Advanced → Control” tab.

(f) Make sure that “Cluster failover” is set to “Enabled for all transfers in this domain”.

(g) Save the server configuration document.

(h) Repeat for all your other server configuration documents.
 

 
 

 
 

 
 

 
Once you’ve made the change, make sure to replicate your Domino Directory around and issue a RESTART TASK ROUTER on them all.
 

Conclusion …
 

Domino Clusters are awesome and they just work out of the box … but that’s one of the details that you need to be aware of if you want mail to flow even when one of your clustered server is down.
 

Thanks for reading!
 

Marc



Filed Under (Lotus Domino Server) by Marc Champoux on May-27-2010

Friday Started Out As A Quiet Day …
  

Then the phone rang … it was Patrick asking me to remove the cluster between ServerA and ServerB. The reason? We purchased a lot of shiny new video conferencing gear for all our offices and it does all the video in hi def. Unfortunately for us, it doesn’t down-scale nicely and the CEO, CIO, and many other people high-up complained (quite loudly) that they didn’t like the pixelization that happened during their video-conferences with remote offices.
 

After checking with the vendor, their only recommendation was to reduce the amount of traffic between the head office and the data center. The end result of the subsequent investigation of the traffic showed that the constant chatter of the cluster on ServerA and ServerB was taking the majority of the juice on the data line. ServerA is located in the head office and ServerB is located in our data center. The vpn entry points for a lot of our telco equipment and data lines is right in the data center and then goes out to the remote offices.
 

Are you starting to see where this is going? So this is why I got the order to remove the cluster (not just disable it temporarily).
      

The Untold Consequences of Un-Clustering a Cluster During the Day
  

Well, right out of the gate, if you look in the admin guide … the procedure is extremely simple. Too simple. 13 years of doing Domino should have made an alarm ring somewhere in my brain but alas … I plowed ahead. Bad idea.
 

Basically the admin guide says that you just need to open your Admin client, go the Config tab, then go to the Clusters view, select the servers to remove from the clusters and then hit the “Remove from Cluster” button. Simple right?
  

Too bad the Admin guide doesn’t say anything about what happens next and what you should do to fix it.
  

Now, keep in mind that this probably will not happen to your IF you schedule it properly and do the change at night and then reboot the servers before the next business day. But in the event that you are under duress and need to do this during the day, learn from my mistakes and see what’s going to go wrong … and how to fix it.
 

Lesson 1: Freetime Lookups Gone Bad … You Ain’t So Free Anymore
  

Yep, Freetime lookups which were driven by the “clubusy.nsf” within the cluster go “bye-bye” after a few minutes of the cluster stopping. Is the Domino server smart enough to rebuild a “busytime.nsf” right away? Nope. It’s not unless you reboot the server. So, how do you go around it during the daytime?
  

Simple: issue the following commands on *both* servers.
   

  • RESTART TASK SCHED
      
  • RESTART TASK CALCONN
      
  • RESTART TASK RNRMGR
      

While certain tasks restart, you will see that it’s rebuilding the busytime.nsf … which is good … but if you have a gazillion users on your server … the Freetime lookups might be offline for a few minutes … or hours.
  

Also, for good measure, you should also issue those two commands after the dust has settled:
  

  • TELL SCHED CHECK
      
  • TELL SCHED VALIDATE

 

Btw, if you don’t do this, your users will receive some strange error messages when trying to check the availability of others like, for example, “Can’t find schedule record for requested user”. 
 

Phew … ok … problem 1 fixed (if you do this during the day). Moving along …
  

Lesson 2: Replication Failures Galore When Using the Cluster Name …
  

Shortly after “un-clustering” your cluster, you’ll see a bunch of error messages on your servers about replication timing out … you might scratch your head a bit because everything is fine with the network.
  

What’s the problem in this case? Well, I think it’s written somewhere in the Admin guide (or it’s a tip that’s floating around the Yellowsphere) but you can set the “destination server name” field in any Replication Connection documents to the name of the cluster. So your “spoke” servers could replicate with either server of the cluster when replication ran. That was awesome when you *had* a cluster … but now that it doesn’t exist anymore you’ve got replication errors … see where this is going?
  

Yep, the solution to this is to open your Domino Directory and re-check your Connection Documents to make sure that the cluster name is not mentioned anywhere in the “Destination Server Name”. Once you’ve done that … replicate the domino directory to all the spoke servers and issue a good old RESTART TASK REPLICA on each server …
  

And now … the final lesson … the most painful one …
  

Lesson 3: The Support Calls … a.k.a. Your Phone Will Ring Off The Hook and Melt (… if you are Lucky)
  

If you had your cluster around for a while (it had been enabled 3 years ago if my memory serves me right) you’ve become accustomed to the fact that you could check your emails on ServerA or ServerB and it would be exactly the same all the time. 
 

On top of that, the BlackBerry Server would, of course, check your “home” server all the time, but you could use the other server and no matter what you did, your BlackBerry and your Inbox always looked the same.
  

A few hours after you remove the cluster, the phone will start ringing. Trust me: it will.
  

No matter who calls you, the following lines (or a variation) will be said at one point during the conversation:
  

  • “I’ve got emails on my BlackBerry that aren’t in my Inbox”.
     
  • “Mister X says they’ve emailed me something but I haven’t received it yet … it’s been 30 minutes already”.
     
  • “No matter how many times I replicate, there is always a 2 hour delay before I get my emails”.
     
  • “When I go into iNotes, my emails aren’t the same as when I go into my Lotus Notes Inbox”.
        

What causes it? Well, most of those calls are because the employee’s mail file doesn’t point to the right “home” server (ServerB instead of ServerA for example). Or it’s because it’s not replicating with the right server (ServerA instead of ServerB).
 

In *all* those cases, it’s because the workspace icon, the bookmark (or whatever) is not pointing to the home server of the user and points to the “old” cluster mate.
  

What about that last line about iNotes? Well, in the case of that user, our iNotes server runs on ServerA and his home server is on ServerB … now that the cluster is removed, the good ol’ replication documents have a schedule of replicating every 30 minutes so there is a delay before his mail file is updated on the iNotes server.
  

Anyhow … what’s the solution to this issue? Well, if you have some sort of Workspace management tool like Desktop Manager from Cooperteam or Marvel Client from Axceler, I’m pretty sure you can fix it “remotely” for your users.
 

In my case unfortunately, it’s a manual process for each call … it’s sad but it’s life.
  

Conclusion
  

Pilots have this saying that goes “Learn from the mistakes of others … you won’t live long enough to try them all” so I hope someone, somewhere will learn from this blog post.
  

And I really hope that by sharing this, I won’t end up on the Worst Practices slides at Lotusphere 2011 …
  

Thanks for reading!
 

Marc



Filed Under (Lotus Domino Server) by Marc Champoux on May-4-2010

There are Some Rules in the Universe that you Cannot Break
    

Like the speed of light for example (for now anyway) and the airspeed velocity of an unladen swallow (see here).
  

For entirely different reasons however, the creators of the Yellowverse have imposed upon us, the little creatures inhabiting this world, different kinds of limits. Like, for example, the 32k limit on text field.
   

Sometimes, creative developers have found work around or coded around that issue … but there are other times where it just plain hurts and you can’t do nothin’ about it.
   

Point in Case: Hitting The Limit Of 32k When Adding Mail Rules To A Server Configuration Document
   

When adding 1 new mail rule to your server configuration document, you get this nasty error (click on the image below to see a bigger version):
 
   

 
  

 
  

 
So, what can you do when this happens? Not much really. You can try a few things however to make some room:
   

  • Review all the mail rules in your server configuration document. Remove all those that aren’t needed anymore.
  • Review and check if you could combine some rules together.
  • Review and check if can make any rules “smaller”, for example if you had a rule that said “journal when sender is jsmith@whatever.com” you can change it to just “journal when sender is jsmith”.
  • If you can, journal everything that is sent and received on your server (it might make the size of your mail journal file increase incredibly fast however).
       

But There Might Be a Glimmer of Hope
  

I opened a ticket with Lotus Support and the support rep created SPR #KGEW84SR3T to add “when Sender or Any Recipient Contains” as an option when creating mail rules.
   

So, if you are like me and you’ve hit the limit, I recommend that you also open a PMR with Lotus Support to ask them to change the way Mail Rules are saved along with Server Configuration Documents.
  

Thanks for reading!
   

Marc



Filed Under (Lotus Domino Server) by Marc Champoux on April-22-2010

To Make A Server Crash, You Must Find The Right Tool For The Job
   

This is going to be a short post. I have been dealing with a few server crashes recently. Some of them having to do with Tivoli and others … well, the PMRs are still opened. However, I got sick and tired of always manually collecting the IBM_TECHNICAL_SUPPORT folder, the log.nsf, the notes.ini, the Event Viewer Files and the WinMSD NFO file each time I was opening a PMR for a crashed server.
   

So, I wrote myself a nice little batch file that does all that for me. Once I had the batch file done, I decided to try it out in the “Run this Script After Server Fault” field of one of the servers in the test environment. But, to test it out, I needed to “create” a server fault.
  

And here’s the gist of this post: for those that don’t know it, there is a small utility on the Lotus Developer Domain Sandbox on a page titled “Utilities to crash client and server”. You can download it from here. The page might say that the platform is “AIX 64 bits” but the zip file contains every possible flavor of OS you can imagine. The version for Windows servers is at the root of the unzipped directory structure.
  

And what happens when you run it? Well, you get a nice “PANIC” error … your server crashes and NSD fires off. Simple is beautiful (in most cases).
   

 
  

 
  

 
  

 
  

 
  

 
  

So, for a guy like me testing out and debugging his “post-server-crash” script, this is very usefull … and by posting it on this blog, maybe someone else will discover it too (hopefully not to wreak havoc in his own environment).
   

Side Notes
  

Well, truth be told, my script works fine when I run it manually from the command prompt. However, when the server fires it off … nothing appears on the screen but files are being copied to the right place by the batch file and then zipped. It’s all done silently.
   

The catch 22 however is that I wanted my script to delete the log.nsf and run a fixup -q on the usual databases that go bonkers after a server fault (admin4.nsf, events4.nsf, mail.boxes, names.nsf just to name a few) so that the server would come up a bit more clean than in cases where it starts up right away and is catching up doing the fixup on the databases while it’s trying to get back up.
  

So, because of that, I opened another PMR with Lotus Support to ask if there is some sort of switch or notes.ini that I should try to make my script run after NSD has done what it’s supposed to do but *before* the server is restarted … we’ll see what support has to say. However, if you are a guru in regards to “scripts to run after a server fault”, please feel free to post in the comments section your 2 cents on why this is happening to me (and your idea for solutions if you have any … thanks!).
  

Conclusion
  

Please use the utility responsibly! Don’t use it for your next April Fools prank … seriously, don’t. Friends don’t let Friends crash their Friends servers as an April Fool’s joke …
 

Thanks for reading!
  

Marc



Filed Under (Lotus Domino Server) by Marc Champoux on November-13-2009

Summary …

 

Last week I upgraded a partitioned server in one of our remote office in the Asia-Pacific region, using Windows Remote Desktop, from Lotus Domino R7.0.2 to R8.5.1. Things went very smoothly during the upgrade but it took longer than I had planned. So, once the partitioned servers were upgraded, I started them and let them run because I didn’t want to bust my maintenance schedule window.

 

Because the upgrade took longer, I knew that I had to schedule some more downtime this week to be able to wrap things up and run a compact on the databases with the server down to upgrade them to the newest On Disk Structure (ODS). After some discussions with the IT folks over in the Asia-Pacific region, the next window of opportunity to schedule some downtime happened to be today, Friday, November 13th 2009.

 

Right away, something *should* have clicked in my head but I guess I’m so amazingly tired that none of my usual paranoia alarms went off. C’mon we’ve got movies such as “Friday the 13th” that clearly illustrate that it’s a bad day to do anything important (let alone server maintenance) … so something should have clicked in my head but alas … nothing … so read on for the horror story … or skip to the “How did I fix it?” part to read about what went wrong and how to fix it if you run into it.

 

But What Did I Really Need To Do On That Partitioned Server?

 

So, if you are still reading this, you may ask, “Sir, what did you really need to do on that particular server?”. And the answer is quite simple … I just needed to un-install one of the un-used partitions on the server and run a compact on some of the databases of the 2 other partitions to bring them up to the latest-and-greatest-omg-it-slices-and-dices-but-wait-theres-more ODS level (51). Simple enough right? Nothing that would scare the pants off your usual run-of-the-mill Lotus Domino Administrator as far as I know.

 

Ok, So … What Happened?

 

Wow, you’re still reading this? Thanks! Well, I “remote desktoped” into the server and un-installed the unused partition and that went well.

 

I also ran a compact on the databases to upgrade them to the latest ODS … and that also went well.

 

I then rebooted the server to complete the maintenance (I always like to do reboots just to clear up the Windows Server memory) … and I waited. And … I waited some more. And … some more. After 10 minutes of waiting without being able to remote desktop back into the server, it was clear that something was wrong. I tried to “ping” the server but it would not even respond … I thought “oh my, Windows must have Blue Screened” …

 

And It Got Worse … Right?

 

Yep, it did but not in the way that you’d expect. Long story short, I had (politely) asked for the login information for the ILO (HP’s Integrated Lights Out) to be added to the list of ILO information in a database that we have where I work. That usually covers our lower-back-part in case Windows crashes because ILO allows you to remote control the machine via another interface. I assumed, and trusted, that people would have done that already because I usually bend-over-backwards for them when they ask me something … and sadly, my assumption was horribly wrong because the ILO information wasn’t anywhere to be found!

 

Knowing full well that it was lunch time for me and midnight for the folks over in the Asia-Pacific region, I said “dammit (jim) this is an emergency, so I have the rights to wake one of guys over there up” … and I tried to call the cell phone of the LAN Admin whom I knew the ILO information. But … no answer! I then tried his home phone number: no answer either. I re-tried his cell. Still no answer. Plan B: I decided to punish call another LAN Admin in the Asia-Pacific region … also no answer!

 

At that point, I knew I was in deep doo doo! Finally, I asked the Senior LAN Admin for the Americas region of the company that I work for to try to find this info. Lucky for me, after about 30 minutes, he managed to find an old reference to it somewhere in his emails!

 

Phew … You Had Access To The Machine … What Was Wrong?

 

Once we ILO’ed back into the machine, we saw that it was stuck at “Applying Preferences”. After some more waiting, we ran out of patience and rebooted it. Too bad for us: it got stuck at the same place! After 2 more reboots for good measure and one final reboot in Safe Mode with Networking, the Senior LAN Admin for my region figured out what was wrong: the server was freezing when it was trying to start the Lotus Domino Server partitions!

 

So, he set them to run manually, rebooted and handed me back the control while he went back to fighting fires in the Americas region.

 

How did I fix it?

 

Once I was back into the server (again, via Remote Desktop), I went to the Windows Services panel and started one of the Lotus Domino Partitions. An error that I had previously ran into instantly reared its ugly head … and what was the error?

 

It was the good ol’ “An error occurred during license use management initialization. Ensure that you are running Domino with a valid license file” error. You can read about that nightmarish upgrade where I ran into this error for the first time on one of my 1st blog post here.

 

And the solution when you run into that nice ”An error occurred during license use management initialization …” error hasn’t changed since I last ran into it: simply re-run the Lotus Domino R8.5.1 installer and it will fix it automagically (see the IBM technote here).

 

So, now that everything is back up again … I promise never to do Server Maintenance on a Friday the 13th ever again …



Filed Under (Lotus Domino Server, Tips and Tricks) by Marc Champoux on November-2-2009

Summary

 

A long time ago, someone needed to get something done in your company and the solution was to purchase some sort of product that you installed on top of Lotus Domino. Since then, you’ve been locked in the eternal 3 steps dance of “(step 1) a new version of Lotus Domino gets released but (step 2) wait for vendor to release compatible version of the add-in and (step 3) finally upgrade both products”. Point-in-case where I work: we needed a Fax solution that integrated with our Lotus Notes and Domino environment so someone installed FastFax from Quadrant Software on top of one of our Lotus Domino servers.

 

After a few years and some issues here and there, it’s been pretty much humming along. But the last issue I ran into left me wondering if there was a (free) way to monitor the 2 tasks that I see for FastFax when I issue a SHOW TASK command on the server console. So, I turned to DDM and Event Handlers for that task but, out-of-the-box, I realized that they can only be used to monitor Lotus Domino server tasks.

 

So, what did I do after I realized this? Well, I tried to hack Event Handlers and, oddly enough, it worked!  Here’s what I did …

 

Hacking Event Handlers to Monitor Add-Ins from other Vendors

 

Two Things Right away … First, I’m sorry if this has been written about somewhere else. I’m sure that in the Yellowverse, someone, somewhere did the exact same thing and blogged about it but I didn’t look hard enough to find it. And secondly, I’ll be blunt, this worked for me and the issue that I was faced with. There’s a good chance that this might not work for you … but you won’t hurt anything by trying. Truth be told, I’m not even sure why it works but it does…

 

Step 1 … issue a SHOW TASK and make a note of the Task(s) that you want Event Handlers to monitor. In my case, FFXGWOT and FFXGWIN:

 

Database Server      Platform Stats is gathering statistics
Database Server      Shutdown Monitor
Database Server      Process Monitor
FFXGWOT              Idle
FFXGWIN              FFXGWIN
Router               Dispatch: Idle
Router               Sweep: Idle
Router               Utility: Idle

 

So, now that you know which “Add-In” tasks we want to monitor … follow the steps (for the screenshots, I’m only using the FFXGWIN task):

  

Step 2: Open the Monitoring Configuration

Step 2: Open the Monitoring Configuration (events4.nsf) on the server where the Add-In task is running.

   

Step 3 - Open the view "Task Status" under "Event Generators"

Step 3 - Open the view "Task Status" view under the "Event Generators" category.

  

Step 4 - Click on "New Task Status Monitor"

Step 4 - Click on the "New Task Status Monitor" view Action button.

  

Step 5 - Select one of the Task (any), the Server and What to monitor

Step 5 - Select one of the Task (*any*), the Server where the Add-In task is running and "What" to monitor (status down).

  

Step 6 - Let's create an agent ...

Step 6 - Let's create an agent to perform the "Hack".

  

Step 7 - Give the agent a name and change the type ...

Step 7 - Give the agent a name and change the agent type.

  

Step 8 - Make sure that the agent is private and set to run on Selected Documents

Step 8 - Make sure that the agent is private and set to run on Selected Documents.

 

Step 9 ... add the code to change the field "Task" to the process we want to monitor.

Step 9 - Add the code to change the field "Task" to the process we want to monitor.

 

Step 10 ... Save the agent ...

Step 10 - Save the agent.

 

Step 11 ... close Designer.

Step 11 - Close Designer.

 

Step 12 ... Select the new task monitor you created and then run the agent on it.

Step 12 - Select the new task monitor you created and then run the agent on it.

 

Step 13 ... Now, make a mental note of the Event Generator number because you'll need it later.

Step 13 - Now, make a mental note of the Event Generator number because you'll need it later.

 

Step 14 - Now switch to the view "Event Handlers - By Server".

Step 14 - Now switch to the view "Event Handlers - By Server".

 

Step 15 - Now click on the button to create a new Event Handler.

Step 15 - Now click on the button to create a new Event Handler.

 

Step 16 - On the Basics tab, select the Trigger to be a Custom event generator.

Step 16 - On the Basics tab, select the Trigger to be a Custom event generator.

 

Step 17 - On the Event tab, click on the button to select the Event.

Step 17 - On the Event tab, click on the button to select the Event.

 

Step 18 - Scroll down to the "Task Status" event generators and select the one you just created (this is where you need the event number to make your life easier).

Step 18 - Scroll down to the "Task Status" event generators and select the one you just created (this is where you need the event number to make your life easier).

 

Step 19 - On the "Action" tab, select the "Mail" method and type your name in the Address field.

Step 19 - On the "Action" tab, select the "Mail" method and type your name in the Address field (yes, you could set it to run an agent or whatever you want ... I wanted to get an email).

 

Step 20 - Save your new Event Handler.

Step 20 - Save your new Event Handler.

 

Step 21 - You can see your new Event Generator under the "All Servers" category.

Step 21 - You can see your new Event Generator under the "All Servers" category.

 

Addendum

 

So, with this Event Generator and Event Handler in place, when the task FFXGWIN goes down in a ball of flames (for one reason or another), I get a nice email. Of course, I had to repeat these steps for the FFXGWOT tasks and it also works like a charm for that task.

 

Like I said previously, this worked for me in my environment. Maybe it won’t work for you but it won’t cost you a dime to try … Enjoy!



Filed Under (Lotus Domino Server) by Marc Champoux on September-28-2009

Summary

 

Another late night programming session … your vision is getting blurry and you ran out of Red Bull a few hours before that. Somewhere in your LotusScript code, there’s 1 line with a call to the “EndSection()” method of the NotesRichTextItem class … but for one reason or another you didn’t call the “BeginSection()” beforehand (you thought you did but it’s late) … it dosen’t matter right?

 

So you test your agent on the server and it Panics and Faults right away! The code dosen’t even go into your ErrorHandler routine (you have one right?) … sooooo what gives?

 

Steps to Reproduce the Error

 

If you want to reproduce the error, simply create a new scheduled agent in a database on one of your Lotus Domino R8.5 Fix Pack 1 test servers (or on a production server if you enjoy the occasional lynch mob running after you around the office with torches and pitchforks – hey they say running is good for you) and paste this code into the Initialize section of the agent:

 

On Error Goto ErrorHandler
 
 Dim Session As New NotesSession
 Dim NewEmail As NotesDocument
 Dim NewBody As NotesRichTextItem
 
 Set NewEmail = Session.CurrentDatabase.CreateDocument
 Set NewBody = New NotesRichTextItem ( NewEmail , “Body” )
 
 Call NewBody.EndSection()
 
 Exit Sub
 
ErrorHandler:
 
 Print “An error occured in the agent MCXTestAgent”
 Exit Sub

 

Notice that there isn’t any call to the “BeginSection” method? Now, either let the agent run on it’s schedule and watch the server Fault OR issue a TELL AMGR RUN “YourDatabaseName.nsf” ‘YourAgentName’ command … and watch it Fault.

 

The Solution

 

While this is technically a problem with LotusScript and it should have gone into the ErrorHandler routine … it’s also, technically speaking, a problem with your code … i.e. you should have have called a “BeginSection” a couple of lines above somewhere in there. So just add the “BeginSection” call where it needs to be and enjoy.

 

To be safe, I opened a ticket with Lotus Support to report this “behavior”. The support rep who called me back said he was able to reproduce the error quite easily and that he opened SPR #JSHN7WBRPM in regards to this issue.