Thursday, March 29, 2012

Network Diagnostic Tester (NDT)

I’ve been doing some work on our new ProCurve switches and I ran into a new tool for testing out network speeds called NDT or Network Diagnostic Tester. This open source tool published by the Internet2 folks is a client server program that you can use to test the throughput between machines. It requires Java, but once it’s up it copies a payload across the wire and shows you statistics on the transfer. Here’s a resultant output for your viewing pleasure:

image

As you can see it event guesses the slowest link speed it crosses and and determines it’s duplex.  The easiest way to to set this up is to download the ps-Performance Toolkit which is a Knoppix based network tool kit and run it from an old computer with a fast NIC. :)   There’s a ton of other useful stuff on this bootable iso including:

image

 

Definitely with a download to check out.  Enjoy.

Friday, March 02, 2012

Lessons Learning in Migrating email Archiving

Well this post is a long time coming. I’ve been migrating between two email archiving products for the last 2 years and it’s still not complete. I’ll leave the actual names of the products out so no one comes hunting me down even though I really want to nark on them and let people know their issues. However, I will let you know some of the bumps in the road as they will apply to multiple products.  So the first legacy product we ran I’ll call ArchiveA and the second product we chose I’ll call ArchiveB. 

ArchiveA is an in-house solution where we purchased the product  to run at our site. It stubs and dumps email into to a large database outside of Exchange on to any disk solution you want. It uses an Outlook plug-in for searching and retrieving messages and has Outlook Web Access integration for email retrieval for users that are out of the office.  There is no integration with mobile devices.  We have it configured to remove the attachment, but leave the message untouched.

Likes
We like the fact it was originally a small footprint and was easy to train users on how to use.  Local storage brought fast retrieval of messages.  The Outlook plugin is simple and works well.

Dislikes
Indexing of our archives became a huge issue. The vendors solution was always to first upgrade all components and rebuild ALL our indexes then see if the issue still exists. That’d be fine but it takes over 7 days just to index our main site.  That meant 3 or 4 times a year for 7 days at a shot users couldn’t search their archives.

Stubs are troublesome. They only delay the fact that you haven’t made a real email management policy. (Feel free to quote me on that.)  Microsoft Exchange (in our case 2003) limits the number of objects in the inbox, sent items and deleted items folders to 5000.  Yes you can create folders to get around this but it still doesn’t like it.  Had I known about the stub issues when we rolled this out, I’d have found another way.

The straw that broke the camel’s back for us on this solution was two fold.  We were storing so much old email that our ArchiveA stores were becoming hard to deal with. The solution worked but we kept needed to add drive space with no end in sight.  The second issue was the killer.  The ArchiveA was sold to another vendor who happened to be in the search engine business. As part of the upgrades, they basically forced us to use a new indexing/search engine tool and it sucked.  It was a limited version of their flagship tool, but it was restricted by software so that it was only partially functional.  This brought with it a ton of issues.  Oh yeah and they wanted us to quadruple our server footprint to handle the upgrade to this new version. So where we initially only had 1 server per site now they wanted 4 servers per site.  At 4 sites that took us to 16 servers just for email archiving with a crappy search engine.  That was it. We needed to find a better way.

Due Diligence
So we spent months looking for a  way out of this situation. There are a ton of archiving products, but it seems they all have their warts.  We looked at 4 of the top recommended solutions and we picked one based upon meeting as many needs as we could.  As a matter of fact, from a technical standpoint it did meet all our needs. There was only a few “like to haves” like Outlook Web Access integration and mobile phone integration that were lacking. (Note: They eventually did have a mobile phone piece that that has helped.)

ArchiveB
So we move forward with ArchiveB knowing that it will be a bit painful to get our users to adjust to the new technology but they should have all the same features they had before plus a few more. 

Part of what we signed off on was the conversion of old ArchiveA messages into their new system.  See, ArchiveB is a cloud based solution so we can’t ingest our own messages, they need to do that for us.  This is the root of all our problems but we will get to that later.

The first issues we hit were that ArchiveB’s engineers didn’t do a very good job of evaluating our current environment. Although we provided all this information up front, they seemed surprised when we started the project that we had multiple sites, multiple servers and that the current archives were compressed.  Yeah…ArchiveB sized the conversion based on compressed size not uncompressed size so all their time estimates were off.  Now they want more money and time. (Ouch #1)  We settle on a number below want they want (since I believe this is their fault) and we progress on.

The second issue we hit was with the fact that their chosen export tool didn’t seem to be able to determine where a messages “lived” so they would have no way to put it into the correct folder on their web based archive. After much discussion and angst they determined it will work and we move forward. (Ouch #2) Unfortunately this doesn’t completely clean up our export issues out of ArchiveA because they subcontracted the export to another company (the owner of the export tool) and they are not responding to our inquiries. This export process takes way too long and extends the timeline even further (Ouch #3)  Oh yeah, and it turns out the tool they require to do the export needs pretty hefty servers to perform the migration and we are responsible for providing the hardware.  Luckily we had a few we could use from another project but its still Ouch #4.

Next we swing all inbound and outbound mail through ArchiveB’s server.  This was probably the easiest part.  They now scan all our mail for virus, malware, phishing, etc.  All in all this was very successful.

So we ship the files off to ArchiveB and they start reviewing them. They realize no folder structures are coming over just the plain message. (Ouch #5)  We go back and forth, do some re-exporting and we finally get it working.  I’ve summarized here but believe me there was a lot of work from my staff to get this sorted out.

Now a new “feature” comes out. No longer will they be required to use MAPI to pull our messages on a daily basis (aka journaling). They can now use POP3 to get a copy.  Great, MAPI is slow and painful so this seems like good news.  Except for the fact that the journaling doesn’t work as promised and we need to go back and pull these messages from ArchiveA’s solution again (we never turned it off because I was wary of all the ouches so far).  So we revert to the MAPI connection, perform a new export/ingestion and we are on our way.

From the beginning of the project we agreed on a payment schedule. I have been getting notices of nonpayment and forwarding them to my salesman for quite awhile but he says not to worry. It’s an accounting glitch on their end. This is fine except for the fact that the archive extraction (which hasn’t completed yet) has been stalled by the 3rd party vendor because of lack of payment. Seems ArchiveB was waiting on our cash to pay the subcontractor doing the export. (Ouch #6)  We of course find this out after 2 weeks of no progress extending the project another 2 weeks.  Finally it gets figured out an we our on our way again.

More exported data shows up at ArchiveB’s site for ingestion and there’s still no folder data. Ugh (Ouch #7)  During this time we also have problem with their “folder synchronization” tool for “current” messages and their Outlook tool is slower than molasses.  (Ouch #8 and #9). From here we work on all these issues for the next 3 months.  Some are solved completely but things like the slow Outlook tool never really get “fixed”.

We do finally complete the ingestion process and get all our messages over to ArchvieB. The process originally was scoped to take 9 weeks took almost 8 months.

We still have issues with stubs not syncing with ArchiveB.  We are told that they don’t support folder synchronization for anything but ipm.note messages classes.  This is the first we have EVER heard of this.  Never since the beginning of the project did they mention this. There response was “oh, you should have stopped stubbing when we started the extraction process.”  EXCUSE ME?!?!?  My servers won’t last a month without stubbing and you’re tell me I shouldn’t have stubbed for the last 8 months it took you to export/ingest?!  I’m furious at this point.  We raise hell and find out they have an “unsupported” way of syncing the stubs.  It’s a path forward so we start this and it seems to work.

So a few more months pass by as we work though a number of smaller issues and train our staff in the new tools, the vendors website and we upgrade everyone to Windows 7 and Office 2010.  It’s pretty quiet because I still haven’t stopped stubbing.  I don’t trust them any further than I can throw them.

We do hit a time period where were get a ton of Exchange errors (623’s).  After weeks of troubleshooting this Microsoft says it’s because we have broken the 5000 message limit on a number of critical folders. We clean up the mess by mass deletion of junk and having users break big folders down to size. The whole time the folder sync tool is going crazy with all the changes.

We finally decide that the test cases we are looking at look ok and we set the rule on the stores to delete mail over 365 days old.  This works but now it forces users to use ArchiveB’s tools since the stubs are gone.  Now we find missing messages.  Actually they have the messages but they aren’t in the right folder.  Sigh. (Ouch #10)  After a few more days of chasing our tail we believe it is because a few users changed the names of folders.  It seems that doesn’t get picked up properly by their folder sync tool.

That pretty much bring us up to date. We are close, but this project originally schedule for 4 months has taken 20 and we still aren’t quite there.

So to recap these are the lessons learned:

  • Identify all your message classes (ipm.note, ipm.Note.Voice.Unity, ipm.Voicemail, ipm.Fax, etc.) and be positive the vendor can work with all of them when it comes to ingestion, searching and folder synchronization.
  • If at all possible do a test run with a few people. We tried and tried to get this done but they swore it couldn’t easily be done.  It’s worth it even if you have to set it up in a duplicate environment.
  • Be wary of the ingestion process. ArchiveB’s process included a 5 step ingestion.  Identify the message, scan it for errors and corruption, antivirus scan it, import it and dedupe it.
  • Be wary of how messages to groups are handled. In our case we tripped and fell a bit because messages to a group were deduped and not everyone has access to that message. They have to “fix” this as an after thought when the ingestion was complete.
  • On the topic of groups, get a good understanding of how archive messages to groups will have the group members expanded. In our case it expanded to the “new” members of those groups not the original members of the groups. This can create issues. This was really another ouch but I forgot to mention it until now.
  • Find out how to verify that their sync process is working.  We have logs but they are woefully inadequate for the purpose.  There should be an easy to to run a verbose mode to see how it handles each message in a mailbox.
  • What tools do they have to let you know that things are or aren't’ working?
  • Have them give you test results showing how long it took on a previous ingestion of similar size.
  • Identify all the people working on your team and where they fit into the org chart of the vendor. If you need to escalate you don’t wanna waste time finding the right person.
  • It’s really hard to do, but get a client running to verify the speeds at which they profess to run are actually true.
  • Identify how e-discovery and legal holds will work.  Does it cost extra money to do an e-discovery or legal hold?
  • Identify your exit plan should the shit hit the fan. Will there be an extra cost to get your stuff back? Will it be in a form your users can use?
  • Put a rider in on our contact that states if the project takes longer than the time estimated, that they will cover the annual costs of your existing servers software and arching solution tools until things are working. You may not get that in writing but it’s a great heads up for them that you are serious about this.
  • Get them to answer, in written form a few questions like:
    • When will the project be complete?
    • Do you guarantee that all message content and the location of each message will match our current solution when the project is complete?
    • What problems do you foresee with our environment?
  • Walk through the project with them and get them to present a project plan. At each point in the plan identify the state of both ArchiveA and ArchiveB so everyone knows what is running and when.

I know this was a long post. Sorry for that. Hope it at least was a good read and helps someone else not bump into the same things we did.