Thursday, March 29, 2012

Network Diagnostic Tester (NDT)

I’ve been doing some work on our new ProCurve switches and I ran into a new tool for testing out network speeds called NDT or Network Diagnostic Tester. This open source tool published by the Internet2 folks is a client server program that you can use to test the throughput between machines. It requires Java, but once it’s up it copies a payload across the wire and shows you statistics on the transfer. Here’s a resultant output for your viewing pleasure:

image

As you can see it event guesses the slowest link speed it crosses and and determines it’s duplex.  The easiest way to to set this up is to download the ps-Performance Toolkit which is a Knoppix based network tool kit and run it from an old computer with a fast NIC. :)   There’s a ton of other useful stuff on this bootable iso including:

image

 

Definitely with a download to check out.  Enjoy.

Friday, March 02, 2012

Lessons Learning in Migrating email Archiving

Well this post is a long time coming. I’ve been migrating between two email archiving products for the last 2 years and it’s still not complete. I’ll leave the actual names of the products out so no one comes hunting me down even though I really want to nark on them and let people know their issues. However, I will let you know some of the bumps in the road as they will apply to multiple products.  So the first legacy product we ran I’ll call ArchiveA and the second product we chose I’ll call ArchiveB. 

ArchiveA is an in-house solution where we purchased the product  to run at our site. It stubs and dumps email into to a large database outside of Exchange on to any disk solution you want. It uses an Outlook plug-in for searching and retrieving messages and has Outlook Web Access integration for email retrieval for users that are out of the office.  There is no integration with mobile devices.  We have it configured to remove the attachment, but leave the message untouched.

Likes
We like the fact it was originally a small footprint and was easy to train users on how to use.  Local storage brought fast retrieval of messages.  The Outlook plugin is simple and works well.

Dislikes
Indexing of our archives became a huge issue. The vendors solution was always to first upgrade all components and rebuild ALL our indexes then see if the issue still exists. That’d be fine but it takes over 7 days just to index our main site.  That meant 3 or 4 times a year for 7 days at a shot users couldn’t search their archives.

Stubs are troublesome. They only delay the fact that you haven’t made a real email management policy. (Feel free to quote me on that.)  Microsoft Exchange (in our case 2003) limits the number of objects in the inbox, sent items and deleted items folders to 5000.  Yes you can create folders to get around this but it still doesn’t like it.  Had I known about the stub issues when we rolled this out, I’d have found another way.

The straw that broke the camel’s back for us on this solution was two fold.  We were storing so much old email that our ArchiveA stores were becoming hard to deal with. The solution worked but we kept needed to add drive space with no end in sight.  The second issue was the killer.  The ArchiveA was sold to another vendor who happened to be in the search engine business. As part of the upgrades, they basically forced us to use a new indexing/search engine tool and it sucked.  It was a limited version of their flagship tool, but it was restricted by software so that it was only partially functional.  This brought with it a ton of issues.  Oh yeah and they wanted us to quadruple our server footprint to handle the upgrade to this new version. So where we initially only had 1 server per site now they wanted 4 servers per site.  At 4 sites that took us to 16 servers just for email archiving with a crappy search engine.  That was it. We needed to find a better way.

Due Diligence
So we spent months looking for a  way out of this situation. There are a ton of archiving products, but it seems they all have their warts.  We looked at 4 of the top recommended solutions and we picked one based upon meeting as many needs as we could.  As a matter of fact, from a technical standpoint it did meet all our needs. There was only a few “like to haves” like Outlook Web Access integration and mobile phone integration that were lacking. (Note: They eventually did have a mobile phone piece that that has helped.)

ArchiveB
So we move forward with ArchiveB knowing that it will be a bit painful to get our users to adjust to the new technology but they should have all the same features they had before plus a few more. 

Part of what we signed off on was the conversion of old ArchiveA messages into their new system.  See, ArchiveB is a cloud based solution so we can’t ingest our own messages, they need to do that for us.  This is the root of all our problems but we will get to that later.

The first issues we hit were that ArchiveB’s engineers didn’t do a very good job of evaluating our current environment. Although we provided all this information up front, they seemed surprised when we started the project that we had multiple sites, multiple servers and that the current archives were compressed.  Yeah…ArchiveB sized the conversion based on compressed size not uncompressed size so all their time estimates were off.  Now they want more money and time. (Ouch #1)  We settle on a number below want they want (since I believe this is their fault) and we progress on.

The second issue we hit was with the fact that their chosen export tool didn’t seem to be able to determine where a messages “lived” so they would have no way to put it into the correct folder on their web based archive. After much discussion and angst they determined it will work and we move forward. (Ouch #2) Unfortunately this doesn’t completely clean up our export issues out of ArchiveA because they subcontracted the export to another company (the owner of the export tool) and they are not responding to our inquiries. This export process takes way too long and extends the timeline even further (Ouch #3)  Oh yeah, and it turns out the tool they require to do the export needs pretty hefty servers to perform the migration and we are responsible for providing the hardware.  Luckily we had a few we could use from another project but its still Ouch #4.

Next we swing all inbound and outbound mail through ArchiveB’s server.  This was probably the easiest part.  They now scan all our mail for virus, malware, phishing, etc.  All in all this was very successful.

So we ship the files off to ArchiveB and they start reviewing them. They realize no folder structures are coming over just the plain message. (Ouch #5)  We go back and forth, do some re-exporting and we finally get it working.  I’ve summarized here but believe me there was a lot of work from my staff to get this sorted out.

Now a new “feature” comes out. No longer will they be required to use MAPI to pull our messages on a daily basis (aka journaling). They can now use POP3 to get a copy.  Great, MAPI is slow and painful so this seems like good news.  Except for the fact that the journaling doesn’t work as promised and we need to go back and pull these messages from ArchiveA’s solution again (we never turned it off because I was wary of all the ouches so far).  So we revert to the MAPI connection, perform a new export/ingestion and we are on our way.

From the beginning of the project we agreed on a payment schedule. I have been getting notices of nonpayment and forwarding them to my salesman for quite awhile but he says not to worry. It’s an accounting glitch on their end. This is fine except for the fact that the archive extraction (which hasn’t completed yet) has been stalled by the 3rd party vendor because of lack of payment. Seems ArchiveB was waiting on our cash to pay the subcontractor doing the export. (Ouch #6)  We of course find this out after 2 weeks of no progress extending the project another 2 weeks.  Finally it gets figured out an we our on our way again.

More exported data shows up at ArchiveB’s site for ingestion and there’s still no folder data. Ugh (Ouch #7)  During this time we also have problem with their “folder synchronization” tool for “current” messages and their Outlook tool is slower than molasses.  (Ouch #8 and #9). From here we work on all these issues for the next 3 months.  Some are solved completely but things like the slow Outlook tool never really get “fixed”.

We do finally complete the ingestion process and get all our messages over to ArchvieB. The process originally was scoped to take 9 weeks took almost 8 months.

We still have issues with stubs not syncing with ArchiveB.  We are told that they don’t support folder synchronization for anything but ipm.note messages classes.  This is the first we have EVER heard of this.  Never since the beginning of the project did they mention this. There response was “oh, you should have stopped stubbing when we started the extraction process.”  EXCUSE ME?!?!?  My servers won’t last a month without stubbing and you’re tell me I shouldn’t have stubbed for the last 8 months it took you to export/ingest?!  I’m furious at this point.  We raise hell and find out they have an “unsupported” way of syncing the stubs.  It’s a path forward so we start this and it seems to work.

So a few more months pass by as we work though a number of smaller issues and train our staff in the new tools, the vendors website and we upgrade everyone to Windows 7 and Office 2010.  It’s pretty quiet because I still haven’t stopped stubbing.  I don’t trust them any further than I can throw them.

We do hit a time period where were get a ton of Exchange errors (623’s).  After weeks of troubleshooting this Microsoft says it’s because we have broken the 5000 message limit on a number of critical folders. We clean up the mess by mass deletion of junk and having users break big folders down to size. The whole time the folder sync tool is going crazy with all the changes.

We finally decide that the test cases we are looking at look ok and we set the rule on the stores to delete mail over 365 days old.  This works but now it forces users to use ArchiveB’s tools since the stubs are gone.  Now we find missing messages.  Actually they have the messages but they aren’t in the right folder.  Sigh. (Ouch #10)  After a few more days of chasing our tail we believe it is because a few users changed the names of folders.  It seems that doesn’t get picked up properly by their folder sync tool.

That pretty much bring us up to date. We are close, but this project originally schedule for 4 months has taken 20 and we still aren’t quite there.

So to recap these are the lessons learned:

  • Identify all your message classes (ipm.note, ipm.Note.Voice.Unity, ipm.Voicemail, ipm.Fax, etc.) and be positive the vendor can work with all of them when it comes to ingestion, searching and folder synchronization.
  • If at all possible do a test run with a few people. We tried and tried to get this done but they swore it couldn’t easily be done.  It’s worth it even if you have to set it up in a duplicate environment.
  • Be wary of the ingestion process. ArchiveB’s process included a 5 step ingestion.  Identify the message, scan it for errors and corruption, antivirus scan it, import it and dedupe it.
  • Be wary of how messages to groups are handled. In our case we tripped and fell a bit because messages to a group were deduped and not everyone has access to that message. They have to “fix” this as an after thought when the ingestion was complete.
  • On the topic of groups, get a good understanding of how archive messages to groups will have the group members expanded. In our case it expanded to the “new” members of those groups not the original members of the groups. This can create issues. This was really another ouch but I forgot to mention it until now.
  • Find out how to verify that their sync process is working.  We have logs but they are woefully inadequate for the purpose.  There should be an easy to to run a verbose mode to see how it handles each message in a mailbox.
  • What tools do they have to let you know that things are or aren't’ working?
  • Have them give you test results showing how long it took on a previous ingestion of similar size.
  • Identify all the people working on your team and where they fit into the org chart of the vendor. If you need to escalate you don’t wanna waste time finding the right person.
  • It’s really hard to do, but get a client running to verify the speeds at which they profess to run are actually true.
  • Identify how e-discovery and legal holds will work.  Does it cost extra money to do an e-discovery or legal hold?
  • Identify your exit plan should the shit hit the fan. Will there be an extra cost to get your stuff back? Will it be in a form your users can use?
  • Put a rider in on our contact that states if the project takes longer than the time estimated, that they will cover the annual costs of your existing servers software and arching solution tools until things are working. You may not get that in writing but it’s a great heads up for them that you are serious about this.
  • Get them to answer, in written form a few questions like:
    • When will the project be complete?
    • Do you guarantee that all message content and the location of each message will match our current solution when the project is complete?
    • What problems do you foresee with our environment?
  • Walk through the project with them and get them to present a project plan. At each point in the plan identify the state of both ArchiveA and ArchiveB so everyone knows what is running and when.

I know this was a long post. Sorry for that. Hope it at least was a good read and helps someone else not bump into the same things we did.

Thursday, December 08, 2011

Managing Exchange Folders and Item Counts

So in my last blog entry I talked about the PAL tool and how it can help you get a handle on your servers performance.  Today I’d like to discuss PFDAVAdmin.  This is a tool free tool that you run on your Exchange server (2000 or 2003) to get a look inside of all your mailboxes and folders to get things like the total number of items in a folder inside a mailbox.  Here are the details to do just this. 

  1. Download the tool
  2. Install the tool on the Exchange server
  3. Connect to the server, global catalog server and select All Mailboxes
  4. On the Tools menu select Export Properties
  5. On this window put in a destination for the location you want the output file to be dropped (it will be in tab delimited format)
  6. Select these three check boxes:
    1. PR_CONTENT_COUNT
    2. PR_DISPLAY_NAME
    3. PR_FOLDER_PATHNAME
  7. Hit Ok
  8. Grab much needed caffeine while you wait.
  9. The server will begin processing you you will get an output tab delimited file that you can load into excel and pretty up showing the users ID, folder name and number of items in the folder for each mailbox on the server.

This helped us identify those “offenders” that never delete or move anything. After we identified and cleaned up some of our users our exchange 623 and 1022 errors started getting less and less frequent.  The latest Microsoft support person (who speaks English as a first language….note it takes 4 weeks to get one btw) says that we still have a large number of items in our “search folders” that we need to deal with.  I can see the search folders in the tab delimited file but there are multiple ones to deal with. Some are Blackberry search folders, some are Cisco Unity Search folders, some are Finder search folders and others are something called MS-OLK-BGPooledSearchFolder search folders.   We were told Blackberry search folders could be  cleaned with the MFCMapi tool.  There’s also a reg fix which I believe is for the MS-OLK-BGPooledSearchFolder folders.  I haven’t run these fixes yet but I’ll let you know how it turns out.

These issues have reinforced something I’ve been saying for years. There are two evils in IT….printers and email.  You’ll never completely rid yourselves of issues with either one.

Friday, December 02, 2011

PAL – Performance Analysis of Logs

Wow…been a long time for an update eh?! Well most of this year I’ve been working on our Windows 7/Office 2010 rollout.  Also been working on a massive shift from one archive solution to another…but that’s another whole story…

After we upgraded one of our sites from Outlook 2003 to Outlook 2010 we coincidentally had huge issues with Exchange 2003.  We started getting 623 errors in event logs followed by 1022 errors which ended up raping, pillaging and plundering our Exchange information stores. So after weeks of the Microsoft Advanced Diagnostics team, which seems to be based out of somewhere in India, said our issues were caused by too many mailboxes with message counts greater than 5000 in folders. No shit..really?!  Still working on this issue so I’ll let you know how it turns out.

So back to PAL. During my research into Exchange performance I found the PAL tool at codeplex.comPAL is a tool that will look at your perfmon counter logs and tell you what’s up with your server.  Unlike, the Baseline Analyzer tools, it reacts to running conditions on your servers and lets you know what seems to be running “out of bounds”.  Turns out a bunch of subject matter experts at Microsoft got together and set some rules in the PAL tool to alarm/alert just like a traditional expert system would. Very cool stuff.
image

So here’s the quick and dirty on getting started with PAL:
  1. Download PAL from here
  2. Install PAL
  3. Run PAL and go to the Threshold File Tab
  4. Pick the type of analysis you want to run…Start with System Overview…it will get you started (Threshold File Title)
  5. Now click on “Export to Perfmon Template File”
  6. Now if it’s a 2003/XP system save the filetype as .htm, if it’s 2008/Win7 save the filetype as .xml.
  7. Copy the template to the system to be monitored
  8. Run perfmon
  9. (Note: The rest of this is steps for 2008/Win7..if you are on 2003/xp figure it out for yourself. ;P)
  10. Go to Data Collector Set
  11. Right click on “User Defined”
  12. New Collector Set
  13. Pick a name and select “from template”
  14. Browse for the template
  15. Hit Finish
  16. Right click on the collector and start it
  17. Run it for awhile and stop it
  18. Take the results file back to your PAL workstation and start from the first tab.
  19. On counter tab, select the resultant file
  20. On Threshold file Tab reselect the one you started with
  21. On Questions Tab, answer the 4 questions about the system you were monitoring
  22. Output Options Tab, leave it at auto for now
  23. File Output Tab, leave defaults
  24. Queue Tab, leave defaults
  25. Execute Tab, select Execute and hit finish
  26. Wait…it takes awhile to process
  27. Enjoy your html output file and analysis of what was up with your server
One last thing to note…scroll through your Threshold File Titles…there’s a lot to choose from.  You can run some very specific tests.  These all relate to specific counters to “watch” in perfmon so it’s a great learning tool just looking at what’s important to watch.
Enjoy.
image

Wednesday, May 25, 2011

Chromebook CR-48

Guess who got a Chromebook today?!  So far I really like it. I’m trying to get my stuff all “in the cloud” but it’s taking awhile to figure out the best way to move everything.

The great things about the notebook are:

  • it’s size – Nice and thin…could be lighter but not bad at all.  Perfect size for a “workable” machine.  They didn’t try an be an iPad clone and I appreciate that.
  • physical appearance – It’s one sexy beast.  Love the simplified keyboard and the fact that all the buttons look the same.
  • simplicity – This is more a comment about the OS, but it’s dead simple to use and figure out.
  • guest mode – Nice touch. I like that I can give it to anyone and they can mess with my stuff.
  • easy of use – Goes hand in hand with simplicity, but it’s worth commenting that they really went the extra mile to make it simple to use.
  • potential – This device is the future.  That’s clear to me now.
  • battery life – nuff said.  8+ hours and still cranking along

The things I’ve found that need improvement are:

  • remote connections – Gotta give me vnc, ssh and rdp. You just have too.  It’s not workable as a tool for me unless I can remotely control a “real” machine. 
  • Java, Javascript or something – Come on, this thing would be a powerhouse with a little client side intelligence.
  • Wireless signup with Verizon – I still haven’t been able to figure out how to sign up.  Guys…there money lying on the table…go get it.
  • More Apps – App store for chrome looks anemic
  • Mouse pad – works ok, but right click action with 2 fingers is difficult to pull off.
  • Google cloud printing is good, but needs improvement.  Need to find a way to set up a “shared” printer in a reception area so that guests can print easily to it over the web. That’s not easy with Google cloud printing.
  • Flash support – Just gotta have it.

 

So after my first 48 hours with it…that’s my view.

Tuesday, May 03, 2011

iperf

iperf is a tool for testing the throughput of a network pipe. It came in handy today as I was testing out the throughput of our WAN lines.  We have T3’s at each site and one site in particular always “seemed” slow when transferring files but I didn’t know why.  I installed iperf on linux (yum install iperf) and set one side as a server.

iperf –s

Then on another box on the other side of the wire I installed iperf and ran it as a client":

iperf –c x.x.x.x –d  (Where x.x.x.x is the ip address of the box running as a server.)

The I got the following result:

C:\>iperf -c x.x.x.x -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to x.x.x.x, TCP port 5001
TCP window size: 8.00 KByte (default)
------------------------------------------------------------
[1840] local y.y.y.y port 1552 connected with x.x.x.x port 5001
[1816] local y.y.y.y port 5001 connected with x.x.x.x port 46524
[ ID] Interval       Transfer     Bandwidth
[1816]  0.0-10.0 sec   439 MBytes   8.5 Mbits/sec
[1840]  0.0-10.0 sec   321 MBytes   6.7 Mbits/sec

This is a full duplex T3 line so I expect something higher than 8.5 and 6.7.  After some inspection I noticed that switchport on the backbone switch was set for Auto-10 instead of just Auto.  That was restricting it to Ethernet speeds. I changed it to Auto and it picked right back up (~30Mbits/sec).  I did this while other traffic was on the wire so it couldn’t entirely fill the pipe by itself.

Cool tool!

Thursday, March 17, 2011

CentOS NIC Settings (Speed & Duplex)

We ran into an issue with the speed and duplex settings on one of our CentOS boxes during a recent upgrade of our infrastructure to HP ProCurve 8112zl switches.  Because I futz with this about once a year I figured I’d write it down so I could look up up next time… Smile

Step 1: Determine what NIC is in your CentOS box so you can see it’s capabilities.

#lspci | grep Ethernet

This should give you something like (excuse the wrap):

0a:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)

Step 2:  Check how the nic is currently set:

#ethtool eth0:

This should give you something like:

Settings for eth0::
        Supported ports: [ TP ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Supports auto-negotiation: Yes
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Half 1000baseT/Full
        Advertised auto-negotiation: Yes
        Speed: 1000Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        Supports Wake-on: g
        Wake-on: g
        Current message level: 0x000000ff (255)
        Link detected: yes

 

Step 3:  Force the speed to say, 100Full (just for grins):

Edit /etc/sysconfig/networking-scripts/ifcfg-eth0
Add the line:

ETHTOOL_OPTS="speed 100 duplex full autoneg off"

The bounce the network service:

#service network restart

That’s it.  Also be sure to check the config on the switch so it matches.  And remember… always be consistent on both sides of the setup. If the server is auto, set the switch to auto. If the server is 100Full, force the switch port to 100Full.