Friday, October 30, 2009

Six SMG Project Status

Since June I've been building a set of six servers to run SMG. A major goal of this project is to deliver 6 SMG servers fully configured and tested so they are ready to run out of the box. Two of the boxes have shipped and presumably are doing well. The four remaining boxes have been a challenge. Two Intel motherboards had to be replaced. One of the two was replaced twice, and the second replacement was to an ASUS board which is now in my warranty replacement inventory.

I specified Intel CPU, fans, and motherboards thinking that a single brand system must have been system tested. My doubts started upon assembling the first system and seeing the Intel fan attachment mechanism causing the motherboard to bow slightly. Then I started getting random kernel oops, panics, and freezes on the 5th box. I had to go through the Intel RMA process to get a replacement but not without a fight. At first, Intel rejected my claim and said that I damaged the board, and offered to sell a new board to me at a reduced cost. After some discussion Intel kindly reversed their position and replaced the board at no cost to me.

As I was waiting for the Intel RMA, I purchased a 7th board to move the project ahead. This board worked for a few days and then began failing with the same random kernel oops, panic, and freeze. Since this purchase was recent I was able to have the board exchanged at Tiger Direct without any problem. I built up box 5 using board 7. It wasn't long before it failed similarly. When I returned to Tiger Direct for yet another replacement I found that they had sold out of this board completely. I exchanged it for an ASUS board and had to buy DDR2 memory because the Intel board used DDR3 memory.

The Intel RMA board arrived and has been working well. The ASUS board is on the shelf because I didn't want to spend time working with an odd board when the project is very late already. I purchased a Thermaltake fan when I bought the ASUS board. This fan causes no board deformation. I will replace the Intel fans in boxes 3-5 with Thermaltake fans on the hunch that fans causing board deformation is the root cause of the board problems, and to simply eliminate the undesirable deformation.

Box 6 has a failed power LED. This is an annoyance. Had I quality checked all components as soon as I received them I wold have been able to get a complete box exchange at the store. My options now are to RMA the box with the manufacterer, fix the LED, or purchase a new box.  I will attempt to fix the LED first.

Another unexpected problem became evident after building the first box.  The analog cards were long and top heavy and had a PCI-e1  connector.  I estimated that these boards might fail during shipping because of stresses on the connector.  I also observed the boards sagging under their own weight.  To alleviate the stress and sagging during shipping and operation, I devised and constructed a support sytem made of Lexan and aluminum shown here.

Early tests showed noticeable heat build-up in the boxes.  Investigation led us to understand that the three analog cards in each box dissapate a lot of heat continuously. All six boxes have case fans added to their specifications.

The original project plan called for placing two patch panels in a desktop rack. When the user received the first two boxes and one of the patch panel racks, he saw that having one patch panel per rack offered more flexibility.  The project will now include three additional racks for a total of six desktop racks: three 4U and three 2U.

The project plan has been enahnced to include a 4 box daisy-chain configuration and a 'Y' configuration.  The plan was also improved to include bulk call testing using SIPP.

The project documentation will use textual, graphical, and video media.

The current status of the project is:
  1. boxes 1 and 2 are shipped and working on site
  2. boxes 3-6
    1. are built and showing signs of long term stability
    2. analog lines are working
    3. require daisy-chain and Y configuration building and testing
    4. require bulk call testing
    5. require shipping to site
  3. box 6 requires power LED repair
  4. box 3-5 require CPU fan change out
  5. 2U patch panel acquisition and preparation
  6. video capture and edit of setup, operation, installation, configuration
  7. .pdf documentation

No comments: