Friday, October 30, 2009

Six SMG Project Status

Since June I've been building a set of six servers to run SMG. A major goal of this project is to deliver 6 SMG servers fully configured and tested so they are ready to run out of the box. Two of the boxes have shipped and presumably are doing well. The four remaining boxes have been a challenge. Two Intel motherboards had to be replaced. One of the two was replaced twice, and the second replacement was to an ASUS board which is now in my warranty replacement inventory.

I specified Intel CPU, fans, and motherboards thinking that a single brand system must have been system tested. My doubts started upon assembling the first system and seeing the Intel fan attachment mechanism causing the motherboard to bow slightly. Then I started getting random kernel oops, panics, and freezes on the 5th box. I had to go through the Intel RMA process to get a replacement but not without a fight. At first, Intel rejected my claim and said that I damaged the board, and offered to sell a new board to me at a reduced cost. After some discussion Intel kindly reversed their position and replaced the board at no cost to me.

As I was waiting for the Intel RMA, I purchased a 7th board to move the project ahead. This board worked for a few days and then began failing with the same random kernel oops, panic, and freeze. Since this purchase was recent I was able to have the board exchanged at Tiger Direct without any problem. I built up box 5 using board 7. It wasn't long before it failed similarly. When I returned to Tiger Direct for yet another replacement I found that they had sold out of this board completely. I exchanged it for an ASUS board and had to buy DDR2 memory because the Intel board used DDR3 memory.

The Intel RMA board arrived and has been working well. The ASUS board is on the shelf because I didn't want to spend time working with an odd board when the project is very late already. I purchased a Thermaltake fan when I bought the ASUS board. This fan causes no board deformation. I will replace the Intel fans in boxes 3-5 with Thermaltake fans on the hunch that fans causing board deformation is the root cause of the board problems, and to simply eliminate the undesirable deformation.

Box 6 has a failed power LED. This is an annoyance. Had I quality checked all components as soon as I received them I wold have been able to get a complete box exchange at the store. My options now are to RMA the box with the manufacterer, fix the LED, or purchase a new box.  I will attempt to fix the LED first.

Another unexpected problem became evident after building the first box.  The analog cards were long and top heavy and had a PCI-e1  connector.  I estimated that these boards might fail during shipping because of stresses on the connector.  I also observed the boards sagging under their own weight.  To alleviate the stress and sagging during shipping and operation, I devised and constructed a support sytem made of Lexan and aluminum shown here.

Early tests showed noticeable heat build-up in the boxes.  Investigation led us to understand that the three analog cards in each box dissapate a lot of heat continuously. All six boxes have case fans added to their specifications.

The original project plan called for placing two patch panels in a desktop rack. When the user received the first two boxes and one of the patch panel racks, he saw that having one patch panel per rack offered more flexibility.  The project will now include three additional racks for a total of six desktop racks: three 4U and three 2U.

The project plan has been enahnced to include a 4 box daisy-chain configuration and a 'Y' configuration.  The plan was also improved to include bulk call testing using SIPP.

The project documentation will use textual, graphical, and video media.

The current status of the project is:
  1. boxes 1 and 2 are shipped and working on site
  2. boxes 3-6
    1. are built and showing signs of long term stability
    2. analog lines are working
    3. require daisy-chain and Y configuration building and testing
    4. require bulk call testing
    5. require shipping to site
  3. box 6 requires power LED repair
  4. box 3-5 require CPU fan change out
  5. 2U patch panel acquisition and preparation
  6. video capture and edit of setup, operation, installation, configuration
  7. .pdf documentation

Thursday, October 29, 2009

sangoma_sccpd Passes Heartbeat Test

The sangoma_sccpd daemon is running and heartbeating with ss7boxd:

[root@ana3 ss7box]# ./xps
10978 -15 ss7boxd
11531 -15 sangoma_isupd
11536 -5 sangoma_mgd
11613 -15 sangoma_sccpd
9891 0 asterisk

It is sending a heartbeat to ss7box. The heartbeats from isupd and sccpd are recognized by ss7boxd who sends a heartbeat ack to each entity. Hearbeats from isupd are ack'd to isupd, and heartbeats from sccpd are ack'd to sccpd. Here are some logs:

First we see the SS7 links are up:

Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 2:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 2:msu oc 34:tot oc 162000:util 0

Then we see an inbound heartbeat from isupd (SI=5):

Oct 29 16:38:43 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3

Next we see an inbound heartbeat from sccpd (SI=3):

Oct 29 16:38:46 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3

We then confirm that the heartbeat ack was sent from ss7boxd to sccpd:

Oct 29 16:38:46 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0

And finally, we see the process repeating continuously:

[root@ana3 ss7box]# Oct 29 16:38:49 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:51 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:51 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:54 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:56 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:56 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:59 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3

Tuesday, October 27, 2009

Twick or Tweet

Twitter has allowed me to return. I suppose I'll continue using it for microblogging the daily stuff.

sccpd and it's sister isupd are the twin daemons that came from splitting ss7boost into two separate functions. This project has been crawling along since May of this year, but it's finally become a top priority. isupd is fully separated operational and is part of the most recent stable SMG release. sccpd will require a new revision of ss7boxd. Anybody wanting to try out the tip development releases of sccpd should expect some updates to ss7boxd along the way. We're planning on getting most of the ss7boxd changes completed soon so that disturbances at the MTP3 levels are minimized.

The four box 48 hours stability test is still running and will be complete in about 12 hours from now. So far, this is the longest run ever for all 4 boxes. Intel motherboards have been a challenge. Intel silicon is fine. My spare motherboard inventory has been converted to ASUS already.

Dead Bird

Twitter won't let me in. No more tweeted status updates for now. Guess I'll go back to old-fashioned blogging.

The four box, 48 hour stability test passed the 24 hour mark this morning.

I've got a Sony video camera for documenting SMG procedures. I downloaded a video file for the first time this morning and found that Windows Media Player would not play the sound. Then I downloaded VLC Media Player which just worked with the MPEG2-PS files produced by the camera. Now I'm looking at video editing tools starting with CyberLink PowerDirector. I don't want to spend much time with this, so the first tool that just works will be the winner. (Follow up. This was hard and not much fun, but I finally managed to produce a video. I think I know enough now to make a useful video.)

This afternoon I'll complete the lab setup for sccpd testing and hopefully complete the first sccpd tests.

Thursday, October 15, 2009

Normalcy Returns

It's been crazy busy since June. We delivered ss7box redundancy, 32 E1/T1 SMG, and some significant bug fixes. We're also working on a couple of unusual special projects that have resisted efforts to keep them on plan. Mix in several personal catastrophes requiring on-going time demands, and presto: crazy busy.

Thank you to everyone that had to wait longer than expected for promises fulfilled and support.

The good news is that things are settling down. The SMG product is showing itself to be on solid ground and we are now committed and able to support releases without requiring users to jump up to using development tip releases. The special projects are headed toward completion and the long-promised sccpd project is back on track.

What lies ahead? More intuitive configuration methods for networks and nodes. A continuous high volume testing program with more variance and talk path testing. Increased focus on sccp and related applications. Exposure of protocol layers using API's.