> Dear Mr. ss7box,
>
> Can you give me some insight into what theses messages are telling me:
>
> Nov 9 15:12:45 tele2ss7 sangoma_isupd[21182]:
> W:sb_cmm.c:cmm_quarantine_ckt:CQ_WAIT_TIMER:go back to
> CQ_WAIT_SG:span/chan/event_id/caller follow:10:22:1019:8
> Nov 9 15:12:45 tele2ss7 sangoma_isupd[21182]:
> W:sb_cmm.c:cmm_quarantine_ckt:CQ_WAIT_TIMER:go back to
> CQ_WAIT_SG:span/chan/event_id/caller follow:10:19:1019:8
> Nov 9 15:12:45 tele2ss7 sangoma_isupd[21182]:
>
> Puzzled
Dear Puzzled,
My first guess is that the remote side is not responding to our reset circuit messages. If this goes on long enough the circuit will be quarantined and stay in the quarantined state until the remote side answers our resets.
Sometimes this happens because the NI and PRIO values in MTP3 for ISUP on our side are not matched to those of the remote side. You can check this by using /ss7box/ss7box_cli --show msu on and looking in /ss7box/msu.log. Examine the inbound and outbound messages for a particular point code pair and ensure that the NI and PRIO values are the same.
If the values are the same then you'll most likely need to work with the remote telco to find out why they are not responding to the circuit reset. The telco on the remote side may or may not care about helping you, and it may be a function of their ability to help you.
If the values are different then make necessary changes to the NI and PRIO columsn for this trunk group in the ss7boost section (soon to be renamed the isupd section) of the configuration spreadsheet and use the smgcfg tool to make a new sangoma_isup.conf. Apply the new conf and see if problem is resolved.
Monday, November 09, 2009
Friday, October 30, 2009
Six SMG Project Status
Since June I've been building a set of six servers to run SMG. A major goal of this project is to deliver 6 SMG servers fully configured and tested so they are ready to run out of the box. Two of the boxes have shipped and presumably are doing well. The four remaining boxes have been a challenge. Two Intel motherboards had to be replaced. One of the two was replaced twice, and the second replacement was to an ASUS board which is now in my warranty replacement inventory.
I specified Intel CPU, fans, and motherboards thinking that a single brand system must have been system tested. My doubts started upon assembling the first system and seeing the Intel fan attachment mechanism causing the motherboard to bow slightly. Then I started getting random kernel oops, panics, and freezes on the 5th box. I had to go through the Intel RMA process to get a replacement but not without a fight. At first, Intel rejected my claim and said that I damaged the board, and offered to sell a new board to me at a reduced cost. After some discussion Intel kindly reversed their position and replaced the board at no cost to me.
As I was waiting for the Intel RMA, I purchased a 7th board to move the project ahead. This board worked for a few days and then began failing with the same random kernel oops, panic, and freeze. Since this purchase was recent I was able to have the board exchanged at Tiger Direct without any problem. I built up box 5 using board 7. It wasn't long before it failed similarly. When I returned to Tiger Direct for yet another replacement I found that they had sold out of this board completely. I exchanged it for an ASUS board and had to buy DDR2 memory because the Intel board used DDR3 memory.
The Intel RMA board arrived and has been working well. The ASUS board is on the shelf because I didn't want to spend time working with an odd board when the project is very late already. I purchased a Thermaltake fan when I bought the ASUS board. This fan causes no board deformation. I will replace the Intel fans in boxes 3-5 with Thermaltake fans on the hunch that fans causing board deformation is the root cause of the board problems, and to simply eliminate the undesirable deformation.
Box 6 has a failed power LED. This is an annoyance. Had I quality checked all components as soon as I received them I wold have been able to get a complete box exchange at the store. My options now are to RMA the box with the manufacterer, fix the LED, or purchase a new box. I will attempt to fix the LED first.
Another unexpected problem became evident after building the first box. The analog cards were long and top heavy and had a PCI-e1 connector. I estimated that these boards might fail during shipping because of stresses on the connector. I also observed the boards sagging under their own weight. To alleviate the stress and sagging during shipping and operation, I devised and constructed a support sytem made of Lexan and aluminum shown here.
Early tests showed noticeable heat build-up in the boxes. Investigation led us to understand that the three analog cards in each box dissapate a lot of heat continuously. All six boxes have case fans added to their specifications.
The original project plan called for placing two patch panels in a desktop rack. When the user received the first two boxes and one of the patch panel racks, he saw that having one patch panel per rack offered more flexibility. The project will now include three additional racks for a total of six desktop racks: three 4U and three 2U.
The project plan has been enahnced to include a 4 box daisy-chain configuration and a 'Y' configuration. The plan was also improved to include bulk call testing using SIPP.
The project documentation will use textual, graphical, and video media.
The current status of the project is:
I specified Intel CPU, fans, and motherboards thinking that a single brand system must have been system tested. My doubts started upon assembling the first system and seeing the Intel fan attachment mechanism causing the motherboard to bow slightly. Then I started getting random kernel oops, panics, and freezes on the 5th box. I had to go through the Intel RMA process to get a replacement but not without a fight. At first, Intel rejected my claim and said that I damaged the board, and offered to sell a new board to me at a reduced cost. After some discussion Intel kindly reversed their position and replaced the board at no cost to me.
As I was waiting for the Intel RMA, I purchased a 7th board to move the project ahead. This board worked for a few days and then began failing with the same random kernel oops, panic, and freeze. Since this purchase was recent I was able to have the board exchanged at Tiger Direct without any problem. I built up box 5 using board 7. It wasn't long before it failed similarly. When I returned to Tiger Direct for yet another replacement I found that they had sold out of this board completely. I exchanged it for an ASUS board and had to buy DDR2 memory because the Intel board used DDR3 memory.
The Intel RMA board arrived and has been working well. The ASUS board is on the shelf because I didn't want to spend time working with an odd board when the project is very late already. I purchased a Thermaltake fan when I bought the ASUS board. This fan causes no board deformation. I will replace the Intel fans in boxes 3-5 with Thermaltake fans on the hunch that fans causing board deformation is the root cause of the board problems, and to simply eliminate the undesirable deformation.
Box 6 has a failed power LED. This is an annoyance. Had I quality checked all components as soon as I received them I wold have been able to get a complete box exchange at the store. My options now are to RMA the box with the manufacterer, fix the LED, or purchase a new box. I will attempt to fix the LED first.
Another unexpected problem became evident after building the first box. The analog cards were long and top heavy and had a PCI-e1 connector. I estimated that these boards might fail during shipping because of stresses on the connector. I also observed the boards sagging under their own weight. To alleviate the stress and sagging during shipping and operation, I devised and constructed a support sytem made of Lexan and aluminum shown here.
Early tests showed noticeable heat build-up in the boxes. Investigation led us to understand that the three analog cards in each box dissapate a lot of heat continuously. All six boxes have case fans added to their specifications.
The original project plan called for placing two patch panels in a desktop rack. When the user received the first two boxes and one of the patch panel racks, he saw that having one patch panel per rack offered more flexibility. The project will now include three additional racks for a total of six desktop racks: three 4U and three 2U.
The project plan has been enahnced to include a 4 box daisy-chain configuration and a 'Y' configuration. The plan was also improved to include bulk call testing using SIPP.
The project documentation will use textual, graphical, and video media.
The current status of the project is:
- boxes 1 and 2 are shipped and working on site
- boxes 3-6
- are built and showing signs of long term stability
- analog lines are working
- require daisy-chain and Y configuration building and testing
- require bulk call testing
- require shipping to site
- box 6 requires power LED repair
- box 3-5 require CPU fan change out
- 2U patch panel acquisition and preparation
- video capture and edit of setup, operation, installation, configuration
- .pdf documentation
Thursday, October 29, 2009
sangoma_sccpd Passes Heartbeat Test
The sangoma_sccpd daemon is running and heartbeating with ss7boxd:
[root@ana3 ss7box]# ./xps
10978 -15 ss7boxd
11531 -15 sangoma_isupd
11536 -5 sangoma_mgd
11613 -15 sangoma_sccpd
9891 0 asterisk
It is sending a heartbeat to ss7box. The heartbeats from isupd and sccpd are recognized by ss7boxd who sends a heartbeat ack to each entity. Hearbeats from isupd are ack'd to isupd, and heartbeats from sccpd are ack'd to sccpd. Here are some logs:
First we see the SS7 links are up:
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 2:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 2:msu oc 34:tot oc 162000:util 0
Then we see an inbound heartbeat from isupd (SI=5):
Oct 29 16:38:43 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Next we see an inbound heartbeat from sccpd (SI=3):
Oct 29 16:38:46 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
We then confirm that the heartbeat ack was sent from ss7boxd to sccpd:
Oct 29 16:38:46 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
And finally, we see the process repeating continuously:
[root@ana3 ss7box]# Oct 29 16:38:49 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:51 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:51 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:54 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:56 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:56 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:59 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
[root@ana3 ss7box]# ./xps
10978 -15 ss7boxd
11531 -15 sangoma_isupd
11536 -5 sangoma_mgd
11613 -15 sangoma_sccpd
9891 0 asterisk
It is sending a heartbeat to ss7box. The heartbeats from isupd and sccpd are recognized by ss7boxd who sends a heartbeat ack to each entity. Hearbeats from isupd are ack'd to isupd, and heartbeats from sccpd are ack'd to sccpd. Here are some logs:
First we see the SS7 links are up:
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 2:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 2:msu oc 34:tot oc 162000:util 0
Then we see an inbound heartbeat from isupd (SI=5):
Oct 29 16:38:43 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Next we see an inbound heartbeat from sccpd (SI=3):
Oct 29 16:38:46 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
We then confirm that the heartbeat ack was sent from ss7boxd to sccpd:
Oct 29 16:38:46 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
And finally, we see the process repeating continuously:
[root@ana3 ss7box]# Oct 29 16:38:49 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:51 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:51 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:54 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:56 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:56 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:59 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Tuesday, October 27, 2009
Twick or Tweet
Twitter has allowed me to return. I suppose I'll continue using it for microblogging the daily stuff.
sccpd and it's sister isupd are the twin daemons that came from splitting ss7boost into two separate functions. This project has been crawling along since May of this year, but it's finally become a top priority. isupd is fully separated operational and is part of the most recent stable SMG release. sccpd will require a new revision of ss7boxd. Anybody wanting to try out the tip development releases of sccpd should expect some updates to ss7boxd along the way. We're planning on getting most of the ss7boxd changes completed soon so that disturbances at the MTP3 levels are minimized.
The four box 48 hours stability test is still running and will be complete in about 12 hours from now. So far, this is the longest run ever for all 4 boxes. Intel motherboards have been a challenge. Intel silicon is fine. My spare motherboard inventory has been converted to ASUS already.
sccpd and it's sister isupd are the twin daemons that came from splitting ss7boost into two separate functions. This project has been crawling along since May of this year, but it's finally become a top priority. isupd is fully separated operational and is part of the most recent stable SMG release. sccpd will require a new revision of ss7boxd. Anybody wanting to try out the tip development releases of sccpd should expect some updates to ss7boxd along the way. We're planning on getting most of the ss7boxd changes completed soon so that disturbances at the MTP3 levels are minimized.
The four box 48 hours stability test is still running and will be complete in about 12 hours from now. So far, this is the longest run ever for all 4 boxes. Intel motherboards have been a challenge. Intel silicon is fine. My spare motherboard inventory has been converted to ASUS already.
Dead Bird
Twitter won't let me in. No more tweeted status updates for now. Guess I'll go back to old-fashioned blogging.
The four box, 48 hour stability test passed the 24 hour mark this morning.
I've got a Sony video camera for documenting SMG procedures. I downloaded a video file for the first time this morning and found that Windows Media Player would not play the sound. Then I downloaded VLC Media Player which just worked with the MPEG2-PS files produced by the camera. Now I'm looking at video editing tools starting with CyberLink PowerDirector. I don't want to spend much time with this, so the first tool that just works will be the winner. (Follow up. This was hard and not much fun, but I finally managed to produce a video. I think I know enough now to make a useful video.)
This afternoon I'll complete the lab setup for sccpd testing and hopefully complete the first sccpd tests.
The four box, 48 hour stability test passed the 24 hour mark this morning.
I've got a Sony video camera for documenting SMG procedures. I downloaded a video file for the first time this morning and found that Windows Media Player would not play the sound. Then I downloaded VLC Media Player which just worked with the MPEG2-PS files produced by the camera. Now I'm looking at video editing tools starting with CyberLink PowerDirector. I don't want to spend much time with this, so the first tool that just works will be the winner. (Follow up. This was hard and not much fun, but I finally managed to produce a video. I think I know enough now to make a useful video.)
This afternoon I'll complete the lab setup for sccpd testing and hopefully complete the first sccpd tests.
Thursday, October 15, 2009
Normalcy Returns
It's been crazy busy since June. We delivered ss7box redundancy, 32 E1/T1 SMG, and some significant bug fixes. We're also working on a couple of unusual special projects that have resisted efforts to keep them on plan. Mix in several personal catastrophes requiring on-going time demands, and presto: crazy busy.
Thank you to everyone that had to wait longer than expected for promises fulfilled and support.
The good news is that things are settling down. The SMG product is showing itself to be on solid ground and we are now committed and able to support releases without requiring users to jump up to using development tip releases. The special projects are headed toward completion and the long-promised sccpd project is back on track.
What lies ahead? More intuitive configuration methods for networks and nodes. A continuous high volume testing program with more variance and talk path testing. Increased focus on sccp and related applications. Exposure of protocol layers using API's.
Thank you to everyone that had to wait longer than expected for promises fulfilled and support.
The good news is that things are settling down. The SMG product is showing itself to be on solid ground and we are now committed and able to support releases without requiring users to jump up to using development tip releases. The special projects are headed toward completion and the long-promised sccpd project is back on track.
What lies ahead? More intuitive configuration methods for networks and nodes. A continuous high volume testing program with more variance and talk path testing. Increased focus on sccp and related applications. Exposure of protocol layers using API's.
Tuesday, July 14, 2009
Thinking vs. Debugging
It happened again. I was working on redundant ss7box yesterday. When I launched the second ss7box there were lots of route audit log messages. Odd because ISUP was not running so there should not have been any route audits. How could I have screwed up so badly?
I got up and walked away. I did not return for over twelve hours. In that time I realized that I was not clear on how to configure redundant ss7box yet. Maybe the configuration was screwed up.
This morning I confirmed that the problem appeared as soon as the new redundant ss7box was started. I looked closely at the ss7box configuration and found that the same port number for sockets to the ISUP layer had been used on both ss7boxes. I found a new configuration rule.
This find strengthened the thought I had yesterday that configuration definition is still too complicated and redundant. Getting the configuration correct is difficult for me, so it will be impossible for a normal user.
This experience demonstrates the importance of taking time to think as part of the debugging process. Setting breakpoints and adding print statements are important too. What's more, it's active debugging. It shows that you are doing something. On the other hand, thinking is passive. It's often done while you are doing something else. It looks like goofing off. It's not. It's probably the most powerful debugging technique that I have. I first heard about it in engineering school at the University of South Carolina from Dr. Pettus. I've found my own way to what he was talking about. It takes a while to get there.
Problems are usually not solved on a time line or according to a deadline. If they are, they can be forced and awkward. Taking time to think through a problem and letting the solution come naturally according to function of the brain doing the thinking is valuable. My brain does not arrive at solutions well under high stress. It won't find any solutions when there's a lack of stress. Finding a tolerable balance of stress is the key.
I got up and walked away. I did not return for over twelve hours. In that time I realized that I was not clear on how to configure redundant ss7box yet. Maybe the configuration was screwed up.
This morning I confirmed that the problem appeared as soon as the new redundant ss7box was started. I looked closely at the ss7box configuration and found that the same port number for sockets to the ISUP layer had been used on both ss7boxes. I found a new configuration rule.
This find strengthened the thought I had yesterday that configuration definition is still too complicated and redundant. Getting the configuration correct is difficult for me, so it will be impossible for a normal user.
This experience demonstrates the importance of taking time to think as part of the debugging process. Setting breakpoints and adding print statements are important too. What's more, it's active debugging. It shows that you are doing something. On the other hand, thinking is passive. It's often done while you are doing something else. It looks like goofing off. It's not. It's probably the most powerful debugging technique that I have. I first heard about it in engineering school at the University of South Carolina from Dr. Pettus. I've found my own way to what he was talking about. It takes a while to get there.
Problems are usually not solved on a time line or according to a deadline. If they are, they can be forced and awkward. Taking time to think through a problem and letting the solution come naturally according to function of the brain doing the thinking is valuable. My brain does not arrive at solutions well under high stress. It won't find any solutions when there's a lack of stress. Finding a tolerable balance of stress is the key.
Friday, June 05, 2009
ss7box: New Features, New Lab
ss7box is getting a lot of attention lately. Three new features are being developed simultaneously: redundancy, sccp routing, and support for a fiber interface. To support this development, significant changes to the Xygnada lab are required. Figure 1 below shows the build plan.

Figure 1
The nodes ana3 and ana62 will host a mated pair of redundant ss7box acting as a single point code. ana3 will host Asterisk, SMG, ISUP, SCCP, and a CNAM application that is capable of being both a client and a server. ana19 will be a clustered ISUP node using SS7 services from ana3 and ana62 with a new twist - it will have a point code that differs from the redundant ss7box pair. It will also host Asterisk and SMG. ana17 will be a single node instance of asterisk/SMG/ss7box. The dt node will be dedicated to SCCP and related applications with CNAM client/server being the lead-off application. This node will also serve as a developer workstation and regression test platform to support integration of a fiber interface into ss7box. In the middle, nodes ana60 and ana61 provide MTP3 transfer services like those found in STPs.
The lab configuration creates quite a few functional interactions and raises the overall lab complexity so that we can get more test coverage and carry out development in several areas simultaneously. Changing the lab is labor intensive and work in other areas stops as a result, so it's not done as often as needed. In this case, we could no longer put off the pain of changing the lab because progress had come to a halt.
The ana3, ana62, ana60, ana61, and ana17 nodes had been working together prior to the change. The dt node has been under construction for a while. It took a while to find the right version of opensuse (10.2) to work with the fiber interface libraries. Then it took more time to figure out that its A102c interfaces are incompatible with the modern SMG/ss7box so an upgrade to A102SH was required. The SS7 linkset between dt and ana60 was put into service today. What remains is to put the dt-ana61 linkset into service, create ana19, and reconfigure ana62 into an SMG/ss7box node from its current status as an SMG-only clustered node.
Figure 1
The nodes ana3 and ana62 will host a mated pair of redundant ss7box acting as a single point code. ana3 will host Asterisk, SMG, ISUP, SCCP, and a CNAM application that is capable of being both a client and a server. ana19 will be a clustered ISUP node using SS7 services from ana3 and ana62 with a new twist - it will have a point code that differs from the redundant ss7box pair. It will also host Asterisk and SMG. ana17 will be a single node instance of asterisk/SMG/ss7box. The dt node will be dedicated to SCCP and related applications with CNAM client/server being the lead-off application. This node will also serve as a developer workstation and regression test platform to support integration of a fiber interface into ss7box. In the middle, nodes ana60 and ana61 provide MTP3 transfer services like those found in STPs.
The lab configuration creates quite a few functional interactions and raises the overall lab complexity so that we can get more test coverage and carry out development in several areas simultaneously. Changing the lab is labor intensive and work in other areas stops as a result, so it's not done as often as needed. In this case, we could no longer put off the pain of changing the lab because progress had come to a halt.
The ana3, ana62, ana60, ana61, and ana17 nodes had been working together prior to the change. The dt node has been under construction for a while. It took a while to find the right version of opensuse (10.2) to work with the fiber interface libraries. Then it took more time to figure out that its A102c interfaces are incompatible with the modern SMG/ss7box so an upgrade to A102SH was required. The SS7 linkset between dt and ana60 was put into service today. What remains is to put the dt-ana61 linkset into service, create ana19, and reconfigure ana62 into an SMG/ss7box node from its current status as an SMG-only clustered node.
Subscribe to:
Posts (Atom)