ss7box

Sunday, September 23, 2012

Xygnada Technology Dissolving

Time to close the doors on Xygnada Technology, Inc and ss7box. Interest in SS7 has declined significantly, or price/performance of alternatives has surpassed that of ss7box. It's nearly impossible to support ss7box now that the driver is so out of step with recent Linux releases. Being a loyal to a single source of network interface hardware was a serious mistake. At this point in history there is little incentive to adapt ss7box to another network interface card.

Open sourcing ss7box software is fraught with uncertainty about how partners in the past would react. None of these partners respond to requests to discuss releasing the software. Well played partners, well played. Doing a clean-room rewrite was considered, but the effort seemed far out of balance with reward.

Thank you and auf wiedersehen.
Mike

Saturday, March 31, 2012

Three Urgent Problems in isupd

Problem 1:

Mar 26 02:21:03 v3 sangoma_isupd[15421]: F:sb_cmm.c:get_new_overall_rte_status:illegal status, should be DAVA:tg/sg0-stat/sg1-stat/overall-stat:0:2:1:1

We are sure this one is fixed in the isupd 2.6.1 patch 8.

Patch 8 (2012-03-30 Fr)
* when heartbeat is lost to SG in simplex or both SG in duplex, M3UA route
is set to DUNA; patch also sets individual SG route status to DUNA which
prevents failure of a status check elsewhere in the code

Problem 2:

*** glibc detected *** /usr/local/ss7box/sangoma_isupd: double free or corruption (out): 0x08dd90a8 ***
======= Backtrace: =========
/lib/libc.so.6[0x83a6c5]
/lib/libc.so.6(cfree+0x59)[0x83ab09]
/usr/local/ss7box/sangoma_isupd[0x804d7e1]
/usr/local/ss7box/sangoma_isupd[0x804992d]
/lib/libc.so.6(__libc_start_main+0xdc)[0x7e6e9c]
/usr/local/ss7box/sangoma_isupd[0x80494b1]

We searched the code for a while but the threads were long, so we decided to devise a usage semaphore in each timer that would let us detect double freeing of timer memory and dump info to the log to help us track down the problem. The timer semaphore is in isupd 2.6.1 patch 9.

Patch 9 (2012-03-31 Sa)
* added a semaphore to timers to test for double freeing of memory for timers

Problem 3:

CQM that cross T1 span boundries cause improper responses that cause loss of circuits. A restart of isupd is required to recover the lost circuits. The CQM arrives nightly at several locations. We are devising a multi-step solution. The first step is to develop a work-around solution to prevent circuit loss as quickly as possible. The first step will be to create a response based on the incoming CQM that always reports the indicated circuits as being in working order.

Friday, March 23, 2012

How To Read the CDR Log File

Here is an example with fake phone numbers:

1001, in, 1, 2012, 03, 23, 16, 02, 22, 1332543742, 499660, 0, 0, 15, 0, , 0, , 0

128, out, 1, 2012, 03, 23, 16, 02, 22, 1332543742, 499692, 0, 0, 15, 10, 1112223333, 10,

4445556666, 0

129, in, 1, 1332543742, 758203, 0, 0, 15

1006, out, 1, 1332543742, 758220, 0, 0, 15

80, unrecognized

1044, unrecognized

132, in, 1, 1332543742, 758427, 0, 0, 15

1009, out, 1, 1332543742, 758441, 0, 0, 15

1012, in, 1, 1332543789, 456939, 0, 0, 15, 16

133, out, 1, 1332543789, 456956, 0, 0, 15

134, in, 1, 1332543789, 464574, 0, 0, 15

1016, out, 1, 1332543789, 464594, 0, 0, 15

Note the 2 unrecognized codes - will have to figure this out.

The code is the first number on each line; 3 digits codes are MGD messages; 4 digit codes are SS7 messages. The layout is as follows:

Call start events (IAM 1001, callstart 128):

sprintf (s_cdr, "%u, %s, %u, %s, %lu, %lu, %u, %u, %u, %u, %s, %u, %s, %u\n",
p_cdr->code,
p_cdr->msg_direction ? "in" : "out",
p_cdr->sysid,
p_ds,
p_cdr->timestamp.tv_sec,
p_cdr->timestamp.tv_usec,
p_cdr->call_setup_id,
p_cdr->span,
p_cdr->chan,
p_cdr->called_number_dig_count,
p_cdr->called_number_digits,
p_cdr->calling_number_dig_count,
p_cdr->calling_number_digits,
p_cdr->calling_number_presentation_indicator
);

Call stop events (REL 1012):

sprintf (s_cdr, "%u, %s, %u, %lu, %lu, %u, %u, %u, %u\n",
p_cdr->code,
p_cdr->msg_direction ? "in" : "out",
p_cdr->sysid,
p_cdr->timestamp.tv_sec,
p_cdr->timestamp.tv_usec,
p_cdr->call_setup_id,
p_cdr->span,
p_cdr->chan,
p_cdr->release_cause);

Simple events - most of the entries (ACM 1006, ANS 1009, RLC 1016):

sprintf (s_cdr, "%u, %s, %u, %lu, %lu, %u, %u, %u\n",
p_cdr->code,
p_cdr->msg_direction ? "in" : "out",
p_cdr->sysid,
p_cdr->timestamp.tv_sec,
p_cdr->timestamp.tv_usec,
p_cdr->call_setup_id,
p_cdr->span,
p_cdr->chan);

Thursday, March 15, 2012

Redundant ss7box Refinements

A redundant ss7box is being constructed in the lab once again with new twist to simulate an IP network with long and diverse links. We need to simulate the loss of an IP link between a remote Asterisk box and ss7box - something that will be more likely in a network that spans a very large region. Using two distinctly different IP carriers for IP connections to each ss7box ensures link diversity.

In this new configuration we'll put in a small IP switch between the two mated ss7boxes and an Asterisk box. To test IP link loss we'll pull the IP cable between one of the ss7boxes and the IP switch. The result should be that the ss7box that lost the IP link should redirect the traffic to that IP link to the crosslink to its mate ss7box, and the Asterisk box that lost the IP link should remap SLS to use its remaining in-service IP link to the other ss7box. There will be a small window where signals could get lost during the transition. Calls with lost signals will timeout and be cleared. The call parties will experience abrupt call termination indications. Detecting loss of an IP link is not as punctual as loss of SS7 link detection - IP is weak in this area. We'll help the situation by using a better ping-pong protocol on the IP links to indicate link loss.

The functionality described above does not exist yet, so we'll set up the test network; make test calls before making improvements and confirm that half of all calls will not complete when an IP link is lost; make appropriate code changes; run the call tests under IP link loss and restore conditions. We'll release this functionality in a major revision release - probably 2.7.

Here's a scan of the drawing we are using to build the lab network:

We use Google Doc spreadsheets for the configuration.

Progress:

The ss7 link between 1002 and the new mated 159 ss7box at 192.168.1.62 is up.
The SIP client on the Asterisk box on the 159 cluster has to be rebuilt. The laptop it was running on lost its HDD. The HDD was replaced and Mint 12 was installed last week. Was using Blink and XP previously. Will need to find and test a suitable SIP client that works on Mint 12 and a Dell Vostro 1000.
- looks like linphone is the first candidate; and it works too - tested on the 1003 node
write up the linphone and asterisk configuration

This is the link report from the 1002 ss7boxd that shows 3 ss7 links up:

Mar 15 11:34:27 ana156 ss7boxd[7056]: R:link util:ls 0:link 0:msu oc 26:tot oc 161840:util 0

Mar 15 11:34:27 ana156 ss7boxd[7056]: R:link util:ls 1:link 0:msu oc 34:tot oc 161840:util 0

Mar 15 11:34:27 ana156 ss7boxd[7056]: R:link util:ls 1:link 1:msu oc 34:tot oc 161920:util 0

Wednesday, March 14, 2012

ss7boxd 2.6.0.13 Released

Announcing a minor ss7boxd release to distinguish the need to use 0.9 revision ss7box.conf configuration file which is created by the smgcfg09.py program. The 2.6.0.12 ss7boxd was released with some using 0.8 versions of the config file and later releases used the 0.9 revision of the config file. Sorry for the confusion.

You can download them from here:

http://www.ss7box.com/tmp/ss7boxd-2.6.0.13-ANSI
http://www.ss7box.com/tmp/ss7boxd-2.6.0.13-ITU

The difference between 0.8 and 0.9 revision conf files is described in the Change History inside the smgcfg09.py file. The only difference for ss7box.conf is the change to the revision number in the conf file. ss7boxd 2.6.0.13 is looking for rev 0.9 in the ss7box conf file.

Thursday, December 15, 2011

Wanpipe Install Problem Fixed

Got this problem. Fixed it. Don't think it's important. It took a lot of time....wasted time....to figure all of this out.

Compiling WANPIPE API Development Utilities ...Failed!

ERROR: Failed to compile WANPIPE API Tools !!!
Please contact support at Sangoma Technologies
email: techdesk@sangoma.com
Please include the file setup_drv_compile.log

Let's see if we can get some detail:

[root@ana64 api]# cd /usr/src/Sangoma/wanpipe/api

[root@ana64 api]# make

make -C tdm_api

make[1]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/tdm_api'

Ok.

make[1]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/tdm_api'

make -C legacy

make[1]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy'

make -C x25 all APIINC=/usr/include/wanpipe

make[2]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/x25'

Ok.

make[2]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/x25'

make -C chdlc all APIINC=/usr/include/wanpipe

make[2]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/chdlc'

cc -Wall -O2 -D__LINUX__ -D_DEBUG_=2 -D_GNUC_ -I../lib -I/usr/include/wanpipe -o chdlc_modem_cmd chdlc_modem_cmd.c ../lib/lib_api.c

chdlc_modem_cmd.c: In function 'handle_socket':

chdlc_modem_cmd.c:412: error: 'wp_api_hdr_t' has no member named 'error_flag'

make[2]: *** [chdlc_modem_cmd] Error 1

make[2]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/chdlc'

make[1]: *** [all] Error 2

make[1]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy'

make: *** [all] Error 2

Problem is with legacy chdlc - not using it. This is why this problem can most likely be ignored. Nevertheless, the fix follows. First, we look for a replacement for the offending "error_flag" field that's not defined.

[root@ana64 api]# grep -r "wp_api_hdr_t\;" ../* | grep "\.h\:"

grep: warning: ../patches/kdrivers/include/linux: recursive directory loop

../patches/kdrivers/include/wanpipe_api_hdr.h:} wp_api_hdr_t;

grep: warning: ../patches/kdrivers/wanec/linux: recursive directory loop

[root@ana64 api]# vi ../patches/kdrivers/include/wanpipe_api_hdr.h

[root@ana64 api]# vim ../patches/kdrivers/include/wanpipe_api_hdr.h

This looks promising:

/* CHDLC Old backdward comptabile */

#define wp_api_rx_hdr_chdlc_error_flag wp_api_rx_hdr_error_flag

Let's apply a change:

[root@ana64 api]# cd /usr/src/Sangoma/wanpipe/api/legacy/chdlc/
[root@ana64 chdlc]#

Create a file called "patch" and fill it with the following:

--- chdlc_modem_cmd.c 2011-12-15 17:05:20.000000000 -0500
+++ chdlc_modem_cmd.c.chg 2011-12-15 17:16:06.000000000 -0500
@@ -409,7 +409,7 @@
return;
}

- switch (api_rx_el->api_rx_hdr.error_flag){
+ switch (api_rx_el->api_rx_hdr.wp_api_rx_hdr_error_flag){

case 0:
/* Rx packet is good */

Apply the patch:

[root@ana64 chdlc]# patch --ignore-whitespace < patch

patching file chdlc_modem_cmd.c

Now compile the api:

[root@ana64 chdlc]# cd /usr/src/Sangoma/wanpipe/api

[root@ana64 api]# make

Problem should be gone. There will be tons of warnings depending on the gcc version you are using. As long as you don't see "error" in the output it should be fine.

Tuesday, December 13, 2011

Cluster Configuration Needs To Improve

Adding a cluster node to the lab this morning. The lab is currently working using older versions of ss7box/SMG. This configuration needs to remain intact. The new cluster being added will be the development tip alpha test site.

It takes a lot of coordinated data to make it work because that's how the protocol works. The smgcfg tool attempted to simplify the task and it did to some extent but plenty of feedback says we can do better.

Using a spreadsheet helps to make things visual and colorful, but downloading .csv and running smgcfgXX against it is kludgy. Two tools and manually pushing files is not good. I'm getting reintroduced to the problem this morning.

A better approach would be to use a single tool that is aware of a group of nodes and a library of configurations. The user interface needs to be efficient for users that want to use vi to edit a source file, a command line for lean systems with no X11 stuff loaded, and a command line IF that accommodates a web interface. A diagramatic interface showing nodes and connections that allows click-to-query-or-modify would be helpful. All of these interfaces should be supported interchangeably, for example, if one person wants to use vi on a source file and another wants to use the CLI or web interface (not at the same time), then it should be possible - because the CLI is a specialized source file editor and the web interface uses the CLI. Of course, using vi to edit the source file could screw things up if the format is disturbed. Ideally the CLI interface will not have this problem. Furthermore, the CLI would have prompts like: add a link to a linkset, or add a trunk to a trunkgroup, or a powerful add a node. These prompts would lead the user through the collection of information needed. Maybe the tool could generate a graphical representation output from input data as a precursor to using a graphical representation as an input.

Here's something interesting. Whatever gets built, most of it is general purpose for all SS7 networks regardless of what equipment or protocol is being used. Distinctions about specific equipment like ss7boxd, isupd, and sccpd are made in the final steps where specific conf files are created and pushed or pulled from specific nodes. Sounds like an open source project.

Saturday, December 10, 2011

Lab Expansion Problem and Fix

The lab is expanding to support cluster configuration testing again. We use old versions of Dahdi, Asterisk and Linux because we only like change in our own code. So when we upgraded a Centos 5 system, we ran into the following problem:

CC [M] /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/card_bri.o
In file included from /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/xpd.h:31,
from /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/card_bri.c:29:
include/linux/device.h:407: error: expected identifier or â(â before âconstâ
make[3]: *** [/usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/card_bri.o] Error 1
make[2]: *** [/usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp] Error 2
make[1]: *** [_module_/usr/src/dahdi-linux-2.2.0.1/drivers/dahdi] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-274.12.1.el5-PAE-i686'
make: *** [modules] Error 2

After trying to solve the problem as though something was missing, we had the insight that maybe something was being redefined. Centos back ports lots of stuff into its 2.6.18 and at the same time, Dahdi does some of its own back porting. We think we found duplicate back porting of the same item. The back ported item in the Dahdi package was eliminated and the problem went away. The patch is as follows:

--- /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/xdefs-orig.h 2011-12-10 11:41:12.000000000 -0500
+++ /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/xdefs.h 2011-12-10 11:23:25.000000000 -0500
@@ -139,7 +139,7 @@
ssize_t name(struct device_driver *drv, char * buf)

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,26)
-#define dev_name(dev) (dev)->bus_id
+//#define dev_name(dev) (dev)->bus_id
#define dev_set_name(dev, format, ...) \
snprintf((dev)->bus_id, BUS_ID_SIZE, format, ## __VA_ARGS__);
#endif

Quite a relief. Now on to installing the media and signal gateway applications, drivers, and patches. We are creating a four-box cluster configuration with duplex signal gateways and two media gateways. We'll start with a two-box configuration composed of a signal gateway and a media gateway. We'll suspend system growth so that we can use this new system to establish the current functionality of the call detail recording system in isupd. After this assessment we'll plan to fix any deficiencies in the CDR system. Then we'll add another media gateway to the system. After that, we'll detail the steps needed to convert a in-service simplex signal system cluster to a duplex signal system cluster. Then after that, we'll repeat the process at a live commercial 14-box operation.

Thursday, December 08, 2011

Clustering Improvements in ss7box

ss7box is an end node. It does not perform the transfer function. Let's make that clear.

ss7box clustering allows several CIC ranges to be supported by several call engines. ss7box and the call engines communicate over an IP network. The association of a CIC range to a call engine worked fine as long as a unique local/remote address:port tuple was used. It makes sense to reuse the same address tuple for different CIC ranges handled by the same call engine. Unfortunately, it didn't work. Even more unfortunately, the problem was flagged but the system was allowed to continue operating. It would have been better to halt ss7box and force the problem to be fixed.

While fixing the CIC range association problem, some sloppy coding was identified and fixed in the socket creation tools too.

Wednesday, December 07, 2011

We're back!

Work on ss7box is resuming. Found a problem with clustering and circuit group messages. It's possible to receive a circuit group message for a range of CICs that span across multiple cluster nodes. Currently, the code assumes circuit group messages can and will be parochial to a single node. The circuit group message handler must be redesigned to handle any possible range of CICs on any number of nodes.

Monday, November 09, 2009

ISUP Circuits Stay in Quarantine State

> Dear Mr. ss7box,
>
> Can you give me some insight into what theses messages are telling me:
>
> Nov 9 15:12:45 tele2ss7 sangoma_isupd[21182]:
> W:sb_cmm.c:cmm_quarantine_ckt:CQ_WAIT_TIMER:go back to
> CQ_WAIT_SG:span/chan/event_id/caller follow:10:22:1019:8
> Nov 9 15:12:45 tele2ss7 sangoma_isupd[21182]:
> W:sb_cmm.c:cmm_quarantine_ckt:CQ_WAIT_TIMER:go back to
> CQ_WAIT_SG:span/chan/event_id/caller follow:10:19:1019:8
> Nov 9 15:12:45 tele2ss7 sangoma_isupd[21182]:
>
> Puzzled

Dear Puzzled,

My first guess is that the remote side is not responding to our reset circuit messages. If this goes on long enough the circuit will be quarantined and stay in the quarantined state until the remote side answers our resets.

Sometimes this happens because the NI and PRIO values in MTP3 for ISUP on our side are not matched to those of the remote side. You can check this by using /ss7box/ss7box_cli --show msu on and looking in /ss7box/msu.log. Examine the inbound and outbound messages for a particular point code pair and ensure that the NI and PRIO values are the same.

If the values are the same then you'll most likely need to work with the remote telco to find out why they are not responding to the circuit reset. The telco on the remote side may or may not care about helping you, and it may be a function of their ability to help you.

If the values are different then make necessary changes to the NI and PRIO columsn for this trunk group in the ss7boost section (soon to be renamed the isupd section) of the configuration spreadsheet and use the smgcfg tool to make a new sangoma_isup.conf. Apply the new conf and see if problem is resolved.

Friday, October 30, 2009

Six SMG Project Status

Since June I've been building a set of six servers to run SMG. A major goal of this project is to deliver 6 SMG servers fully configured and tested so they are ready to run out of the box. Two of the boxes have shipped and presumably are doing well. The four remaining boxes have been a challenge. Two Intel motherboards had to be replaced. One of the two was replaced twice, and the second replacement was to an ASUS board which is now in my warranty replacement inventory.

I specified Intel CPU, fans, and motherboards thinking that a single brand system must have been system tested. My doubts started upon assembling the first system and seeing the Intel fan attachment mechanism causing the motherboard to bow slightly. Then I started getting random kernel oops, panics, and freezes on the 5th box. I had to go through the Intel RMA process to get a replacement but not without a fight. At first, Intel rejected my claim and said that I damaged the board, and offered to sell a new board to me at a reduced cost. After some discussion Intel kindly reversed their position and replaced the board at no cost to me.

As I was waiting for the Intel RMA, I purchased a 7th board to move the project ahead. This board worked for a few days and then began failing with the same random kernel oops, panic, and freeze. Since this purchase was recent I was able to have the board exchanged at Tiger Direct without any problem. I built up box 5 using board 7. It wasn't long before it failed similarly. When I returned to Tiger Direct for yet another replacement I found that they had sold out of this board completely. I exchanged it for an ASUS board and had to buy DDR2 memory because the Intel board used DDR3 memory.

The Intel RMA board arrived and has been working well. The ASUS board is on the shelf because I didn't want to spend time working with an odd board when the project is very late already. I purchased a Thermaltake fan when I bought the ASUS board. This fan causes no board deformation. I will replace the Intel fans in boxes 3-5 with Thermaltake fans on the hunch that fans causing board deformation is the root cause of the board problems, and to simply eliminate the undesirable deformation.

Box 6 has a failed power LED. This is an annoyance. Had I quality checked all components as soon as I received them I wold have been able to get a complete box exchange at the store. My options now are to RMA the box with the manufacterer, fix the LED, or purchase a new box. I will attempt to fix the LED first.

Another unexpected problem became evident after building the first box. The analog cards were long and top heavy and had a PCI-e1 connector. I estimated that these boards might fail during shipping because of stresses on the connector. I also observed the boards sagging under their own weight. To alleviate the stress and sagging during shipping and operation, I devised and constructed a support sytem made of Lexan and aluminum shown here.

Early tests showed noticeable heat build-up in the boxes. Investigation led us to understand that the three analog cards in each box dissapate a lot of heat continuously. All six boxes have case fans added to their specifications.

The original project plan called for placing two patch panels in a desktop rack. When the user received the first two boxes and one of the patch panel racks, he saw that having one patch panel per rack offered more flexibility. The project will now include three additional racks for a total of six desktop racks: three 4U and three 2U.

The project plan has been enahnced to include a 4 box daisy-chain configuration and a 'Y' configuration. The plan was also improved to include bulk call testing using SIPP.

The project documentation will use textual, graphical, and video media.

The current status of the project is:

boxes 1 and 2 are shipped and working on site
boxes 3-6

are built and showing signs of long term stability
analog lines are working
require daisy-chain and Y configuration building and testing
require bulk call testing
require shipping to site

box 6 requires power LED repair
box 3-5 require CPU fan change out
2U patch panel acquisition and preparation
video capture and edit of setup, operation, installation, configuration
.pdf documentation

Thursday, October 29, 2009

sangoma_sccpd Passes Heartbeat Test

The sangoma_sccpd daemon is running and heartbeating with ss7boxd:

[root@ana3 ss7box]# ./xps
10978 -15 ss7boxd
11531 -15 sangoma_isupd
11536 -5 sangoma_mgd
11613 -15 sangoma_sccpd
9891 0 asterisk

It is sending a heartbeat to ss7box. The heartbeats from isupd and sccpd are recognized by ss7boxd who sends a heartbeat ack to each entity. Hearbeats from isupd are ack'd to isupd, and heartbeats from sccpd are ack'd to sccpd. Here are some logs:

First we see the SS7 links are up:

Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 0:link 2:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 1:msu oc 34:tot oc 162000:util 0
Oct 29 16:38:41 ana3 ss7boxd[10978]: R:link util:ls 1:link 2:msu oc 34:tot oc 162000:util 0

Then we see an inbound heartbeat from isupd (SI=5):

Oct 29 16:38:43 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3

Next we see an inbound heartbeat from sccpd (SI=3):

Oct 29 16:38:46 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3

We then confirm that the heartbeat ack was sent from ss7boxd to sccpd:

Oct 29 16:38:46 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0

And finally, we see the process repeating continuously:

[root@ana3 ss7box]# Oct 29 16:38:49 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:51 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:51 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:54 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3
Oct 29 16:38:56 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:3:0:3
Oct 29 16:38:56 ana3 sangoma_sccpd[11613]: I:core.c:analyse_sg_0_heartbeat_ack:MARK:0
Oct 29 16:38:59 ana3 ss7boxd[10978]: I:mtp3_l4_m3ua.c:handle_asp_state_msg:M3UA_MT_ASPSM_BEAT:si/index/msg-type:5:0:3

Tuesday, October 27, 2009

Twick or Tweet

Twitter has allowed me to return. I suppose I'll continue using it for microblogging the daily stuff.

sccpd and it's sister isupd are the twin daemons that came from splitting ss7boost into two separate functions. This project has been crawling along since May of this year, but it's finally become a top priority. isupd is fully separated operational and is part of the most recent stable SMG release. sccpd will require a new revision of ss7boxd. Anybody wanting to try out the tip development releases of sccpd should expect some updates to ss7boxd along the way. We're planning on getting most of the ss7boxd changes completed soon so that disturbances at the MTP3 levels are minimized.

The four box 48 hours stability test is still running and will be complete in about 12 hours from now. So far, this is the longest run ever for all 4 boxes. Intel motherboards have been a challenge. Intel silicon is fine. My spare motherboard inventory has been converted to ASUS already.

Dead Bird

Twitter won't let me in. No more tweeted status updates for now. Guess I'll go back to old-fashioned blogging.

The four box, 48 hour stability test passed the 24 hour mark this morning.

I've got a Sony video camera for documenting SMG procedures. I downloaded a video file for the first time this morning and found that Windows Media Player would not play the sound. Then I downloaded VLC Media Player which just worked with the MPEG2-PS files produced by the camera. Now I'm looking at video editing tools starting with CyberLink PowerDirector. I don't want to spend much time with this, so the first tool that just works will be the winner. (Follow up. This was hard and not much fun, but I finally managed to produce a video. I think I know enough now to make a useful video.)

This afternoon I'll complete the lab setup for sccpd testing and hopefully complete the first sccpd tests.

Thursday, October 15, 2009

Normalcy Returns

It's been crazy busy since June. We delivered ss7box redundancy, 32 E1/T1 SMG, and some significant bug fixes. We're also working on a couple of unusual special projects that have resisted efforts to keep them on plan. Mix in several personal catastrophes requiring on-going time demands, and presto: crazy busy.

Thank you to everyone that had to wait longer than expected for promises fulfilled and support.

The good news is that things are settling down. The SMG product is showing itself to be on solid ground and we are now committed and able to support releases without requiring users to jump up to using development tip releases. The special projects are headed toward completion and the long-promised sccpd project is back on track.

What lies ahead? More intuitive configuration methods for networks and nodes. A continuous high volume testing program with more variance and talk path testing. Increased focus on sccp and related applications. Exposure of protocol layers using API's.

Tuesday, July 14, 2009

Thinking vs. Debugging

It happened again. I was working on redundant ss7box yesterday. When I launched the second ss7box there were lots of route audit log messages. Odd because ISUP was not running so there should not have been any route audits. How could I have screwed up so badly?

I got up and walked away. I did not return for over twelve hours. In that time I realized that I was not clear on how to configure redundant ss7box yet. Maybe the configuration was screwed up.

This morning I confirmed that the problem appeared as soon as the new redundant ss7box was started. I looked closely at the ss7box configuration and found that the same port number for sockets to the ISUP layer had been used on both ss7boxes. I found a new configuration rule.

This find strengthened the thought I had yesterday that configuration definition is still too complicated and redundant. Getting the configuration correct is difficult for me, so it will be impossible for a normal user.

This experience demonstrates the importance of taking time to think as part of the debugging process. Setting breakpoints and adding print statements are important too. What's more, it's active debugging. It shows that you are doing something. On the other hand, thinking is passive. It's often done while you are doing something else. It looks like goofing off. It's not. It's probably the most powerful debugging technique that I have. I first heard about it in engineering school at the University of South Carolina from Dr. Pettus. I've found my own way to what he was talking about. It takes a while to get there.

Problems are usually not solved on a time line or according to a deadline. If they are, they can be forced and awkward. Taking time to think through a problem and letting the solution come naturally according to function of the brain doing the thinking is valuable. My brain does not arrive at solutions well under high stress. It won't find any solutions when there's a lack of stress. Finding a tolerable balance of stress is the key.

Friday, June 05, 2009

ss7box: New Features, New Lab

ss7box is getting a lot of attention lately. Three new features are being developed simultaneously: redundancy, sccp routing, and support for a fiber interface. To support this development, significant changes to the Xygnada lab are required. Figure 1 below shows the build plan.

Figure 1

The nodes ana3 and ana62 will host a mated pair of redundant ss7box acting as a single point code. ana3 will host Asterisk, SMG, ISUP, SCCP, and a CNAM application that is capable of being both a client and a server. ana19 will be a clustered ISUP node using SS7 services from ana3 and ana62 with a new twist - it will have a point code that differs from the redundant ss7box pair. It will also host Asterisk and SMG. ana17 will be a single node instance of asterisk/SMG/ss7box. The dt node will be dedicated to SCCP and related applications with CNAM client/server being the lead-off application. This node will also serve as a developer workstation and regression test platform to support integration of a fiber interface into ss7box. In the middle, nodes ana60 and ana61 provide MTP3 transfer services like those found in STPs.

The lab configuration creates quite a few functional interactions and raises the overall lab complexity so that we can get more test coverage and carry out development in several areas simultaneously. Changing the lab is labor intensive and work in other areas stops as a result, so it's not done as often as needed. In this case, we could no longer put off the pain of changing the lab because progress had come to a halt.

The ana3, ana62, ana60, ana61, and ana17 nodes had been working together prior to the change. The dt node has been under construction for a while. It took a while to find the right version of opensuse (10.2) to work with the fiber interface libraries. Then it took more time to figure out that its A102c interfaces are incompatible with the modern SMG/ss7box so an upgrade to A102SH was required. The SS7 linkset between dt and ana60 was put into service today. What remains is to put the dt-ana61 linkset into service, create ana19, and reconfigure ana62 into an SMG/ss7box node from its current status as an SMG-only clustered node.

Wednesday, May 27, 2009

Configuration Changes

We are moving away from hand editing SMG configuration files, and moving toward using the smgcfgXX.py tool. XX is the version. We are currently using 01 and working on 02 now.

The tool uses a .csv file as input and the output are the wanpipe, ss7box, and ss7boost conf files. The .csv file comes from the export function in any spreadsheet program.

We support googledocs spreadsheets:
http://spreadsheets.google.com/ccc?key=rzeNA2SoiXKhaKX7Y9t2keg&hl=en

The smgcfgXX.py program is a Python script and is included with each SMG release.

We are working on an improvement where the smgcfg.py script will read the revision number in the .csv file and use the corresponding smgcfgXX.py script to continue processing.

Wednesday, May 13, 2009

ss7box Redundancy Explained

Two ss7box are better than one as is usually the case in SS7 networking. If one fails or needs to be worked on, the other carries the load and there's no interruption in operations. The next question is, "How will that work?"

The first thing that's needed is a two-link linkset as shown in Figure 1 below. The links need to be in two separate E1/T1 spans. Each span is connected to Sangoma wanpipe port on a separate Linux box. Each Linux box is running an instance of ss7box. This is the F-Link setup most often found in ITU networks.

Figure 1

In Figure 2 the left hand SSP from Figure 1 is shown as an Sangoma SMG implementation using redundant ss7box. ss7box and SMG communicate using SCTP/IP. SMG communicates with Asterisk using TCP/IP and UDP/IP. SMG can receive from either ss7box at any time. SMG sends to the pair of ss7box using a load balancing scheme. If one of the links to an ss7box is lost, or if an ss7box is lost or is under maintenance, then the full signaling load is handled by the remaining ss7box. When the second ss7box returns to service, load is automatically balanced between both ss7box.

Figure 2

ANSI networks and some ITU networks use redundant signal transfer points (STPs) to interconnect signaling terminations like signal switching points (SSPs) as shown in Figure 3. In this type of network there will be a combined linkset between an SSP and a mated pair of STPs. A combined linkset, in its simplest form, is a pair of single-link linksets. The SSP connects to both linksets. Each STP in the mated pair connects to one of the two linksets. The same rules

Figure 3

In Figure 4 the SSP from Figure 3 is shown as an Sangoma SMG implementation. The same rules regarding load balance, fail-over, and fail-back listed for the F-link configuration above also apply to this A-link (or combined linkset) configuration.

Figure 4