Sunday, February 24, 2013

ss7box Will Not Be Open Source

It's final.  ss7box will not be released to open source. There are objections.

The world is filled with telephone systems that interconnect using SS7.  They could sit there for decades just humming along.  But the 4-wire telephone interface is becoming irrelevant because wireless and IP  alternatives.  Eventually the power bill for running those systems might exceed the profit they generate.  Will there be a time when 4-wire POTS service will be discontinued?  I read that POTS service decline in the US is steady and precipitous, but it's a cash generating system that requires little input. If this is true then the incentive is to do nothing and harvest as much cash as possible while building up wireless and broadband.

The crystal ball is murky. Some things point to a slow decline and others to a faster one. It's not clear that another open source SS7 stack is needed either.  The current SS7 service providers and equipment providers will be in a race to the bottom as they face lower demand for their products and services unless they form a cartel and agree to keep prices high in order to glean as much cash from the market as possible. A couple of players, Nortel and Lucent aren't even playing anymore. Assuming no cartel will be formed, the open source SS7 stacks will fair poorly against the time-testing commercial solutions that keep getting cheaper and cheaper.

All of this adds up to a dismal outlook for open source SS7. The first working version of ss7box took me a year of full-time effort to produce.  There were people cheering me on. It was exciting.  The prospect of doing it again does not energize me.  But, if I were to do it, I'd do it differently.  The MTP2 in ss7box follows the ITU specs to the letter. Those specs are perfect.  I'd follow the specs again.  But this time I'd write it to use a bitstream from any card. Each card would require an interlock code to connect its device driver to the south end of the MTP2.  MTP2 would reside in kernel space unless RT Linux would allow it to run in top priority in user space. The downside is that E1/T1 cards are not typically RT Linux ready. I'd make the ss7 device run in a completely separate box and run under RT Linux if possible.  I'd try hard to make it compatible with box virtulizers like VirtualBox. For the MTP2 to upper layer interface, I'd make it a socket based interface.  I'd do a better job of providing for link failover-failback. The interface would have plug-in capability to convert between MTP, SIGTRAN, SIP, etc. The  message router would be re-used code from some other open source project rather than re-invented.  The ISUP encoder/decoders would be taken from Wireshark.  The configuration system would be taken from Asterisk.  The concept of circuit groups would be established at the very beginning of the project - instead of starting with circuits and then tweaking into circuit groups.  The concept of clustering ISUP and SCCP nodes around router and link boxes would be built in on the front end.  Instead of ss7box, I'd have linkbox, routebox, callbox, and servicebox that are interconnected using SCTP sockets. I'd write linkbox in C.  I'd write the other boxes in Java.

By the time this project is ready, the demand for SS7 will have declined even more. I'd be doing this in spare time rather than prime time. No telling when it would be ready, or if it would ever be ready. Perhaps ss7box did what it was supposed to do, and that job is finished now.

Sunday, September 23, 2012

Xygnada Technology Dissolving

Time to close the doors on Xygnada Technology, Inc and ss7box.  Interest in SS7 has declined significantly, or price/performance of alternatives has surpassed that of ss7box.  It's nearly impossible to support ss7box now that the driver is so out of step with recent Linux releases. Being a loyal to a single source of network interface hardware was a serious mistake.  At this point in history there is little incentive to adapt ss7box to another network interface card.

Open sourcing ss7box software is fraught with uncertainty about how partners in the past would react.  None of these partners respond to requests to discuss releasing the software. Well played partners, well played. Doing a clean-room rewrite was considered, but the effort seemed far out of balance with reward.

Thank you and auf wiedersehen.
Mike


Saturday, March 31, 2012

Three Urgent Problems in isupd

Problem 1:

Mar 26 02:21:03 v3 sangoma_isupd[15421]: F:sb_cmm.c:get_new_overall_rte_status:illegal status, should be DAVA:tg/sg0-stat/sg1-stat/overall-stat:0:2:1:1

We are sure this one is fixed in the isupd 2.6.1 patch 8.

Patch 8 (2012-03-30 Fr)
  * when heartbeat is lost to SG in simplex or both SG in duplex, M3UA route
    is set to DUNA; patch also sets individual SG route status to DUNA which
    prevents failure of a status check elsewhere in the code

Problem 2:

*** glibc detected *** /usr/local/ss7box/sangoma_isupd: double free or corruption (out): 0x08dd90a8 ***
======= Backtrace: =========
/lib/libc.so.6[0x83a6c5]
/lib/libc.so.6(cfree+0x59)[0x83ab09]
/usr/local/ss7box/sangoma_isupd[0x804d7e1]
/usr/local/ss7box/sangoma_isupd[0x804992d]
/lib/libc.so.6(__libc_start_main+0xdc)[0x7e6e9c]
/usr/local/ss7box/sangoma_isupd[0x80494b1]

We searched the code for a while but the threads were long, so we decided to devise a usage semaphore in each timer that would let us detect double freeing of timer memory and dump info to the log to help us track down the problem. The timer semaphore is in isupd 2.6.1 patch 9.

Patch 9 (2012-03-31 Sa)
  * added a semaphore to timers to test for double freeing of memory for timers

Problem 3:

CQM that cross T1 span boundries cause improper responses that cause loss of circuits. A restart of isupd is required to recover the lost circuits. The CQM arrives nightly at several locations. We are devising a multi-step solution. The first step is to develop a work-around solution to prevent circuit loss as quickly as possible. The first step will be to create a response based on the incoming CQM that always reports the indicated circuits as being in working order.

Friday, March 23, 2012

How To Read the CDR Log File



Here is an example with fake phone numbers:

1001, in, 1, 2012, 03, 23, 16, 02, 22, 1332543742, 499660, 0, 0, 15, 0, , 0, , 0
128, out, 1, 2012, 03, 23, 16, 02, 22, 1332543742, 499692, 0, 0, 15, 10, 1112223333, 10, 
                 4445556666, 0
129, in, 1, 1332543742, 758203, 0, 0, 15
1006, out, 1, 1332543742, 758220, 0, 0, 15
80, unrecognized
1044, unrecognized
132, in, 1, 1332543742, 758427, 0, 0, 15
1009, out, 1, 1332543742, 758441, 0, 0, 15
1012, in, 1, 1332543789, 456939, 0, 0, 15, 16
133, out, 1, 1332543789, 456956, 0, 0, 15
134, in, 1, 1332543789, 464574, 0, 0, 15
1016, out, 1, 1332543789, 464594, 0, 0, 15

Note the 2 unrecognized codes - will have to figure this out. 


The code is the first number on each line; 3 digits codes are MGD messages; 4 digit codes are SS7 messages. The layout is as follows:

Call start events (IAM 1001, callstart 128):

        sprintf (s_cdr, "%u, %s, %u, %s, %lu, %lu, %u, %u, %u, %u, %s, %u, %s, %u\n",
                        p_cdr->code,
                        p_cdr->msg_direction ? "in" : "out",
                        p_cdr->sysid,
                        p_ds,
                        p_cdr->timestamp.tv_sec,
                        p_cdr->timestamp.tv_usec,
                        p_cdr->call_setup_id,
                        p_cdr->span,
                        p_cdr->chan,
                        p_cdr->called_number_dig_count,
                        p_cdr->called_number_digits,
                        p_cdr->calling_number_dig_count,
                        p_cdr->calling_number_digits,
                        p_cdr->calling_number_presentation_indicator
                        );

Call stop events (REL 1012):

        sprintf (s_cdr, "%u, %s, %u, %lu, %lu, %u, %u, %u, %u\n",
                        p_cdr->code,
                        p_cdr->msg_direction ? "in" : "out",
                        p_cdr->sysid,
                        p_cdr->timestamp.tv_sec,
                        p_cdr->timestamp.tv_usec,
                        p_cdr->call_setup_id,
                        p_cdr->span,
                        p_cdr->chan,
                        p_cdr->release_cause);

Simple events - most of the entries (ACM 1006, ANS 1009, RLC 1016):

        sprintf (s_cdr, "%u, %s, %u, %lu, %lu, %u, %u, %u\n",
                        p_cdr->code,
                        p_cdr->msg_direction ? "in" : "out",
                        p_cdr->sysid,
                        p_cdr->timestamp.tv_sec,
                        p_cdr->timestamp.tv_usec,
                        p_cdr->call_setup_id,
                        p_cdr->span,
                        p_cdr->chan);

Thursday, March 15, 2012

Redundant ss7box Refinements

redundant ss7box is being constructed in the lab once again with new twist to simulate an IP network with long and diverse links. We need to simulate the loss of an IP link between a remote Asterisk box and ss7box - something that will be more likely in a network that spans a very large region.  Using two distinctly different IP carriers for IP connections to each ss7box ensures link diversity.


In this new configuration we'll put in a small IP switch between the two mated ss7boxes and an Asterisk box. To test IP link loss we'll pull the IP cable between one of the ss7boxes and the IP switch. The result should be that the ss7box that lost the IP link should redirect the traffic to that IP  link to the crosslink to its mate ss7box, and the Asterisk box that lost the IP link should remap SLS to use its remaining in-service IP link to the other ss7box.  There will be a small window where signals could get lost during the transition. Calls with lost signals will timeout and be cleared. The call parties will experience abrupt call termination indications. Detecting loss of an IP link is  not as punctual as loss of SS7 link detection - IP is weak in this area. We'll help the situation by using a better ping-pong protocol on the IP links to indicate link loss.


The functionality described above does not exist yet, so we'll set up the test network; make test calls before making improvements and confirm that half of all calls will not complete when an IP link is lost; make appropriate code changes; run the call tests under IP link loss and restore conditions. We'll release this functionality in a major revision release - probably 2.7.


Here's a scan of the drawing we are using to build the lab network:




We use Google Doc spreadsheets for the configuration.


Progress:
  1. The ss7 link between 1002 and the new mated 159 ss7box at 192.168.1.62 is up.
  2. The SIP client on the Asterisk box on the 159 cluster has to be rebuilt. The laptop it was running on lost its HDD. The HDD was replaced and Mint 12 was installed last week. Was using Blink and XP previously.  Will need to find and test a suitable SIP client that works on Mint 12 and a Dell Vostro 1000.
    - looks like linphone is the first candidate; and it works too - tested on the 1003 node
  3. write up the linphone and asterisk configuration
This is the link report from the 1002 ss7boxd that shows 3 ss7 links up:


Mar 15 11:34:27 ana156 ss7boxd[7056]: R:link util:ls 0:link 0:msu oc 26:tot oc 161840:util 0

Mar 15 11:34:27 ana156 ss7boxd[7056]: R:link util:ls 1:link 0:msu oc 34:tot oc 161840:util 0

Mar 15 11:34:27 ana156 ss7boxd[7056]: R:link util:ls 1:link 1:msu oc 34:tot oc 161920:util 0





Wednesday, March 14, 2012

ss7boxd 2.6.0.13 Released

Announcing a minor ss7boxd release to distinguish the need to use 0.9 revision ss7box.conf configuration file which is created by the smgcfg09.py program.  The 2.6.0.12 ss7boxd was released with some using 0.8 versions of the config file and later releases used the 0.9 revision of the config file.  Sorry for the confusion.

You can download them from here:

http://www.ss7box.com/tmp/ss7boxd-2.6.0.13-ANSI
http://www.ss7box.com/tmp/ss7boxd-2.6.0.13-ITU

The difference between 0.8 and 0.9 revision conf files is described in the Change History inside the smgcfg09.py file.  The only difference for ss7box.conf is the change to the revision number in the conf file.  ss7boxd 2.6.0.13 is looking for rev 0.9 in the ss7box conf file.

Thursday, December 15, 2011

Wanpipe Install Problem Fixed


Got this problem. Fixed it. Don't think it's important. It took a lot of time....wasted time....to figure all of this out.

Compiling WANPIPE API Development Utilities ...Failed!

        ERROR: Failed to compile WANPIPE API Tools !!!
        Please contact support at Sangoma Technologies
        email: techdesk@sangoma.com
        Please include the file setup_drv_compile.log


Let's see if we can get some detail:

[root@ana64 api]# cd /usr/src/Sangoma/wanpipe/api


[root@ana64 api]# make
make -C tdm_api
make[1]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/tdm_api'
Ok.
make[1]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/tdm_api'
make -C legacy
make[1]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy'
make -C x25 all  APIINC=/usr/include/wanpipe
make[2]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/x25'
Ok.
make[2]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/x25'
make -C chdlc all  APIINC=/usr/include/wanpipe
make[2]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/chdlc'
cc -Wall -O2 -D__LINUX__ -D_DEBUG_=2 -D_GNUC_ -I../lib -I/usr/include/wanpipe -o chdlc_modem_cmd chdlc_modem_cmd.c ../lib/lib_api.c
chdlc_modem_cmd.c: In function 'handle_socket':
chdlc_modem_cmd.c:412: error: 'wp_api_hdr_t' has no member named 'error_flag'
make[2]: *** [chdlc_modem_cmd] Error 1
make[2]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/chdlc'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy'
make: *** [all] Error 2


Problem is with legacy chdlc - not using it.  This is why this problem can most likely be ignored. Nevertheless, the fix follows. First, we look for a replacement for the offending "error_flag" field that's not defined.

[root@ana64 api]# grep -r "wp_api_hdr_t\;" ../* | grep "\.h\:"
grep: warning: ../patches/kdrivers/include/linux: recursive directory loop

../patches/kdrivers/include/wanpipe_api_hdr.h:} wp_api_hdr_t;
grep: warning: ../patches/kdrivers/wanec/linux: recursive directory loop

[root@ana64 api]# vi ../patches/kdrivers/include/wanpipe_api_hdr.h
[root@ana64 api]# vim ../patches/kdrivers/include/wanpipe_api_hdr.h

This looks promising:

/* CHDLC Old backdward comptabile */
#define wp_api_rx_hdr_chdlc_error_flag                  wp_api_rx_hdr_error_flag

Let's apply a change:


[root@ana64 api]# cd /usr/src/Sangoma/wanpipe/api/legacy/chdlc/
[root@ana64 chdlc]#

Create a file called "patch" and fill it with the following:



--- chdlc_modem_cmd.c   2011-12-15 17:05:20.000000000 -0500
+++ chdlc_modem_cmd.c.chg       2011-12-15 17:16:06.000000000 -0500
@@ -409,7 +409,7 @@
                                                return;
                                        }

-                                       switch (api_rx_el->api_rx_hdr.error_flag){
+                                       switch (api_rx_el->api_rx_hdr.wp_api_rx_hdr_error_flag){

                                        case 0:
                                                /* Rx packet is good */

Apply the patch:


[root@ana64 chdlc]# patch --ignore-whitespace < patch
patching file chdlc_modem_cmd.c

Now compile the api:

[root@ana64 chdlc]# cd /usr/src/Sangoma/wanpipe/api
[root@ana64 api]# make

Problem should be gone.  There will be tons of warnings depending on the gcc version you are using. As long as you don't see "error" in the output it should be fine.