Thursday, December 15, 2011

Wanpipe Install Problem Fixed


Got this problem. Fixed it. Don't think it's important. It took a lot of time....wasted time....to figure all of this out.

Compiling WANPIPE API Development Utilities ...Failed!

        ERROR: Failed to compile WANPIPE API Tools !!!
        Please contact support at Sangoma Technologies
        email: techdesk@sangoma.com
        Please include the file setup_drv_compile.log


Let's see if we can get some detail:

[root@ana64 api]# cd /usr/src/Sangoma/wanpipe/api


[root@ana64 api]# make
make -C tdm_api
make[1]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/tdm_api'
Ok.
make[1]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/tdm_api'
make -C legacy
make[1]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy'
make -C x25 all  APIINC=/usr/include/wanpipe
make[2]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/x25'
Ok.
make[2]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/x25'
make -C chdlc all  APIINC=/usr/include/wanpipe
make[2]: Entering directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/chdlc'
cc -Wall -O2 -D__LINUX__ -D_DEBUG_=2 -D_GNUC_ -I../lib -I/usr/include/wanpipe -o chdlc_modem_cmd chdlc_modem_cmd.c ../lib/lib_api.c
chdlc_modem_cmd.c: In function 'handle_socket':
chdlc_modem_cmd.c:412: error: 'wp_api_hdr_t' has no member named 'error_flag'
make[2]: *** [chdlc_modem_cmd] Error 1
make[2]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy/chdlc'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/Sangoma/wanpipe-3.5.12/api/legacy'
make: *** [all] Error 2


Problem is with legacy chdlc - not using it.  This is why this problem can most likely be ignored. Nevertheless, the fix follows. First, we look for a replacement for the offending "error_flag" field that's not defined.

[root@ana64 api]# grep -r "wp_api_hdr_t\;" ../* | grep "\.h\:"
grep: warning: ../patches/kdrivers/include/linux: recursive directory loop

../patches/kdrivers/include/wanpipe_api_hdr.h:} wp_api_hdr_t;
grep: warning: ../patches/kdrivers/wanec/linux: recursive directory loop

[root@ana64 api]# vi ../patches/kdrivers/include/wanpipe_api_hdr.h
[root@ana64 api]# vim ../patches/kdrivers/include/wanpipe_api_hdr.h

This looks promising:

/* CHDLC Old backdward comptabile */
#define wp_api_rx_hdr_chdlc_error_flag                  wp_api_rx_hdr_error_flag

Let's apply a change:


[root@ana64 api]# cd /usr/src/Sangoma/wanpipe/api/legacy/chdlc/
[root@ana64 chdlc]#

Create a file called "patch" and fill it with the following:



--- chdlc_modem_cmd.c   2011-12-15 17:05:20.000000000 -0500
+++ chdlc_modem_cmd.c.chg       2011-12-15 17:16:06.000000000 -0500
@@ -409,7 +409,7 @@
                                                return;
                                        }

-                                       switch (api_rx_el->api_rx_hdr.error_flag){
+                                       switch (api_rx_el->api_rx_hdr.wp_api_rx_hdr_error_flag){

                                        case 0:
                                                /* Rx packet is good */

Apply the patch:


[root@ana64 chdlc]# patch --ignore-whitespace < patch
patching file chdlc_modem_cmd.c

Now compile the api:

[root@ana64 chdlc]# cd /usr/src/Sangoma/wanpipe/api
[root@ana64 api]# make

Problem should be gone.  There will be tons of warnings depending on the gcc version you are using. As long as you don't see "error" in the output it should be fine.

Tuesday, December 13, 2011

Cluster Configuration Needs To Improve

Adding a cluster node to the lab this morning.  The lab is currently working using older versions of ss7box/SMG.  This configuration needs to remain intact.  The new cluster being added will be the development tip alpha test site.

It takes a lot of coordinated data to make it work because that's how the protocol works.  The smgcfg tool attempted to simplify the task and it did to some extent but plenty of feedback says we can do better.

Using a spreadsheet helps to make things visual and colorful, but downloading .csv and running smgcfgXX against it is kludgy. Two tools and manually pushing files is not good.  I'm getting reintroduced to the problem this morning.

A better approach would be to use a single tool that is aware of a group of nodes and a library of configurations.  The user interface needs to be efficient for users that want to use vi to edit a source file, a command line for lean systems with no X11 stuff loaded, and a command line IF that accommodates a web interface.  A diagramatic interface showing nodes and connections that allows click-to-query-or-modify would be helpful.  All of these interfaces should be supported interchangeably, for example, if one person wants to use vi on a source file and another wants to use the CLI or web interface (not at the same time), then it should be possible - because the CLI is a specialized source file editor and the web interface uses the CLI.  Of course, using vi to edit the source file could screw things up if the format is disturbed.  Ideally the CLI interface will not have this problem.  Furthermore, the CLI would have prompts like: add a link to a linkset, or add a trunk to a trunkgroup, or a powerful add a node. These prompts would lead the user through the collection of information needed.  Maybe the tool could generate a graphical representation output from input data as a precursor to using a graphical representation as an input.

Here's something interesting.  Whatever gets built, most of it is general purpose for all SS7 networks regardless of what equipment or protocol is being used.  Distinctions about specific equipment like ss7boxd, isupd, and sccpd are made in the final steps where specific conf files are created and pushed or pulled from specific nodes.  Sounds like an open source project.

Saturday, December 10, 2011

Lab Expansion Problem and Fix



The lab is expanding to support cluster configuration testing again.  We use old versions of Dahdi, Asterisk and Linux because we only like change in our own code.  So when we upgraded a Centos 5 system, we ran into the following problem:

  CC [M]  /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/card_bri.o
In file included from /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/xpd.h:31,
                 from /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/card_bri.c:29:
include/linux/device.h:407: error: expected identifier or â(â before âconstâ
make[3]: *** [/usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/card_bri.o] Error 1
make[2]: *** [/usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp] Error 2
make[1]: *** [_module_/usr/src/dahdi-linux-2.2.0.1/drivers/dahdi] Error 2
make[1]: Leaving directory `/usr/src/kernels/2.6.18-274.12.1.el5-PAE-i686'
make: *** [modules] Error 2


After trying to solve the problem as though something was missing, we had the insight that maybe something was being redefined.  Centos back ports lots of stuff into its 2.6.18 and at the same time, Dahdi does some of its own back porting.  We think we found duplicate back porting of the same item. The back ported item in the Dahdi package was eliminated and the problem went away.  The patch is as follows:

--- /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/xdefs-orig.h 2011-12-10 11:41:12.000000000 -0500
+++ /usr/src/dahdi-linux-2.2.0.1/drivers/dahdi/xpp/xdefs.h      2011-12-10 11:23:25.000000000 -0500
@@ -139,7 +139,7 @@
                ssize_t name(struct device_driver *drv, char * buf)

 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,26)
-#define dev_name(dev)          (dev)->bus_id
+//#define dev_name(dev)                (dev)->bus_id
 #define dev_set_name(dev, format, ...) \
        snprintf((dev)->bus_id, BUS_ID_SIZE, format, ## __VA_ARGS__);
 #endif

Quite a relief.  Now on to installing the media and signal gateway applications, drivers, and patches.  We are creating a four-box cluster configuration with duplex signal gateways and two media gateways. We'll start with a two-box configuration composed of a signal gateway and a media gateway. We'll suspend system growth so that we can use this new system to establish the current functionality of the call detail recording system in isupd.  After this assessment we'll plan to fix any deficiencies in the CDR system. Then we'll add another media gateway to the system. After that, we'll detail the steps needed to convert a in-service simplex signal system cluster to a duplex signal system cluster. Then after that, we'll repeat the process at a live commercial 14-box operation. 

Thursday, December 08, 2011

Clustering Improvements in ss7box

ss7box is an end node.  It does not perform the transfer function. Let's make that clear.

ss7box clustering allows several CIC ranges to be supported by several call engines.  ss7box and the call engines communicate over an IP network.  The association of a CIC range to a call engine worked fine as long as a unique local/remote address:port tuple was used.  It makes sense to reuse the same address tuple for different CIC ranges handled by the same call engine.  Unfortunately, it didn't work.  Even more unfortunately, the problem was flagged but the system was allowed to continue operating.  It would have been better to halt ss7box and force the problem to be fixed.

While fixing the CIC range association problem, some sloppy coding was identified and fixed in the socket creation tools too.

Wednesday, December 07, 2011

We're back!

Work on ss7box is resuming.  Found a problem with clustering and circuit group messages.  It's possible to receive a circuit group message for a range of CICs that span across multiple cluster nodes.  Currently, the code assumes circuit group messages can and will be parochial to a single node.  The circuit group message handler must be redesigned to handle any possible range of CICs on any number of nodes.

Monday, November 09, 2009

ISUP Circuits Stay in Quarantine State

> Dear Mr. ss7box,
>
>     Can you give me some insight into what theses messages are telling me:
>
> Nov  9 15:12:45 tele2ss7 sangoma_isupd[21182]:
> W:sb_cmm.c:cmm_quarantine_ckt:CQ_WAIT_TIMER:go back to
> CQ_WAIT_SG:span/chan/event_id/caller follow:10:22:1019:8
> Nov  9 15:12:45 tele2ss7 sangoma_isupd[21182]:
> W:sb_cmm.c:cmm_quarantine_ckt:CQ_WAIT_TIMER:go back to
> CQ_WAIT_SG:span/chan/event_id/caller follow:10:19:1019:8
> Nov  9 15:12:45 tele2ss7 sangoma_isupd[21182]:
>
> Puzzled

Dear Puzzled,

My first guess is that the remote side is not responding to our reset circuit messages. If this goes on long enough the circuit will be quarantined and stay in the quarantined state until the remote side answers our resets.

Sometimes this happens because the NI and PRIO values in MTP3 for ISUP on our side are not matched to those of the remote side. You can check this by using /ss7box/ss7box_cli --show msu on and looking in /ss7box/msu.log. Examine the inbound and outbound messages for a particular point code pair and ensure that the NI and PRIO values are the same.

If the values are the same then you'll most likely need to work with the remote telco to find out why they are not responding to the circuit reset. The telco on the remote side may or may not care about helping you, and it may be a function of their ability to help you.

If the values are different then make necessary changes to the NI and PRIO  columsn for this trunk group in the ss7boost section (soon to be renamed the isupd section) of the configuration spreadsheet and use the smgcfg tool to make a new sangoma_isup.conf. Apply the new conf and see if problem is resolved.

Friday, October 30, 2009

Six SMG Project Status

Since June I've been building a set of six servers to run SMG. A major goal of this project is to deliver 6 SMG servers fully configured and tested so they are ready to run out of the box. Two of the boxes have shipped and presumably are doing well. The four remaining boxes have been a challenge. Two Intel motherboards had to be replaced. One of the two was replaced twice, and the second replacement was to an ASUS board which is now in my warranty replacement inventory.

I specified Intel CPU, fans, and motherboards thinking that a single brand system must have been system tested. My doubts started upon assembling the first system and seeing the Intel fan attachment mechanism causing the motherboard to bow slightly. Then I started getting random kernel oops, panics, and freezes on the 5th box. I had to go through the Intel RMA process to get a replacement but not without a fight. At first, Intel rejected my claim and said that I damaged the board, and offered to sell a new board to me at a reduced cost. After some discussion Intel kindly reversed their position and replaced the board at no cost to me.

As I was waiting for the Intel RMA, I purchased a 7th board to move the project ahead. This board worked for a few days and then began failing with the same random kernel oops, panic, and freeze. Since this purchase was recent I was able to have the board exchanged at Tiger Direct without any problem. I built up box 5 using board 7. It wasn't long before it failed similarly. When I returned to Tiger Direct for yet another replacement I found that they had sold out of this board completely. I exchanged it for an ASUS board and had to buy DDR2 memory because the Intel board used DDR3 memory.

The Intel RMA board arrived and has been working well. The ASUS board is on the shelf because I didn't want to spend time working with an odd board when the project is very late already. I purchased a Thermaltake fan when I bought the ASUS board. This fan causes no board deformation. I will replace the Intel fans in boxes 3-5 with Thermaltake fans on the hunch that fans causing board deformation is the root cause of the board problems, and to simply eliminate the undesirable deformation.

Box 6 has a failed power LED. This is an annoyance. Had I quality checked all components as soon as I received them I wold have been able to get a complete box exchange at the store. My options now are to RMA the box with the manufacterer, fix the LED, or purchase a new box.  I will attempt to fix the LED first.

Another unexpected problem became evident after building the first box.  The analog cards were long and top heavy and had a PCI-e1  connector.  I estimated that these boards might fail during shipping because of stresses on the connector.  I also observed the boards sagging under their own weight.  To alleviate the stress and sagging during shipping and operation, I devised and constructed a support sytem made of Lexan and aluminum shown here.

Early tests showed noticeable heat build-up in the boxes.  Investigation led us to understand that the three analog cards in each box dissapate a lot of heat continuously. All six boxes have case fans added to their specifications.

The original project plan called for placing two patch panels in a desktop rack. When the user received the first two boxes and one of the patch panel racks, he saw that having one patch panel per rack offered more flexibility.  The project will now include three additional racks for a total of six desktop racks: three 4U and three 2U.

The project plan has been enahnced to include a 4 box daisy-chain configuration and a 'Y' configuration.  The plan was also improved to include bulk call testing using SIPP.

The project documentation will use textual, graphical, and video media.

The current status of the project is:
  1. boxes 1 and 2 are shipped and working on site
  2. boxes 3-6
    1. are built and showing signs of long term stability
    2. analog lines are working
    3. require daisy-chain and Y configuration building and testing
    4. require bulk call testing
    5. require shipping to site
  3. box 6 requires power LED repair
  4. box 3-5 require CPU fan change out
  5. 2U patch panel acquisition and preparation
  6. video capture and edit of setup, operation, installation, configuration
  7. .pdf documentation