Saturday, March 31, 2012

Three Urgent Problems in isupd

Problem 1:

Mar 26 02:21:03 v3 sangoma_isupd[15421]: F:sb_cmm.c:get_new_overall_rte_status:illegal status, should be DAVA:tg/sg0-stat/sg1-stat/overall-stat:0:2:1:1

We are sure this one is fixed in the isupd 2.6.1 patch 8.

Patch 8 (2012-03-30 Fr)
  * when heartbeat is lost to SG in simplex or both SG in duplex, M3UA route
    is set to DUNA; patch also sets individual SG route status to DUNA which
    prevents failure of a status check elsewhere in the code

Problem 2:

*** glibc detected *** /usr/local/ss7box/sangoma_isupd: double free or corruption (out): 0x08dd90a8 ***
======= Backtrace: =========
/lib/libc.so.6[0x83a6c5]
/lib/libc.so.6(cfree+0x59)[0x83ab09]
/usr/local/ss7box/sangoma_isupd[0x804d7e1]
/usr/local/ss7box/sangoma_isupd[0x804992d]
/lib/libc.so.6(__libc_start_main+0xdc)[0x7e6e9c]
/usr/local/ss7box/sangoma_isupd[0x80494b1]

We searched the code for a while but the threads were long, so we decided to devise a usage semaphore in each timer that would let us detect double freeing of timer memory and dump info to the log to help us track down the problem. The timer semaphore is in isupd 2.6.1 patch 9.

Patch 9 (2012-03-31 Sa)
  * added a semaphore to timers to test for double freeing of memory for timers

Problem 3:

CQM that cross T1 span boundries cause improper responses that cause loss of circuits. A restart of isupd is required to recover the lost circuits. The CQM arrives nightly at several locations. We are devising a multi-step solution. The first step is to develop a work-around solution to prevent circuit loss as quickly as possible. The first step will be to create a response based on the incoming CQM that always reports the indicated circuits as being in working order.

No comments: