Monday, July 28, 2008

A Second Quarter of Reliability, Performance, and Capacity

It's been a while since the last entry. Since that time the focus has been on managing 500+ circuits when four critical resources are lost and recovered. Those four resources are: signal gateway (ss7boxd), media gateway (sangoma_mgd), SS7 route, and bearer channel spans (E1 or T1).

Initial designs were found to be flawed when taken to high capacity test and some field trials. Two revisions followed and now the basic design is settled. The system, called Circuit Maintenance Manager (CMM) is now undergoing high capacity test and field trials. It currently supports loss/recovery of signal gateway, media gateway, and SS7 routes. The bearer channel span management function is ready for development now that underlying support is available from the Sangoma device driver as of this past week.

The CDR Logger is under going a change. The points in ss7boost where information is injected to the logger has been reduced and is now coupled tightly to the ss7boost message input/output functions and timer function. This change ensures no loss of info to the logger. The call serial number field is no longer provided because it did not make sense in some situations. Log entries are correlated into circuit event using the call setup id , span/chan and timestamps for outbound calls, and span/chan/timestamps for inbound calls. The reason for the change is that CDR Logger began as a tool for creating call records only but has also been valuable as a protocol analyser. The older logger missed some messages events which were not important to creating CDRs but were important for protocol analysis. The new logger improves the analyser function while retaining the call record creation function.

Another activity this summer has been load testing with 16 E1 loaded fully with 30 second calls. A number of bugs were found and fixed and the overall quality has improved noticeably.

There are now two release versions of SMG: stable and beta. This is more work but the time has come to provide options for SMG users. Stable releases will only receive small fixes and now new features. Beta releases will contain all fixes and new features. Stable releases that are closed will be reproducible. This marks a significant change from the past when development was much faster and installations were smaller and more change tolerant. As SMG has become more stable, users have indicated the SMG must change in a more controlled manner.

SMG ANSI installations are growing and bring new challenges. The circuit continuity test appears to be used more often in these locations. The procedure is slightly different from the ITU method. The final stage of COT reception functionality for four-wire circuits is being completed to comply with requirements from a major western USA service provider.

The third quarter 2008 will see initial development, testing, and field trials of the long awaited clustering function where asterisk boxes are clustered around an SMG box (already available) and SMG boxes are clustered around an ss7box box. Following cluster development will be the re-introduction of ss7box redundancy.

Other features in the works are expanded conformance testing and expanded ISUP parameter decoding. In particular are the calling party category and the generic number parameters. Asterisk does not have a convenient container for the many ISUP parameters so SMG is sneaking the information in and out of Asterisk using a char long string.

Thursday, April 10, 2008

New MTP and ISUP Features

SMG/ss7box has been accepted into a major telecom network in North America.

The ss7boost Circuit Maintenance Manager (CMM) is functional in the field now. It is managing the blocking and resetting of circuits with respect to ss7boost restart, and the presence and absence of the media gateway interface (sangoma_mgd). In the coming weeks CMM will be managing circuits with respect to the presence/absence of SS7 routes, the signaling gateway (ss7box), and trunks (E1/T1 facilities).

ss7box now supports Management Inhibit reception sufficient to pass the limited conformance tests being administered in ITU-land. Management Inhibiting is used in conformance testing and rarely in operations. The procedure is complicated in some configurations. More time is being devoted to developing features for conformance tests lately.

ss7box will be supporting the reception of MTP Restart procedure in the upcoming week.

ss7boost now supports GRS reception fully. The circuits being reset are actually reset. ss7boost also now supports the CQM/CQR procedure and the ANSI CCR process.

Monday, March 17, 2008

Group Reset Reception Processing

The time has finally come where it is no longer possible to avoid developing the ill-specified Group Reset (GRS). Our siege of the GRS was broken while SMG was undergoing conformance testing in Belgium. We knew this day would come and it's good that we could hold out as long as we did, because the time has allowed the general approach to circuit maintenance in ss7boost to mature and improve to handle many circuits under stressful conditions. We are ready to implement GRS reception. We continue to resist implementing GRS sending for reasons we offer in detail at the end of this article.

The base ss7boost 1.0.2.x is in lab test and has passed calls in both directions. This revision has the first working version of the Circuit Maintenance Manager for queueing and executing requests for circuit reset, block, and unblock. The CMM is required as the base for developing GRS reception processing to ensure compatibility with the new circuit maintenance model in ss7boost.

GRS reception processor decomposes the request into individual circuit reset requests that are each then submitted to the CMM. The CMM queues the requests and manages sending the reset requests to the core circuit reset reception handler (CCRH). The CCRH performs maintenance actions and then passes the request to call processing where the the concerned circuit is handled for reset based on its current state. The final reset action in the CCRH is to acknowledge the completion of the circuit reset. Currently the ackowledgement is a Release Complete (RLC) message sent in response to a Reset Circuit (RSC). ss7boost will be modified to conditionally send an RLC or a completion indication to the GRS processor. This approach will eliminate the need to modify the call processing state machine, which eliminates a significant amount of regression testing to a core function that has approximately 1.1 million hours of field verification time.

Our dislike of the GRS comes from the awkward specification of the Range and Status parameter. The format and coding of the parameter changes if it is being used in a GRS, CGB, or CGU, It also changes if it is being used in ANSI or ITU SS7 flavors. In some instances, it allows up to 30-ish circuits to be affected in a group of up to 256 circuits. This means that it's possible to receive a GRS with a status containing a 256 bit status field with each bit needing to be test for 0 (not affected) or 1 (affected). The Range field assigns meaning to the values 0, 1, and n>1, and if n>1 is used, then the actual count of affected circuits in the Range is n-1. The GRS processor must search of allocate a variable number of octets in the Range. The Range and Status field has a very high degree of variability which drives up the numbers of exceptions and test cases. A great deal of code must be written and tested for the exceptions. The possibility of partial success exists with the GRS, the processing of which is not specified clearly.

The cost versus the benefit of the GRS in a small and distrubuted ISUP implementation like ss7boost is not favorable. Using the RSC for each individual circuit make smuch more sense because the implementation is straightforward. In a large switch with large trunk groups, the GRS makes sense. Most samples of GRS harvested from the field show a restricted and rational use of the GRS in that circuit groups are in the same E1/T1. It is our opinion that the idea of the GRS is good, the specification is too broad and uses tricks to save a bit of space.

Our implementation of the GRS reception will be based on our field samples and a sub-set of the specifications that minimizes the number of exceptions that will be handled. If our GRS reception processor encounters a GRS that is legally forms and yet exceeds the limits we impose, it will rejected and announced. We will consider remedies if this should occur, but we are betting it will never occur.

Wednesday, March 05, 2008

Merging ss7boost Development Streams

For a while now there have been two ss7boost code streams: stable and development. The stable stream is a repository branch and the development stream is the repository tip. The development stream insulated users from the difficulties of building the Circuit Maintenance Manager. The development stream has reached functional parity with the stable stream in terms of circuit management, and the development stream needs to be tested in the field. The time has come to merge the stable and dev ss7boost streams.

The merge is complicated because the two streams have been apart for several months. A wholesale "diff -up stable dev > patchfile" followed by a "patch -p0 <>

Friday, February 01, 2008

ss7box and MTP3 Route Management

ss7box has a partial implementation of MTP3 route management. Route management is needed when ever SS7 linksets are connected to a signal transfer point (STP). The ss7box implementation needs to be advanced for passing network acceptance testing, automatic restart, and improved fault tolerance. Up to this point, ss7box has not been connected to an STP where route management has been a critical issue. STPs tolerate nodes that ignore route management messages which also supports why the implementation is incomplete. As is often the case with ss7box, things changed suddenly this week, and now having a more complete route management is important.

Route management in MTP3 consists of procedures and messages per ITU Q.704 Section 13 Signalling Route Management:
  • TFA - allow traffic to route
  • TFP - prohibit
  • TFR - restrict, don't allow unless no better choice to reach destination
  • TFC - controlled, for communicating congestion levels
  • SRST - signal route set test for asking and STP about the status of a route
  • SRSC - signal route set congestion test for asking an STP about the congestion status of a route
In ss7box, route management is already designed and implemented at the core of the ss7box routing mechanism. The route management values assigned to each linkset are initialized to TFA and are never changed, but the values are checked for every outbound MSU passing through ss7box. Inbound route management messages are being decoded. These functions have been thoroughly tested in the field.

Inbound route management messages need to affect the route table and the values from the route table need to affect the related status of each linkset. If a route becomes completely unreachable, ss7box needs to send out a Destination UNAvailable (DUNA) M3UA message to all connected ss7boost clients. Conversely if a destination becomes reachable, a Destination AVAvailable (DAVA) M3UA message needs to be sent.

A signaling route set test procedure needs to be implemented to periodically query an STP that has prohibited a destination on its linksets to see if the prohibited destination is available.

A tool is needed to inject and receive route management messages for unit testing in the lab. This will most likely take the form of ss7box_cli commands for injection. In the lab two ss7box will be connected back-to-back. One will be the device under test (DUT) and the other will be the tester. The cli commands will be issued on the tester to the DUT. Messages from the DUT will be captured and decoded n the tester using normal ss7box route management procedures with some added debugging code.

The route set congestion messages and procedures will not be supported at this time as they are options in the spec and not frequently encountered.

Thursday, January 31, 2008

Bug Found in ss7boost_cli --ckt-block

A bug was found with the ss7boost CLI commands:

./ss7boost_cli --ckt-block --span X --chan all
./ss7boost_cli --ckt-unblock --span X --chan all

The logs for the unblock follow:

Jan 31 22:46:08 acc05 ss7boost[9831]: I:sb_mpc.c:sg_bsm_unblock_sending:MARK:0
Jan 31 22:46:08 acc05 ss7boost[9831]: I:sb_mpc.c:sg_bsm_wait_uba:MARK:0
Jan 31 22:46:08 acc05 ss7boost[9831]: W:sb_mpc.c:sg_bsm_wait_uba:SS7 event id not expected:event id follows:2

Using block and unblock with individual circuits works:

./ss7boost_cli --ckt-block --span X --chan Y
./ss7boost_cli --ckt-unblock --span X --chan Y

This problem will be fixed in next major revision of ss7boost because the entire circuit maintenance method has been redesigned as mentioned in the previous post on this website.

No fix will be made in the following revisions and below:

Xyganda Technology, Inc. ss7boost Revision 1.0.1-25 (and any 25 branch revs like 25.2+)
Xygnada Technology, Inc. ss7boost Command Line Interface, Version: 1.0.1-01

Monday, January 21, 2008

Circuit Maintenance Manager and Bug Fixes

The maintenance of voice circuits in ISUP in ss7boost is being overhauled. Currently, call processing and circuit maintenance initiate circuit resets and blocks directly without any coordination. As hardware blocking design evolved it became apparent that circuit maintenance needed centralized management. The central maintenance manager (CMM) controls the resetting and blocking of all ISUP voice circuits - 496 maximum presently - per node. Each circuit has a maintenance action request queue. The CMM is dormant until queues are loaded and a trigger is pulled. The CMM checks all queues processing N circuits every M seconds until all queues are empty. Then the CMM returns to its dormant state.

The CMM code is written to a point that initial unit testing has begun. The number of test cases needed to verify operation looks to be high so unit testing with the ss7boost unit tester is a huge time saver.

There are a number of upgrades and configuration changes on-going so support work is spiking this week, which slows down CMM testing. This is exactly what makes predicting when large features will be ready very difficult. It's the nature of the business.

Speaking of unit testers, sangoma_mgd now has its own unit tester apparatus. A number of bugs got stomped this past December as a result of using it.

We caught and killed bugs in sangoma_mgd and ss7boost last week. In sangoma_mgd there was a case where a call ended and was not propogated to ISUP. In ss7boost we found that some sigboost messages from sangoma_mgd to ss7boost were not being properly entered in to the ISUP CDR log. Also in ss7boost we found that the "SMGRev-" prefix was not attached to inbound RDNIS strings in Asterisk/Callweaver. A new SMG release is out today that delivers the fixes - ftp://ftp.sangoma.com/linux/smg/smginstall-2008-01-21.tgz

Sunday, January 20, 2008

New Look for ss7box.com

ss7box.com has a slightly new look. It is designed for rapid maintenance and moving as a defense against web hosts that suddenly and unexpectedly terminate service. It is also designed for optimal use as a technical user manual since that is its primary purpose. The design is embedded in a cascading style sheet.

Wednesday, January 02, 2008

We are back on blogspot

The wiki experience was OK until the web host went out of business. All the material is written in tikiwiki format and stored in an SQL database. I know next to nothing about tikiwiki administration and SQL databases. I also found that not all web hosts provide tikiwiki.

So, I revert to web 1.0. I will use HTML and docbook. The nice thing about docbook is that the same source will produce HTML or PDF. Since many ss7box users are charged per minute for Internet access, the wiki has not been the best documentation option for them. They are better served by downloading a PDF document that they can refer to repeatedly at no extra charge. Docbook is also a good system for porting content from one web host to another as they seem to be a transient.