[asterisk-dev] dahdi_device representation

Oron Peled oron.peled at xorcom.com
Sat Aug 28 07:14:38 CDT 2010


On Saturday, 28 בAugust 2010 05:04:06 Tilghman Lesher wrote:
> On Friday 27 August 2010 17:54:45 Oron Peled wrote:
> > During Shaun Ruffell adventures with DAHDI persistant channel assignments
> > he started implementing a very important feature (IMO) -- a representation
> > of dahdi_device that represent a collection of spans.
> >
> > Lack of this representation caused hardware attributes to be
> > duplicated in the spans while in reality they should be represented
> > in the dahdi_device (e.g: location)
> >
> > I would like to use this opportunity ot present a relevant issue that
> > may IMO affect the design:
> >
> > 1. Historically, chan_dahdi was not made for hot-pluggable
> >    devices.
> >
> > 2. As a result. After a successfull open(), chan_dahdi ignore
> > read()/write() errors (except for the special errno used to pass events).
> >
> > 3. This means that if a device is removed under chan_dahdi feet it
> >    goes to an infinite tight failed read() loop which usually make the
> >    host unresponsive after a few seconds (except of the kernel)
> >    because asterisk usually runs at real-time priority.
> >
> > 4. Since Astribanks were always hot-plugabble, we "solved" this problem
> >    by employing various measures in our xpp drivers:
> >    - When a device is removed, we *keep* its data structure intact and
> >       make a note to ourselves that it's disconnected.
> >    - We send a red alarm to asterisk for disconnected devices, trying
> >       to squelch some of the "noise".
> >    - We ignore asterisk calls for disconnected devices.
> >    - We added a "REMOVED" event to asterisk, politely asking it to remove
> >       a span with all its channels.
> >    - We refcount the opne/close so if/when asterisk is nice and actually
> > close all channels, we can actually release the data structures.
> >
> >    BTW: only lately (during dial-byname development) we managed to fix
> >            asterisk so removing a digital span would also close its dchan.
> >
> > 5. Obviously, keeping "ghost" devices around so we don't surprise asterisk
> >     is not a very good design, but we didn't see any alternatives at the
> > time.
> >
> > If chan_dahdi is not made aware to driver errors (e.g: -ENODEV), similar
> > ugly techniques would be needed for hot-plug implementation at the DAHDI
> > level. This has some design consequences for the sysfs object layout and
> > therefore should be thought about early.
> >
> > So the question is short:
> >    Should DAHDI account for and work around chan_dahdi ignorance?
> >    Or should chan_dahdi be fixed first?
> 
> Yes, DAHDI will need to work around this, since we cannot ensure that each
> Asterisk installation will upgrade the userland piece to a version which is
> sufficient to work around the problem.  One question, though.  If this is
> fixed in both locations, what method would you prefer to communicate that
> chan_dahdi has been fixed, and DAHDI doesn't need to employ the work
> around?  Or would you prefer to simply keep the workaround active in DAHDI
> regardless of whether it is necessary for chan_dahdi?  Perhaps it would be
> sufficient to detect the poor behavior (multiple successive read()s which
> fail) and employ the workaround only in that case.

0. Keeping production systems working is a given. But let's look at
   some considerations.

1. First, regardless of the workarounds we may implement in DAHDI,
   This is a bug that exist in all asterisk installations and should be
   fixed anyway. Here is one manifestation of this bug:
      https://issues.asterisk.org/view.php?id=17669

2. If the fix in chan_dahdi is small/simple (haven't looked at the code
   yet) than there's no reason not to apply it for all maintained
   asterisk versions (including 1.4.x, 1.6.x). This may significantly
   reduce the time needed to maintain ugly solution (e.g: from
   infinity to 1 year ;-)

3. I think the most prominent effect of this workaround is that it change
   the lifecycle of sysfs data structures (the inability to free the device
   data structure on time). So I think we will not gain anything significant
   by trying to detect bugy asterisk on run-time.

4. It would be nice if we can delineate this code with #ifdefs so we can
   remove it later, but I'm not sure if that would be easy (again, due
   to its structural nature).

5. The idea of run-time checks may be usefull in a later phase when
   we want to urge users to update their installation to a non-bugy
   version.

Bye,

-- 
Oron Peled                                 Voice: +972-4-8228492
oron at actcom.co.il                  http://users.actcom.co.il/~oron
But it does move!
                -- Galileo Galilei



More information about the asterisk-dev mailing list