[asterisk-dev] Pthread wrapper updates

SF Markus Elfring elfring at users.sourceforge.net
Sat Nov 4 13:10:03 MST 2006


> Yes, of course, the box will slowly grind to a halt. In the meanwhile 
> the threads that already have their
> resources allocated can continue providing several minutes of possibly 
> invaluable service.
I see that you assume that some of the remaining threads can perform 
useful work. Well, I can imagine that a few may be independent from the 
rest for an unknown moment.
Would you like to put stuck threads into a "garbage queue"? How many can 
the system tolerate? How long would you like to try to delay the 
abnormal program termination?

I guess that there two possibilties if a lock or unlock function will 
return a non-zero value (error code).
1. A fundametal programming error did exist for the direct Pthread API 
calls. (I hope that this case is not in the source files.)

2. The Pthread functions were correctly called by the source code. But 
they stop the intended operation because their consistency checks were 
affected because internal data structures were changed by a flaw like a 
buffer overflow. How much can the execution environment be trusted to 
achieve any meaningful result after such an unexpected situation?
(How do you cope with effects on memory chips from radiation of passing 
stars in outer space?)


> Yes, absolutely, thats always the best option and we should strive for 
> it. However, as I said,
> this is engineering, so we need to be tolerant of faults if we can.
Would you like to fine-tune the error tolerance?


> I spent a couple of years working of projects for the european space 
> agency, where you can't reach easily reboot because
> your code is several light seconds away. In that environment you keep 
> the coms/cmd channel working at all costs.
It would be very interesting to hear about experiences from such a 
special application area.
Do the costs include the transmission of trash data if mutual exclusion 
was damaged in an embedded computing process?

I guess that the engineers care for correctness to avoid multi-threading 
mistakes like wrong handling of priority inversion.
http://research.microsoft.com/~mbj/Mars_Pathfinder/Authoritative_Account.html

How many processing resources have you got in such satellite systems?

Are there any safeguards that would trigger an abort and corresponding 
software restart?

Regards,
Markus


More information about the asterisk-dev mailing list