Experimental Physics and
Industrial Control System

"Schoeneburg, Bernd" <[email protected]> · Thu, 19 Feb 2009 19:07:46 +0100

Hi Bernd,

On Tuesday 17 February 2009 13:43:33 Schoeneburg, Bernd wrote:

*2. Alarm Disable Field in all Records
*If a record rises an alarm because of a "known bug" or in case of
maintenance or special operating conditions, it would be a very smart
solution to disable this alarm at the source. Otherwise all connected
clients must mask this alarm individually. Setting all severities to
"NO_ALARM" does not help if the alarm is set by the device support.
Setting a new disable-field to "DISABLE" would set SEVR to "NO_ALARM"
and STAT to something like "ALMDIS" or so. Changing the disable field
would need some special function which sends out the change. If
disabled, the alarm bit in the monitor mask is ignored and the
logAlarms-hook-function returns without doing anything. This change is
useful if using the conventional alarm handler as well as the new AMS
used by CSS.

This idea needs some discussion.  There is a related suggestion in my
list called "OFFLINE Alarm Status" which takes a different approach,
requiring device support to set an Invalid/Offline alarm instead.  I
don't know whether using a NO_ALARM severity with a different status
would work properly with alarm acknowledgments.

This looks for me like a solution for a different problem. If a record
has its alarm disabled, other record should still be able to read data
from it and the severity (if MS of the inlink is set) should not become
invalid. Nothing is really offline.
In an alarm tree display, the root of the tree shows the highest
severity of the leaves. An alarm-disabled record therefor must have
NO_ALARM severity for not to interfere the status of the root. Why is
acknowledgment necessary if the severity is NO_ALARM?

I agree that the two are not really the same.  I may have been wrong about the 
acknowledgment issue, but I have just looked at the code for the alarm 
handler; adding a new alarm status value will require rebuilding ALH at the 
same time; old versions will report alarms with any new status values that 
get added as a WRITE_ACCESS_ALARM (the highest existing alarm status value).

It is not possible for a CA client program to request a complete list of alarm 
status names through CA, because the protocol can only transport at most 16 
status string for any ENUM field, and we already have 22 alarm status values.  
The strings used come from an array in libCom, but of course there is no 
guarantee that the libCom that ALH was built with is the same release as the 
libCom that the IOC is using, thus incompatibility is possible.

I'm not saying we can't add new status values, but it does have implications 
that must be made clear if we do so.

*4. Device Status Field in all Records*
If the device support encounters a problem, it has only limited
possibilities to say that to the operators. Alarms could be rised like
"READ" or "COMM". Detailed information can only be sent to a log-file. A
string-type field like "DSTA" would be a destination for such messages,
that could be displayed by each graphic display.

We support writing to a central logServer for this kind of information. 
I agree that it's not as convenient as putting it in a field, but 40
characters for an error message really isn't very many, and adding 40
bytes to the size of every record would be impossible for some smaller
embedded IOCs.

May be that less than 40 characters are sufficient. Device specific
short error codes which can be documented in a manual could be enough. A
long description in the record field is not necessary.

I still think outputting to a logServer, possibly with a specialized client 
that filters the logged messages, would be much better suited to this.  If a 
device support running at 10Hz generates a different problem report each time 
it processes, you're not going to be able to read the status codes off a GUI 
display anyway.

*5. get_alarm_double should not return unused limits
*This has to be explained: When a display widget is coming up, it may
need some values from the record, like display limits, control limits
and alarm limits. Alarm limits are needed on a bargraph widget to mark
some positions with arows or lines. If the corresponding alarm is not
used, it's severity is set to "NO_ALARM". This means, crossing this
limit does nothing! Specially when the EGU-range includes zero (EGUL <
0, EGUF > 0) then the arrows are missleading.
The solution: If the alarm is not used, the records function
"get_alarm_double" should return "NaN" for this limit. In this way the
client knows, not to display the limit.
The direct access to the limit fields is not affected.

I agree the current requests for compound types that return alarm
information are of little practical use to a display widget because they
don't account for the severity that applies to each alarm level.  However
I'd like to know what should be returned if I set HSV to INVALID_ALARM,
or if I set both HSV and HHSV to MINOR_ALARM?  Also the use of NaN values
doesn't solve what should happen with the DBR_GR_LONG and analogous CA
types that have integer warning levels — get_alarm_double() requests the
values from the record as doubles, but dbAccess subsequently converts
them to integers if necessary.

It may be my fault, but I don't understand the first problem. I don't
care of the severities except checking for beeing NO_ALARM. The "arrows"
in the display widget is useless if the corresponding severity is
NO_ALARM because the code of the record (ai, longin,..) does not compare
the value with the limits if the severity is 0 (NO_ALARM). HIHI and LOLO
ist alway compared prior to HIGH and LOW and the function returns if one
of it applies.

Most record types have code like this in their get_alarm_double() routine, 
conditional on the field being VAL:
	pad->upper_alarm_limit   = prec->hihi;
	pad->upper_warning_limit = prec->high;
	pad->lower_warning_limit = prec->low;
	pad->lower_alarm_limit   = prec->lolo;
My point is what does "warning" vs "alarm" mean in those names?  To me it 
would make more sense if the "warning" limits get set by whichever alarm 
limit is configured to be MINOR_ALARM and the "alarm" limits by the limit set 
to MAJOR_ALARM, but then what should we do if HHSV is INVALID_ALARM rather 
than MAJOR_ALARM?

The CA structures don't transport enough information for the display widget to 
indicate the alarm severity that the limit relates to, thus I'm generalizing 
your problem to say that we need to handle more than just NO_ALARM severity.  
Using a NaN value only solves a part of the larger problem.

For the second problem (dbAccess converts back to INT)... - well, no
idea. Anyone uses bargraphs of integer values?

CA supports versions of those structures with char, short or long for the 
limit values.  I doubt if anyone actually uses a struct dbr_gr_char anywhere 
in real life, but they might and we would have to decide what its behavior 
should be if we make this change.

*9. Epics General Time: Allow more than one NTP Server*
It can be, that this is not related to Epics-base. epicsGeneralTime
facility is using os-depending services. I think osdNTPGet is the place
were an extension would be necessary, if alternative NTP servers are
needed. But nevertheless I mention it here. If the desired server (or
the network path to it) fails, a second server would be better. In my
changes I once did for the tsDrv in 3.13.x a list of servers in the
environment variable was foreseen. Perhaps this is possible too now.

General time allows you to register your own time providers, so it should
be quite easy to add providers for additional NTP servers.  The
osiNTPTime.c provider in libCom can't talk to multiple servers at the
moment because the NTP client code in RTEMS wasn't written with that in
mind, but we'd be happy to see a better NTP time provider if someone
wants to write one (preferably one with a built-in PLL for doing slow
adjustments).  This doesn't have to be part of Base, although eventually
it could be incorporated into Base.

I did once write an improved NTP support including PLL with multiple
server support for the old tsDrv. Perhaps we could use parts of this
code. Matthias held a talk about it on the EPICS meeting in june 2006. I
just found my presentation.

That would certainly be an interesting addition and definitely appropriate to 
work on at the Codeathon.  Another source of relevent software could be Larry 
Doolittle's 'ntpclient' package which he has told me he might be willing to 
re-license for use with EPICS (it's currently GPLv2, so couldn't be shipped 
with Base).

10. Make "Smoothing" in aiRecord independent of Conversion

The SMOO field in the aiRecord is evaluated in the convert function,
which is not executed if the device support read function returns 2
(don't convert). If the device delivers a float value directly, I do not
see a reason not to smooth it. Can we take the smoothing code out of
convert()?

That's a good question: Is smoothing part of the analog to digital conversion 
process, or should we make it more general.  I think you're probably right 
and that we should make that change (Eric agrees), but it does have the 
capability to break existing databases that might have SMOO set when they're 
not actually using it, or where someone is using that field for a completely 
unrelated purpose (since it's not used by their record at the moment).

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System