Experimental Physics and
Industrial Control System

Andrew Johnson <[email protected]> · Wed, 18 Feb 2009 11:56:42 -0600

Hi Bernd,

On Tuesday 17 February 2009 13:43:33 Schoeneburg, Bernd wrote:
>
> >> *2. Alarm Disable Field in all Records
> >> *If a record rises an alarm because of a "known bug" or in case of
> >> maintenance or special operating conditions, it would be a very smart
> >> solution to disable this alarm at the source. Otherwise all connected
> >> clients must mask this alarm individually. Setting all severities to
> >> "NO_ALARM" does not help if the alarm is set by the device support.
> >> Setting a new disable-field to "DISABLE" would set SEVR to "NO_ALARM"
> >> and STAT to something like "ALMDIS" or so. Changing the disable field
> >> would need some special function which sends out the change. If
> >> disabled, the alarm bit in the monitor mask is ignored and the
> >> logAlarms-hook-function returns without doing anything. This change is
> >> useful if using the conventional alarm handler as well as the new AMS
> >> used by CSS.
> >
> > This idea needs some discussion.  There is a related suggestion in my
> > list called "OFFLINE Alarm Status" which takes a different approach,
> > requiring device support to set an Invalid/Offline alarm instead.  I
> > don't know whether using a NO_ALARM severity with a different status
> > would work properly with alarm acknowledgments.
>
> This looks for me like a solution for a different problem. If a record
> has its alarm disabled, other record should still be able to read data
> from it and the severity (if MS of the inlink is set) should not become
> invalid. Nothing is really offline.
> In an alarm tree display, the root of the tree shows the highest
> severity of the leaves. An alarm-disabled record therefor must have
> NO_ALARM severity for not to interfere the status of the root. Why is
> acknowledgment necessary if the severity is NO_ALARM?

I agree that the two are not really the same.  I may have been wrong about the 
acknowledgment issue, but I have just looked at the code for the alarm 
handler; adding a new alarm status value will require rebuilding ALH at the 
same time; old versions will report alarms with any new status values that 
get added as a WRITE_ACCESS_ALARM (the highest existing alarm status value).

It is not possible for a CA client program to request a complete list of alarm 
status names through CA, because the protocol can only transport at most 16 
status string for any ENUM field, and we already have 22 alarm status values.  
The strings used come from an array in libCom, but of course there is no 
guarantee that the libCom that ALH was built with is the same release as the 
libCom that the IOC is using, thus incompatibility is possible.

I'm not saying we can't add new status values, but it does have implications 
that must be made clear if we do so.

> >> *3. Alarm Delay Field in all Records*
> >> Beside the alarm limits we have a "HYST" in the analog records. Why not
> >> also a "DLY" (delay). Some values are very unstable and have some noise
> >> or over-swing. So a short time over/under the limit should be allowed
> >> sometimes. Implementing this in the ioc has two advantages: 1.:It is
> >> analog to "HYST" which is a value-related hysteresis. "DLY" is a
> >> time-related hysteresis. 2.:Sending a lot of good/no-good messages to
> >> the clients could be avoided. This change is not so easy, many things
> >> must be considered.
> >
> > Any new fields needed can be added to each record type as they are added
> > to the support code; they should not be added to dbCommon.dbd though.  I
> > would like to see a clear description of how the final alarm code will
> > work, but I have no particular objections to these ideas being
> > implemented in the Base record types.
>
> Ok, I asume that the reason for not to put it in dbCommon.dbd is, that
> this would apply to all records even if their record support does not
> support the new function, which may be confusing, right?

Right.

> I will try to make a description - its not so easy.

It might help to take a look at the way the alarm calculations are described 
in the opening chapters of the Record Reference Manual, and maybe rewrite 
them for the new behavior.  We'll need to change those descriptions anyway 
eventually, so that might be a good way to start.

> >> *4. Device Status Field in all Records*
> >> If the device support encounters a problem, it has only limited
> >> possibilities to say that to the operators. Alarms could be rised like
> >> "READ" or "COMM". Detailed information can only be sent to a log-file. A
> >> string-type field like "DSTA" would be a destination for such messages,
> >> that could be displayed by each graphic display.
> >
> > We support writing to a central logServer for this kind of information. 
> > I agree that it's not as convenient as putting it in a field, but 40
> > characters for an error message really isn't very many, and adding 40
> > bytes to the size of every record would be impossible for some smaller
> > embedded IOCs.
>
> May be that less than 40 characters are sufficient. Device specific
> short error codes which can be documented in a manual could be enough. A
> long description in the record field is not necessary.

I still think outputting to a logServer, possibly with a specialized client 
that filters the logged messages, would be much better suited to this.  If a 
device support running at 10Hz generates a different problem report each time 
it processes, you're not going to be able to read the status codes off a GUI 
display anyway.

> >> *5. get_alarm_double should not return unused limits
> >> *This has to be explained: When a display widget is coming up, it may
> >> need some values from the record, like display limits, control limits
> >> and alarm limits. Alarm limits are needed on a bargraph widget to mark
> >> some positions with arows or lines. If the corresponding alarm is not
> >> used, it's severity is set to "NO_ALARM". This means, crossing this
> >> limit does nothing! Specially when the EGU-range includes zero (EGUL <
> >> 0, EGUF > 0) then the arrows are missleading.
> >> The solution: If the alarm is not used, the records function
> >> "get_alarm_double" should return "NaN" for this limit. In this way the
> >> client knows, not to display the limit.
> >> The direct access to the limit fields is not affected.
> >
> > I agree the current requests for compound types that return alarm
> > information are of little practical use to a display widget because they
> > don't account for the severity that applies to each alarm level.  However
> > I'd like to know what should be returned if I set HSV to INVALID_ALARM,
> > or if I set both HSV and HHSV to MINOR_ALARM?  Also the use of NaN values
> > doesn't solve what should happen with the DBR_GR_LONG and analogous CA
> > types that have integer warning levels — get_alarm_double() requests the
> > values from the record as doubles, but dbAccess subsequently converts
> > them to integers if necessary.
>
> It may be my fault, but I don't understand the first problem. I don't
> care of the severities except checking for beeing NO_ALARM. The "arrows"
> in the display widget is useless if the corresponding severity is
> NO_ALARM because the code of the record (ai, longin,..) does not compare
> the value with the limits if the severity is 0 (NO_ALARM). HIHI and LOLO
> ist alway compared prior to HIGH and LOW and the function returns if one
> of it applies.

Most record types have code like this in their get_alarm_double() routine, 
conditional on the field being VAL:
	pad->upper_alarm_limit   = prec->hihi;
	pad->upper_warning_limit = prec->high;
	pad->lower_warning_limit = prec->low;
	pad->lower_alarm_limit   = prec->lolo;
My point is what does "warning" vs "alarm" mean in those names?  To me it 
would make more sense if the "warning" limits get set by whichever alarm 
limit is configured to be MINOR_ALARM and the "alarm" limits by the limit set 
to MAJOR_ALARM, but then what should we do if HHSV is INVALID_ALARM rather 
than MAJOR_ALARM?

The CA structures don't transport enough information for the display widget to 
indicate the alarm severity that the limit relates to, thus I'm generalizing 
your problem to say that we need to handle more than just NO_ALARM severity.  
Using a NaN value only solves a part of the larger problem.

> For the second problem (dbAccess converts back to INT)... - well, no
> idea. Anyone uses bargraphs of integer values?

CA supports versions of those structures with char, short or long for the 
limit values.  I doubt if anyone actually uses a struct dbr_gr_char anywhere 
in real life, but they might and we would have to decide what its behavior 
should be if we make this change.

> >> *9. Epics General Time: Allow more than one NTP Server*
> >> It can be, that this is not related to Epics-base. epicsGeneralTime
> >> facility is using os-depending services. I think osdNTPGet is the place
> >> were an extension would be necessary, if alternative NTP servers are
> >> needed. But nevertheless I mention it here. If the desired server (or
> >> the network path to it) fails, a second server would be better. In my
> >> changes I once did for the tsDrv in 3.13.x a list of servers in the
> >> environment variable was foreseen. Perhaps this is possible too now.
> >
> > General time allows you to register your own time providers, so it should
> > be quite easy to add providers for additional NTP servers.  The
> > osiNTPTime.c provider in libCom can't talk to multiple servers at the
> > moment because the NTP client code in RTEMS wasn't written with that in
> > mind, but we'd be happy to see a better NTP time provider if someone
> > wants to write one (preferably one with a built-in PLL for doing slow
> > adjustments).  This doesn't have to be part of Base, although eventually
> > it could be incorporated into Base.
>
> I did once write an improved NTP support including PLL with multiple
> server support for the old tsDrv. Perhaps we could use parts of this
> code. Matthias held a talk about it on the EPICS meeting in june 2006. I
> just found my presentation.

That would certainly be an interesting addition and definitely appropriate to 
work on at the Codeathon.  Another source of relevent software could be Larry 
Doolittle's 'ntpclient' package which he has told me he might be willing to 
re-license for use with EPICS (it's currently GPLv2, so couldn't be shipped 
with Base).

> 10. Make "Smoothing" in aiRecord independent of Conversion
>
> The SMOO field in the aiRecord is evaluated in the convert function,
> which is not executed if the device support read function returns 2
> (don't convert). If the device delivers a float value directly, I do not
> see a reason not to smooth it. Can we take the smoothing code out of
> convert()?

That's a good question: Is smoothing part of the analog to digital conversion 
process, or should we make it more general.  I think you're probably right 
and that we should make that change (Eric agrees), but it does have the 
capability to break existing databases that might have SMOO set when they're 
not actually using it, or where someone is using that field for a completely 
unrelated purpose (since it's not used by their record at the moment).

Thanks,

- Andrew
-- 
The best FOSS code is written to be read by other humans -- Harold Welte

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System