Experimental Physics and
Industrial Control System

"Jeff Hill" <[email protected]> · Thu, 7 Oct 2004 16:42:32 -0600

I created mantis entry 144.

The original ANSI/ISO C did not have the isnan() function, but I believe
that isnan() *is* in ISO C99.

Since NAN isn't within the drive limits, then it seems functionally
consistent to reject NAN values being written into the VAL field of an AO
record. Rejecting NAN writes appears to also be prudent from a generic
database field modification perspective given that most records are not
expecting NAN values in any one of their fields and probably won't operate
correctly when NAN values are present. Defensively coding the records to
deal with NAN randomly appearing in one of many database fields would
probably be prohibitively expensive, and non-uniformly implemented.

Perhaps we should also consider rejecting NAN requests in ca_put, but is
that going too far? Would there ever be a valid reason to allow a CA client
to write NAN into a specialized PV? Is this a server plug-in specific issue?

Jeff

> -----Original Message-----
> From: Kay-Uwe Kasemir [mailto:[email protected]]
> Sent: Thursday, October 07, 2004 3:52 PM
> To: EPICS Tech Talk
> Subject: Not-a-number (nan) issues
> 
> Greetings!
> 
> We experienced quite some trouble at the SNS as a result of a
> save/restore-type program writing NaN = nan = not-a-number
> values to process variables all across the linac.
> 
> There are actually several binary bit patterns for floating
> point values that are considered "NaN".
> Accidentally writing all 0xFF into a double could result
> in NaN, but in our case it was the "official" IEEE NaN
> pattern that was sent to the PVs.
> 
> The root cause has hopefully been fixed:
> That save/restore program, meant to take snapshots of the machine state
> similar to "burt" et. al, logged "NaN" for PVs to which it couldn't
> connect
> for various reasons and later simply wrote those "NaN"s back
> when restoring the snapshot file.
> 
> Problems arouse because of the way NaN is handled by C/C++:
>     NaN + xxx  ->  NaN (same for -, *, /)
>     NaN < xxx  -> false
>     NaN > xxx  -> false
>     NaN == xxx -> false
>     NaN != xxx -> true
>     (long)NaN == 0x80000000
> Other languages might be similar, in any case these were
> the symptoms on the affected EPICS IOCs:
> 
> On analog records, all checks against HOPR, LOPR, DRVH, DRVL
> fail for NaN.
> -> the value is accepted even though you might think
>     you've locked "bad" values out.
> 
> The test abs(NaN - previous value)>MDEL  fails,
> so no ChannelAccess monitors are sent,
> hence you don't see that the records are at NaN
> in operator displays or archives.
> 
> When NaN ends up in analog record's RVAL or
> is written to other non-floating point records,
> you get a huge or tiny number (depends on whether you
> use it signed or unsigned),
> which in the case of the AO record sneaked past
> the DRVH/DRVL check, and can now cause all sorts
> of interesting results when passed to the unexpecting
> hardware.
> 
> At the SNS, we have differing opinions on how to react.
> For sure the cause, our save/restore program gets fixed.
> Is it worthwhile to prevent the symptoms from happening again?
> Should Channel Access and records check for NaN and
> block it?
> Or should a user be free to send NaN if that's what he/she
> really wants, except MDEL/HOPR/LOPR/DRVH/DRVL ought to
> handle it?
> Brings up practical issues:
> Is this a lot of work?
> Who volunteers?
> How do you catch NaN? Different from most C/C++ compilers
> and standard C/C++ libraries,
> the one we use for vxWorks/Tornado 2.0.2 does not define "isnan()".
> Shen Peng suggested to use something like this:
> #define isnan(x)    ((x)!=(x))
> Who knows if this works in all cases?
> 
> The bottom line: Avoid NaN.
> 
> - SNS controls group

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System