Experimental Physics and
We experienced quite some trouble at the SNS as a result of a save/restore-type program writing NaN = nan = not-a-number values to process variables all across the linac.
There are actually several binary bit patterns for floating point values that are considered "NaN". Accidentally writing all 0xFF into a double could result in NaN, but in our case it was the "official" IEEE NaN pattern that was sent to the PVs.
The root cause has hopefully been fixed:
That save/restore program, meant to take snapshots of the machine state
similar to "burt" et. al, logged "NaN" for PVs to which it couldn't connect
for various reasons and later simply wrote those "NaN"s back
when restoring the snapshot file.
Problems arouse because of the way NaN is handled by C/C++: NaN + xxx -> NaN (same for -, *, /) NaN < xxx -> false NaN > xxx -> false NaN == xxx -> false NaN != xxx -> true (long)NaN == 0x80000000 Other languages might be similar, in any case these were the symptoms on the affected EPICS IOCs:
On analog records, all checks against HOPR, LOPR, DRVH, DRVL fail for NaN. -> the value is accepted even though you might think you've locked "bad" values out.
The test abs(NaN - previous value)>MDEL fails, so no ChannelAccess monitors are sent, hence you don't see that the records are at NaN in operator displays or archives.
When NaN ends up in analog record's RVAL or is written to other non-floating point records, you get a huge or tiny number (depends on whether you use it signed or unsigned), which in the case of the AO record sneaked past the DRVH/DRVL check, and can now cause all sorts of interesting results when passed to the unexpecting hardware.
At the SNS, we have differing opinions on how to react. For sure the cause, our save/restore program gets fixed. Is it worthwhile to prevent the symptoms from happening again? Should Channel Access and records check for NaN and block it? Or should a user be free to send NaN if that's what he/she really wants, except MDEL/HOPR/LOPR/DRVH/DRVL ought to handle it? Brings up practical issues: Is this a lot of work? Who volunteers? How do you catch NaN? Different from most C/C++ compilers and standard C/C++ libraries, the one we use for vxWorks/Tornado 2.0.2 does not define "isnan()". Shen Peng suggested to use something like this: #define isnan(x) ((x)!=(x)) Who knows if this works in all cases?
The bottom line: Avoid NaN.
- SNS controls group
|ANJ, 10 Aug 2010||
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·