EPICS Home

Experimental Physics and Industrial Control System


 
1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Not-a-number (nan) issues
From: Kay-Uwe Kasemir <[email protected]>
To: EPICS Tech Talk <[email protected]>
Date: Thu, 07 Oct 2004 17:51:43 -0400
Greetings!

We experienced quite some trouble at the SNS as a result of a
save/restore-type program writing NaN = nan = not-a-number
values to process variables all across the linac.

There are actually several binary bit patterns for floating
point values that are considered "NaN".
Accidentally writing all 0xFF into a double could result
in NaN, but in our case it was the "official" IEEE NaN
pattern that was sent to the PVs.

The root cause has hopefully been fixed:
That save/restore program, meant to take snapshots of the machine state
similar to "burt" et. al, logged "NaN" for PVs to which it couldn't connect
for various reasons and later simply wrote those "NaN"s back
when restoring the snapshot file.


Problems arouse because of the way NaN is handled by C/C++:
   NaN + xxx  ->  NaN (same for -, *, /)
   NaN < xxx  -> false
   NaN > xxx  -> false
   NaN == xxx -> false
   NaN != xxx -> true
   (long)NaN == 0x80000000
Other languages might be similar, in any case these were
the symptoms on the affected EPICS IOCs:

On analog records, all checks against HOPR, LOPR, DRVH, DRVL
fail for NaN.
-> the value is accepted even though you might think
   you've locked "bad" values out.

The test abs(NaN - previous value)>MDEL  fails,
so no ChannelAccess monitors are sent,
hence you don't see that the records are at NaN
in operator displays or archives.

When NaN ends up in analog record's RVAL or
is written to other non-floating point records,
you get a huge or tiny number (depends on whether you
use it signed or unsigned),
which in the case of the AO record sneaked past
the DRVH/DRVL check, and can now cause all sorts
of interesting results when passed to the unexpecting
hardware.

At the SNS, we have differing opinions on how to react.
For sure the cause, our save/restore program gets fixed.
Is it worthwhile to prevent the symptoms from happening again?
Should Channel Access and records check for NaN and
block it?
Or should a user be free to send NaN if that's what he/she
really wants, except MDEL/HOPR/LOPR/DRVH/DRVL ought to
handle it?
Brings up practical issues:
Is this a lot of work?
Who volunteers?
How do you catch NaN? Different from most C/C++ compilers
and standard C/C++ libraries,
the one we use for vxWorks/Tornado 2.0.2 does not define "isnan()".
Shen Peng suggested to use something like this:
#define isnan(x)    ((x)!=(x))
Who knows if this works in all cases?

The bottom line: Avoid NaN.

- SNS controls group


Replies:
RE: Not-a-number (nan) issues Jeff Hill
Re: Not-a-number (nan) issues Andrew Johnson
Re: Not-a-number (nan) issues Marty Kraimer

Navigate by Date:
Prev: Re: cmlog compile problems Jie Chen
Next: RE: Not-a-number (nan) issues Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: cmlog compile problems Jie Chen
Next: RE: Not-a-number (nan) issues Jeff Hill
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  <20042005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024