EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: Andor SDK3 issue - AT_WaitBuffer fails
From: "Pearson, Matthew R." <[email protected]>
To: Mark Rivers <[email protected]>
Cc: "[email protected] list" <[email protected]>
Date: Wed, 19 Oct 2016 16:11:05 +0000
Hi,

Small update on this issue:

I updated to the Andor SDK3.12 release, but found it’s not compatible with RHEL6. Initially it seemed ok, but after a few hours the machine locks up and we get messages like this in /var/log/messages:

Oct 17 16:01:03 cg1d-dassrv2 kernel: ERR(0): GetOneSG: cannot get 2328 blocks at 0
Oct 17 16:01:03 cg1d-dassrv2 kernel: ERR(0): bitflow_scatter_gather: g1sg failed on frame 0 of 5
Oct 17 16:03:09 cg1d-dassrv2 automount[14608]: do_umount_autofs_direct: attempt to umount busy direct mount /home/controls
Oct 17 16:04:41 cg1d-dassrv2 kernel: INFO: task events/0:35 blocked for more than 120 seconds.
Oct 17 16:04:41 cg1d-dassrv2 kernel:      Tainted: P           -- ------------    2.6.32-642.1.1.el6.x86_64 #1
Oct 17 16:04:41 cg1d-dassrv2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 17 16:04:41 cg1d-dassrv2 kernel: events/0      D 0000000000000000     0    35      2 0x00000000
Oct 17 16:04:41 cg1d-dassrv2 kernel: ffff880418953d50 0000000000000046 ffff880418953d18 ffff880418953d14
Oct 17 16:04:41 cg1d-dassrv2 kernel: ffff880418953ce0 ffff88041ec24100 000006e46193d5a4 ffff880028316ec0
Oct 17 16:04:41 cg1d-dassrv2 kernel: 0000000000000400 00000001006f2807 ffff88041894fad8 ffff880418953fd8
Oct 17 16:04:41 cg1d-dassrv2 kernel: Call Trace:
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff81548f66>] __mutex_lock_slowpath+0x96/0x210
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e4710>] ? bflki_work+0x0/0x10 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff81548a8b>] mutex_lock+0x2b/0x50
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e429e>] bflki_mutex_lock+0xe/0x10 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e5bda>] bflki_private_work+0x4a/0x2d0 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffffa04e471e>] ? bflki_work+0xe/0x10 [bitflow]
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8109fdc0>] ? worker_thread+0x170/0x2a0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff810a6ac0>] ? autoremove_wake_function+0x0/0x40
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8109fc50>] ? worker_thread+0x0/0x2a0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff810a662e>] ? kthread+0x9e/0xc0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8100c28a>] ? child_rip+0xa/0x20
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff810a6590>] ? kthread+0x0/0xc0
Oct 17 16:04:41 cg1d-dassrv2 kernel: [<ffffffff8100c280>] ? child_rip+0x0/0x20

I’ve passed this information on to Andor, but they don’t have a history of quickly resolving issues like this. And this one may be an issue with the BifFlow driver rather than the user space SDK.

Is anybody else using Andors SDK3.12 with Linux?

Also, it didn’t solve the original issue (I’ve seen it reoccur before the machine locks up). The only way to recover from the original issue seems to be an application restart. Two different calls into the Andor SDK produced the uncaught exception, so not sure we can recover from the EPICS driver side.

Cheers,
Matt

Data Acquisition and Control Engineer
Spallation Neutron Source
Oak Ridge National Lab





> On Sep 23, 2016, at 9:47 AM, Pearson, Matthew R. <[email protected]> wrote:
> 
>> 
>> If you restart the IOC does it start working, or so you need to power-cycle the camera?
> 
> Hi Mark,
> 
> Yes, an IOC restart fixes it (don’t need to power cycle the camera).
> 
> Cheers,
> Matt
>> 
>> ________________________________________
>> From: [email protected] [[email protected]] on behalf of Pearson, Matthew R. [[email protected]]
>> Sent: Friday, September 23, 2016 7:54 AM
>> To: [email protected] list
>> Subject: Andor SDK3 issue - AT_WaitBuffer fails
>> 
>> Hi,
>> 
>> We have an Andor sCMOS Zyla camera and we use the Andor3 support in areaDetector. This uses the Andor SDK3 API to control the camera and read out data. Sometimes we see the Andor3 driver get into a state because the:
>> 
>> status = AT_WaitBuffer(handle_, &image, &size, AT_INFINITE);
>> 
>> function always immediately returns with error code 11 (which means AT_ERR_NODATA). The Andor3 driver ends up then calling AT_WaitBuffer repeatedly in a tight loop, causing giant log files, until we exit the IOC.
>> 
>> The function should block until there is a data frame to read from the SDK, but it doesn’t, which is a problem. I’ve contacted Andor about this (and they asked me to update my SDK version from 3.9 to the latest version).
>> 
>> Has anyone else also seen this problem?
>> 
>> When I get chance I’ll try to reproduce it, and also update the SDK. I’m not sure if we can easily recover from this state, without restarting the IOC, but perhaps the Andor3 driver could detect this problem and set the ADStatus error flag. I’ll experiment with that if I can reproduce the problem.
>> 
>> A bit suspiciously we also see an uncaught exception when we exit the IOC:
>> 
>> 2016/09/22 19:46:04.402 andor3:imageTask: AT_WaitBuffer, error=11
>> 2016/09/22 19:46:04.402 andor3:imageTask: AT_WaitBuffer, error=11
>> 2016[Thu Sep 22 19:46:07 2016] /09/22 19:46:04.402 andor3:imageTask: AT_WaitBuffer, error=11
>> terminate called after throwing an instance of '[Thu Sep 22 19:46:07 2016] TSDK3Exception'
>> [Thu Sep 22 19:46:07 2016]   what():  TDualCLLogicalControl: Error - Sensor halves are not running in unison
>> 
>> Cheers,
>> Matt
>> 
>> 
> 



References:
Andor SDK3 issue - AT_WaitBuffer fails Pearson, Matthew R.
RE: Andor SDK3 issue - AT_WaitBuffer fails Mark Rivers
Re: Andor SDK3 issue - AT_WaitBuffer fails Pearson, Matthew R.

Navigate by Date:
Prev: S7nodave/boost problem Remi FAURE
Next: channel connect timeout error palak shimpee
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: Andor SDK3 issue - AT_WaitBuffer fails Pearson, Matthew R.
Next: printing value of pv to new line palak shimpee
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  <20162017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 20 Oct 2016 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·