Argonne National Laboratory

Experimental Physics and
Industrial Control System

2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017 Index 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
<== Date ==> <== Thread ==>

Subject: Re: asynManager changes
From: Torsten Bögershausen <torsten.bogershausen@esss.se>
To: Mark Rivers <rivers@cars.uchicago.edu>, 'Eric Norum' <wenorum@lbl.gov>, "'Johnson, Andrew N.'" <anj@aps.anl.gov>
Cc: "'core-talk@aps.anl.gov'" <core-talk@aps.anl.gov>
Date: Thu, 14 Sep 2017 06:48:25 +0200
Hej Mark and all,
Without a detailed understanding in the complete scenaario,
it is  difficult to analyze these complex stuff.

However, some answers inline.

On 12/09/17 23:47, Mark Rivers wrote:
Hi Eric, Andrew and others,

I ran into a problem with asyn when working on the Modbus driver the other day.  The problem is with pasynManager->queueLockPort(), which is called from asynXXXSyncIO() functions.

Here is a scenario where the problem occurs when using the Modbus driver.

- A drvModbusIPPort is created to talk TCP to the Modbus device.

- The asynOption "disconnectOnReadTimeout" is set to "Y" for this TCP port. This is needed in order to have a speedy recovery if the network is briefly disconnected. If this is not used then it takes the OS a very long time to decide that the socket is dead and to report that it is disconnected so that asyn will reconnect it.

- There are several Modbus input drivers, each of which is polling a range of registers.  They use the asynOctetSyncIO functions to read in their polling loop.

- If the network cable is disconnected one of the Modbus drivers will wait for the timeout period before getting a timeout.  Meanwhile the other Modbus drivers have called pasynOctetSyncIO->writeRead, which calls pasynManager->queueLockPort, which in turn calls pasynManager->queueRequest *with no timeout*.

- When the read timeout occurs the TCP port will be disconnected.  While it is disconnected the drivers that call queueLockPort will be hung, because the queueRequest will not complete until the port reconnects.  In other words pasynOctetSyncIO->writeRead() is hung for those other drivers until the TCP port reconnects.

- This is not what we want, we want pasynOctetSyncIO->writeRead() to return an error after a certain period of time.  This allows the Modbus driver to do callbacks to device support putting I/O Intr scanned input records into invalid alarm.

Technically speaking, could it be possible to unblock the semaphores when asyn
knows that the connection just went down ?
This may be the cleanest solution, but the timeout is, what I can see, fine.
More further down.


- The problem is that the call to pasynManager->queueRequest to lock the port should have a timeout, with a timeout handler routine that allows pasynOctetSyncIO->writeRead() to complete and return an error status.

- I have modified asynManager.c to do this.  I had to decide what value to use for the queueRequest timeout.  One possibility is to use the pasynUser->timeout for the actual I/O request.  But this is not really appropriate, because each transaction might be relatively fast, but there could be many transactions in the queue and we don't want the queueRequest to timeout when things are "normal".  I decided on the following:

  - The port structure has a new queueLockPortTimeout value.  This is set to a default of 2.0 seconds when the port is created.

   - This can be changed with a new iocsh command

         asynSetQueueLockPortTimeout(portName, timeout)

  - If the pasynUser->timeout that is passed for a particular I/O operation is larger than the port timeout value this larger value is used instead.

As long as the QueueLockPortTimeout is shorter than the reconnect-timeout,
this should be fine:
2.0 seconds for the QueueLock and 5.0 seconds for the auto-reconnect should
cover everything even on busy systems.



I made the major mistake of not working on a branch for this, so I can't make a formal pull request.  But I have attached the diffs from R4-31 that show what I have done.

I did the same mistake this week.
As long as the master branch is not pushed (to github in our case) the situation
can be cured:

git checkout -b queueLockPortTimeout
git push origin  queueLockPortTimeout
git branch -d master # or git branch -D master
#if you need to look at master:
git checkout origin/master

#Merge the pull request via a web browser, once it is ready
git fetch origin
git checkout origin/master


The changes are in the master branch on Github, and the release notes can be viewed as HTML here: http://cars9.uchicago.edu/software/epics/RELEASE_NOTES.html

I'd appreciate any comments.

Thanks,

Mark


References:
asynManager changes Mark Rivers

Navigate by Date:
Prev: asynManager changes Mark Rivers
Next: Re: Improved simulation mode prototype Ralph Lange
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
Navigate by Thread:
Prev: asynManager changes Mark Rivers
Next: Build failed in Jenkins: epics-base-3.16-win64 #136 APS Jenkins
Index: 2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  <2017
ANJ, 15 Sep 2017 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·