Experimental Physics and Industrial Control System
Hej Mark and all,
Without a detailed understanding in the complete scenaario,
it is difficult to analyze these complex stuff.
However, some answers inline.
On 12/09/17 23:47, Mark Rivers wrote:
Hi Eric, Andrew and others,
I ran into a problem with asyn when working on the Modbus driver the other
day. The problem is with pasynManager->queueLockPort(), which is called from
asynXXXSyncIO() functions.
Here is a scenario where the problem occurs when using the Modbus driver.
- A drvModbusIPPort is created to talk TCP to the Modbus device.
- The asynOption "disconnectOnReadTimeout" is set to "Y" for this TCP port.
This is needed in order to have a speedy recovery if the network is briefly
disconnected. If this is not used then it takes the OS a very long time to
decide that the socket is dead and to report that it is disconnected so that
asyn will reconnect it.
- There are several Modbus input drivers, each of which is polling a range of
registers. They use the asynOctetSyncIO functions to read in their polling loop.
- If the network cable is disconnected one of the Modbus drivers will wait for
the timeout period before getting a timeout. Meanwhile the other Modbus
drivers have called pasynOctetSyncIO->writeRead, which calls
pasynManager->queueLockPort, which in turn calls pasynManager->queueRequest
*with no timeout*.
- When the read timeout occurs the TCP port will be disconnected. While it is
disconnected the drivers that call queueLockPort will be hung, because the
queueRequest will not complete until the port reconnects. In other words
pasynOctetSyncIO->writeRead() is hung for those other drivers until the TCP
port reconnects.
- This is not what we want, we want pasynOctetSyncIO->writeRead() to return an
error after a certain period of time. This allows the Modbus driver to do
callbacks to device support putting I/O Intr scanned input records into invalid
alarm.
Technically speaking, could it be possible to unblock the semaphores when asyn
knows that the connection just went down ?
This may be the cleanest solution, but the timeout is, what I can see, fine.
More further down.
- The problem is that the call to pasynManager->queueRequest to lock the port
should have a timeout, with a timeout handler routine that allows
pasynOctetSyncIO->writeRead() to complete and return an error status.
- I have modified asynManager.c to do this. I had to decide what value to use
for the queueRequest timeout. One possibility is to use the pasynUser->timeout
for the actual I/O request. But this is not really appropriate, because each
transaction might be relatively fast, but there could be many transactions in
the queue and we don't want the queueRequest to timeout when things are
"normal". I decided on the following:
- The port structure has a new queueLockPortTimeout value. This is set to a
default of 2.0 seconds when the port is created.
- This can be changed with a new iocsh command
asynSetQueueLockPortTimeout(portName, timeout)
- If the pasynUser->timeout that is passed for a particular I/O operation is
larger than the port timeout value this larger value is used instead.
As long as the QueueLockPortTimeout is shorter than the reconnect-timeout,
this should be fine:
2.0 seconds for the QueueLock and 5.0 seconds for the auto-reconnect should
cover everything even on busy systems.
I made the major mistake of not working on a branch for this, so I can't make a
formal pull request. But I have attached the diffs from R4-31 that show what I
have done.
I did the same mistake this week.
As long as the master branch is not pushed (to github in our case) the situation
can be cured:
git checkout -b queueLockPortTimeout
git push origin queueLockPortTimeout
git branch -d master # or git branch -D master
#if you need to look at master:
git checkout origin/master
#Merge the pull request via a web browser, once it is ready
git fetch origin
git checkout origin/master
The changes are in the master branch on Github, and the release notes can be
viewed as HTML here: http://cars9.uchicago.edu/software/epics/RELEASE_NOTES.html
I'd appreciate any comments.
Thanks,
Mark
- References:
- asynManager changes Mark Rivers
- Navigate by Date:
- Prev:
asynManager changes Mark Rivers
- Next:
Re: Improved simulation mode prototype Ralph Lange
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
<2017>
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
asynManager changes Mark Rivers
- Next:
Build failed in Jenkins: epics-base-3.16-win64 #136 APS Jenkins
- Index:
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
<2017>
2018
2019
2020
2021
2022
2023
2024