Experimental Physics and
Industrial Control System

Jeff Hill <[email protected]> · Tue, 17 Dec 1996 14:18:44 -0700

Chip,

> 
> I would like to start a discussion about possible improvements to ioc core.
> These improvements should not be hard to implement, and will provide some
> improvements in behavior.  I am certainly not as expert about EPICS internals
> as some of you, so feel free to correct any mistakes in my description of
> the problem and suggested improvements, but I'd invite you to also discuss
> the spirit of the change more than the exact technical implementation
> details (which can be left to the real guru's like Marty Kraimer).  I
> have had preliminary discussions with Marty, so I know I'm not completely
> off base.
> 
> Problem Statement
> =================
> 
> 1) Communication between 2 ioc's is unreliable and non-deterministic.
> 
>         There are 2 ways to transfer data between ioc's: push (ca_put style)
>         and pull (monitor style).  Suppose the source signal transitions from
>         0 to 1 and back again.  In the case of push, if the remote/sink ioc
>         is temporarily busy, the 2nd put may occur before the record is
>         processed, and so the 0->1 transition is never seen.  In the case
>         of pull, if the source ioc is loaded, the monitor may be "dropped"
>         (i.e. the channel access buffer is full and the data is discarded)
> 
>         Summary: for database to database communication, there is
>         NO RELIABLE, GUARANTEED DELIVERY of data.
> 
>         (I know that there is a proposal to support process passive over
>         channel access, which also addresses this problem)
> 
> 2) Channel access server consumes large amounts of memory.
> 
>         Because of its low priority, the channel access server reserves
>         considerable storage space to hold any outbound data occurring as
>         as result of database processing (at higher priorities). This
>         pre-allocation of memory is done to improve determinism in the
>         record processing.  Nevertheless, under load the server discards
>         data (see above).

The event queuing sub-system has been completely rewritten in the 
new (portable C++) server so that it consumes much less
memory. This server will not show up in IOC core until 3.14.
The new event queuing subsystem is also more robust in heavy load
situations. There is still room for improvement in the 
message buffering sub-system.

>         The design of epics core says that the channel access client
>         connection tasks run at a priority lower than all of the scan tasks.
>         The rationale (as I understand it) is that epics is a real time
>         system, and keeping operator displays up to date is lower priority
>         than executing the control algorithm on the ioc.
> 
>         [ This rationale fails on 2 points: (1) control algorithms may span
>         ioc's, and (2) if an operator can't reach the ioc, he must assume
>         that the system is no longer under control and must crash it. This
>         is the only reasonable operational approach.  The impact of not
>         crashing our RF system is that we may trip the cold compressor and
>         eventually release 10's of thousands of dollars of helium. ]
> 

All systems degrade in some way under load. We attempted 
to design the EPICS system so that it would degrade in a 
controlled and predictable fashion.

Some basic design assumptions that we made:
1) Excessive network load should never prevent the scan tasks
from  performing activities that are self contained within
a particular IOC.
2) A slow CA client (which cant process events (monitors) as fast as 
the IOC scan tasks create them) should never prevent the scan 
tasks from  performing activities that are self contained within
a particular IOC.
3) Excessive server load created by client requests should
never prevent the scan tasks from  performing activities that are self
contained within a particular critical IOC.

In the CA event (monitoring) system we have the data base scan 
tasks (and others) which are the event producers and the CA 
clients who are the consumers of events (monitors). In between
the two we have a finite buffering system which can accumulate
events if there temporary periods when communication does not
proceed immediately. If the client's sustained event consumption 
rate is slower than the scan task's sustained event consumption
rate then we will eventually overflow the finite buffering system.
Likewise if the average available network bandwidth constrains the 
event transmission rate to the client to the point where it
is slower than the event production rate then we will also
overflow the finite buffering system.

Fully expecting the above situation to occur under load we
designed a flow control system into the original Channel 
Access so that when a particular client starts to fall far 
behind it sends a message to the server and requests that 
no event updates be sent until the client removes any backlog 
that may exist. When the client has finished processing all 
outstanding events it sends another message to the server indicating 
that it should: 1) send an update for each monitored PV that may
have changed in the interim and 2) resume normal event
dispatching. The intent is to discard only intermediate
events when the system is under heavy load, and to 
guarantee that slow clients are working with reasonably 
fresh data.

> (2) if an operator can't reach the ioc, he must assume
>         that the system is no longer under control and must crash it. This
>         is the only reasonable operational approach.  The impact of not
>         crashing our RF system is that we may trip the cold compressor and
>         eventually release 10's of thousands of dollars of helium. ]

Here is a list of symptoms you describe above 
(and some possible remedies which no-doubt Chip is
aware of by I will include for those who are not):

1) The scan tasks consume enough CPU so that the server is
starved for CPU.
        SOLUTION=> split into two IOCs or buy new CPU
2) The IO state is not saved during a reboot
        SOLUTION=> cut vme SYSRESET trace on IO card
        SOLUTION=> place IO on field bus
3) Local control algorithms take precedence over remote
        control algorithms.
        SOLUTION=> CPU consumption by local control is
        quite predictable. Buy new CPU or split IOC into two IOCs
        to provide additional headroom.

> 
> 3) Insufficient flexibility in setting priorities in algorithm design
> 
>         Database record processing is prioritized according to frequency.
>         10 hz pre-empts 5 hz, etc.  In a system which may experience
>         temporary loading phenomena, either due to IOEVENT scanning or
>         non-epics task processing, less frequently processed records cease
>         to process (hard edged scheduler in VxWorks).
> 
>         So, if I have a record which absolutely must be processed at least
>         once every 10 seconds, and the bulk of my database is processed
>         at once a second, then I must process the critical record at
>         2 hz to guarantee it is not starved out.
> 

Agree that additional flexibility would be better here.

> Proposed Solution
> =================
> 
> 
> 2) Re-prioritize channel access handling
> 
>         a) When an event is posted to a client (either as a result of a
>         monitor firing, or an OUTLINK pushing data), and the buffer is
>         full, then IMMEDIATELY start the network operation and allocate
>         a new buffer from a freelist.  The old buffer is marked as in the
>         state SENDING.  When the network operation completes, the buffer is
>         put back on the freelist. Under this solution, channel access
>         outbound network operations are processed in the context
>         of the scan task.

Use of free lists for CA server buffering is a good idea which
I have wanted to implement for several years.

If a socket call takes a non-deterministic time to complete 
do we allow this to upset the scan period? Note that we already
have this problem to some extent because the WRS "netTask"
runs at elevated priority :(

Another problem: If we make socket calls from the scan tasks
then their maximum stack consumption may increase. Do a tt()
on one of the event tasks while it is running a few times
if you would like to see how many subroutines deep the
IP kernel goes.

Another complication: One scan task can end up sending a TCP/IP
message to hundreds (a non-deterministic number) of clients.

> 
>         b) Replace all the channel access client tasks with a single task
>         to flush partially full buffers that are more than so many
>         milleseconds old (programmable, of course).  If channel access
>         priority is implemented, replace this with 3 tasks (high, medium,
>         and low).

Must use "select()" then. This is ok (the new server has this option)
but we will loose the option of prioritizing special clients
above other (non-ca-server activity in the system).
Remember that we can never block the system on a single client that
is not reading its TCP circuit (therefore we must use select()
and non-blocking io if we do this). Savings: 4k of stack per client.

> 
>         Impact 1: some determinism in record processing is lost. This can
>         be minimized by using zero-copy options in VxWorks tcp/ip.  The
>         amount of lost determinism is small (after all, all scan tasks
>         can already be interrupted by higher priority scan tasks, so we
>         don't truly have a determinstic system except for IOEVENT and/or
>         EVENT scanning, or whichever task you set to the highest priority).
>

Zero copy sockets *are* cool but unfortunately - not portable 
(they are a WRS prop API). The new server has the socket calls
well isolated partly because I was considering use of zero copy
TCP. I would need to redo the message buffering if we were to see
any real performance gain.

>         Impact 2: memory usage is greatly reduced, because only 1 buffer
>         per client is needed, plus some number on the freelist.

Dynamic memory management within the server is a complex issue. I
have made a large improvement in the event queuing system already
(in the portable C++ server lib). There is also room for improvement
in the message buffering system.

> 
>         Impact 3: monitors are NEVER dropped, which means a pull architecture
>         becomes one with reliable data delivery
> 

Intermediate monitors will be dropped in a consumer producer system where
the sustained production rate is higher than the sustained consumption
rate if we are not willing to suspend event production (fact of life).

>         Impact 4: (almost) all channel access client tasks are deleted,
>         again resulting in hugh memory savings, and significantly lowering
>         the cost of supporting many clients.

If I place the single threaded portable server into the IOC we will
see this benefit. This is easy to do but:

Pro: 4-8k less stack per client
Con: No preemptive multi-tasking. ie the alarm client is not allowed to be
serviced at a higher priority than some of the scan tasks etc. We dont do
this now but perhaps we would like to.

Best Regards (and sorry about the long response),

Jeff

-- 
______________________________________________________________________
Jeffrey O. Hill                 Internet        [email protected]
LANL MS H820                    Voice           505 665 1831
Los Alamos, NM 87545 USA        FAX             505 665 5107

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System