EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  <19961997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  <19961997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: EPICS database & channel access
From: [email protected]
To: [email protected]
Date: Tue, 17 Dec 96 13:46:54 -0500
I would like to start a discussion about possible improvements to ioc core.
These improvements should not be hard to implement, and will provide some
improvements in behavior.  I am certainly not as expert about EPICS internals
as some of you, so feel free to correct any mistakes in my description of
the problem and suggested improvements, but I'd invite you to also discuss
the spirit of the change more than the exact technical implementation
details (which can be left to the real guru's like Marty Kraimer).  I
have had preliminary discussions with Marty, so I know I'm not completely
off base.


Problem Statement
=================

1) Communication between 2 ioc's is unreliable and non-deterministic.

        There are 2 ways to transfer data between ioc's: push (ca_put style)
        and pull (monitor style).  Suppose the source signal transitions from
        0 to 1 and back again.  In the case of push, if the remote/sink ioc
        is temporarily busy, the 2nd put may occur before the record is
        processed, and so the 0->1 transition is never seen.  In the case
        of pull, if the source ioc is loaded, the monitor may be "dropped"
        (i.e. the channel access buffer is full and the data is discarded)

        Summary: for database to database communication, there is 
        NO RELIABLE, GUARANTEED DELIVERY of data.

        (I know that there is a proposal to support process passive over
        channel access, which also addresses this problem)

2) Channel access server consumes large amounts of memory.

        The design of epics core says that the channel access client 
        connection tasks run at a priority lower than all of the scan tasks.
        The rationale (as I understand it) is that epics is a real time
        system, and keeping operator displays up to date is lower priority
        than executing the control algorithm on the ioc.

        [ This rationale fails on 2 points: (1) control algorithms may span
        ioc's, and (2) if an operator can't reach the ioc, he must assume
        that the system is no longer under control and must crash it. This
        is the only reasonable operational approach.  The impact of not
        crashing our RF system is that we may trip the cold compressor and
        eventually release 10's of thousands of dollars of helium. ]

        Because of its low priority, the channel access server reserves
        considerable storage space to hold any outbound data occurring as
        as result of database processing (at higher priorities). This
        pre-allocation of memory is done to improve determinism in the
        record processing.  Nevertheless, under load the server discards
        data (see above).

3) Insufficient flexibility in setting priorities in algorithm design

        Database record processing is prioritized according to frequency.
        10 hz pre-empts 5 hz, etc.  In a system which may experience
        temporary loading phenomena, either due to IOEVENT scanning or
        non-epics task processing, less frequently processed records cease
        to process (hard edged scheduler in VxWorks).

        So, if I have a record which absolutely must be processed at least
        once every 10 seconds, and the bulk of my database is processed
        at once a second, then I must process the critical record at
        2 hz to guarantee it is not starved out.

Proposed Solution
=================

1) Re-schedule database processing

        Replace all periodic scan tasks with 3 new periodic scan tasks: HIGH,
        MEDIUM, and LOW.  Each of these tasks maintains multiple scan lists
        corresponding to frequency.  Scheduling is done at the granularity
        of 1 record processing.  So, while processing the 5 hz list, at
        the end of a single record processing step I check to see if it is
        time to process the 10 hz list.  If so, I process that entire list
        and then resume the 5 hz list.  (Generalization to multiple lists
        is left as an exercise for the user).

        The PRIO field in each record determines to which of the 3 sets of
        scan lists the record is assigned.

        Impact 1: high priority records are never starved for CPU cycles
        (unless you say everything is high priority :)

        Impact 2: remove 4 tasks, and support more periods without incurring
        a task overhead -- in fact could dynamically configure periods
        instead of statically configuring them.

2) Re-prioritize channel access handling

        a) When an event is posted to a client (either as a result of a
        monitor firing, or an OUTLINK pushing data), and the buffer is
        full, then IMMEDIATELY start the network operation and allocate
        a new buffer from a freelist.  The old buffer is marked as in the 
        state SENDING.  When the network operation completes, the buffer is 
        put back on the freelist. Under this solution, channel access 
        outbound network operations are processed in the context 
        of the scan task.  

        b) Replace all the channel access client tasks with a single task 
        to flush partially full buffers that are more than so many 
        milleseconds old (programmable, of course).  If channel access
        priority is implemented, replace this with 3 tasks (high, medium,
        and low).

        Impact 1: some determinism in record processing is lost. This can
        be minimized by using zero-copy options in VxWorks tcp/ip.  The
        amount of lost determinism is small (after all, all scan tasks
        can already be interrupted by higher priority scan tasks, so we
        don't truly have a determinstic system except for IOEVENT and/or
        EVENT scanning, or whichever task you set to the highest priority).

        Impact 2: memory usage is greatly reduced, because only 1 buffer
        per client is needed, plus some number on the freelist.

        Impact 3: monitors are NEVER dropped, which means a pull architecture
        becomes one with reliable data delivery

        Impact 4: (almost) all channel access client tasks are deleted,
        again resulting in hugh memory savings, and significantly lowering
        the cost of supporting many clients.


Regards,

Chip

-----------------------------------------------------------------------------
Chip Watson
Internet: [email protected]       Thomas Jefferson National Accelerator Facility *
Tel: (757) 269-7101             12000 Jefferson Avenue, MS 12A2
FAX: (757) 269-5024             Newport News, VA 23606
WWW: http://www.jlab.org/~watson/
* (formerly CEBAF, the Continuous Electron Beam Accelerator Facility)


Navigate by Date:
Prev: Re: Hideos Tim Mooney
Next: Re: EPICS database & channel access watson
Index: 1994  1995  <19961997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: VxWorks Licenses Dave Reid
Next: Re: EPICS database & channel access watson
Index: 1994  1995  <19961997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 10 Aug 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·