On Friday 24 June 2005 16:37, Kay-Uwe Kasemir wrote:
> On Jun 23, 2005, at 19:36, Benjamin Franksen wrote:
> > Do we really need this?
> > Isn't this more like exposing internal details, unnecessarily
> > complicating the client interface?
> > What about hiding most of the name service questions behind
> > protocol and
> > just give client code the answer(s) they want about PV in question?
>
> We need this type of API both for 'hello world' and the real world
> in the form of matlab scripts:
>
> pulselen = caget("DTL4:LLRF:PulseLen.value");
>
> Some CA clients want to be more involved in
> what's happening internally:
> 1 - select directory server
> 2 - query it for "DTL4:LLRF:PulseLen"
> 3 - CAS = the CA server that's marked 'primary'
> 4 - create a channel for "DTL4:LLRF:PulseLen"
> 5 - connect to CAS
> 6 - build a request for the "value" property
> 7 - send it
> 8 - wait for response
> 9 - extract the value from the response
> In addition, there's error handling.
>
> The V3 client API hid the discovery
> (configurable via EPICS_CA*ADDR* but not in the API),
> then exposed most from 4 down.
>
> Doug asked me yesterday why we have the user 'connect'.
> Ben suggests hiding as much as possible.
Let me explain the rationale.
Consider the above first 3 steps:
1 - select directory server
2 - query it for "DTL4:LLRF:PulseLen"
3 - CAS = the CA server that's marked 'primary'
There are two hidden preconditions in this 'code':
(a) some directory server is running
(b) there is one CA server marked 'primary' among all that serve the PV
in question
What if these preconditions are not fulfilled?
ad (a) The directory server might be down. Does this cause the whole
program to fall down? Do we want such a server to be a single point of
failure for all clients? I thought it was agreed that we want to fall
back to broadcasting in case name servers are down.
ad (b) How do you ensure that for several servers serving a certain PV,
there is always exactly one marked as 'primary'? This seems to imply
maintaining an absolute ordering relation among CA servers. Is this
ordering configurable for each PV or only for each server? Is the
configuration static or can it change during runtime and what does that
mean for already connected channels? Note that in a complex control
system servers may go down and come up in unpredictable patterns. So
even if the place in the ordering for servers is configured statically
for each server, for clients it will be something that might change at
runtime.
My suggestion is to hide the complexity of these questions behind a good
abstraction, so that the client programmer can not inadvertantly break
invariants, and so that client code is as robust as possible.
> Jeff's API notes include the question to abandon the
> channelStateNotify and instead allow subscription on channel
> creation.
>
> I agree completely that connection should be hidden.
> Instead of a 'connected' notification you get the value.
> A 'disconnect' is simply an error for blocking 'gets' & 'puts'
> or a special type of notification for async calls and subscriptions.
What you propose here is not hiding. As you said, you /will/ get an
error (for the blocking calls) if connection is lost and you /will/ get
a different sort of notification for async calls and subscriptions.
Nothing is hidden. Really hiding the connection state is impossible
under typical control system requirements. It is a fact we cannot
ignore that connections get lost, that PVs appear, disappear and
re-appear, possibly on a different server. The question is how you
structure the API so that client code can handle these situations
properly, depending on the client's requirements. Your proposal may be
better or worse in this regard (I haven't formed an opinion yet) but
the matter is completely separate from actually hiding stuff (like
server selection) from the client program.
> I do see a need to select the directory server
> (there might be different ones for the linac, test network, ...).
See above: what if it isn't running? Do you want the application to
fail, then?
> Sometimes I also want to decide which PV gets used in case
> there's a primary, backup, ...
Why? IMO, backup should be selected automatically, whenever primary
fails. Yes, you definitely do want to be notified whenever this
happens, but this is an orthogonal matter.
Server duplication for backup should IMHO be completely transparent to
CA clients.
Let me sketch how I would solve the multiple PVs problem in the most
general manner:
Each PV on each server has an associated property that indicates its
place in the global selection order. Let us for the moment name this
property 'precedence'. It could be an integral number, for instance.
There is a default precedence (for instance zero), but servers can
change the default (for subsequent creations) and are also able to
change precedences bulk-wise for a given subset (including all) of the
PVs they serve. Whenever a server creates a new PV with a given
precedence, it first checks if the (PV-name, precedence) pair already
exists in the system. If yes, this is an error and the PV does not get
created.
For each PV, CA protocol automatically selects the server with the
highest precedence. For an already connected channel, if a PV appears
in the system with the same name but higher precedence, two things
happen:
- client code is notified
- connection is re-routed to the new server
Thus, in the typical scenario with a primary server and a secondary
backup (synchronized via some as yet unspecified method), the PVs on
the primary server will all be configured to have higher precedence
than on the ones on the secondary one. As soon as the primary IOC goes
down, the secondary takes over without a fuss. Client /will/ get
notified, though, so maintenance can try and get the broken server
running again. As soon as it comes up again, it will take over, again
without a fuss. Clients keep running and working, no need to restart
anything, no need to reconfigure anything.
Second scenario: Testing in a production system. Let's assume you want
to test the (formerly) primary server before it gets back online. You
reconfigure it so that its PVs get default precedences lower than the
secondary one. You start a new CA client with a restricted set of
servers in its EPICS_CA_ADDR_LIST, i.e. excluding the (now working as
primary) backup server (this could be improved: we might have an
additional variable EPICS_CA_EXCLUDE_ADDR_LIST to make this easier and
more fool-proof). Then this client does never see the PVs from the
server you excluded, thus connects to the one you just brought online.
Note: it is extremely important that each client is notified whenever a
channel gets re-routed. This minimizes the probability of such a change
taking place accidentally and go unnnoticed, which would be very bad.
Such events should also be logged.
I think the above sketched solution is straight and simple to work with.
What's not so simple is to implement all this at the protocol level and
in the client libraries.
Note: It does in no way preclude directory services, even multiple
redundant ones. It only means that locating such services and using
them would not be visible to CA client programs. It also does not mean
that we cannot have a low-level API, separate from the CA client API,
that exposes and allows to manipulate all the details of service
location and routing of channels. But it /should/ be a completly
separate interface, and the client API should not depend on it.
> Lastly I like the idea that e.g. ZeroC ICE has where I can use
> a directory but also specify the data source directly (IP & port #).
> That way I can work without a directory,
> override the directory for testing or use another directory
> mechanism.
But isn't the current way to do this, i.e. /outside/ the program a lot
more flexible? Do really want to hard-code server IP numbers in CA
clients and recompile them whenever the IP number changes? I wouldn't
even hard-code domain names.
> So how about exposing these steps:
>
> Directory:
> * select directory server
> Optional. There's a default directory.
> * query it for a PV
> Returns list of PVs: CA server, type (primary, ...)
>
> Channel:
> * constructor(name, Directory = default)
> * constructor(name, CA server info)
> If you want full control over the data source
> [...]
I cannot imagine an application that needs this kind of full control.
Which do you have in mind?
Ben
- References:
- discover, connect, get value. was: Version 4 EPICS Kay-Uwe Kasemir
- Navigate by Date:
- Prev:
RE: Fundamental Types document / unsigned integers Jeff Hill
- Next:
Interesting reading material Benjamin Franksen
- Index:
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
discover, connect, get value. was: Version 4 EPICS Kay-Uwe Kasemir
- Next:
Interesting reading material Benjamin Franksen
- Index:
2002
2003
2004
<2005>
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|