EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: RE: MAXv "spurious" command errors
From: <[email protected]>
To: <[email protected]>, <[email protected]>
Cc: [email protected], [email protected]
Date: Tue, 13 Jul 2010 09:16:13 +0100
Michael, Dirk,

Yes, I have also seen the command error bit set on the move done
interrupt. It does not impinge on the operation of the system so I've
ignored it so far. This, combined with the status read errors I have
seen does imply data read errors on the VME bus.

Regards,


Austen Rose

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Davidsaver, Michael
Sent: 12 July 2010 17:28
To: Dirk Zimoch
Cc: Keister, Jeffrey; [email protected]
Subject: RE: MAXv "spurious" command errors

Hi Dirk,

> It would be nice to know the command that caused the interrupt. Check 
> your PREM and POST strings. Anything strange there?

As far as I can tell the command error bit is never actually being set.
The message is only an indication that the bit read 1 this time.  The
interrupt which is actually occurring is the move done interrupt (low 8
bits).  If I make longer moves then the messages become less frequent.

We have run with the drvMAXvdebug flag set so that we can see every
message sent or received.  The messages are always the same, but the
command error does not occur every time.

This past weekend we actually ran with the command error interrupt
disabled and still see these messages.


Thanks,
Michael

> 
> Recently (last week) I added some code to print the faulty command.
> There is a lot of guessing involved. I basically go back from the 
> current command index until I find an axis select command. Here is the

> code, based on motorR6-2-2.
> 
>      static char msgbuffer[80];
>      static char cmdbuffer[80];
> 
>      if (status1_flag.Bits.cmndError)
>      {
>          int cmdstart, cmdend;
>          char* commandstring;
> 
>          sprintf(msgbuffer, "drvMAXv motorIsr: Card %d command error 
> detected\n", card);
>          epicsInterruptContextMessage(msgbuffer);
>          /* print last command: search start at axis select command */
>          commandstring = cmdbuffer+sizeof(cmdbuffer);
>          *--commandstring = 0;
>          *--commandstring = '\n';
>          cmdend = pmotor->outPutIndex;
>          cmdstart = cmdend;
>          do {
>              cmdstart--;
>              if (cmdstart < 0) cmdstart = BUFFER_SIZE-1;
>              *--commandstring = pmotor->outBuffer[cmdstart];
>          } while (commandstring > cmdbuffer &&
>              (toupper(commandstring[0]) != 'A' || strchr
("XYZTUVRSAM",
> toupper(commandstring[1])) == NULL));
>          epicsInterruptContextMessage(commandstring);
>      }
> 
> 
> Best regards,
> Dirk
> 
> Davidsaver, Michael wrote:
> > All,
> >
> > Has anyone seen this behavior before in a MAXv (or elsewhere)?
> >
> > We have been having some sporadic crashes with a VME crate with 5
> MAXv
> > motor controllers in it (detail below).  The crate in its present 
> > configuration has been in operation for ~10 months.  During that
time
> we
> > have seen the following message occur randomly (~20 sec period) when
> an
> > axis is moving.
> >
> >> <179>syslog:
> >> drvMAXv.cc:motorIsr: command error - card 03
> >
> > This happens for all five cards.  All cards were recently returned
to
> > OMS and updated to firmware 1.33, but the problem remains.
> >
> >
> > As it didn't seem to have any effect on the operation of the device
> we
> > ignored it, assuming that the crashes were the result of the known 
> > watchdog timer problem (which we were also seeing).  However it now 
> > seems that this is a symptom of a different, and fairly nasty
> problem.
> > During a recent attempt at debugging the problem we added a printk
to
> > the interrupt handler to print the interrupt status register when a 
> > command error is indicated.
> >
> >>     if (status1_flag.Bits.cmndError)
> >>     {
> >> #ifdef __rtems__
> >>         printk("at cmd err Status %08x\n", status1_flag.All); 
> >> #endif
> >
> > Note that according to the MAXv documentation the top 5 bits of this

> > register are unused and should always be zeros.
> >
> > What we now see is:
> >
> >> <179>syslog:
> >> drvMAXv.cc:motorIsr: command error - card 01 at cmd err Status
> >> 03000004
> >> <179>syslog:
> >> drvMAXv.cc:motorIsr: command error - card 00 at cmd err Status 
> >> 7FFF0010
> >> <179>syslog:
> >> drvMAXv.cc:motorIsr: command error - card 02 at cmd err Status
> >> 3FDF0002
> >> <179>syslog:
> >> drvMAXv.cc:motorIsr: command error - card 03 at cmd err Status
> >> 03810004
> >> <179>syslog:
> >> drvMAXv.cc:motorIsr: command error - card 00 at cmd err Status 
> >> 01810020
> >> <179>syslog:
> >> drvMAXv.cc:motorIsr: command error - card 04 at cmd err Status 
> >> 3FCF0010
> >
> > So it looks like the top 16-bits are occasionally set when they
> should
> > be.  We haven't had a chance to investigate with a logic analyzer,
> but
> > it looks like a classic VME timing problem (top 2 bytes driven too 
> > late).
> >
> > The crate controller is a mvme3100 running RTEMS 4.9.3 with EPICS 
> > 3.14.10 and synapps 5.5 w/ the motor module updated to 6.5.1.  The
> crate
> > also contains a Joerger VSC16 scaler and a Precision Analog Systems
> DAC.
> > The scaler is 32-bit and has been showing no signs of this problem.
> >
> >
> > Michael Davidsaver
> > NSLSII Controls Group
> > Brookhaven National Lab
> > (631) 344-3698
> >
> >
> >
> >



-- 
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
 





References:
MAXv "spurious" command errors Davidsaver, Michael
Re: MAXv "spurious" command errors Dirk Zimoch
RE: MAXv "spurious" command errors Davidsaver, Michael

Navigate by Date:
Prev: Re: MEDM: Hiding waveforms in the Cartesian Plot object Luedeke Andreas
Next: Using Motor Record with READBACK Deadband and number of Retries. Ernest L. Williams Jr.
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: RE: MAXv "spurious" command errors Davidsaver, Michael
Next: RE: MAXv "spurious" command errors austen.rose
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  <20102011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·