EPICS Controls Argonne National Laboratory

Experimental Physics and
Industrial Control System

1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024  Index 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
<== Date ==> <== Thread ==>

Subject: Re: VME Bus Error handling on MVME3100 and 6100 boards
From: Till Straumann <[email protected]>
To: Andrew Johnson <[email protected]>
Cc: EPICS tech-talk <[email protected]>
Date: Thu, 10 Aug 2006 20:16:54 -0700
Here's some more stuff I found out:

*(valid_ptr1)    = val;   EIEIO();
*(invalid_ptr)   = val;   EIEIO();
*(valid_ptr2)    = val;   EIEIO();


(valid_ptrx are valid VME addresses, invalid_ptr causes a bus error when accessed).

The above sequence (through a Tsi148) does write the value into
the good addresses, i.e., I'm not sure if the bus error really flushes
the entire FIFO.

The appended test code, executed on a MVME6100
prints

Cexp>berrWhere()
Bus Error when accessing 0x21000a00 (PC: 0x9e7944)
Read instruction was at  0x9e7940 (offset 4)

i.e, this, IMHO demonstrates that the MVME6100/Tsi148
does indeed deliver the bus error interrupt in a timely
fashion.

Of course, write operations are completely asynchronous
and in that case, the only thing that can be done is reporting
that an error happened but there is no way to relate it
to a particular task/PC.

Note that this is also true for the Universe (with write-posting
enabled).

However, in contrast to the universe, write posting cannot be disabled
on the Tsi148 and that introduces problems with
VME ISRs: when an ISR clears an interrupt condition on a
board, that operation goes into the write-FIFO but the
ISR may return before the operation is actually performed
on the board and hence the BSP may find the IRQ still
pending and try to run a 2nd IACK after the ISR returns.

The only remedy here is reading something back from
the device prior to letting the ISR return (reading anything
flushes the tsi148's write-FIFO)

=> IMO, the Tsi148's new features
(fast 2eVME and SST transfers among others)
outweigh the disadvantage that write-posting
cannot be disabled.
I don't share your negative assessment and
recommendation to stay away from 6100s.


-- Till

PS: I also would like to stress that a bus error is really
     expensive - it can stall the CPU for 10s or even 100s
     of microseconds.
     During normal operation, VME bus errors *must*
     be considered a fatal error.

Andrew Johnson wrote

Till Straumann wrote:


This behaviour is not possible with the Tempe chip - VME read cycles that terminate with a Bus Error just return 0xFFFF values (all one bits), and by default there is no indication that anything unusual happened. It is possible to enable the VME Exception interrupt, which will eventually be serviced by the processor and most of the time would allow the BSP to suspend the correct task (assuming that the Bus Error wasn't caused by an Interrupt Service Routine), but that interrupt could be delayed for quite a long time by other pending interrupts.


So why would you care? The faulting task would still not be able to make any progress and hence be suspended right at the faulting instruction.


Interrupts may not be as quick at actually getting to the CPU as a Target Abort - I don't know whether modern CPUs finish off any/all instructions that they've already started running before they actually switch to processing the exception, but it's likely that there will be a number of instructions pending. This also supposes that interrupts are enabled at the time the bus error gets flagged.

If the Bus Error occurs inside an interrupt service routine, the ISR will get to run to completion, making use of the dud all-1's read data in the process (the vxWorks mv6100 BSP doesn't support nested interrupts, and the hardware doesn't appear to make it easy to do so). If this does occur, the task that happened to be running before the faulting ISR was entered might be incorrectly blamed for causing the error and suspended.

While it might be possible to work around some of these effects it won't be easy to do, and the "unreliable instruction location" behaviour is not something I'd want when debugging a driver.

- Andrew


#include <rtems.h>
#include <bsp.h>
#include <stdio.h>
#include <stdarg.h>
#include <bsp/vmeTsi148.h>
#include <bsp/VME.h>
#include <rtems/bspIo.h>

unsigned long long	erraddr;
Thread_Control		*err_tcb = 0;
int			wasRd;

void ab(unsigned *a)
{
	*a=0xdead;
}

unsigned
berrRaise(void *addr,int rd)
{
unsigned t1,t2,val=0xdeadbeef;

	wasRd = rd;

	asm volatile("mftb %0":"=r"(t1));
	if ( rd ) {
	asm volatile(
		"	.globl probeRDAddress\n"
		"probeRDAddress:             \n"
		"	lwz %0,0(%1)         \n"
		"	eieio                \n"
		:"=r"(val):"b"(addr));
	} else {
	asm volatile(
		"	.globl probeWRAddress\n"
		"probeWRAddress:             \n"
		"	stw %0,0(%1)         \n"
		"	eieio                \n"
		::"r"(val),"b"(addr));
	}
	asm volatile("mftb %0":"=r"(t2));
	return t2-t1;
}

void
berrHandler(void *uarg, unsigned long vec)
{
	vmeTsi148ClearVMEBusErrors(&erraddr);
	err_tcb = _Thread_Executing;
	/* write operation is totally asynchronous; we cannot suspend
	 * any arbitrary task that might be running now
	 */
	if (wasRd)
		rtems_task_suspend(RTEMS_SELF);
}


/* Call this to pick up the TCB of the suspended task.
 * For obscure reasons, the TCB is not accessible (i.e., it
 * is not updated with current information yet) from 
 * the ISR.
 */
void
berrWhere()
{
BSP_Exception_frame *f;
extern void probeRDAddress();
extern void probeWRAddress();

	if ( err_tcb )  {
		f = (BSP_Exception_frame*)(*(unsigned*)err_tcb->Registers.gpr1 + 8);
		printf("Bus Error when accessing %#llx (PC: %#x)\n",erraddr,f->EXC_SRR0);
		if (wasRd)
			printf("Read instruction was at  %#lx (offset %li)\n",
				(unsigned long)probeRDAddress,
				f->EXC_SRR0 - (unsigned long)probeRDAddress);
		else
			printf("Write instruction was at %#lx (offset %li)\n",
				(unsigned long)probeWRAddress,
				f->EXC_SRR0 - (unsigned long)probeWRAddress);
	}
}

void
_cexpModuleInitialize(void*unused)
{
	BSP_installVME_isr(TSI_VERR_INT_VEC, berrHandler, 0);
	BSP_enableVME_int_lvl(TSI_VERR_INT_VEC);
}

int
_cexpModuleFinalize(void*unused)
{
	BSP_disableVME_int_lvl(TSI_VERR_INT_VEC);
	return BSP_removeVME_isr(TSI_VERR_INT_VEC, berrHandler, 0);
}

References:
VME Bus Error handling on MVME3100 and 6100 boards Andrew Johnson
Re: VME Bus Error handling on MVME3100 and 6100 boards Kate Feng
Re: VME Bus Error handling on MVME3100 and 6100 boards Till Straumann
Re: VME Bus Error handling on MVME3100 and 6100 boards Andrew Johnson

Navigate by Date:
Prev: Re: VME Bus Error handling on MVME3100 and 6100 boards Andrew Johnson
Next: Re: Smooth transition from medm to edm Ralph Lange
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
Navigate by Thread:
Prev: Re: VME Bus Error handling on MVME3100 and 6100 boards Andrew Johnson
Next: Re: VME Bus Error handling on MVME3100 and 6100 boards Andrew Johnson
Index: 1994  1995  1996  1997  1998  1999  2000  2001  2002  2003  2004  2005  <20062007  2008  2009  2010  2011  2012  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  2024 
ANJ, 02 Sep 2010 Valid HTML 4.01! · Home · News · About · Base · Modules · Extensions · Distributions · Download ·
· Search · EPICS V4 · IRMIS · Talk · Bugs · Documents · Links · Licensing ·