Experimental Physics and
Industrial Control System

[email protected] (John R. Winans) · Wed, 26 Oct 1994 15:38:04 -0600

In the light of a large number of email messages I have been getting
lately about memory and the BSPs for vxWorks, I offer the following:

This is the start of a FAQ I have been working on about how to deal with
memory when writing device drivers.

--John

P.S. This does not yet address the devLib code.  I hope that Jeff H. might be
     able to add stuff about it as is needed.

==========================================================================

First lets go over how the VME system defines the memory spaces.  And EXACTLY
what the BSP from vxWorks does to handle the VME bus.

There are 3 regions of memory, a 16-bit addressed range called A16 (or SHORT) 
that contains 64KB, a 24-bit addressed range called A24 (or STD) that
contains 16MB, and a 32-bit addressed range called A32 (or EXT) that contains
4GB.

The processor mode is also taken into account by the VME bus.  The allowed 
modes are USR and SUP (for user and supervisor.)  (These modes are not a 
concern to EPICS because it ALWAYS runs in the same mode.  But they have to 
be taken into account.)

The type of data access is also represented on the VME bus.  The types are used
to represent the intended use of the data being accessed.  These can be PGM for
instruction text, DATA for accesses to non-instruction data, and BLK for
bulk block transfers.

The VME defines something called an AMODE, the AMODE is the code that 
represents the memory region, processor mode, and data type.  These are 
expressed by vxWorks in the file 'vme.h' as the following:

#define VME_AM_STD_SUP_ASCENDING        0x3f
#define VME_AM_STD_SUP_PGM              0x3e
#define VME_AM_STD_SUP_DATA             0x3d

#define VME_AM_STD_USR_ASCENDING        0x3b
#define VME_AM_STD_USR_PGM              0x3a
#define VME_AM_STD_USR_DATA             0x39

#define VME_AM_SUP_SHORT_IO             0x2d
#define VME_AM_USR_SHORT_IO             0x29

#define VME_AM_EXT_SUP_ASCENDING        0x0f
#define VME_AM_EXT_SUP_PGM              0x0e
#define VME_AM_EXT_SUP_DATA             0x0d

#define VME_AM_EXT_USR_ASCENDING        0x0b
#define VME_AM_EXT_USR_PGM              0x0a
#define VME_AM_EXT_USR_DATA             0x09

(The 'ASCENDING' ones are WRS version of block transfer.)

Each of the AMODEs can be accessed by either using 8-bit (D8), 16-bit (D16), 
or 32-bit (D32) data transfers.  (The D8 width can be set to transfer the 
8-bit value on either the even and odd or just the odd portion of the 16-bit 
bus. Therefore the D8 access width is properly specified via D8(EO) or D8(O).

I/O boards one the VME backplane have to decode the AMODE that they are 
supposed to respond to as well as the data access widths that it supports.  
The AMODE is typically set via jumpers on the I/O board.  And the data width
is normally hardwired into the board.

Depending on the supported/selected AMODE that an I/O board is using, it has to
also decode the address lines representing the range of addresses that the 
board is going to respond to.  Obviously, when operating in an A16 mode, there
are 16 address lines, A24 has 24, and A32 has 32.

It is possible for a board to decode and respond to more than one AMODE and/or
data widths.  It is also possible that two seperate boards can respond to the
same address range in the same memory region, but to different processor modes
and/or data widths.  The VME bus will operate properly in either case.

===
Before continuing, it should be pointed out that the modern CPU boards contain
devices and memory along with the CPU.  This can cause some interesting
confusion because the CPU card might be able to access devices that are on
its own board, but other boards on the VME backplane might NOT be able to.
It is up to the designer of the CPU card to decide what can be accessed from 
the CPU and what can be accessed from the VME.

Since CPUs generally don't 'speak' VME directly, the VME bus has to be 
interfaced to the CPU as if it itself were a peripherial device.  Most CPUs
don't know squat about things like the D8(EO) mode and so on.  Therefore
they are 'faked' by the VME interface.  The common way to do that is to reserve
sections of the CPUs address space and assign each one to a VME AMODE and
data transfer width.  Then when the CPU wants to get at something in one of 
the VME areas, it has to access it by referenceing that section of the CPUs
address space that was allocated to the one of interest.

The rest of this document will use the term CPU address to refer to the address
being used by the CPU itself, and the term VME region to refer to a specific
AMODE and data transfer width.

===
Here is the view of how the distributed BSP sets up the mv167 to deal with
allowing accesses to and from each of the AMODEs and data widths in release 
5.1.1 of vxWorks.

Realize that there is a number of things on the mv167 itself that consume a 
range of addresses.  Most obviously is the DRAM.  There is also a number of
registers and things that are used to operate the mv167s included devices.

The mv167 has on it a 68040 CPU that is wired in such a way that it can 
address exactly 4GB of memory.  The CPUs addressing modes (that are similar 
to the VME ones, by the way) are essentially ignored.

Motorola has hard-coded a number of addresses on the mv167 and therefore they
can NOT be changed.  These are the device addresses for the I/O devices on the
mv167 (like the ethernet controller, serial ports, SCSI bus, etc.)  This 
reserved area is from CPU address 0xFFF00000 to 0xFFFEFFFF.  Devices here
are not visable from the VME bus.  And since this range is in use, it can not 
assigned to anything else... like a VME region.

The mv167 also has some memory on it that can be programmed to start just about
anywhere in the CPUs address space.

The distributed vxWorks BSP makes the following 'configurations':

1) Place the DRAM at CPU address 0

2) If there is 4MB of DRAM on the CPU board, place the DRAM in all the 
   VME_AM_STD_* AMODEs starting at 0x00400000.

3) Place the DRAM in all the VME_AM_EXT_* AMODEs starting at two times 
   sizeof(DRAM).

4) Configure the CPU address range from sizeof(DRAM) thru 0xEFFFFFFF to
   access VME_AM_EXT_USR_DATA using the D32 transfer width such that the 
   addresses on the VME bus are the same as those generated by the CPU.  This 
   is important to note because this way the CPU can NEVER generate an 
   VME_AM_EXT_USR_DATA D32 reference to any address less than sizeof(DRAM).

5) Configure the CPU address range from 0xF0000000 to 0xF0FFFFFF to access
   VME_AM_STD_USR_DATA using the D16 transfer width such that the addresses 
   on the VME are the low 24 bits of those generated by the CPU.  (Note that 
   this covers the full address range of A24 for USR_DATA D16.  And that CPU
   address 0xF0000000 is VME A24 USR DATA D16 address 0.)

6) Configure the CPU address range from 0xFFFF0000 thru 0xFFFFFFFF to access
   VME_AM_SUP_SHORT_IO using the D16 transfer width such that the addresses
   on the VME are the low 16 bits of those generated by the CPU (same game
   as for the A24 space.)

Notice a few things here.  The A32 space is accessed by the CPU from 
sizeof(DRAM) thru 0xEFFFFFFF.  BUT the DRAM on the CPU board is placed into
the same A32 space from two times sizeof(DRAM) to three times sizeof(DRAM).
Basically, we have a situation where the CPU board can access the VME bus 
where the CPU board itself is also 'responding'.  This is a no-no.  And is
taken care of by simply not accessing that area.  (The same thing happens in 
the A24 area when there is 4MB DRAM on the CPU card.)

==========================================================================

vxWorks and the mv167's MMU and cache system.

As a completely seperate issue from the VME stuff discussed above, you also
have to concern yourself with the fact that vxWorks turns on virtual memory
translation and the cache system.  This manifests itself such that even though
all the VME and DRAM stuff is set up as outlined above, it can be rearranged
'virtually' by the MMU.

As it turns out, the only reason that the MMU is turned on by the BSP is so 
that the cache system can be enabled.  This is fine and dandy, except the BSP
has to deal with the fact that the MMU is doing things too.  Specifically, it
has to build the MMU configuration tables for the CPUs address space.  This
takes time and consumes memory.  So WRS decided to only build these tables
to go from CPU address 0 to 0x02FFFFFF.  Further they enable the cache from 0 
thru 0x01FFFFFFF.  

Nice huh?  What we really WANT is to have the MMU map everything from 0 to 
0xFFFFFFFF.  And to only have the cache on where the DRAM is.

So the outcome is that the only 'usable' A32 space starts at three times 
sizeof(DRAM) and ends at 0x02FFFFFF.

==========================================================================

>I noticed JRW modified address modifiers USR to SUP for AM_STD and AM_EXT
>modes.  What was the motivation behind this?

It was either that, or restrap all our hardware when switching between
hkv20 and mv167 CPUs (the WRS BSP defaults for hkv20 used SUP for them and
the mv167 used USR.)  It seemed pretty reasonable to make them the same... and
to do so in a way that creates the least work for everyone.

>Why did you change the starting base addresses in the BSP for A32 and A24.

I changed the starting base address for the A32 and A24 regions to 0.  But 
it should not be visable to anyone properly using the sysLocalToBus() stuff.

IMHO the defaults represent a major lack of understanding by WRS as to how
the vmeChip2 operates.  Starting them all over the place fragments the VME
memory space and confuses anyone trying to configure add-on boards.

  As an example, a mv167 with 4MB ram using the distributed BSP has its
  local view of the DRAM starting at CPU address 0, the CPUs view of A32
  space starts at the end of DRAM (0x00400000.)  But then the image of the
  CPUs DRAM is mapped into A32 at address 0x00800000.  Therefore there is a
  hole in A32 where the CPU can not access (that is from 0 to 0x00400000)
  because that is where the CPU sees its own DRAM and again from
  0x00800000. to 0x00BFFFFF.  However, other boards COULD be placed into
  A32 at address 0 AND be accessed by anything BUT the CPU.  Pretty stupid.

  My way places the DRAM at CPU address 0 and the CPUs view of A32 space still
  starts at 0x00400000.  The image of the DRAM also starts at 0 in A32.
  Now we have no hidden hole and less wasted/fragmented space.  We
  also have an easier time configuring the cache and MMU if we want to use
  them.  It also prevents the starting address from moving around when the 
  DRAM size changes.

The change to where memory starts in A24 is less important, but prevents
conflicts with the historical locations of I/O cards like the DVX2502.

Other changes to A24 allow CPUs with more than 4MB to place some or all of 
their DRAM into the A24 space so that DMA devices that only know how to talk 
to A24 (like the NI1014 and DVX2502) can access some CPU memory.

>I noticed that AM_EXT uses D32 while AM_STD and AM_SHORT uses D16; which
>might account for the bus errors I get while accessing A16. 

That has caused us headaches too.  Just a gotcha to remember about when
setting up your I/O boards.  A16 space uses D16 transfers ONLY.  It is
possible to use one of the unused vmeChip2 mapping registers to map a
second A16 region in somewhere.  The problem with that is that 
sysBusToLocalAdrs() will ALWAYS return the addresses for the A16-D16 region 
unless you hack it up as well (no biggie, but the short-sighted choices by 
WRS prevents a proper implementation.)

>Why can't I use addresses above sysMemTop() and below 0xefffffff for my
>A32/D32 boards?  I get bus errors.

You should look closer at the lunacy in the BSP.  Wind rivers jerks around
with the MMU mappings and the cache stuff.  The cache and MMU are set up
as described above.  Therefore you can not expect to access anything in A32 
that starts before 3*sysMemTop() or after 0x02FFFFFF.

If you want to change the MMU stuff, you can change the addresses and stuff in 
the BSP.  The Reference manual discuses how to do it.

If you want to just bypass the MMU altogether, you can do what I did and enable
the transparent translation registers.  Thus defeating the MMU config garbage.  
I enabled transparent translation from 0x80000000 thru 0xffffffff which covers 
a large chunk of A32 space (as well as all the A24, local I/O and A16, but 
that is OK.)

To do this, I put a sysSetTTRegs() function into sysALib.s because configuring 
the MMU to cover anything over 0x10000000 causes the booting time to exceed 
something like 30 seconds while it wastes memory building the page tables.

>Have you talked to Wind Rivers about this?

Yes...  Feel free to ask them about TSR 12979 (submitted by John Winans
circa August 1993.)  They got nowhere really fast (far as I could tell.) 
Their primary response was to do the stuff mentioned on the mailing list
(which is similar to what I ended up with after talking to Rozelle Wright
at LANL about her ideas on a solution.)  

[Gee thanx for your help WRS...  This big-$$ licence sure seems worth it now.]

>How do I add the second piggy-back memory board to my mv167?

You simply plug it in... and watch vxWorks ignore it  :-(

If you WANT to use a second memory board on your mv167, you MUST hack
the BSP to enable it and add it to the free memory pool.  I have done
this in the below mentioned BSP.

>Have you contacted WRS about this?

Ask them about TSR 16040 (submitted by John Winans circa March 1993 or SPR
1484.)  They have it under advisement.  Last I spoke to them they told me
HOW to turn on the memory board, but their way required a different boot
image for CPUs with and without the second card... and broke things like
sysLocalToBusAdrs().   My way allows the same image to operate both
configurations and allows sysLocalToBusAdrs() to work... however it lies
about where sysMemTop really is.  I got over that pretty fast since there is
no sane reason for EPICS to use it.

>When using the distributed BSP from WRS, how come I can't 
>sysLocalToBusAdrs() a DRAM address for the A24 space when I have more
>than 4MB DRAM?

Again, look closely at the above discussion on the VME bus and the BSP
memory configurations.

Basically, someone at WRS decided to hard-code an 'if' statement that said
"if there is 4MB ram, put it into A24 from 0x400000 to 0x7fffff, otherwise
leave it out of A24 completely."

In my BSP the 'if' reads "is there is 8MB or less, put it into A24
starting at 0x000000, otherwise leave it out, but allow some to be
dynamically mapped in at a later time on demand."

A little insight to EPICS initialization...
When EPICS (3.11 and later) boots, the drivers that need to do DMA into
A24 space (like the GPIB and DVX drivers) can call the EPICS devlib
function devLibA24Malloc() to get memory that exists in A24 such that a
call to sysLocalToBusAdrs() will work.  See the devLib.c source and DOCs
for more info about devLibA24Malloc().

>Why is this memory stuff so confusing.

My guess is that it was a provide-all to please every type of design. 
It also has to do with the differences between 3U and 9U VME cards... the
smaller ones do not have all the signals that the larger ones do.

FYI:
====

I made a tar file containing out mv167 BSP that we use here at the APS for 
vxWorks version 5.1.1.  It is in:

                phoebus.aps.anl.gov:~winans/vwbsp.tar.gz

It is intended for use with EPICS releases 3.11 and later.  It has not
been tested with earlier releases.  I must also mention that it is ALWAYS
in a state of flux.  So you might want to email me if you have a specific 
question or problem with it.

I welcome sugestions and corrections to this FAQ.  I also welcome sugestions 
on how to better deal with all of this stuff.

! John Winans                     Advanced Photon Source  (Controls)    !
! [email protected]              Argonne National Laboratory, Illinois !
!                                                                       !
!"The large print giveth, and the small print taketh away." - Tom Waits !

Experimental Physics and Industrial Control System

Experimental Physics and
Industrial Control System