All,
Even if you don't use vxStats beware that the ca server in iocCore calls
memFindMax() in vxWorks once every EPICS_CA_BEACON_PERIOD seconds, and there is
a strong possibility that WRS has implemented memFindMax() on top of the vxWorks
memPartInfoGet() which is now known to have a severe bug which could damage pool
and cause your IOC to fail. With this memFindMax() calling frequency the failure
rate could vary significantly depending on your system configuration. A simple
IOC might have a very low occurrence rate because most of the built in memory
allocation is based on free lists. However, an IOC with layered tools or device
drivers installed might experience a substantially higher failure rate because
frequent allocation of memory from pool might be more likely to collide with the
defect now know to be included in the stock WRS version of memPartInfoGet().
Jeff
> -----Original Message-----
> From: Hoff, Lawrence [mailto:[email protected]]
> Sent: Saturday, May 31, 2003 5:33 PM
> To: [email protected]
> Subject: bug in VxWorks memPartInfoGet (used by VxStats)
>
>
> At SNS, we ran into a bug in memPartInfoGet, using VxWorks
> kernel 5.4.2 and VxStats. Apparently there is a bug in memPartInfoGet()
> which can corrupt the heap if another task preempts memPartInfoGet()
> and does a heap operation (e.g. malloc/free). There is an SPR covering
> this bug which seems to imply that the bug also exists in kernel
> version 5.5. It is not clear which earlier versions of VxWorks might
> also be affected, but the date on the SPR is more than 3 years ago.
>
> The tell-tale signature is a message which says :
> " invalid block at <some hex address> deleted"
>
> If you ever see that message, then your heap has been
> corrupted by the bug.
>
> The symptoms of the heap corruption can be quite severe. In
> our case, the system acted as if memory was completely exhausted,
> requiring a system reset. In one case, the IOC would not run reliably
> for more than a day without heap corruption.
>
> We implemented a work-around in VxStats by re-writing
> memPartInfoGet(). With this work-around, the unstable system
> has been running the better part of a week now without failure.
>
> I suggest that if anyone has had unexplained heap corruption
> on IOCs which use VxStats, that they either log all console
> messages looking for the signature, or try running without VxStats,
> or try our work-around.
>
> Attached to this message is the WRS SPR information, and
> the patch to VxStats.
>
> HTH -- Larry
>
>
>
> /* Added by LTH because memPartInfoGet() has a bug when "walking" the
> list */
>
> #include "semLib.h"
> #include "dllLib.h"
> #include "smObjLib.h"
> #include "private/memPartLibP.h"
>
> static STATUS memInfoGet(
> MEM_PART_STATS * ppartStats /* partition stats structure */
> ){
>
>
> FAST PART_ID partId = memSysPartId;
> BLOCK_HDR * pHdr;
> DL_NODE * pNode;
> ppartStats->numBytesFree = 0;
> ppartStats->numBlocksFree = 0;
> ppartStats->numBytesAlloc = 0;
> ppartStats->numBlocksAlloc = 0;
> ppartStats->maxBlockSizeFree = 0;
> if (ID_IS_SHARED (partId)) /* partition is shared? */
> {
> /* shared partitions not supported yet */
> return (ERROR);
> }
> /* partition is local */
> if (OBJ_VERIFY (partId, memPartClassId) != OK)
> return (ERROR);
> /* take and keep semaphore until done */
> semTake (&partId->sem, WAIT_FOREVER);
> for (pNode = DLL_FIRST (&partId->freeList);
> pNode != NULL;
> pNode = DLL_NEXT (pNode))
> {
> pHdr = NODE_TO_HDR (pNode);
> {
> ppartStats->numBlocksFree ++ ;
> ppartStats->numBytesFree += 2 * pHdr->nWords;
> if(2 * pHdr->nWords > ppartStats->maxBlockSizeFree)
> ppartStats->maxBlockSizeFree = 2 * pHdr->nWords;
> }
> }
> ppartStats->numBytesAlloc = 2 * partId->curWordsAllocated;
> ppartStats->numBlocksAlloc = partId->curBlocksAllocated;
>
> semGive (&partId->sem);
> return (OK);
> }
>
>
>
> SPR
>
> TITLE:
> memPartInfoGet() and memPartAvailable() can
> cause memory corruption
> SPR #: 30316
> STATUS: Assigned
> IDE: Tornado 2.2
> RTOS: VxWorks 5.5
> PRODUCT NAME: VxWorks 5.5
> RELEASE STATUS: FCS
> PRODUCTS AFFECTED: VxWorks
> HOST: All
> HOST OS: All
> ARCH FAMILY: All
> PROCESSOR FAMILY: N/A
> PROCESSOR: All
> BSP:
> DESCRIPTION:
> Under some conditions it may be possible that
> calling the routine memPartBlockIsValid() will induce memory corruption.
>
> This routine is called by both memPartInfoGet() and
> memPartAvailable(). Also, memPartAvailable() is called by memPartShow()
> and memShow().
>
> PATCHES:
> DATE CREATED:
> Feb 15 2000
- References:
- bug in VxWorks memPartInfoGet (used by VxStats) Hoff, Lawrence
- Navigate by Date:
- Prev:
RE: channel access disconnects Jeff Hill
- Next:
Re: bug in VxWorks memPartInfoGet (used by VxStats) Benjamin Franksen
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
<2003>
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
- Navigate by Thread:
- Prev:
bug in VxWorks memPartInfoGet (used by VxStats) Hoff, Lawrence
- Next:
Re: bug in VxWorks memPartInfoGet (used by VxStats) Benjamin Franksen
- Index:
1994
1995
1996
1997
1998
1999
2000
2001
2002
<2003>
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
|