sddsoutlier

**description:**`sddsoutlier`does outlier elimination of rows from SDDS tabular data. An ``outlier'' is a data point that is statistically unlikely or else invalid.**example:**Eliminate ``bad'' beam-position-monitor readouts from PAR x BPM data, where a bad readout is one that is more than three standard deviations from the mean:`sddsoutlier par.bpm par.bpm1 -columns=P?P?x -stDevLimit=3``sddspfit par.bpm -pipe=out -columns=P1P2x,P1P1x`

| sddsoutlier -pipe=in par.2bpms -column=P1P1xResidual -stDevLimit=2`sddspfit par.bpm -pipe=out -columns=P1P2x,P1P1x`

| sddsoutlier -pipe par.2bpms -column=P1P1xResidual -stDevLimit=2

| sddspfit -pipe -columns=P1P2x,P1P1x

| sddsoutlier -pipe=in par.2bpms -column=P1P1xResidual -stDevLimit=2**synopsis:**`sddsoutlier [-pipe=[input][,output]] [`*inputFile*] [*outputFile*] [-columns=*listOfNames*] [-excludeColumns=*listOfNames*] [-stDevLimit=*value*] [-absLimit=*value*] [-absDeviationLimit=*value*] [-minimumLimit=*value*] [-maximumLImit=*value*] [-chanceLimit=*value*] [-invert] [-verbose] [-noWarnings] [{-markOnly | -replaceOnly={lastValue | nextValue | interpolatedValue | value=*number*}}]**files:***inputFile*contains column data that is to be winnowed using outlier elimination. If*inputFile*contains multiple pages, the are treated separately.*outputFile*contains all of the array and parameter data, but only those rows of the tabular data that pass the outlier elimination.*Warning*: if*outputFile*is not given and`-pipe=output`is not specified, then*inputFile*will be overwritten.**switches:**`-pipe[=input][,output]`-- The standard SDDS Toolkit pipe option.`-columns=`-- Specifies a comma-separated list of optionally wildcard containing column names. Outlier analysis and elimination will be applied to the data in each of the specified columns independently. No row that is eliminated by outlier analysis of any of these columns will appear in the output. If this option is not given, all columns are included in the analysis.*listOfNames*`-excludeColumns=`-- Specifies a comma-separated list of optionally wildcard containing column names that are to be excluded from outlier analysis.*listOfNames*`-stDevLimit=`-- Specifies the number of standard deviations by which a data point from a column may deviate from the average for the column before being considered an outlier.*value*`-absLimit=`-- Specifies the maximum absolute value that a data point from a column may have before being considered an outlier.*value*`-absDeviationLimit=`-- Specifies the maximum absolute value by which a data point from a column may deviate from the average for the column before being considered an outlier.*value*`-minimumLimit=`,*value*`-minimumLimit=`-- Specify minimum or maximum values that data points may have without being considered outliers.*value*`-chanceLimit=`-- Specifies placing a lower limit on the probability of seeing a data point as a means of removing outliers. Gaussian statistics are used to determine the probability that each point would be seen in sampling a gaussian distribution a given number of times (equal to the number of points in each page). If this probability is less than*value**value*, then the point is considered an outlier. Using a larger*value*results in elimination of more points.`-invert`-- Specifies that only outlier points should be kept.`-markOnly`-- Specifies that instead of deleting outlier points, they should be only marked as outliers. This is done by creating a new column (`IsOutlier`) in the output file that contains a 1 (0) if the row has (no) outliers. If`IsOutlier`is in the input file, rows with a value of 1 are treated as outliers and essentially ignored in processing. Hence, successive invocations of`sddsoutlier`in a data-processing pipeline make use of results from previous invocations even if`-markOnly`is given. Note: if`-markOnly`is not given, then the presence of`IsOutlier`in the input file has no effect.- -tt -replaceOnly={lastValue | nextValue | interpolatedValue | value=
*number*} -- Specifies replacing outliers rather than removing them.`lastValue`(`nextValue`) specifies replacing with the previous (next) value in the column.`interpolatedValue`specifies interpolating a new value from the last and next value (with row number as the independent quantity).`value=`specifies replacing outliers with*number**number*. `-verbose`-- Specifies that informational printouts should be provided.`-noWarnings`-- Specifies that warnings should be suppressed.

**see also:****author:**M. Borland, ANL/APS.