The effect size m can be considered unstable when much of the pushing or pulling of m come from the extremities of data. That is when a collection of extreme data points have a high influence on m in an equation such as y=mx+b for that effect size is Unstable.
The Influence of Extremities Factor has been devised to measure the instability of the effect size m.
In Yi = mXi + b, Yi and Xi are vectors of data with Real Entries.
m is the regression coefficient and is calculated via least squares. This is termed as simple linar regression.
(x,y)i denotes the ith point of the data.
In equation (5.1) infi is the influence of the point i in calculating m. This is defined as the following:
mU is m with all points in the dataset, where U denotes the entire dataset.
mU-i is m calculated without the ith data point.
We then sort the data by values of descending infi. The set of data comprising of the top 10% of infi is labelled as “UL” for the upper limit and the bottom 10% of infi is labelled as “LL” or lower limit.
By removing the UL set of data we calculate m{U-UL}.
By removing the LL set of data we calculate m{U-LL}.
By removing data that is pushing up “m” (i.e. high infi) we get m{U-UL(10%)} which is lower than mU.
By removing data that is pulling down “m” (i.e.low infi) we get m{U-LL(10%)} which is higher than mU.
Influence of Extremities Factor at 10% intervals would be:
A low IEF indicates a stable effect size m. A high IEF indicates an unstable effect size m. By stable, the value of m is not overly influenced by a few data points and by unstable, the value of m is overly influenced by a few data points. Datasets that are stable are more representative of the entire cohort and datasets that are unstable are likely not representative of the entire cohort.