efficiency (partial) balance
R A Bailey
r.a.bailey at qmul.ac.uk
Tue Nov 4 11:44:33 GMT 2003
This message concerns efficiency factors, but an immediate conclusion
is that we should say nothing about statistical properties of
unequally replicated designs in the forthcoming release.
For i=1, 2, let L_i be the information matrix of design i and suppose
that the variance per plot when this design is used is sigma_i^2.
Then the variance of the estimator of the contrast x'\tau is
x'L_i^{-}x \sigma_i^2. Here L^{-} denotes any generalized inverse of
L, though, since L is symmetric, I almost always use its canonical
generalized inverse, which some people call Moore-Penrose.
Thus, as far as x'\tau is concerned, the relative efficiency of
design 1 with respect to design 2 is
\frac{x'L_2^{-}x \sigma_2^2}{x'L_1^{-}x \sigma_1^2}.
(I think that efficiencies in statistics are always relative, like
this.) Because we don't know the value of either of the \sigma^2
before we've done the experiment, we define the efficiency factor (for
design 1 relative to design 2) to be
\frac{x'L_2^{-}x}{x'L_1^{-}x}.
[I think that there should be no dissent so far.]
Now, it is common to take design 2 to be a complete block design in the
equally replicated case, but this doesn't extend to unequal
replication so it is better to take design 2 to be the unblocked design
with the same replications. Now, here comes the common mistake.
People generally say that L_2 = diag(r) in this case, where r is the
vector of replications. I think that they shouldn't. Look at any
anova table for a CRD almost anyhere in the world and you will see a
line saying ``total = n-1", where n is the number of plots. This is
because the data is being silently projected orthogonal to the plots
all-1 vector before anything else happens. Look at the last Section
of Chapter 2 in
http://www.maths.qmw.ac.uk/~rab/DOEbook/
to see this made explicit. So we ought to put L_2 = X'(I-J/n)X, where
X is the design matrix. Then L_2 = diag(r) - rr'/n.
So the definition of efficiency balance ought to be that
\frac{x'M^{-}x}{x'L^{-}x} is constant for contrasts x, where L is the
information matrix of the design we are considering and
M = diag(r) - rr'/n. Now, M is positive semidefinite so it has a
canonical square root N, so I think the definition should be that
N^{-}LN^{-} is scalar on the image of N (and zero on N of the all-1s
treatments vector).
We can get away with being sloppy for efficiency balance. To be fair,
Emlyn Williams, who defined the concept, almost always has a phrase
like `let y be the mean-corrected data', so the the sloppy definition
will be fine in practice. But if we want to extend it to some idea of
efficiency partial balance then I think we need to start with the
matrix N^{-}LN^{-}.
RAB
--
R. A. Bailey
Snail: School of Mathematical Sciences Tel: (+44) 20 7882 5517
Queen Mary, University of London
Mile End Road Email: r.a.bailey at qmul.ac.uk
London E1 4NS
U.K.
More information about the Developers
mailing list