numerics
J. P. Morgan
jpmorgan at vt.edu
Thu Oct 23 01:30:38 BST 2003
Dear All,
Notes on numerical questions brought up by PeterD as they relate to
statistical_properties. This should have been sent as a reply to his post,
but
I received that at a defunct mailer. As always, comments/criticisms
welcome/expected. JP
1. Connectedness (or lack thereof) is fundamental to the quantities
calculated
in statistical properties. Thus is it reasonable that INTERNALLY the
connected
indicator be a REQUIRED element of this tree. That is, calculation of
statistical properties should begin with determining the number of connected
components in the design graph. This is a simple, exact procedure. It
guarantees consistency with the connected indicator in a different branch of
the
external rep. For disconnected designs it will allow many optimality values
to
be established without using the canonical variances (CVs), the pairwise
variances (PVs), or the efficiency factors (EFs), since in this case many of
these values are defined to be not_applicable. It will also establish the
true
number of CVs that are infinite (not_applicable) and EFs that are zero.
Calculation of the sizes of the connected components would also tell the
true
number of PVs that are infinite.
2. Now consider the CVs, PVs, and EFs themselves. Each are multisets of
real
numbers. We report the distribution of CVs and EFs, and we report PVs via
function_on_k_subsets_of_indices (which can be handled as a distribution).
For
CV and PV we have an optimality value that is "number of distinct" (no
corresponding value for efficiency factors is currently included - it should
be
there, and is known to equal the number of distinct CVs). It is apparent
that
correctly identifying the number of distinct values for each multiset is
important. How do we handle this in the external representation?
It is reasonable, and simple, to guarantee externally that the distributions
are
correct up to the required precision. Internally we may know much more. We
could include an indicator for each of the three multisets in question as
follows:
multiplicities_guaranteed = element multiplicities_guaranteed { attribute
flag {
"true" | "false" } }
This would explicitly state the situation. So the distribution (values and
multiplicities) would be guaranteed correct to the stated precision in every
case, the multiplicities guaranteed absolutely correct whenever indicated,
and
the multiplicities absolutely correct in the vast majority of cases even
when
not guaranteed. I have used the phrase "absolutely correct" to mean the
indicator above takes value "true."
This does not preclude identical (to the required precision) values being
reported as distinct with their own multiplicities.
Why the hedge? Why not always have absolute correctness of multiplicities
guaranteed? It is a matter of what we are able, and what we are willing, to
internally implement. My knowledge of the possibilities is limited. In
particular, the matrix from which the EFs are calculated can contain
irrational
values. Is this a problem? If there is no problem of will or ability
(including resource limitations) then we should always guarantee
multiplicities.
3. Of CVs, PVs, and EFs, the CVs present the least difficulties. They are
in
1:1 correspondence with the eigenvalues of the information matrix C. The
EFs
also enjoy this correspondence for equally replicated designs, but may
otherwise
be roots of a matrix with irrational elements; however, their multiplicities
are
in 1:1 correspondence with the those of the CVs in every case. The PVs are
defined in terms of the M-P inverse of C, thus require either the spectral
decomposition of C or the inverse of an associated positive definite matrix
(at
least I know of no other way to determine their values). This may mean that
a
guarantee for PVs is not generally possible (please comment), though it is
certainly possible in special cases such as partial balance.
4. How bothersome is it that there may be cases where the multiplicities of
any of these distributions are not absolutely correct? From a practical
point of
view, very little. If we work with a standard precision of 9 significant
digits, the practicing statistician will not care that two variances
reported as
the same actually differ at a finer precision that that. It is not the best
situation for the researcher in statistical design, who may for example be
searching for designs with, say, two distinct variances. Still, we can be
explicit about when the potential for a problem is there.
5. There are a few tie-ins of statistical properties with other parts of
the
external representation other than that of connectedness with eigenvalues of
C
(i.e. with canonical variances, since we do not actually report the
eigenvalues
of C). I will write what I see in a further note. But it may be best to
first
decide how to handle the questions surrounding the three fundamental
distributions of CVs, EFs, and PVs.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://designtheory.org/pipermail/developers/attachments/20031022/d27591b8/attachment.html
More information about the Developers
mailing list