Re: Gaussian distribution
- From: aruzinsky <aruzinsky@xxxxxxxxxxxxxxxxxxxx>
- Date: Sun, 23 Dec 2007 13:56:34 -0800 (PST)
On Dec 22, 7:25 pm, hru...@xxxxxxxxxxxxxxxxxxxx (Herman Rubin) wrote:
In article <6869a233-786e-48c4-afc4-7aa063f6a...@xxxxxxxxxxxxxxxxxxxxxxxxxx>,
aruzinsky <aruzin...@xxxxxxxxxxxxxxxxxxxx> wrote:
On Dec 21, 1:36=A0pm, hru...@xxxxxxxxxxxxxxxxxxxx (Herman Rubin) wrote:
In article <92750c41-7c11-4ce0-9e11-5e890b366...@xxxxxxxxxxxxxxxxxxxxxxxxx=com>,
aruzinsky =A0<aruzin...@xxxxxxxxxxxxxxxxxxxx> wrote:ps.=3D
On Dec 20, 3:54=3DA0pm, hru...@xxxxxxxxxxxxxxxxxxxx (Herman Rubin) wrote:=
In article <58197979-983f-4b5f-8c23-53ee907cd...@xxxxxxxxxxxxxxxxxxxxxx=
e:com>,
aruzinsky =3DA0<aruzin...@xxxxxxxxxxxxxxxxxxxx> wrote:
On Dec 20, 10:23 am, aruzinsky <aruzin...@xxxxxxxxxxxxxxxxxxxx> wrote:=
On Dec 18, 12:01 pm, hru...@xxxxxxxxxxxxxxxxxxxx (Herman Rubin) wrot=
................
"No finite number is enough.", is irrelevant because this forum isHowever, sinceIt works for joint distributions as well. =A0A necessary and
1. Under certain conditions, CLT implies that sums have near/exact
Gaussian marginals and not joint distributions.
sufficient condition that a multivariate distribution is
normal is that all linear combinations are univariate
normal. =A0No finite number is enough.
applied statistics therefore the objective is "close enough".
This is NOT an "applied statistics" forum, and "applied
statistics" SHOULD BE the proper application of statistical
theory. I suggest you look at my commandments, on my
web page.
I'm sorry. I was fooled by Google categorizing this under Science and
Technology which is applied math and calling it "sci.stat". I suppose
you are going to tell me that science isn't applied math?
The
question is whether the assumption of a joint Gaussian can be
justified by the CLT in practice.
Wrong. The question is whether the use of a procedure
based on the normal distribution is a good enough
approximation to what can be done, that is, if methods
based upon it work well. Robustness theorems like the
Gauss-Markov Theorem show that, in a sense, normal is
"least favorable", and least squares works as well for
non-normal as normal if the other assumptions are met.
I disagree. It is often harmful to assume a narrow model when none is
needed. For example, the sample mean is the MLE of the mean for
i.i.d. Gaussian. It works just as well for a wide class of
distributions. However, using the adequacy of Gaussian MLE to
vindicate the assumption of a Gaussian model is morally and
practically wrong. Instead of assuming a Gaussian model, a researcher
should simply state that the sample mean is the best linear estimate
for a wide class of distributions. The practical and moral difference
is that
1. The Gaussian model tends to discourage looking for better non-
linear estimates.
2. The Gaussian model may be extrapolated into other situations where
it fails.
Both scenerios can be avoided by assuming a less narrow model.
If one knows the true distribution, one can do better,
but one usually does not have enough information to
do so.
That does not justify lying about reality. Pretending that reality is
approximately Gaussian when it isn't is B.S.. The fact that it is
wide spread makes it worse.
For example, consider the ARMA
process:
Xt =3D (B - a1)/(1 - a1*B) Et
and its pure MA equivalent
Xt =3D (B - a1) ( 1 + a1*B + (a1*B)^2 + ...) Et
where
B is backshift operator
|a1| < 1
Et is Non-Gaussian (e.g., Uniform), i.i.d., zero mean.
Observe that Xt are white (uncorrelated).
First, I am not going to quibble about the process being stationary
only at t =3D oo. I assume it is close enough. Observe that the
marginals Xt should be almost Gaussian, particularly with a1 near 1.
Certainly, if Xt were exactly jointly Gaussian, a1 could not be
estimated from finite samples of Xt.
It can be stationary; so what. Also, if Xt were exactly
normal, a1 could not be estimated from knowing the entire
process. As I said, the normal distribution is typically
least favorable, in that often procedures based upon it
work quite well even if normality is absent, as long as
second moments are present.
So, would you or would you not conclude that a1 could be practically
estimated and what is your rationale? In other words, is Xt
practically jointly Gaussian or not?
As usual, the estimation of a1 gets worse as it goes to
zero. I can come up with a method for estimating it.
How did "3D" appear in your quotes?
I suspect the estimation of a1 gets worse as a1 goes to +-1.
So are linear transformations of dependent jointly normal random2. Near/exact Gaussian marginals do not imply a near/exact GaussianCorrect.
joint distribution.
3. Typically, joint Gaussian and not just Gaussian marginalSee my comment to 1. =A0
distributions are assumed in science and engineering models.
, the CLT is relatively unimportant. =A0Instead, it is mainly a kind ofAll linear transformations of independent normal
linear invariance that makes joint Gaussian distributions important.
In particular, orthogonal linear transformations of independent joint
Gaussian random variables are also independently jointly Gaussian.
distributions are jointly normal. =A0In fact, this is another
way of defining jointly normal, which handles singular
cases quite well.
variables.
True, but the use of the suggested definition does not
imply previous knowledge of jointly normal,
Wrong. Independent random variables with Gaussian marginals are
jointly Gaussian with zero off diagonal terms in the covariance
matrix. This is just semantics.
and works
just as well if there is linear dependence.
And, it is shorter to say and easier to remember if "dependence" and
"independence" are left out.
The central limit theorem works for joint distributions,How does that relate to my example?
and there are even articles on how closely the fit is,
based on the dimension and third moments.
It does not; it relates to your questions about the
applicability of the multivariate central limit
theorem.
Well, I thought I was asking about the practical value of the CLT as
it might approximately apply to my example of a weighted summation of
non-Gaussian i.i.d. random variables, but since this isn't applied
math, yes, my question was pointless.
Much is not mostly. Much of is done is adequate, but done for theIf this is not the big picture answer, what is?Much of what is done does NOT depend on normality.
It is the case that methods derived on several
assumptions, including normality, work almost as
well if the other assumptions are satisfied, but
there is a large leeway about normality.
Regression is such a situation.
wrong reasons.
No, not for the wrong reasons. For a great many
situations, normality is the least important assumption,
and methods based on normality work well even if it
is not present.
The adequacy of Gaussian ML estimates for other distributions does not
justify a Gaussian model. Models should only represent useful
approximations to reality. When a Gaussian Model is NOT needed, it is
often harmful. Statistically speaking (pun intended) an unneeded
model is harmful because it discourages more accurate representations
of reality.
After Gauss proved the Gauss-Markov
theorem, he gave up trying to find reasons for things
being normal, as it did not matter.
The Gauss-Markov theorem does not cover estimation of parameters of
Gaussian ARMA processes, e.g., LS estimates are not ML and both are
biased (asymptotically unbiased), but nevertheless, inappropiate
Gaussian models seem just as popular in time series analysis as
elsewhere.
Consult a competent mathematicalI thought I was.
statistician if you have a question about when.
If you were, you would not have raised the questions
you did, and you would have known the answers I gave.
Because I would believe in widely accepted statistical dogma about
Gaussian models?
.
- References:
- Gaussian distribution
- From: deluded.soul@xxxxxxxxx
- Re: Gaussian distribution
- From: aruzinsky
- Re: Gaussian distribution
- From: Herman Rubin
- Re: Gaussian distribution
- From: aruzinsky
- Re: Gaussian distribution
- From: Herman Rubin
- Gaussian distribution
- Prev by Date: Re: Lilliefor´s TEST for N >30
- Next by Date: Re: Poll of Usage of the Word, "Average"
- Previous by thread: Re: Gaussian distribution
- Next by thread: Re: Gaussian distribution
- Index(es):
Relevant Pages
|