Talk:Kalman filter
|
Contents |
Talk page tidy
I've removed remarks here that are no longer relevant to the current state of the pagePlease restore them if they need further discussion - the revision history (http://en.wikipedia.org/w/index.php?title=Talk:Kalman_filter&oldid=12511610) knows all. — ciphergoth 07:58, 2005 Apr 19 (UTC)
Cleanup
I think all of the notation needs to be cleaned up and more concepts used to be linked to their articles. For example, the Kalman filter#Example has only one link to normal distribution because I put it there. I also don't see anything linking to linear algebra, which this article uses heavily. Cburnett 19:44, Apr 16, 2005 (UTC)
- When i first started editing this page it was a bit of a state. I added sections on the EKF and UKF and information filter as well as the derivation (i'm thinking of changing this to 'relation to probability theory' and add the normal style derivation), the example and the bit about models, i.e. everything except the introduction. I also added most of the equations. If i neglected to add enough cross references then i appologise. Chrislloyd 17:06, 18 Apr 2005 (UTC)
- Yes, I looked at the history today and your work was invaluable. I'm not sure a cleanup notice isn't overkill - there's definitely room for improvement, but what article isn't that true of? Cleanups usually mean that a substantial portion of the material is by a single author who isn't that experienced. Now we've got a few of us making improvements, maybe we could remove the notice? — ciphergoth 20:03, 2005 Apr 18 (UTC)
The notice is by no means meant as an insult or to demean your contributions (no need to apologize). I just think it could use a little cleanup. If there's a notice for what I mean (more links to relevant articles and some more basic explanation on some of the stuff) then by all means.... I just wish I had the time to do it myself because I wouldn't have bothered with the notice. I certainly won't revert the removal of the notice, but I hope some effort is made. Just scanning real-quick, I only find 6 links from Kalman filter#Kalman filter basics through just before Kalman filter#Applications. Definitely lacking links, IMHO. Cburnett 21:10, Apr 18, 2005 (UTC)
- I agree there are some changes that could be made. I personally don't like the statement "...This will reduce the Kalman filter to an ordinary observer, which is computationally simpler" and "...the Kalman filter is an optimal estimator in a least squares sense of the true state." The one area i haven't touched and which i think needs the most work is the introductory section. For example the statement "tracking [estimating?] a time-dependent state vector with noisy equations of motion in real time [recursively surely!] by the least-squares method." Isn't the least squares a different (though perhaps similar) method? On the whole (and i know i'm biased) i think this page is turning into one of the best KF pages on the web. Chrislloyd 22:03, 18 Apr 2005 (UTC)
- A least squares estimator simply means that the error-metric for the filter, <math>\hat{\textbf{P}}_{k|k-1}<math>, is quadratic, which of course is true since it equals <math>\textbf{E}\{(\textbf{x}_{k} - \hat{\textbf{x}}_{k|k-1})(\textbf{x}_{k} - \hat{\textbf{x}}_{k|k-1})^{T}\}<math>. The Kalman filter equations are derived by differentiating the trace of this error, and setting it equal to zero. I already i December pointed out that this article lacks a derivation of theese equations. The current derivation section doesn't really contain any derivations at all (only recursive Bayesian estimation background). --Fredrik Orderud 23:16, 18 Apr 2005 (UTC)
- Perhaps i should have made myself clear, i wasn't saying that the kalman filter isn't an optimal estimator in the least sqaures sence, merely that i didn't like the dangling statement without any explanation, nor should it be the first thing that is said about the kalman filter. Chrislloyd 05:06, 19 Apr 2005 (UTC)
- I changed the introduction considerably, and reduced the prominence of this remark, though I hadn't read this discussion when I did so!— ciphergoth 08:02, 2005 Apr 19 (UTC)
And what does this formula mean?
I thought this was the formula for the covariance matrix:
<math>\textbf{Q}_{k} = E[\textbf{w}_{k} \textbf{w}_{k}^{T}]<math>
But the article has
<math>\textbf{Q}_{k} \delta(k-j) = E[\textbf{w}_{k} \textbf{w}_{j}^{T}]<math>
What does it mean? — ciphergoth 10:17, 2005 Apr 18 (UTC)
- It means that <math>Q_k \delta(k-j)<math> has to be diagonal for if <math>k \ne j<math> then <math>\delta(k-j) = 0<math>. Cburnett 14:03, Apr 18, 2005 (UTC)
- Not quite - it's trying to say that wk and wj are uncorrelated where <math>k \ne j<math>. I think it'd be much more readable to say that explicitly! — ciphergoth 14:23, 2005 Apr 18 (UTC)
- I clarified my previous statement, but yes, diagonal covariance matrix <==> uncorrelated. Cburnett 14:32, Apr 18, 2005 (UTC)
- OK, but it's not saying that each element in w is uncorrelated - it's saying that successive samples are uncorrelated. Have edited article to make that explicit. — ciphergoth 15:03, 2005 Apr 18 (UTC)
- I clarified my previous statement, but yes, diagonal covariance matrix <==> uncorrelated. Cburnett 14:32, Apr 18, 2005 (UTC)
- Bah, this is why I put the cleanup notice on the article. Even I, who's at least familiar with Kalman filters, gets fouled up on the notation.... Cburnett 18:17, Apr 18, 2005 (UTC)
- <math>\textbf{Q}_{k} = E[\textbf{w}_{k} \textbf{w}_{k}^{T}]<math>, <math>E[\textbf{w}_{k} \textbf{w}_{k'}^{T}] = 0, k \ne k'<math> is definately better. <math>\textbf{Q}_{k} \delta(k-j) = E[\textbf{w}_{k} \textbf{w}_{j}^{T}]<math> This was one of the original equations which i left unchanged, although i've never particularly cared for it much. Chrislloyd 18:30, 18 Apr 2005 (UTC)
- I have to admit that I was the one who defined the noise covariance using the Dirac delta function. My motivation for doing so was to express the properties of this covariance elegantly using a singe equation (opposed to the two currently used), thinking the readers of this article would have a solid background in linear algebra and digital signal processing. I still think that the single equation approach is more elegant, but acknowledges that the dual equation approach proposed may be more easily understood for new readers. --Fredrik Orderud 23:28, 18 Apr 2005 (UTC)
- The way i've seen this defined most often is with the kronecker delta function <math>\textbf{Q}_{k} \delta_{kj} = E[\textbf{w}_{k} \textbf{w}_{j}^{T}]<math> where <math> \delta_{kj} <math> is 1 if k = j else is 0. To be honest i think this is unnecessary obfuscation. Chrislloyd 05:30, 19 Apr 2005 (UTC)
Derivation of Kalman equations
The aricle lacks a derivation of the Kalman equations. This is crucial, since it also doubles as a proof of the filter being a MMSE (minimal mean square error) estimator. The derivation section provides a nice relationship to recursive Bayestian estimation, but doesn't really contain any derivations. My suggestion is therefore to rename derivation to relationship to bayesian estimation, and create a new derivation section containing the differentiation of the trace of the estimate covariance, and and the solution found by setting it equal to zero. --Fredrik Orderud 23:55, 18 Apr 2005 (UTC)
- I think this is a good idea. Chrislloyd 05:11, 19 Apr 2005 (UTC)
- That would be great! — ciphergoth 07:53, 2005 Apr 19 (UTC)
A derivation section (including the actual derivation) is now in place. This section contains the whole derivation required, but should probably be extended a bit with more textual descriptions and intermediate steps of the equations to clear things up a bit.
Estimated covariance?
I've reacted to the naming of the "error covariance" <math>\textbf{P}<math> as "estimated covariance" several times in the article. This is under my impression not correct, since <math>\textbf{P}<math> really is the accuracy of the state estimate, and not an estimate of the covariance. My suggestion is therefore to rename all occurences of <math>\hat{\textbf{P}}<math> to <math>\textbf{P}<math> and replace estimated covariance with error covariance . This would also make the artice more consistent with most research papers on Kalman (that I know of). --Fredrik Orderud 00:03, 19 Apr 2005 (UTC)
- My aim here was to convey the idea (unsucessfully perhaps) that the (estimated) covariance is only the true covariance when the filter parameters and models are matched to the true models/parameters. Chrislloyd 05:18, 19 Apr 2005 (UTC)
- I'd prefer to do the rename but leave the hat, but I favour consistency with current practice in research over my own aesthetics. — ciphergoth 07:39, 2005 Apr 19 (UTC)
- I've now renamed all occurences of <math>\hat{\textbf{P}}<math> to <math>\textbf{P}<math> and replaced estimated covariance with error covariance. This was importaint, since the Kalman filter does not estimate a state covariance. It estimates a state, which also gives a estimate covariance for estimate accuracy as a by-product. This change makes the article consistent with all of my sources of Kalman filtering (mostly IEEE papers and Brown & Hwang book on kalman). --Fredrik Orderud 09:16, 19 Apr 2005 (UTC)
Usage of Chi in intermediate steps
Chi is used several times as a intermediate step in the calculation of the covariance for the posterior state-estimate. I think this is unfortunate, since chi also is used to denote sigma-points in SPKF. Very few of my Kalman resources actually use this intermediate step, so my proposal is therefore to remove this intermediate step.
- Sure - I took that from the example. The only advantage in using Chi is that it avoids repeating (I - KkHk) but I'll be led by you. — ciphergoth 19:42, 2005 Apr 19 (UTC)
- Few texts associate a label with the Chi term, and as such is probably non-standard. We could remove the term altoghter, or use a different letter to represent it. Chrislloyd 23:39, 19 Apr 2005 (UTC)
Text for derivation
Could you be a bit clearer about the problems with my text for the derivation? There are stylistic, idiomatic, and clarity problems with the current text, but given your rather harsh words in the change logs, I don't want to try and fix them without learning what was wrong with my last go? I know I slightly misstated what was being minimized, but what problems were there apart from that that called for a total revert? — ciphergoth 20:16, 2005 Apr 19 (UTC)
- My consern with your rewrite was that you turned the text into an introduction to stochastic estimation (in general) by introducing sentences like we must define precisely what we mean by "best possible estimate". The Kalman article is not the place to describe what MMSE is. However, I agree with you in that the section definetely needs to be rephrased.
- What about stating that the method is based on an MMSE solution to the (quadratic) <math> E \{ \textbf{e}_{k|k} \textbf{e}_{k|k}^T \} <math> error metric? (or something simmilar to that)
- Please don't take this personally. My intension was only to make the article appear more in conformancea with the Kalman introductions found in the litterature. --Fredrik Orderud 09:22, 20 Apr 2005 (UTC)
var(X)
We could make a lot of the equations a lot more readable by using the function <math>\textrm{var}(\textbf{X}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{X} - E[\textbf{X}])^{T}]<math>
If we reference covariance matrix (which defines this function) the first time we use it, would this be an acceptable simplification? — ciphergoth 08:40, 2005 Apr 20 (UTC)
- I'm not so sure about this, since <math>\textrm{var}(\textbf{X})<math> easily can be misinterpreted as "ordinar" scalar variance. Also, it is very importaint that we strive toward consistency with excisting Kalman litterature.--Fredrik Orderud 11:04, 20 Apr 2005 (UTC)
- I read your comment below as approval on both parts since your comment above was added later, so I went ahead and made the change. The gain in clarity is enormous. The formulae are all half the length, and steps such as breaking one var into two become very easy to see and to justify. This use of var is blessed by the article on covariance matrix, and I think that an encycopaedia article on the Kalman function has to favour clarity for the new reader over familiarity for one already familiar with the topic, if it can do so without sacrificing accuracy, as we can in this instance. At least, please don't revert this change until other contributors to this article have had a chance to comment. — ciphergoth 12:15, 2005 Apr 20 (UTC)
- I've just posted a comment (http://en.wikipedia.org/wiki/Talk:Covariance_matrix#Nonstandard_notation.3F) on the correlation matrix talk-page questioning the notation used. This article is also without any external references, so I'm quite sceptical about it. Futhermore wikipedia is not the place to "invent" your own notation in favour of established notation pracise. What about splitting up the equations over several lines instead? --Fredrik Orderud 12:33, 20 Apr 2005 (UTC)
- Given how convenient it makes things, I'm astonished to learn that there's no notation in widespread use in the literature that you've read that expresses these things, but I'll take your word for it.
- If it turns out that that notation is novel to Wikipedia, or not in widespread use anywhere, then I will sadly have to concur that it has to go. But if it's widely used in other fields, then I'm still in favour of borrowing it here to make things clearer. — ciphergoth 13:22, 2005 Apr 20 (UTC)
- I've always seen <math>\operatorname{cov}(X)<math> as indicating the covariance matrix. Cburnett 14:24, Apr 20, 2005 (UTC)
- I agree that "cov" is at least a better alternative to "var", but I have yet to see it being used in a Kalman setting. Do you have any references? All my sources stick to <math>E [\textbf{x}\textbf{x}^T]<math>.
- Given the gain in clarity, if it's valid notation that's in general use I think it's worth using in this context even if it's not usual. "cov" is fine. — ciphergoth 15:29, 2005 Apr 20 (UTC)
- Statistical Inference by Casella & Berger (ISBN 0-534-24312-6) uses cov() to denote covariance matrix. Cburnett 17:52, Apr 20, 2005 (UTC)
- My two pennies: I've just been looking at "estimation with applications to navigation and tracking" Bar-Shalom et al which uses the cov natation. Personally i would restrict its use to the derivation since it is only there where there is any appreciable benefit. Generally when defining noise etc <math>E [\textbf{x}\textbf{x}^T]<math> is perfectly sufficient Chrislloyd 17:29, 20 Apr 2005 (UTC)
- Chrislloyd, just to be clearl: <math>E [\textbf{x}\textbf{x}^T]<math> is correct only for zero-mean, so I think cov(X) would be more specific... Cburnett 17:52, Apr 20, 2005 (UTC)
- OK, but one of KF assumptions is that the noise is zero mean. Before when we had <math> \textbf{w}_k \sim N(0,\textbf{Q}_k) <math> then <math> E[\textbf{w}_k\textbf{w}_k^T] = \textbf{Q}_k <math> We know no longer seem to be specifying that for the KF the noise is assumed to be zero mean. Chrislloyd 18:21, 20 Apr 2005 (UTC)
- Understood, just saying that we need to be careful to not say "cov(X) = E(XX^T)" without explicitly saying it's zero-mean. That's all. Cburnett 18:29, Apr 20, 2005 (UTC)
- Its a good point. It should specified somewhere that the noise is zero mean before we start using cov as a short hand for E[...] I think the former should be in the definition and in the derivation we can say because ... therefore ... Chrislloyd 18:43, 20 Apr 2005 (UTC)
I strongly urge you to reconsider the usage of "var" or "cov" in this article. Cburnett may very well be right in that "cov" is used in some parts of statistics to indicate covariance matrices, but I have yet to see it being used in a Kalman setting (or statistical signal processing at all). If you all would please look at p.54 in Robot Localization... (http://www.negenborn.net/kal_loc/thesis.pdf) (PDF p.64) you can see the standard notation. Is it really so bad with multiline equations? Also while <math>E[x x^T]<math> is self explanitory, the use of "var" may cause confusion since many people (including myself) are unfamiliar with the definition of "var" for matrices.--Fredrik Orderud 20:48, 20 Apr 2005 (UTC)
- I have to say (and without any offence indended) the argument "i'm not very familiar with this notation therefore we shouldn't use it" doesn't really hold any water. Also i've given a reference to a book by probably the most highly respected authority in this field which *does* use the cov notation in connection with (kalman filter) covariance matrices and yet your argument is "no, look at this pdf by some random bloke" to convince us that this is the standard notation. Chrislloyd 21:20, 20 Apr 2005 (UTC)
- Indeed. Fundamentals of Statistical Signal Processing, Volume 1: Estimation Theory by Steven Kay (ISBN 0-13-345711-7) also uses cov for matrices. Linking to covariance matrix should be sufficient for someone who is unaware of what it is. Cburnett 21:52, Apr 20, 2005 (UTC)
- The "random bloke" article was used since it's one of the "external references" found in this article. My primary references for the usage of <math>E[x x^T]<math> are "Introduction to Random Signals and Applied Kalman Filtering" (Brown & Hwang), An Introduction to the Kalman Filter (http://www.cs.unc.edu/~tracker/media/pdf/SIGGRAPH2001_CoursePack_08.pdf) (SIGGRAPH2001), The unscented Kalman filter for nonlinear estimation (http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=882463) (IEEE AS-SPCC) and Asymptotic behavior of the extended Kalman filter... (http://ieeexplore.ieee.org/xpl/abs_free.jsp?arNumber=1101943) (IEEE Automatic Control). But since you seem to have documented the usage of "cov" you've convinced me. I wont argue agains it any more :) But at least, please rename the "var" into "cov".
- Will do. — ciphergoth 23:13, 2005 Apr 20 (UTC)
- I think the wider point here is that there is no such thing as 100% standard notation; all authors have their own prefered style. Our main goal should be to make this page as internally consistent and as easily understood as possible. Just because some or all authors, do or don't use a particular style or notation doesn't mean we have to or should. My view is that this is a reference for the general public not the estimation insider. Most people who are versed in this area don't need to consult a wikipedia article. Chrislloyd 23:26, 20 Apr 2005 (UTC)
Also, are we using square, round, or curly brackets for expected value? Square are predominant; derivation uses curly; other articles use round... — ciphergoth 08:40, 2005 Apr 20 (UTC)
- Do as you want. I don't have any strong opininons about this. --Fredrik Orderud 09:08, 20 Apr 2005 (UTC)
- I prefer <math>\mathrm{E}\left[X\right]<math> for expectation. Cburnett 17:52, Apr 20, 2005 (UTC)
Standard notation:
- <math>\operatorname{var}(\textbf{X}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{X} - E[\textbf{X}])^{T}]<math>
ALSO standard notation:
- <math> \operatorname{cov}(\textbf{X}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{X} - E[\textbf{X}])^{T}]<math>
ALSO standard notation:
- <math>\operatorname{cov}(\textbf{X},\textbf{Y}) = E[(\textbf{X} - E[\textbf{X}])(\textbf{Y} - E[\textbf{Y}])^{T}]<math> (the "cross-covariance" between two random vectors)
Unfortunately the first two of these usages jar with each other. The first and third are in perfect harmony. The first notation is found in William Feller's celebrated two-volume book on probability, which everybody is familiar with, so it's surprising that some people are suggesting it first appeared on Wikipedia. It's also found in some statistics texts, e.g., Sanford Weissberg's linear regression text. Michael Hardy 17:55, 28 Apr 2005 (UTC)
Sample independence
I removed the formula stating that successive samples were uncorrelated. If it's to stay, we also have to state the formulae which say that each wk is uncorrelated with each vk, and with the control inputs, and and and... best to just say in words "each sample is independent from everything else", from which all that follows. — ciphergoth 09:03, 2005 Apr 20 (UTC)
Numerically stable update?
I'm not sure if I agree with this section. The so-called "numerical stable" formula is actually just the "original" <math>\textbf{P}_{k|k}<math> equation derived in the derivation section, which is valid for any gain. The <math>\textbf{P}_{k|k}<math> formula used in "update" is really just a simplification of this formula, which is only valid for gains equal to the gain found in "update".
The propertiy of increased numerical stability might very well be true, but this is not the main point here.
I actually changed the title to "Posterior error covariance equation valid for any gain" (or something simmilar) a couple of months ago, but someone rewrote my title. --Fredrik Orderud 11:46, 20 Apr 2005 (UTC)
- I agree. I'd rather just use mention the "numerically stable" update and skip the other formula. It's neither easier to derive nor computationally useful, so I think we should just omit it. — ciphergoth 12:01, 2005 Apr 20 (UTC)
Hey, what have you done??? <math>\textbf{P}_{k|k} = (I - \textbf{K}_{k} \textbf{H}_{k}) \textbf{P}_{k|k-1}<math> doesn't have poor numerical stability! It is computationaly cheaper than the alternative (and thus almost always used), but its drawback is that it is only valid for the optimal gain derived in the derivation section.--Fredrik Orderud 15:12, 20 Apr 2005 (UTC)
- In which case I've preserved an error which was present in the original text - sorry! Please fix it! — ciphergoth 15:24, 2005 Apr 20 (UTC)
- The so called "poor numerical stability" probably relates to low-precission embedded integer implementations unable to calculate the optimal Kalman gain excactly. The main point is that the extended formula is valid for any gain. This includes deviations from the optimal gain due to poor arithmetics as well as situations where you want to use a gain different from the optimal.--Fredrik Orderud 15:40, 20 Apr 2005 (UTC)
- I think these two things are inter-related (cause and effect). There is a problem when the gain calcuation is inexact (due to numeric instability) so people use a version which can be use with any gain which has the property that the covariance is stable. Therefore the update is a numerically stable update. Computing a numerically stable covariance seems to be the main motivation for this equation and is therefore the title of the subsection. I don't see the problem. Chrislloyd 17:45, 20 Apr 2005 (UTC)
Edit conflict
I really like the changes I've just overwritten (in derivation) - will try to merge them in properly now. — ciphergoth 11:56, 2005 Apr 20 (UTC)
- Please don't do this. The usage of "var" for matrices is both ambigous and nonstandard (at least when your'e talking about Kalman estimation). --Fredrik Orderud 12:11, 20 Apr 2005 (UTC)
- See discussion of var(X) above — ciphergoth 12:29, 2005 Apr 20 (UTC)
non-correlation of noise variables
Why do we state that successive samples are independent but not state that wi is independent with vj? And why do we set out the formula that asserts that they're uncorrelated when what we state in the text is that they're independent? Which is the correct assumption? — ciphergoth 07:27, 2005 Apr 21 (UTC)
- I think there is a difference (in general) between independant and uncorrelated. For gaussian noise independance also implies uncorrelated but in general this is not the case. Chrislloyd 15:59, 21 Apr 2005 (UTC)
- WRONG.
- Jointly Gaussian and uncorrelated implies independence.
- Gaussian and uncorrelated does not imply independence without the word "jointly". Counterexamples exist1. Michael Hardy 21:42, 28 Apr 2005 (UTC)
- Now I've written up a counterexample: normally distributed and uncorrelated does not imply independent. Michael Hardy 23:35, 28 Apr 2005 (UTC)
- For gaussian noise independance also implies uncorrelated - I think this has to be wrong. I can easily generate two uniformly distributed variables which are uncorrelated but not independent, if their values have to fall in the squares marked X in the grid below:
- You're right that it's wrong, but uniformly distributed random variables are not Gaussian! Michael Hardy 23:35, 28 Apr 2005 (UTC)
- Read on, where I suggest applying Φ-1 to generate Gaussians, and later suggest a simpler example starting with Gaussians. — ciphergoth 07:01, 2005 Apr 29 (UTC)
X..X .XX. .XX. X..X
- where the grid divides the space 0..1 x 0..1 into quarters in each dimension. Apply Φ-1 to each of X and Y and you should get normally distributed variables which are independent but not uncorrelated.
- However, I suspect that the assumption that they're uncorrelated is enough to prove the Kalman gain optimal and so on, since everything is linear — ciphergoth 10:35, 2005 Apr 23 (UTC)
- I must correct myself, i ment to say for gaussian noise uncorrelated => independent. Independence always implies uncorrelated! Chrislloyd 03:13, 25 Apr 2005 (UTC)
- I didn't notice the mistake - I read it the way you intended it, not the way it actually read! However, you're definitely mistaken, even for gaussian noise uncorrelation does not imply independence. I've thought of a simpler example. Suppose X is a standard gaussian, Z is 1 with probability 0.5 and -1 with probability 0.5, and Y is XZ. Then X and Y are definitely not independent - they always have the same magnitude - but their correlation is zero, because E[XY] = E[XY|Z=1]Pr(Z=1) + E[XY|Z=-1]Pr(Z=-1) = 0.5 E[X^2] + 0.5 E[-X^2] = 0.5 - 0.5 = 0 — ciphergoth 11:59, 2005 Apr 25 (UTC)
- If both X and Z are drawn from gaussian distribution(s), as the noise vectors defined to be in "Underlying dynamic system model" are, then if they are uncorrelated they must be independent. The statement independent implies uncorrelated is a true statement. If you're trying to prove that this statement is not true by providing an example which is not independent and uncorrelated then you should look up the definition of implies since F => T |- T. Chrislloyd 22:46, 25 Apr 2005 (UTC)
(Moving back to the left margin before we disappear into the right) From your tone it sounds like I'm trying your patience - I'm sorry! I'm now a little confused about exactly what your position is, so I hope I won't try your patience too much further by restating exactly what I think is at issue and why I believe my example is relevant. If I'm wrong please set me right!
I think we agree that independence always implies uncorrelation, and that uncorrelation does not always imply independence. You assert (I think) that for variables drawn from a gaussian distribution, uncorrelation does always imply independence, and there we disagree.
I believe I've found a counterexample, which I'll restate here in pseudocode:
X = random.gaussian() Z = random.choose([-1, 1]) Y = X * Z
Note that it's (X, Y) are the interesting pair - Z is just a step on the way. Now if you are asserting what I think you're asserting, and I haven't convinced you yet, then I think you have to disagree with one of these bullet points:
- X follows a gaussian distribution
- Y follows a gaussian distribution
- X and Y are uncorrelated
- X and Y are not independent
- This is therefore a counterexample
but I don't know which one! Note that point 2 follows from the fact that Φ(-a) = 1- Φ(a), so Pr(Y < a) = Pr(Z = 1 && X < a) + Pr(Z = -1 && X > -a) = 0.5 Φ(a) + 0.5 (1-Φ(-a)) = Φ(a). Have I totally mistaken your position? Is my counterexample bogus? Thanks for taking the time to discuss this with me!
— ciphergoth 18:58, 2005 Apr 26 (UTC)
- You're not trying my patience - I applogise if that was the impression i gave. I think the error here is in the assumption that X and Y are not independent. Your rational for assuming they're dependent is that they have the same magnitude, however sucessive sames of Z in {-1,1} have the same magnitude, they are still independent. If they are indepedent p(XY) = P(X)p(Y), which i think is the case here. Chrislloyd 23:17, 26 Apr 2005 (UTC)
- No, they are not independent. Independence means that for any a, b, P(X < a && Y < b) = P(X < a)P(Y < b). However, P(X < -10 && Y < -10) = 0.5 Φ(-10) which is considerably greater than Φ(-10)2. Loosely speaking, independence means that knowing X tells you nothing new about Y; that's clearly not the case here. — ciphergoth 07:57, 2005 Apr 27 (UTC)
- Suppose we had two random variables X' and Y', both drawn independently from the set {-1,1}. next sample Z from a gaussian distribution and take the absolute value (Z' = |Z|). Now if we multiply X' and Y' by Z' (X = X'.Z', Y = Y'.Z') we have the same distribution for X and Y as you had in your example. However clearly X and Y are still independent, they just have a random common factor which is gaussian (sort of). Similarly if we sample Z from the set {5} we have the same distribution as we would have had if X and Y had been drawn from the set {-5,5} (assumping p(-5)=p(-1)), they're still independent. Also note that in your example there really is only one gaussian sample and as i said above in bold if we have *two* samples drawn from gaussian(s), then if they are uncorrelated they are independent. Chrislloyd 16:36, 27 Apr 2005 (UTC)
- I don't know how to interpret your remark "note that in your example there really is only one gaussian sample". X is distributed gaussianly, and so is Y. They are very closely related, but that's just another way of saying that they are not independent. That remark troubles me on a deeper level, though, because it's so vague - it suggests far too much of an appeal to intuition, rather than to mathematical proof. Proof by assertion is not going to lend any weight, even if the assertion is in bold. As far as I'm concerned, I've presented a mathematical proof that you're mistaken - I've created a counterexample, and proven each assertion about the counterexample to demonstrate that it runs counter to your belief. If you think the proof is mistaken, you'll have to be *much* more precise about exactly which step the proof fails at, and what the mistake is.
- I can't work out what your example is supposed to prove, since it's clear to me that in your example as in mine X and Y are not independent. They fail the test I set out above just as mine did, since as you say they are distributed in the same related way. I have appealed to the definition of independence here in Wikipedia, which you don't refer to.
- Once again, using numbered bullet points (and with Z using my original definition rather than your new one), do you disagree with any step of my reasoning here?
- X and Y are independent iff, for any a, b, P(X < a && Y < b) = P(X < a)P(Y < b)
- P(X < -1) = Φ(-1)
- P(Y < -1) = Φ(-1)
- P(X < -1)P(Y < -1) = Φ(-1)2 ~= .025171
- P(X < -1 && Y < -1) = P(X < -1 && Z = 1)
- = P(X < -1)P(Z = 1)
- = 0.5 Φ(-1) ~= .07932
- P(X < -1 && Y < -1) != P(X < -1)P(Y < -1)
- Therefore, X and Y are not independent.
- A re-read of the material linked from probability theory explaining how random variables are constructed might help to clarify some of the issues that are causing your confusion here.
- I must say that what i am asserting here is not something i made up on the spot, it is a well known result. If you are correct i suggest you immediately publish your findings!
- Surely P(Y < -1) = 0.5 Φ(-1) + 0.5 Φ(1) which means that P( X < -1, Y < -1 ) = P(X < -1)P(Y < -1) = Φ(-1) . ( 0.5 Φ(-1) + 0.5 Φ(1) ) = 0.5 Φ(-1) = P(X < -1 & Z = 1 ). Stop the press!! Chrislloyd 21:53, 27 Apr 2005 (UTC)
- No, P(Y < -1) = P(X < -1 && Z == 1) + P(X > 1 && Z == -1) = 0.5 Φ(-1) + 0.5 (1 - Φ(1)) = 0.5 Φ(-1) + 0.5 Φ(-1) = Φ(-1) as per my proof of my first point 2 above.
- Both you and Fredrik Orderud state that this is widely believed. I think he's right in thinking that it's the nonlinearity in the relationship between X and Y that's the key to the paradox. If you use only linear operations and you end up with two gaussians, any linear combination of them will also be a gaussian, so you have a multivariate normal distribution and uncorrelation implies independence as per my note below. However, if you're allowed multiplication then the rules change. — ciphergoth 22:02, 2005 Apr 27 (UTC)
- I give up. Either i've completely misunderstood this result or it is a lot less powerful than i thought Chrislloyd 04:38, 28 Apr 2005 (UTC)
Just a footnote - it is true that if [X, Y] is a multivariate normal distribution, and X and Y are uncorrelated, then they are independent. The assertion that [X, Y] is a multivariate normal distribution is stronger than that X and Y are both normally distributed. In the article, we state that each individual w_k and v_k is drawn from an MND, but we don't try and say that if you stick all the w_i and v_i variables together into one big super-noise-vector, that this super-noise-vector is an MND. If we did, and we also asserted that w_k is uncorrelated with w_k' (except where k == k') and uncorrelated with v_k (whether or not k==k') then that would be sufficient to state that they are independent, but it seems like a rather long-winded way to go about it.
It seems to me much simpler to state that {w_0 ... w_k, v_0 ... v_k} is an independent collection of MNDs. This appeals to the measure-theoretical definition of independence in Statistical independence#Independent_random_variables to define independence of vectors. — ciphergoth 22:41, 2005 Apr 26 (UTC)
- I've followed your discussion for a while, and find it quite interesting; since I've always thought that uncorrelated implies independence for Gaussian variables. You can, among other places, find this statement on connexions (http://cnx.rice.edu/content/m10238/latest/) (statement 2). But at the same time, I'm also unable to find any flaws in ciphergoth's examples. I've therefore remained silent because I really don't know which one of you is correct.
- The reason for this contradiction is probably due to ciphergoth's nonlinear "inversion process". I firmly believe that uncorrelated implies independence if you stick to linear operations on your Gaussians, which is the case in Kalman filtering. --Fredrik Orderud 21:34, 27 Apr 2005 (UTC)
- Sure, but what I'm trying to determine is what we need to state about {w_1 ... w_k} and so on. If all we state is that they're pairwise uncorrelated, then we leave open the possibility that they could have a tricky nonlinear relationship such as the example I've given. That might I think invalidate our proofs - though I'm not sure. What I want to do is delete the formulae that state that they're uncorrelated, and simply state that they're independent. Actually, I want to state that {x_0, w_1, ... w_k, v_1, ... v_k} are independent.
- Thanks for the link to the connexions article - emailing the author. — ciphergoth 22:02, 2005 Apr 27 (UTC)
S and ~y
I took these variables Sk and <math>\tilde{\textbf{y}}_{k}<math> from the example and moved them into the main presentation of how the filter works. Are they in standard use? If not, they should go. If so, however, we can use Sk to simplify some of the formulae in "Derivation" considerably, especially the simplification of the posterior error covariance formula. — ciphergoth 09:14, 2005 Apr 27 (UTC)
- These variables are certainly in standard use in the target tracking community. Chrislloyd 17:10, 27 Apr 2005 (UTC)
- You could even add another variable <math> \hat{\textbf{z}}_k = \textbf{H}_k \hat{\textbf{x}}_k <math> which is the predicted measurement. Chrislloyd 17:23, 27 Apr 2005 (UTC)
Posterior error covariance and numeric stability
From what User:Orderud says, the shorter and faster formula for posterior error covariance is nearly always used, which suggests that it usually does not suffer from serious numeric instability problems except under very special circumstances. I will therefore remove the discussion of alternatives from the main "The Kalman Filter" section, and put it all in the "Derivation" section, on the grounds that there's already a partial retread of this ground in that section, and anyone planning on doing the sort of weird stuff that would call for using the other formula should understand that section anyway. It can of course be reverted if it's a problem. — ciphergoth 08:43, 2005 Apr 28 (UTC)
- Sounds like a reasonable suggestion, since the alternative/extended form is now covered in the derivation. --Fredrik Orderud 20:05, 28 Apr 2005 (UTC)
Relationship to recursive Bayesian estimation
User:Chrislloyd, you added this section (which was then titled "Derivation") around February 2/10 (http://en.wikipedia.org/w/index.php?title=Kalman_filter&diff=10129775&oldid=9918105). Now that we have a derivation section contributed by User:Orderud, is it still necessary? What does it add? Thanks! — ciphergoth 10:18, 2005 Apr 28 (UTC)
- I think the section is still importaint, since it relates Kalman filtering to the "bigger picture" of recursive Bayesian estimation (which Kalman filtering is a part of). --Fredrik Orderud 20:08, 28 Apr 2005 (UTC)
- In that case I think it needs substantial work to make its point clear, since I've tried very hard to understand it and come up with very little. Is p(X) the probability density function of X? It doesn't link to probability density function and that latter doesn't mention p(X) having that meaning. How is PDF defined for vectors and joint distributions? I think I can guess, but it's not discussed in probability density function making it a bit demanding to infer. Even Lebesgue integration only defines integration from reals to reals, leaving one to infer how integration of functions such as p: (R x R) -> R is defined (though it seems straightforward to extend it to any real function whose domain is measurable). What does "The probability distribution of updated" mean? What is the denominator unimportant to? What do the probability density functions given at the end mean? How does it all tie together to say something cohesive and substantial? — ciphergoth 22:21, 2005 Apr 28 (UTC)
- You're probably right in that it's poorly written (I haven't read until now myself), but it's still very importaint. The variable p(x) is, as you thought, the probability distribution of the state x. The Kalman filter replaces p(x) with a Gaussian distribution parametrized by a state estimate and a covariance. IEEE SignalProc. had a quite straightforward tutorial (http://www.ene.unb.br/~gaborges/disciplinas/pe/papers/arulampalam2002.pdf) in 2002, containing the derivation of Kalman filter from a recursive Bayesian estimator. It is absolutely worth a read. --Fredrik Orderud 22:39, 28 Apr 2005 (UTC)
- Thanks, that helps a lot! — ciphergoth 23:05, 2005 Apr 28 (UTC)
- At a glance it looks like that paper is the basis of this section. I can follow the paper much better, since I can see what it's trying to get at. Unfortunately, it doesnt AFAICT actually prove its assertions about the Kalman filter at all - it just states "if you do this, you get the correct conditional PDFs". If we're going to do the same, we should make it explicit that we're stating without proof that the Kalman filter gives the correct PDFs. I think I can see how to do this. (Also, it's a pity the equations are bitmaps rather than scalable fonts in the PDF of the paper!) — ciphergoth 21:46, 2005 Apr 29 (UTC)
- This section is not there to prove the optimality of the Kalman filter. The "proof" section already does that. It's main intent is to demonstrate how recursive Bayestion estimation can be simplified into tractable linear equations with Gaussian PDFs when dealing with linear state-space models subject to Gaussian noise. The derivations are pretty standard, and found in many Kalman textbooks. Your can also find the paper on IEEE Xplore (http://ieeexplore.ieee.org/) in much higher quality, but this requires an subscription. --Fredrik Orderud 11:03, 30 Apr 2005 (UTC)
- OK, but the paper makes it look as if our proof is insufficiently precise, because it talks about expected values, covariance and so forth without talking about what they're conditioned on. Is it
- <math>\textbf{P}_{k|k} = \textrm{cov}(\textbf{x}_k - \hat{\textbf{x}}_{k|k})<math>
- or
- <math>\textbf{P}_{k|k} = \textrm{cov}(\textbf{x}_k - \hat{\textbf{x}}_{k|k}|\textbf{z}_{1 \ldots k})<math>
- ? It feels as if there's big gaps in our proof that the Kalman filter is valid... — ciphergoth 17:38, 2005 Apr 30 (UTC)
- I'm pretty sure <math>\textbf{P}_{k|k} = \textrm{cov}(\textbf{x}_k - \hat{\textbf{x}}_{k|k}|\textbf{z}_{1 \ldots k})<math>, since the Kalman filter is a causal recursive estimator which incorporates the latest measurements available into its estimates. --Fredrik Orderud 11:43, 1 May 2005 (UTC)
EKF
I have seen different versions of the EKF covariance matrix P update. I'm not too much into this thing but they don't look equal to me: we have :<math> \textbf{P}_{k|k} = (I - \textbf{K}_{k} \textbf{H}_{k}) \textbf{P}_{k|k-1} (I - \textbf{K}_{k} \textbf{H}_{k})^{T} + \textbf{K}_{k} \textbf{R}_{k}\textbf{K}_{k}^{T}<math> as stated in the article and also
- <math> \textbf{P}_{k|k} = (I - \textbf{K}_{k} \textbf{H}_{k}) \textbf{P}_{k|k-1}<math>
from e.g. [1] (http://homepages.inf.ed.ac.uk/rbf/CVonline/LOCAL_COPIES/WELCH/kalman.2.html), which seem to be missing some part.
- They are in fact equal when the optimal Kalman gain is used, which is almost always the case. Consult the "Derivations"-section for proof of this. I've now changed the EKF covariance update to the simple version, which makes it consistent with your reference :). --Fredrik Orderud 13:19, 21 May 2005 (UTC)