Einstein notation
|
- For other topics related to Einstein, see Einstein (disambiguation).
In mathematics, especially in applications of linear algebra to physics, the Einstein notation or Einstein summation convention is a notational convention useful when dealing with coordinate equations or formulas.
According to this convention, when an index variable appears twice in a single term, it implies that we are summing over all of its possible values. In typical applications, these are 1,2,3 (for calculations in Euclidean space), or 0,1,2,3 or 1,2,3,4 (for calculations in Minkowski space), but they can have any range, even (in some applications) an infinite set. Furthermore, abstract index notation uses Einstein notation without requiring any range of values.
In general relativity, the Greek alphabet and the Roman alphabet are used to distinguish whether summing over 1,2,3 or 0,1,2,3 (e.g. Roman, i, j, ... for 1,2,3 and Greek, μ, ν, ... for 0,1,2,3). As in sign conventions, the convention used in practice varies: Roman and Greek may be reversed.
Sometimes (as in general relativity), the index is required to appear once as a superscript and once as a subscript; in other applications, all indices are subscripts. See Dual vector space and Tensor product.
Contents |
Introduction
In mechanics and engineering, vectors in 3D space are often described in relation to orthogonal unit vectors i, j and k.
- <math>\mathbf{u} = u_x \mathbf{i} + u_y \mathbf{j} + u_z \mathbf{k}<math>
If the basis vectors i, j, and k are instead expressed as e1, e2, and e3, a vector can be expressed in terms of a summation:
- <math>\mathbf{u} = u_1 \mathbf{e}_1 + u_2 \mathbf{e}_2 + u_3 \mathbf{e}_3
= \sum_{i = 1}^3 u_i \mathbf{e}_i<math>
The innovation of the Einstein notation is the recognition that an index that is repeated twice in an equation implies a summation, and the summation symbol need not be included.
The usefulness of the Einstein notation becomes apparent in the algebraic manipulation of vector equations. For example,
- <math>\mathbf{u} \cdot \mathbf{v} = \sum_{i = 1}^3 u_i \mathbf{e}_i \cdot
\sum_{j = 1}^3 v_j \mathbf{e}_j = u_i \mathbf{e}_i \cdot v_j \mathbf{e}_j <math>
or equivalently:
- <math>\mathbf{u} \cdot \mathbf{v} = u_i v_j ( \mathbf{e}_i \cdot
\mathbf{e}_j ) <math>
where
- <math> \mathbf{e}_i \cdot
\mathbf{e}_j = \delta_{ij} <math>
and <math>\ \delta_{ij}<math> is the Kronecker delta, which is equal to 1 when i = j, and 0 otherwise. It logically follows that this allows one j in the equation to be converted to an i, or one i to be converted to a j. Then,
- <math>\mathbf{u} \cdot \mathbf{v} = u_i v_j\delta_{ij}= u_i v_i = u_j v_j <math>
For the cross product,
- <math> \mathbf{u} \times \mathbf{v}= \sum_{j = 1}^3 u_j \mathbf{e}_j \times
\sum_{k = 1}^3 v_k \mathbf{e}_k = u_j \mathbf{e}_j \times v_k \mathbf{e}_k = u_j v_k (\mathbf{e}_j \times \mathbf{e}_k ) = \epsilon_{ijk} \mathbf{e}_i u_j v_k <math>
where <math> \mathbf{e}_j \times \mathbf{e}_k = \epsilon_{ijk} \mathbf{e}_i<math> and <math>\ \epsilon_{ijk}<math> is the Levi-Civita symbol defined by:
- <math>\epsilon_{ijk} =
\left\{ \begin{matrix} +1 & \mbox{if } (i,j,k) \mbox{ is } (1,2,3), (2,3,1) \mbox{ or } (3,1,2)\\ -1 & \mbox{if } (i,j,k) \mbox{ is } (3,2,1), (1,3,2) \mbox{ or } (2,1,3)\\ 0 & \mbox{otherwise: }i=j \mbox{ or } j=k \mbox{ or } k=i \end{matrix} \right. <math>
which recovers
- <math> \mathbf{u} \times \mathbf{v} = (u_2 v_3 - u_3 v_2) \mathbf{e}_1 + (u_3 v_1 - u_1 v_3) \mathbf{e}_2 + (u_1 v_2 - u_2 v_1) \mathbf{e}_3<math>
from
- <math> \mathbf{u} \times \mathbf{v}= \epsilon_{ijk} \mathbf{e}_i u_j v_k = \sum_{i = 1}^3 \sum_{j = 1}^3 \sum_{k = 1}^3 \epsilon_{ijk} \mathbf{e}_i u_j v_k
<math>.
Additionally, if <math> \mathbf{w} = \mathbf{u} \times \mathbf{v}<math>, then <math> \mathbf{w} = \epsilon_{ijk} \mathbf{e}_i u_j v_k <math> and <math>\ w_i = \epsilon_{ijk} u_j v_k <math>. This also highlights that when an index appears once on both sides of the equation, this implies a system of equations instead of a summation:
- <math>
\begin{matrix} w_1 = \epsilon_{1jk} u_j v_k\\ w_2 = \epsilon_{2jk} u_j v_k\\ w_3 = \epsilon_{3jk} u_j v_k \end{matrix} <math>
Formal definitions
In the traditional usage, one has in mind a vector space V with finite dimension n, and a specific basis of V. We can write the basis vectors as e1, e2, ..., en. Then if v is a vector in V, it has coordinates v1, ..., vn relative to this basis.
The basic rule is:
- v = vi ei.
In this expression, it is assumed that the term on the right side is to be summed as i goes from 1 to n, because the index i appears on both sides. In that case, the equation is indeed true.
The i is known as a dummy index since the result is not dependent on it; thus we could also write, for example:
- v = vj ej.
An index that is not summed over is a free index and should be found in each term of the equation or formula.
In contexts where the index must appear once as a subscript and once as a superscript, the basis vectors ei retain subscripts but the coordinates become vi with superscripts. Then the basic rule is:
- v = vi ei.
The value of the Einstein convention is that it applies to other vector spaces built from V using the tensor product and duality. For example, <math>V\otimes V<math>, the tensor product of V with itself, has a basis consisting of tensors of the form <math>\mathbf{e}_{ij} := \mathbf{e}_i \otimes \mathbf{e}_j<math>. Any tensor T in <math>V\otimes V<math> can be written as:
- <math>\mathbf{T} = T^{ij}\mathbf{e}_{ij}<math>.
V*, the dual of V, has a basis e1, e2, ..., en which obeys the rule
- <math>\mathbf{e}^i (\mathbf{e}_j) = \delta_{i}^j<math>.
Here δ is the Kronecker delta, so <math>\delta_{i}^j<math> is 1 if i=j and 0 otherwise.
We have also used a superscript for the dual basis, which fits in with a convention requiring summed indices to appear once as a subscript and once as a superscript. In this case, if L is an element in V*, then:
- L = Li ei.
If instead every index is required to be a subscript, then a different letter must be used for the dual basis, say di := ei.
The real purpose of the Einstein notation is for formulas and equations that make no mention of the chosen basis. For example, if L and v are as above, then
- L(v) = Li vi,
and this is true for every basis. The next few sections contain further examples of such equations.
Elementary vector algebra and matrix algebra
If V be Euclidean n-space Rn, then there is a standard basis for V, in which ei is (0,...,0,1,0,...,0), with the 1 in the ith position. Then n-by-n matrices can be thought of as elements of <math>V^* \otimes V<math>. We can also think of vectors in V as column vectors, or n-by-1 matrices; elements of V* are row vectors, or 1-by-n matrices.
In these examples, all indices will appear as superscripts. (Ultimately, this is because V has an inner product and the chosen basis is orthonormal, as explained in the next section.)
If H is a matrix and v is a column vector, then Hv is another column vector. To define w := Hv, we can write:
- wi := Hij vj.
Notice that the free index i appears once in every term, while the dummy index j appears twice in a single term.
The distributive law, that H(u + v) = Hu + Hv, can be written:
- Hij (uj + vj) = Hij uj + Hij vj.
This example also indicates the proof of the distributive law, since the index equation makes direct reference only to certain real numbers, and its validity follows directly from the distributive law for real numbers.
The transpose of a column vector is a row vector with the same components, and the transpose of a matrix is another matrix whose components are given by swapping the indices. Suppose that we're interested in the product of vT and HT. If w (a row vector) is this product, then:
- wi = vi Hji.
Thus to say that taking the transpose of a product switches the order of multiplication, we can write:
- Hji vi = vi Hji.
Again, this is obviously true, by the commutative law for real numbers.
The dot product of two vectors u and v can be written
- <math>\mathbf{u}\cdot\mathbf{v} = u_i v_i<math>.
If n = 3, then we can also write the cross product, using the Levi-Civita symbol. Specifically, if w is u×v, then:
- <math>\ w_i = \epsilon_{ijk} u_j v_k<math>.
Here, the Levi-Civita symbol <math>\ \epsilon<math> satisfies <math>\ \epsilon_{ijk}<math> is 1 if (i,j,k) is an even permutation of (1,2,3), -1 if it's a odd permutation, and 0 if it's not a permutation of (1,2,3) at all.
You may have noticed in these examples that we often introduced a vector w that would normally not have to be given a specific name using coordinate-free notation. This vector doesn't need to be given a specific name using only index notation either, but the translation between the notations is easier to describe by giving it a name.
With no implicit inner product
If you review the above examples, you'll find that all of them through the distributive law make sense if a summed index must appear once as a subscript and once as a superscript. But the examples from the transpose on don't make sense in that case. This is because they implicitly use the standard inner product on Euclidean space, while the earlier examples do not.
In some applications, there is no inner product on V. In these cases, requiring a summed index to appear once as a subscript and once as a superscript can help one avoid errors in calculation, in much the same way as dimensional analysis does. Perhaps more significantly, the inner product may be a primary object of study that shouldn't be suppressed in the notation; this is the case, for example, in general relativity. Then the difference between a subscript and a superscript can be quite significant.
When an inner product is explicitly referred to, its components are often referred to as <math>g_{ij}<math>. Note that <math>g_{ij} = g_{ji}<math>. Then the formula for the dot product becomes:
- <math>\mathbf{u} \cdot \mathbf{v} = g_{ij} u^i v^j<math>.
We can also lower the index on <math>u^i<math> by defining
- <math>\ u_i := g_{ij} u^j<math>.
Then we have:
- <math>\mathbf{u} \cdot \mathbf{v} = u_i v^i<math>.
Note that we have implicitly used <math>g_{ij} = g_{ji}<math> here.
Similarly, we can raise an index using the corresponding inner product on V*. The coordinates of this inner product are <math>g^{ij}<math>, which is (as a matrix) the inverse of <math>g_{ij}<math>. If you raise an index and then lower it (or the other way around), then you get back where you started. If you raise the i in <math>g_{ij}<math>, then you get <math>\delta^i_j<math> (the Kronecker delta), and if you raise the j in <math>\delta^i_j<math>, then you get <math>g^{ij}<math>.
If the chosen basis of V is orthonormal, then <math>g_{ij} = \delta_{ij}<math> and <math>u_i = u^i<math>. In this case, the formula for the dot product from the previous section may be recovered. However, if the basis is not orthonormal, then this will not be true; thus, when working with an arbitrary basis one must refer to <math>g_{ij}<math> explicitly. Furthermore, if the inner product is not positive-definite (as is the case, for example, in special relativity), then <math>g_{ij} \neq \delta_{ij}<math> even if the basis is chosen to be orthonormal; in this case, <math>g_{ij}<math> may be +1 or -1 when i=j. Thus, raising and lowering indices are important operations in these applications.
See also
nl:Einstein-sommatieconventie ru:Соглашение Эйнштейна sl:Einsteinov zapis