Lei Mao bio photo

Lei Mao

Machine Learning, Artificial Intelligence, Computer Science.

Twitter Facebook LinkedIn GitHub   G. Scholar E-Mail RSS

Introduction

In my previous blog post “Special Relativity Explained”, I have explained special relativity and its several key consequences based on the Lorentz transformation.


Since I did not give a derivation for Lorentz transformation last time, in this blog post, I would like to present the derivations in detail.

Postulates of Special Relativity

Lorentz transformation was derived based on the following two postulates only.

First Postulate (Principle of Relativity)

The laws of physics take the same form in all inertial frames of reference.

Second Postulate (Invariance of Light Speed)

As measured in any inertial frame of reference, light is always propagated in empty space with a definite velocity $c$ that is independent of the state of motion of the emitting body. It is also equivalent to say, the speed of light in free space has the same value $c$ in all inertial frames of reference.

Derivation

In the spacetime, we have two reference frames, a reference frame $S$ and another reference frame $S’$ moving at a velocity $v$ with respect to it. So the two reference frames in this scenario are inertial reference frame. The coordinate axes in each reference frame are parallel, i.e., the $x$ and $x’$ axes are parallel, the $y$ and $y’$ axes are parallel, and the $z$ and $z’$ axes are parallel, and remain mutually perpendicular. We assume the relative motion is along the coincident $xx’$ axes. At $t = t’ = 0$, the origins of both coordinate systems are the same, $(x,y,z) = (x’,y’,z’) = (0, 0, 0)$.


An event in the time space could be observed and recorded by the observers on the two reference frames using spacetime coordinates $(t,x,y,z)$ in the reference frame $S$ and $(t’,x’,y’,z’)$ in the reference frame $S’$, respectively.


We want to set up the mapping between $(t,x,y,z)$ and $(t’,x’,y’,z’)$ for the same event.

Lorentz Transformation is Linear Transformation

We propose the spacetime transformation from the reference frame $S$ to the reference frame $S’$ to have the following form.

\[\begin{align} t^{\prime} &= f_t(x,t)\\ x^{\prime} &= f_x(x,t)\\ y^{\prime} &= y\\ z^{\prime} &= z\\ \end{align}\]

Note that we could eliminate the variables $y$ and $z$ in the functions $f_t$ and $f_x$ because of $y$ and $z$ are constants.


Now that we have proposed the form of transformation, there could be an infinite number of transformations that satisfied the form. What exactly the transformation is?


Suppose we have two events, one event has coordinates $(t_1,x_1,y_1,z_1)$ observed in the reference frame $S$ and coordinates $(t’_1,x’_1,y’_1,z’_1)$ observed in the reference frame $S’$, another one has coordinates $(t_2,x_2,y_2,z_2)$ observed in the reference frame $S$ and coordinates $(t’_2,x’_2,y’_2,z’_2)$ observed in the reference frame $S’$. Note that because reference frame $S’$ is moving along the $xx’$ axes, $y_1 = y’_1$, $z_1 = z’_1$, $y_2 = y’_2$, $z_2 = z’_2$.


Without loss of generality, we set $t_1 = t^{\prime}_1 = 0$, $x_1 = x^{\prime}_1 = 0$.

\[\begin{align} \Delta t &= t_{2} - t_{1} = t_{2}\\ \Delta x &= x_{2} - x_{1} = x_{2}\\ \Delta y &= y_{2} - y_{1} = 0\\ \Delta z &= z_{2} - z_{1} = 0\\ \end{align}\] \[\begin{align} \Delta t^{\prime} &= t_{2}^{\prime} - t_{1}^{\prime} = t_{2}^{\prime}\\ \Delta x^{\prime} &= x_{2}^{\prime} - x_{1}^{\prime} = x_{2}^{\prime}\\ \Delta y^{\prime} &= y_{2}^{\prime} - y_{1}^{\prime} = 0\\ \Delta z^{\prime} &= z_{2}^{\prime} - z_{1}^{\prime} = 0\\ \end{align}\]

The two events, $(t_1,x_1,y_1,z_1)$ and $(t_2,x_2,y_2,z_2)$ observed in reference frame $S$, $(t’_1,x’_1,y’_1,z’_1)$ and $(t’_2,x’_2,y’_2,z’_2)$ observed in reference $S’$ have become equivalent to $(0,0,y_1,z_1)$ and $(\Delta t,\Delta x,y_2,z_2)$ observed in reference frame $S$, $(0,0,y’_1,z’_1)$ and $(\Delta t^{\prime},\Delta x^{\prime},y’_2,z’_2)$ observed in reference $S’$.


Based on the principle of relativity assumption, the transformation still holds. We have

\[\begin{align} 0 &= f_t(0,0)\\ 0 &= f_x(0,0)\\ y^{\prime} &= y\\ z^{\prime} &= z\\ \end{align}\]

and

\[\begin{align} \Delta t^{\prime} &= f_t(\Delta x, \Delta t)\\ \Delta x^{\prime} &= f_x(\Delta x, \Delta t)\\ y^{\prime} &= y\\ z^{\prime} &= z\\ \end{align}\]

This means the distances and time elapsed could also be transformed using the exact transformation for coordinates!


Ignoring uninteresting $y$ and $z$, we could equivalently write

\[\begin{align} \begin{bmatrix} \Delta t^{\prime} \\ \Delta x^{\prime} \\ \end{bmatrix} &= \begin{bmatrix} t_2^{\prime} - t_1^{\prime} \\ x_2^{\prime} - x_1^{\prime} \\ \end{bmatrix} = \begin{bmatrix} f_t(x_2,t_2) - f_t(x_1,t_1)\\ f_x(x_2,t_2) - f_x(x_1,t_1)\\ \end{bmatrix} = \begin{bmatrix} f_t(x_2,t_2)\\ f_x(x_2,t_2)\\ \end{bmatrix} - \begin{bmatrix} f_t(x_1,t_1)\\ f_x(x_1,t_1)\\ \end{bmatrix} \\ &= f \bigg( \begin{bmatrix} t_2 \\ x_2 \\ \end{bmatrix} \bigg) - f \bigg( \begin{bmatrix} t_1 \\ x_1 \\ \end{bmatrix} \bigg) \end{align}\] \[\begin{align} \begin{bmatrix} \Delta t^{\prime} \\ \Delta x^{\prime} \\ \end{bmatrix} &= \begin{bmatrix} f_t(\Delta x, \Delta t) \\ f_x(\Delta x, \Delta t) \\ \end{bmatrix} = f \bigg( \begin{bmatrix} \Delta t \\ \Delta x \\ \end{bmatrix} \bigg) = f \bigg( \begin{bmatrix} t_2 - t_1 \\ x_2 - x_1 \\ \end{bmatrix} \bigg) = f \bigg( \begin{bmatrix} t_2 \\ x_2 \\ \end{bmatrix} - \begin{bmatrix} t_1 \\ x_1 \\ \end{bmatrix} \bigg) \end{align}\]

We set a column vector $p = [t, x]^{\top}$ and this $p$ is a tensor in physics. It is also equivalent to write

\[f(p_2) - f(p_1) = f(p_2 - p_1)\]

This is also further equivalent to

\[f(p_1) + f(p_2) = f(p_1 + p_2)\]

In the next step, we would like to further show

\[f(kp) = k f(p)\]

Similarly suppose we have two events, one event has coordinates $(t_1,x_1,y_1,z_1)$ observed in the reference frame $S$ and coordinates $(t’_1,x’_1,y’_1,z’_1)$ observed in the reference frame $S’$, another one has coordinates $(t_2,x_2,y_2,z_2)$ observed in the reference frame $S$ and coordinates $(t’_2,x’_2,y’_2,z’_2)$ observed in the reference frame $S’$. In addition, in the reference frame it is observed that $t_2 = k t_1$ and $x_2 = k x_1$. Based on the principal of relativity assumption, $t’_2 = k t’_1$ and $x’_2 = k x’_1$.


Because

\[\begin{align} \begin{bmatrix} t_1^{\prime} \\ x_1^{\prime} \\ \end{bmatrix} &= f \bigg( \begin{bmatrix} t_1 \\ x_1 \\ \end{bmatrix} \bigg) \end{align}\] \[\begin{align} \begin{bmatrix} t_2^{\prime} \\ x_2^{\prime} \\ \end{bmatrix} &= f \bigg( \begin{bmatrix} t_2 \\ x_2 \\ \end{bmatrix} \bigg) = \begin{bmatrix} k t_1^{\prime} \\ k x_1^{\prime} \\ \end{bmatrix} = k \begin{bmatrix} t_1^{\prime} \\ x_1^{\prime} \\ \end{bmatrix} = k f \bigg( \begin{bmatrix} t_1 \\ x_1 \\ \end{bmatrix} \bigg) \\ &= f \bigg( \begin{bmatrix} k t_1 \\ k x_1 \\ \end{bmatrix} \bigg) = f \bigg( k \begin{bmatrix} t_1 \\ x_1 \\ \end{bmatrix} \bigg) \end{align}\]

Therefore,

\[f(kp) = k f(p)\]

Because we have shown that

\[\begin{align} f(p_1) + f(p_2) &= f(p_1 + p_2) \\ f(kp) &= k f(p) \end{align}\]

This is exactly the definition of a linear function for function $f$ ($f_t$ and $f_x$), and note that this linear function $f$ has no bias term. Therefore, $f(p) = Mp$ for some matrix $M \in \mathbb{R}^{2 \times 2}$, and Lorentz transformation is a linear transformation.

Lorentz Transformation

Because Lorentz transformation is a linear transformation, we could assume

\[\begin{align} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ \end{bmatrix} = M \begin{bmatrix} t \\ x \\ \end{bmatrix} = \begin{bmatrix} A & B \\ C & D \\ \end{bmatrix} \begin{bmatrix} t \\ x \\ \end{bmatrix} = \begin{bmatrix} At + Bx \\ Ct + Dx \\ \end{bmatrix} \end{align}\]

Then the problem is very like the machine learning regression problem where we have to find the values for parameter $A$, $B$, $C$, and $D$. To solve this regression problem, we need some concrete data.


Because the reference frame $S’$ is moving at velocity $v$ with respect to the reference frame $S$. At time $t$ in the reference frame $S$ and $t’$ in the reference frame $S’$, we know $x = vt$ in the reference frame $S$ overlaps with the origin $x’ = 0$ in the reference frame $S’$. Note that $x = vt + 1$ in the reference frame $S$ does not necessary overlaps with $x’ = 1$ in the reference frame $S’$, although this is true in Galilean transformations.

\[\begin{align} \begin{bmatrix} t^{\prime} \\ 0 \\ \end{bmatrix} = \begin{bmatrix} A & B \\ C & D \\ \end{bmatrix} \begin{bmatrix} t \\ vt \\ \end{bmatrix} = \begin{bmatrix} At + Bvt \\ Ct + Dvt \\ \end{bmatrix} \end{align}\]

We found the relationships between $C$ and $D$.

\[C = -Dv\]

In addition, because the reference frame $S$ is moving at velocity $-v$ with respect to the reference frame $S’$. At time $t$ in the reference frame $S$ and $t’$ in the reference frame $S’$, we know $x = 0$ in the reference frame $S$ overlaps with the origin $x’ = -vt$ in the reference frame $S’$.

\[\begin{align} \begin{bmatrix} t^{\prime} \\ -v t^{\prime} \\ \end{bmatrix} = \begin{bmatrix} A & B \\ -Dv & D \\ \end{bmatrix} \begin{bmatrix} t \\ 0 \\ \end{bmatrix} = \begin{bmatrix} At \\ -Dvt \\ \end{bmatrix} \end{align}\]

We cancel the variable $t’$ and get

\[-vAt = -Dvt\]

So

\[A = D\]

This reduced the number of free parameters from four to two.

\[\begin{align} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ \end{bmatrix} = \begin{bmatrix} D & B \\ -Dv & D \\ \end{bmatrix} \begin{bmatrix} t \\ x \\ \end{bmatrix} = \begin{bmatrix} Dt + Bx \\ -Dt + Dx \\ \end{bmatrix} \end{align}\]

There could many different ways to derive the values for $B$ and $D$, but usually the simplest way is to directly do thought experiments using light. Here is one thought experiment, and there could be many others.


Suppose we shot a beam of light in the reference frame $S’$, At time $t$ in the reference frame $S$ and $t’$ in the reference frame $S’$, the event of the head of the light beam has $x = ct$ in the reference frame $S$ and reference $x’ = ct’$ frame $S’$, where $c$ is the light speed constant, based on the invariance of light speed assumption.

\[\begin{align} \begin{bmatrix} t^{\prime} \\ ct^{\prime} \\ \end{bmatrix} = \begin{bmatrix} D & B \\ -Dv & D \\ \end{bmatrix} \begin{bmatrix} t \\ ct \\ \end{bmatrix} = \begin{bmatrix} Dt + Bct \\ -Dvt + Dct \\ \end{bmatrix} \end{align}\]

We cancel the variable $t’$ and get

\[c = \frac{-Dv + DC}{D + BC}\]

We could then get the relationship between $B$ and $D$

\[B = -\frac{v}{c^2}D\]

Now the linear transformation has become

\[\begin{align} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ \end{bmatrix} = \begin{bmatrix} D & -\frac{v}{c^2}D \\ -Dv & D \\ \end{bmatrix} \begin{bmatrix} t \\ x \\ \end{bmatrix} = D \begin{bmatrix} 1 & -\frac{v}{c^2} \\ -v & 1 \\ \end{bmatrix} \begin{bmatrix} t \\ x \\ \end{bmatrix} = \begin{bmatrix} D(t - \frac{v}{c^2}x) \\ D(-vt + x) \\ \end{bmatrix} \end{align}\]

We are able to derive the inverse transformation as well.

\[\begin{align} \begin{bmatrix} t \\ x \\ \end{bmatrix} &= \frac{1}{D(1-\frac{v^2}{c^2})} \begin{bmatrix} 1 & \frac{v}{c^2} \\ v & 1 \\ \end{bmatrix} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ \end{bmatrix} = \begin{bmatrix} \frac{1}{D(1-\frac{v^2}{c^2})} (vt^{\prime} + x^{\prime}) \\ \frac{1}{D(1-\frac{v^2}{c^2})} (t^{\prime} + \frac{v}{c^2} x^{\prime}) \\ \end{bmatrix} \\ &= \frac{1}{D(1-\frac{(-v)^2}{c^2})} \begin{bmatrix} 1 & -\frac{-v}{c^2} \\ -(-v) & 1 \\ \end{bmatrix} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ \end{bmatrix} \end{align}\]

That the reference frame $S’$ is moving at a velocity $v$ with respect to the reference frame $S$ is equivalent to that the reference frame $S$ is moving at a velocity $-v$ with respect to the reference frame $S’$. Based on the principle of relativity assumption, the linear transformation also has the same form, which is

\[\begin{align} \begin{bmatrix} t \\ x \\ \end{bmatrix} = D \begin{bmatrix} 1 & -\frac{-v}{c^2} \\ -(-v) & 1 \\ \end{bmatrix} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ \end{bmatrix} \end{align}\]

This means that

\[D \equiv \frac{1}{D(1-\frac{(-v)^2}{c^2})}\]

Therefore,

\[D = \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}}\]

We often use $\gamma$ to represent this factor, and this factor is called Lorentz factor.

\[\gamma = \frac{1}{\sqrt{1 - \frac{v^2}{c^2}}}\]

The linear transformation is called Lorentz transformation.

\[\begin{align} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ \end{bmatrix} = \begin{bmatrix} \gamma & -\frac{v}{c^2}\gamma \\ -\gamma v & \gamma \\ \end{bmatrix} \begin{bmatrix} t \\ x \\ \end{bmatrix} = \gamma \begin{bmatrix} 1 & -\frac{v}{c^2} \\ -v & 1 \\ \end{bmatrix} \begin{bmatrix} t \\ x \\ \end{bmatrix} = \begin{bmatrix} \gamma(t - \frac{v}{c^2}x) \\ \gamma(-vt + x) \\ \end{bmatrix} \end{align}\]

We could further include the other two dimensions for $yy’$ and $zz’$.

\[\begin{align} \begin{bmatrix} t^{\prime} \\ x^{\prime} \\ y^{\prime} \\ z^{\prime} \\ \end{bmatrix} = \begin{bmatrix} \gamma & -\gamma\frac{v}{c^2} & 0 & 0 \\ -\gamma v & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} t \\ x \\ y \\ z \\ \end{bmatrix} = \begin{bmatrix} \gamma(t - \frac{v}{c^2}x) \\ \gamma(-vt + x) \\ y \\ z \\ \end{bmatrix} \end{align}\]

Sometimes, to make the transformation matrix symmetric, we have the following equivalent form.

\[\begin{align} \begin{bmatrix} ct^{\prime} \\ x^{\prime} \\ y^{\prime} \\ z^{\prime} \\ \end{bmatrix} = \begin{bmatrix} \gamma & -\beta \gamma & 0 & 0 \\ -\beta \gamma & \gamma & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \begin{bmatrix} ct \\ x \\ y \\ z \\ \end{bmatrix} = \begin{bmatrix} \gamma ct - \gamma \beta x \\ \gamma x - \beta \gamma ct \\ y \\ z \\ \end{bmatrix} \end{align}\]

where

\[\beta = \frac{v}{c}\]

This concludes the derivation.

Conclusions

The derivation of Lorentz transformation is mathematically simple. It only requires to use the basic linear algebra, or even just high school math. However, since the transformation is against our common sense, and the derivation is only based on the two “simple” postulates, we have to be aware not to introduce additional assumptions during derivation.

Final Remarks

In high school physics class or college physics class for non-physics-major students, the lecturers would usually just present the Lorentz transformation and skip the derivation in the lectures for special relativity.