libft_ssl/doc/md5.tex
2026-05-11 18:46:40 +02:00

100 lines
3.5 KiB
TeX

\section{MD5}
MD5 (Message Digest Algorithm 5) was designed by Ronald Rivest in 1991 as a
strengthened replacement for MD4. It produces a 128-bit digest from a message
of arbitrary length, processing data in 512-bit blocks. Although MD5 is now
considered cryptographically broken (collision attacks have been
demonstrated since 2004) it remains widely used for non-security purposes
such as checksums and data integrity verification.
\vspace{1em}
MD5 maintains a state of four 32-bit words, conventionally named $A$, $B$, $C$
and $D$, initialized to fixed constants defined in RFC 1321. Each 512-bit block
is processed in four rounds of sixteen operations each, for a total of 64
operations per block. Each operation applies one of four non-linear functions
to the state words, adds a message word and a precomputed constant derived from
the sine function, and rotates the result by a fixed amount.
\vspace{1em}
The state is initialized to the following fixed constants, as specified in RFC 1321:
\begin{align*}
A &= \texttt{0x67452301} \\
B &= \texttt{0xefcdab89} \\
C &= \texttt{0x98badcfe} \\
D &= \texttt{0x10325476}
\end{align*}
\vspace{1em}
Before processing, the message is padded to a length congruent to 448 bits
modulo 512. A single \texttt{1} bit is appended first, followed by as many
\texttt{0} bits as needed. The original message length in bits is then appended
as a 64-bit little-endian integer, bringing the total padded length to an exact
multiple of 512 bits.
\vspace{1em}
Each of the four rounds uses a distinct non-linear function applied to the
state words $B$, $C$ and $D$:
\begin{align*}
F(B, C, D) &= (B \land C) \lor (\lnot B \land D) \\
G(B, C, D) &= (B \land D) \lor (C \land \lnot D) \\
H(B, C, D) &= B \oplus C \oplus D \\
I(B, C, D) &= C \oplus (B \lor \lnot D)
\end{align*}
The message word index used at step $i$ is not sequential: each round applies
a distinct selector function $k_r$ where $r = \lfloor i / 16 \rfloor$:
\begin{align*}
k_0(i) &= i \bmod 16 \\
k_1(i) &= (5i + 1) \bmod 16 \\
k_2(i) &= (3i + 5) \bmod 16 \\
k_3(i) &= 7i \bmod 16
\end{align*}
At each step $i$ (with $0 \leq i < 64$), one of the four functions is selected
according to the current round, and the state is updated as follows:
\begin{align*}
A &\leftarrow B + \bigl((A + \phi(B, C, D) + M[k] + T[i]) \lll s[i]\bigr)
\end{align*}
\noindent where $\phi$ is the auxiliary function for the current round, $M[k]$
is a 32-bit word of the current block, $T[i]$ is a precomputed constant, $s[i]$
is the rotation amount, and $\lll$ denotes a left rotation. After this
operation, the state words are cycled: $(A, B, C, D) \leftarrow (D, A, B, C)$.
\vspace{1em}
The rotation amounts $s[i]$ are constant per round and repeat every four steps:
\begin{align*}
\text{Round 0} &: 7,\ 12,\ 17,\ 22 \\
\text{Round 1} &: 5,\ 9,\ 14,\ 20 \\
\text{Round 2} &: 4,\ 11,\ 16,\ 23 \\
\text{Round 3} &: 6,\ 10,\ 15,\ 21
\end{align*}
\vspace{1em}
The 64 constants $T[i]$ are derived from the sine function:
\begin{align*}
\forall i \in \mathbb{N},\ 0\le i < 64, T_i = \left\lfloor 2^{32}\,|\sin(i+1)| \right\rfloor
\end{align*}
After each block is processed, the compressed state is added word-by-word to
the state before compression:
\begin{align*}
(A, B, C, D) \leftarrow (A + A_0,\ B + B_0,\ C + C_0,\ D + D_0)
\end{align*}
\noindent where $A_0$, $B_0$, $C_0$, $D_0$ denote the state at the beginning
of the block. After all blocks have been processed, the four state words are
serialized in little-endian order to produce the 128-bit digest.