libft_ssl/doc/preliminaries.tex

\section{Preliminaries}

This section defines the terminology used throughout the document. The concepts
introduced here are general to cryptographic hash functions and apply to all
algorithms described in subsequent sections.

\vspace{1em}

A \textbf{bit} is the smallest unit of information, taking a value of either 0
or 1. A \textbf{byte} is a group of eight bits, and is the standard unit of
data storage and transmission. A \textbf{word} is a fixed-size integer used
internally by a hash algorithm --- MD5 and SHA-256 operate on 32-bit words,
while Whirlpool operates on 64-bit words.

\vspace{1em}

\textbf{Endianness} refers to the byte order used when storing a multi-byte
integer in memory. In \textbf{little-endian} order, the least significant byte
is stored first; in \textbf{big-endian} order, the most significant byte is
stored first. This distinction matters when serializing the internal state to
produce the final digest --- MD5 uses little-endian, while SHA-256 and
Whirlpool use big-endian.

\[
\text{Value}\; 0xD41D8CD9:
\begin{cases}
\text{Big-endian: } & \text{d4}\;\mid\;\text{1d}\;\mid\;\text{8c}\;\mid\;\text{d9}\\
\text{Little-endian: } & \text{d9}\;\mid\;\text{8c}\;\mid\;\text{1d}\;\mid\;\text{d4}
\end{cases}
\]


\vspace{1em}

A \textbf{message} is the arbitrary-length input fed to a hash function. The
\textbf{digest} is the fixed-size output it produces. A hash function is said
to be \textbf{one-way} if it is computationally infeasible to recover any input
that produces a given digest. A \textbf{collision} occurs when two distinct
messages produce the same digest; a hash function is considered broken when
collisions can be found efficiently.

\vspace{1em}

Hash functions process their input in fixed-size chunks called \textbf{blocks}.
Since the message length is rarely a multiple of the block size,
\textbf{padding} is appended to the last block to bring it to the required
length. The \textbf{state} is a set of words initialized to fixed constants and
updated after each block; it accumulates the result of the computation and is
serialized into the digest at the end. The \textbf{compression function} is the
core transformation applied to each block --- it takes the current state and
one block of data, and produces a new state.

\vspace{1em}

The \textbf{Miyaguchi-Preneel} construction is a way to build a compression
function from a block cipher $E$. Given a current state $H$ and a message
block $M$, it produces a new state as:
\begin{align*}
H \leftarrow E(H,\ M) \oplus M \oplus H
\end{align*}

\noindent where $E(H, M)$ denotes the encryption of $M$ using $H$ as the key.
The XOR with both $M$ and $H$ ensures that the output cannot be trivially
inverted even if $E$ is known.

\vspace{1em}

The \textbf{wide-pipe} construction is a variant of Merkle-Damgård where the
internal state is wider than the final digest. This makes collision attacks
harder: an attacker targeting the output must first find a collision in the
larger internal state, which requires significantly more work than attacking
the digest directly.