feat: README
This commit is contained in:
parent
79e3fdb620
commit
3705bdd8b3
1 changed files with 214 additions and 0 deletions
214
README.md
214
README.md
|
|
@ -1 +1,215 @@
|
||||||
# libasm
|
# libasm
|
||||||
|
|
||||||
|
This is my second project in assembly.\
|
||||||
|
It is a from-scratch reimplementation of libc-inspired utilities.\
|
||||||
|
It's pedagogical in purpose and not meant for serious real-world use.
|
||||||
|
|
||||||
|
> **Note:** All technical content in this README is specific to x86_64 Linux.
|
||||||
|
|
||||||
|
## Technical description
|
||||||
|
|
||||||
|
Architecture: x86_64\
|
||||||
|
Syntax: Intel\
|
||||||
|
Assembler: NASM 3.01
|
||||||
|
|
||||||
|
## What I learned
|
||||||
|
|
||||||
|
ASM has different syntaxes depending on the assembler and architecture.
|
||||||
|
|
||||||
|
It's a very verbose language - everything must be explicit.\
|
||||||
|
Each line represents a single instruction.
|
||||||
|
|
||||||
|
### Stack
|
||||||
|
|
||||||
|
The stack is a LIFO (Last In, First Out) data structure used for temporary storage during program execution.\
|
||||||
|
It grows downward in memory - each time you `push` a value, the stack pointer (`rsp`) decreases.\
|
||||||
|
Conversely, `pop` retrieves the top value and moves `rsp` back up.
|
||||||
|
|
||||||
|
In x86_64, the stack must be 16-byte aligned before any `call` instruction - failing to do so causes undefined behavior.
|
||||||
|
|
||||||
|
`rbp` (base pointer) is typically used to mark the start of the current stack frame, making it easy to reference local variables at fixed offsets regardless of how `rsp` moves.
|
||||||
|
|
||||||
|
### Register
|
||||||
|
|
||||||
|
Registers are the processor’s working memory - small, ultra-fast storage slots wired directly into the CPU.\
|
||||||
|
Unlike RAM, which can only be read from or written to, registers are connected to active units like the ALU,\
|
||||||
|
allowing the processor to actually compute: add, subtract, shift bits, apply logical operators, and so on.\
|
||||||
|
All computation happens inside registers - RAM just holds the data until it’s needed.
|
||||||
|
|
||||||
|
#### Special registers
|
||||||
|
|
||||||
|
| 64-bit | 32-bit | 16-bit | Name | Purpose |
|
||||||
|
|--------|--------|--------|------|---------|
|
||||||
|
| `rsp` | `esp` | `sp` | Stack Pointer | Points to the top of the stack |
|
||||||
|
| `rbp` | `ebp` | `bp` | Base Pointer | Marks the base of the current stack frame |
|
||||||
|
| `rip` | `eip` | `ip` | Instruction Pointer | Points to the next instruction to execute |
|
||||||
|
| `rflags` | `eflags` | `flags` | Flags Register | Stores CPU state flags (zero, carry, sign, overflow...) |
|
||||||
|
|
||||||
|
#### General-purpose registers
|
||||||
|
|
||||||
|
| 64-bit | 32-bit | 16-bit | 8-bit high | 8-bit low | Conventional use |
|
||||||
|
|--------|--------|--------|------------|-----------|-----------------|
|
||||||
|
| `rax` | `eax` | `ax` | `ah` | `al` | Return value, accumulator |
|
||||||
|
| `rbx` | `ebx` | `bx` | `bh` | `bl` | Callee-saved |
|
||||||
|
| `rcx` | `ecx` | `cx` | `ch` | `cl` | 4th argument |
|
||||||
|
| `rdx` | `edx` | `dx` | `dh` | `dl` | 3rd argument |
|
||||||
|
| `rsi` | `esi` | `si` | - | `sil` | 2nd argument |
|
||||||
|
| `rdi` | `edi` | `di` | - | `dil` | 1st argument |
|
||||||
|
| `r8` | `r8d` | `r8w` | - | `r8b` | 5th argument |
|
||||||
|
| `r9` | `r9d` | `r9w` | - | `r9b` | 6th argument |
|
||||||
|
| `r10`–`r11` | `r10d`–`r11d` | `r10w`–`r11w` | - | `r10b`–`r11b` | Caller-saved (scratch) |
|
||||||
|
| `r12`–`r15` | `r12d`–`r15d` | `r12w`–`r15w` | - | `r12b`–`r15b` | Callee-saved |
|
||||||
|
|
||||||
|
> Writing to a 32-bit register (e.g. `eax`) zeroes the upper 32 bits of its 64-bit counterpart (`rax`).
|
||||||
|
> Writing to a 16-bit or 8-bit register leaves the upper bits unchanged.
|
||||||
|
|
||||||
|
### CPU instructions
|
||||||
|
|
||||||
|
#### Base
|
||||||
|
|
||||||
|
| Instruction | Description |
|
||||||
|
|-------------|-------------|
|
||||||
|
| `mov dst, src` | Copy src into dst |
|
||||||
|
| `push src` | Push src onto the stack |
|
||||||
|
| `pop dst` | Pop top of stack into dst |
|
||||||
|
| `lea dst, [src]` | Load effective address of src into dst |
|
||||||
|
|
||||||
|
#### Branching
|
||||||
|
|
||||||
|
| Instruction | Description |
|
||||||
|
|-------------|-------------|
|
||||||
|
| `cmp a, b` | Compare a and b (sets flags, no result stored) |
|
||||||
|
| `test a, b` | Bitwise AND to set flags (no result stored) |
|
||||||
|
| `jmp label` | Unconditional jump |
|
||||||
|
| `je label` | Jump if equal (ZF=1) |
|
||||||
|
| `jne label` | Jump if not equal (ZF=0) |
|
||||||
|
| `jz label` | Jump if zero (ZF=1) |
|
||||||
|
| `jnz label` | Jump if not zero (ZF=0) |
|
||||||
|
| `jo label` | Jump if overflow (OF=1) |
|
||||||
|
| `jno label` | Jump if no overflow (OF=0) |
|
||||||
|
| `js label` | Jump if sign / negative (SF=1) |
|
||||||
|
| `jns label` | Jump if no sign / positive (SF=0) |
|
||||||
|
| `jg label` | Jump if greater (signed) |
|
||||||
|
| `jge label` | Jump if greater or equal (signed) |
|
||||||
|
| `jl label` | Jump if less (signed) |
|
||||||
|
| `jle label` | Jump if less or equal (signed) |
|
||||||
|
| `ja label` | Jump if above (unsigned) |
|
||||||
|
| `jae label` | Jump if above or equal (unsigned) |
|
||||||
|
| `jb label` | Jump if below (unsigned) |
|
||||||
|
| `jbe label` | Jump if below or equal (unsigned) |
|
||||||
|
|
||||||
|
#### Arithmetic
|
||||||
|
|
||||||
|
| Instruction | Description |
|
||||||
|
|-------------|-------------|
|
||||||
|
| `add dst, src` | dst = dst + src |
|
||||||
|
| `sub dst, src` | dst = dst - src |
|
||||||
|
| `inc dst` | dst = dst + 1 |
|
||||||
|
| `dec dst` | dst = dst - 1 |
|
||||||
|
| `imul dst, src` | dst = dst * src (signed) |
|
||||||
|
| `mul src` | rax * src → rdx:rax (unsigned) |
|
||||||
|
| `idiv src` | rdx:rax / src → rax (quotient), rdx (remainder) (signed) |
|
||||||
|
| `div src` | rdx:rax / src → rax (quotient), rdx (remainder) (unsigned) |
|
||||||
|
| `neg dst` | dst = -dst |
|
||||||
|
| `and dst, src` | dst = dst AND src |
|
||||||
|
| `or dst, src` | dst = dst OR src |
|
||||||
|
| `xor dst, src` | dst = dst XOR src (used to zero a register when dst == src) |
|
||||||
|
|
||||||
|
### System call
|
||||||
|
|
||||||
|
A system call (syscall) is a software interrupt that requests a service from the kernel - file I/O, memory allocation, process control, etc.\
|
||||||
|
In x86_64 Linux, syscalls are triggered with the `syscall` instruction.
|
||||||
|
|
||||||
|
**Calling convention:**
|
||||||
|
|
||||||
|
| Register | Role |
|
||||||
|
|----------|------|
|
||||||
|
| `rax` | Syscall number |
|
||||||
|
| `rdi` | 1st argument |
|
||||||
|
| `rsi` | 2nd argument |
|
||||||
|
| `rdx` | 3rd argument |
|
||||||
|
| `r10` | 4th argument |
|
||||||
|
| `r8` | 5th argument |
|
||||||
|
| `r9` | 6th argument |
|
||||||
|
|
||||||
|
The return value is stored in `rax`. On error, `rax` contains a negative errno value.
|
||||||
|
|
||||||
|
**Common syscalls:**
|
||||||
|
|
||||||
|
| Number | Name | Description |
|
||||||
|
|--------|------|-------------|
|
||||||
|
| 0 | `read` | Read from a file descriptor |
|
||||||
|
| 1 | `write` | Write to a file descriptor |
|
||||||
|
| 2 | `open` | Open a file |
|
||||||
|
| 3 | `close` | Close a file descriptor |
|
||||||
|
| 60 | `exit` | Terminate the process |
|
||||||
|
|
||||||
|
### Function
|
||||||
|
|
||||||
|
#### Calling convention
|
||||||
|
|
||||||
|
Arguments are passed in registers in this order: `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9`.\
|
||||||
|
Additional arguments are pushed onto the stack. The return value is stored in `rax`.
|
||||||
|
|
||||||
|
**Register preservation:**
|
||||||
|
|
||||||
|
| Type | Registers | Who saves | Behavior |
|
||||||
|
|------|-----------|-----------|----------|
|
||||||
|
| Caller-saved | `rax`, `rcx`, `rdx`, `rsi`, `rdi`, `r8`–`r11` | Caller | **Will be overwritten** by the called function - save them before `call` if needed |
|
||||||
|
| Callee-saved | `rbx`, `rbp`, `r12`–`r15` | Callee | Must be restored before returning |
|
||||||
|
|
||||||
|
#### Structure of a function
|
||||||
|
|
||||||
|
```nasm
|
||||||
|
my_function:
|
||||||
|
push rbp ; save caller's base pointer
|
||||||
|
mov rbp, rsp ; set up new stack frame
|
||||||
|
|
||||||
|
; function body
|
||||||
|
|
||||||
|
pop rbp ; restore caller's base pointer
|
||||||
|
ret ; return to caller (pops rip)
|
||||||
|
```
|
||||||
|
|
||||||
|
`call label` pushes the return address onto the stack then jumps to `label`.\
|
||||||
|
`ret` pops that address and jumps back to it.
|
||||||
|
|
||||||
|
### Macro
|
||||||
|
|
||||||
|
A macro is a named block of code that gets inlined at each call site - unlike a function, it has no `call`/`ret` overhead.\
|
||||||
|
Use macros for short repeated patterns where performance or readability matter.
|
||||||
|
|
||||||
|
#### Syntax
|
||||||
|
|
||||||
|
```nasm
|
||||||
|
%macro name nb_args
|
||||||
|
; body - arguments accessed via %1, %2, ...
|
||||||
|
%endmacro
|
||||||
|
```
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
```nasm
|
||||||
|
%macro save_regs 0
|
||||||
|
push rbx
|
||||||
|
push r12
|
||||||
|
%endmacro
|
||||||
|
```
|
||||||
|
|
||||||
|
#### `%[expr]`
|
||||||
|
|
||||||
|
`%[...]` forces the preprocessor to evaluate an expression inline - useful inside contexts where a token would not normally be expanded, such as inside another macro or a string.
|
||||||
|
|
||||||
|
```nasm
|
||||||
|
%define OFFSET 8
|
||||||
|
mov rax, [rbp + %[OFFSET]] ; expands to [rbp + 8] at preprocessing time
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Resources
|
||||||
|
|
||||||
|
- [x86_64 System call table](https://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/)
|
||||||
|
- [Register usage](https://math.hws.edu/eck/cs220/f22/registers.html)
|
||||||
|
- [Basic instructions (french)](https://lacl.u-pec.fr/tan/asm.pdf)
|
||||||
|
- [Turing Complete (game)](https://turingcomplete.game/)
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue