What is Assembler? - Miftachul Huda Almaftuchin

What is Assembler?


Almaftuch.in - An assembler program creates object code by translating combinations of mnemonics and syntax for operations and addressing modes into their numerical equivalents. This representation typically includes an operation code ("opcode") as well as other control bits and data. The assembler also calculates constant expressions and resolves symbolic names for memory locations and other entities. The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitution – e.g., to generate common short sequences of instructions as inline, instead of called subroutines.

Some assemblers may also be able to perform some simple types of instruction set-specific optimizations. One concrete example of this may be the ubiquitous x86 assemblers from various vendors. Most of them are able to perform jump-instruction replacements (long jumps replaced by short or relative jumps) in any number of passes, on request. Others may even do simple rearrangement or insertion of instructions, such as some assemblers for RISC architectures that can help optimize a sensible instruction scheduling to exploit the CPU pipeline as efficiently as possible.[citation needed]

Like early programming languages such as Fortran, Algol, Cobol and Lisp, assemblers have been available since the 1950s and the first generations of text based computer interfaces. However, assemblers came first as they are far simpler to write than compilers for high-level languages. This is because each mnemonic along with the addressing modes and operands of an instruction translates rather directly into the numeric representations of that particular instruction, without much context or analysis. There have also been several classes of translators and semi automatic code generators with properties similar to both assembly and high level languages, with speedcode as perhaps one of the better known examples.

There may be several assemblers with different syntax for a particular CPU or instruction set architecture. For instance, an instruction to add memory data to a register in a x86-family processor might be add eax,[ebx], in original Intel syntax, whereas this would be written addl (%ebx),%eax in the AT&T syntax used by the GNU Assembler. Despite different appearances, different syntactic forms generally generate the same numeric machine code, see further below. A single assembler may also have different modes in order to support variations in syntactic forms as well as their exact semantic interpretations (such as FASM-syntax, TASM-syntax, ideal mode etc., in the special case of x86 assembly programming).

Segments

Segment registers: cs,es,ds,ss,fs,gs
Bits 0,1 describe the RPL , request privilege level
Bit 2 describes if the LDT is used or not
Bits 3 to 15 contain the offset into the GDT or LDT table (when shifted left by 3)
example:
CS of 8 = 1000b = 1 0 00 : RPL=0, LDT=0, so GDT is used, offset in GDT table is (1 << 3) = 8 CS of 0x23 = 100011b = 100 0 11 : RPL=3, LDT=0 (GDT), offset in GDT table is 100b=4, (4 << 3) = 32
Note that even though 64-bit mode is used, bits 3 to 15 still only need to be shifted by 3 to point to the proper offset

GDT

The gdt is a table of descriptors that describe what should happen when entering a specific segment and setting it's rights. (What access rights, the limits, if it's data or code, etc...)

IDT

The IDT is a table of descriptors that describe what should happen when an interrupt occurs. It contains the used code segment, and the EIP/RIP address to call, but also information like the DPL of the interrupt and if it's a callgate, taskgate or interrupt gate
Useful interrupts in regards of game hacking: Interrupt 1(Single step), 3(breakpoint),13(General protection fault) and 14 (Page fault)

Flags

There are 32 bits available for the 17 EFlags. Missing bits in this list are not a mistake, some flags temporarily use their neighbours.
ID, VIP, VIF, AC, VM, RF, NT, IOPL, OF, DF, IF, TF, SF, ZF, AF, PF, CF

Bit: Flag - description
00: CF Carry Flag – becomes one if an addition, multiplication, AND, OR, etc results in a value larger than the register meant for the result.
02: PF Parity Flag – becomes 1 if the lower 8-bits of an operation contains an even number of 1 bits.
04: AF Auxiliary Flag – Set on a carry or borrow to the value of the lower order 4 bits.
06: ZF Zero Flag – becomes 1 if an operation results in a 0 writeback, or 0 register.
07: SF Sign Flag – is 1 if the value saved is negative, 0 for positive.
08: TF Trap Flag – allows for the stopping of code within a segment (allows for single stepping/debugging in programming).
09: IF Interrupt Flag – when this flag is set, the processor begins 'listening' for external interrupts.
10: DF Direction Flag – determines the direction to move through the code (specific to repeat instructions).
11: OF Overflow Flag – becomes 1 if the operation is larger than available space to write (eg: addition which results in a number >32-bits).
12-13: IOPL I/O Privilege Level – 2-bit register specifying which privilege level is required to access the IO ports
14: NT Nested Task – becomes 1 when calls within a program are made.
16: RF Resume Flag – stays 1 upon a break, and stays that way until a given 'release' or resume operation/command occurs.
17: VM Virtual Machine 8086 – becomes a 1 if the processor is to simulate the 8086 processor (16-bit).
18: AC Alignment Check – checks that a file or command is not breaking its privilege level.
19: VIF Virtual Interrupt Flag – almost always set in protected mode, listening for internal and assembling interrupts.
20: VIP Virtual Interrupt Pending – 1 if a virtual interrupt is yet to occur.
21: ID ID Flag – is set if a CPU identification check is pending (used in some cases to ensure valid hardware).

Opcodes

Most commonly used opcodes:
ADD : Increases a register or address with a specified amount
DEC : Decreases a register or address with 1
INC : Increases a register or address with 1
SUB : Decreases a register or address with a specified amount
MOV : Sets a register or address to a specified value
NOP = No Operation , usually used when removing the code that decreases life
XOR : Exclusive OR operation on a register or address with a specified value. An Exclusive OR sets the result bit to 1 for each bit that is different between the 2 values