X86 assembly programming in real mode
|
x86 assembly programming in real mode is a type of assembly computer programming for the Intel x86 in real mode. It involves the manipulation of several 16-bit processor registers, and dealing with physical addresses in memory only (as opposed to protected mode). Perhaps the most popular use of this type of programming was writing DOS programs in the 1980s. All modern x86 operating systems use protected mode; however, when the computer boots, it starts up in real mode, so the part of the operating system responsible for switching into protected mode must operate in the real mode environment.
Contents |
Registers
Each register is specialized for a certain task, and operations that deal with that task are often run more efficiently if the right register is used.
Registers in real mode include:
- data registers
- AX, the accumulator
- BX, the base register
- CX, the counter register
- DX, the data register
- address registers
- SI, the source register
- DI, the destination register
- SP, the stack pointer register
- BP, the stack base pointer register
Each data register can be broken up into two eight-bit registers - that is 16 bits of data in a 16 bit register can be addressed 8 bits at a time: the upper eight and the lower eight bits, and can be treated as registers in their own right. For example, in the AX register, the AH register addresses the upper eight bits of the AX register, and the AL register addresses the lower eight bits of the AX register. The other data registers can be addressed in this way by changing the suffix - "X" for extended, "H" for high, and "L" for low.
Collectively the data and address registers are called the general registers.
With the general registers, there are additionally the:
- segment registers
- other registers
- IP, the instruction pointer register
- FLAGS, the flag register
The IP register points to where in the program the processor is currently executing its code. The IP register cannot be accessed by the programmer directly.
The FLAGS register contains the current state of the processor. Each bit in this register is called a flag. Each flag can be either 1 or 0, set or not set. Some of the flags that the FLAGS register contains is carry, overflow, zero and single step.
Flags are notably used in the x86 architecture for comparisons. A comparison is made between two registers, for example, and in comparison of their difference a flag is raised. A jump instruction then checks the respective flag and jumps if the flag has been raised: for example
cmp ax, bx jne do_something
first compares the AX and BX registers, and if they are unequal, the code branches off to the do_something label.
Mnemonics for opcodes
In real mode, the following mnemonics are available: aaa, aad, aam, aas, adc, add, and, call, cbw, clc, cld, cli, cmc, cmp, cmpsb, cmpsw, cwd, daa, das, dec, div, esc, hlt, idiv, imul, in, inc, int, into, iret, ja, jae, jb, jbe, jc, jcxz, je, jg, jge, jl, jle, jmp, jna, jnae, jnb, jnbe, jnc, jne, jng, jnge, jnl, jnle, jno, jnp, jns, jnz, jo, jp, jpe, jpo, js, jz, lahf, lds, lea, les, lock, lodsb, lodsw, loop, loope, loopne, loopnz, loopz, mov, movsb, movsw, mul, neg, nop, not, or, out, pop, popf, push, push, puchf, rcl, rcr, rep, repe, repne, repnz, repz, ret, rol, ror, sahf, sal, sar, sbb, scasb, scasw, shl, shr, stc, std, sti, stosb, stosw, sub, test, wait, xchg, xlat, xor
There are also some undocumented opcodes that has no mnemonics named after them. For example, 0x0F while executed by most 8086-processors could be translated to "POP CS". Other processors in the x86-family may not interpret undocumented opcodes as earlier processors do. Therefore, use of undocumented opcodes might render your program useless in future x86-processors.
The real mode addressing model
This is quite simple, but is quite controversial amongst programmers. The x86 architecture uses a process known as segmentation to address memory, and not a linear method as used in other architectures. Segmentation involves decomposing a linear address into two parts - a segment and an offset. The segment address points to the beginning of a 64K group of addresses and an offset from the base address of the specified segment. To translate back into a linear address, the segment address is shifted 4 bits left and then added to the offset. The formula looks like this: segment*0x10+offset.
In real mode, two registers are used for a memory address: one to hold the segment, and one to hold the offset.
For example, if DS contains the hexadecimal number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE One quick way to do this without a hexadecimal calculator would be to just add a zero to the hexadecimal number in the segment register and then add the content of the offset register to that number. The above would be 0xDEAD0+0xCAFE.
In referring to an address with a segment and an offset, the notation of segment:offset is used, in the above example, the linear address 0xEB5CE can be written as 0xDEAD:0xCAFE, or if one has a segment and offset register pair, DS:DX.
There are some special combinations of segment registers and general registers that point to important addresses:
- CS:IP points to the address where the processor will fetch its next byte of code.
- SS:SP points to the location of the last item pushed onto the stack.
- DS:SI is often used to point to data that is about to be copied to ES:DI
The PC memory layout in real mode
0-3FF IVT (Interrupt Vector Table) 400-5FF BDA (BIOS Data Area) 600-9FFFF Ordinary application RAM A0000-BFFFF Video memory C0000-EFFFF Optional ROMs (The VGA ROM is usually located at C0000) F0000-FFFFF BIOS ROM
That means that we have 640kB of application RAM.
Everything above 0xFFFFF is called the "high memory area".
Interrupts in real mode
The x86 architecture is an interrupt-driven architecture. This means that hardware or software can present the processor with requested data, instead of the processor waiting for a device to respond.
There are two kinds of interrupts: software and hardware interrupts. Software interrupts is often used to talk with the operating system. A typical software interrupt is interrupt 0x21 (the DOS-interrupt, nearly all DOS system functions are accessed via this interrupt) and int3 (breakpoint, which is often used to enter a software-debugger). A typical hardware interrupt would be when some external circuit decides that it need attention from the CPU, like when the system clock ticks. The 8259 chip is used to map different IRQs into ordinary interrupts. There are two 8259 chips in a PC, 8259A and 8259B. If the 8259A chip is mapped into interrupt 0x20 to 0x27, the every time the system clock ticks the interrupt 0x20 would go off.
At the very beginning of the memory lies the Interrupt Vector Table (IVT). The IVT contains pointers to all the Interrupt Service Routines (ISR's).
The pointers to the different ISR's wired to the interrupts are saved in this format:
[offset_0][segment_0][offset_1][segment_1][... ...][offset_255][segment_255] (each integer (that is: the offset or segment-pointers) is 16 bits wide)
There are 256 different interrupts, each with its own pointer.
Example
This NASM-assembler program is an example of real mode code that prints "Hello world!" to the screen by means of writing directly to video.
[org 0x100] [bits 16] [section .text]
mov ax, cs ; cs = code segment mov ds, ax ; ds = cs ; (this way, we dont have to care much about where our data is located) mov ax, 0xB800 ; 0xB8000 is the base of the text video memory mov es, ax ; Remember the memory model! mov si, text ; Remember that ds:si -> es:di xor di, di ; a xor a is always zero. (di is given the value 0)
around: mov al, [ds:si] ; give al the value of what ds:si points to cmp al, 0 ; compare if al contains zero ("Hello world!",0) je stop ; if so, stop writing to the screen mov [es:di], al ; move the content of al to es:di (text video memory) inc si ; select the next byte in the Hello world!-string add di, 2 ; and goto the next position on the screen. jmp around ; and go back to the beginning of the loop stop: ret ; and return back to the caller function
text db "Hello world!",0
This program could be compiled into a DOS-compatible .com-file, it is also quite possible to assemble it to any other operating system running in realmode, or even no operating system at all, but you might need to make some minor changes in such cases. Because it does not make use of the screen-functions that is provided by DOS or the BIOS, the text that the program prints to screen will disappear when the program is terminated and other programs write to video memory.