Only the x86 instruction set is mostly described here.
Basics
What is Assembly Language
Assembly language is a low-level programming language for a computer or other programmable device.
What is the difference between the Assembly language and high-level programming languages
Each assembly languages is specific for a given computer architecture (instruction set).
High level programming language are mostly portable across multiple systems.
How is the source code of an high-level programming language converted to the executable machine code
Via a Compiler
How is the source code of an assembly language converted to the executable machine code
Via an Assembler
List examples of some assemblers
NASM - Free, well documented, can be used on both Linuxc and Windwos
MASM - Microsoft Assembler
TASM - Borland Turbo Assembler
GAS - GNU Assembler
Why Assembly languages exist?
Each computer has a microprocessor with arithmetical, logical and control activities.
Each family of processors has its own set of instructions.
Processors understands only machine language instructions, which are ones and zeros. But develop software only using ones and zeros is too hard and complex. As the solution there are the assembly languages for each the instruction sets, instructions are represented with symbolic code and a more understandable form.
Advantages of understanding the assembly language
You will know:
- How applications communicate with the operating system, processor and BIOS
- The ways, data is represented in memory and other external devices
- Access and execution of instructions by the processor
- Access and processing data by instructions
Advantages of the assembly language
Less RAM
Less execution time
Suitable for time-critical jobs
Processor, memory and registers.
Describe shortly processor
Processor executes program instructions.
Describe shortly registers
Registers hold data and address.
Describe shortly memory
Storage for data. The transfer speed much higher than the speed of an HDD or SDD. The transfer speed is lower than the speed of registers.
What is a bit
The smallest unit of the storage is a bit, which can be ON (1) or OFF (0).
What is the name for a group of 8 bits?
Group of 8 related bits is name a byte.
Which data sizes are supported by the processor?
- Word: a 2-byte data item
- Doubleword: a 4-byte (32 bit) data item
- Quadword: an 8-byte (64 bit) data item
- Paragraph: a 16-byte (128 bit) area
- Kilobyte: 1024 bytes
- Megabyte: 1,048,576 bytes
Binary number system
The base is 2.
Hexadecimal number system
The base is 16.
Octal number system
The base is 8.
Decimal number system
The base is 10.
What are the steps of an execution cycle of the processor
- Fetching the instruction from memory
- Decoding or identifying the instruction
- Executing the instruction
How processor stores and loads the data
Storing and loading is done in the reverse-byte sequence
Kinds of memory addressing
- Absolute address - a direct reference of specific location.
- Segment address (or offset) - starting address of a memory segment with the offset value.
Assembly syntax
Parts of an assembly program
- data section
- bss section
- text section
Data section
Used for declaring constants - which do not change at the runtime
The start of this sections is declared as: section.data
Bss section
Used for declaring variables
The start of this sections is declared as: bxx.data
Text section
Used for declaring the code.
The start of this sections is declared as:
section.text
global _start
_start:
Comments
Comments starts with the semicolon (;) character.
One comments cannot be on more lines. Assembly language comments are only one-line.
Comments (;) can start on a new line are after an instruction
Types of Assembly Language Statements
- Executable instructions or instructions,
- Assembler directives or pseudo-ops, and
- Macros.
Executable instructions, or simply "instructions," direct the processor's actions. Each one includes an operation code (opcode) and corresponds to a single machine language instruction.
Assembler directives, or pseudo-ops, provide guidance to the assembler on aspects of the assembly process. They are non-executable and do not produce machine language instructions.
Macros act as a mechanism for text substitution.
Syntax of Assembly Language Statements
Assembly language statements are entered one statement per line.
[label] mnemonic [operands] [;comment]
Hello World program in the Assembly Language
This example shows how to write a simple program in assembly for the x86 architecture that prints "Hello, World!" to the screen using Linux system calls.
section .data msg db 'Hello, World!', 0 ; null-terminated string section .text global _start ; entry point for the program _start: ; Write the string to stdout (file descriptor 1) mov eax, 4 ; syscall number for sys_write mov ebx, 1 ; file descriptor 1 (stdout) mov ecx, msg ; pointer to the string mov edx, 13 ; length of the string int 0x80 ; invoke the system call ; Exit the program mov eax, 1 ; syscall number for sys_exit xor ebx, ebx ; exit status 0 int 0x80 ; invoke the system call
Explanation
1. Section .data: Defines the data segment, which contains the string "Hello, World!".
2. Section .text: Defines the code segment where the program starts executing.
3. System Call (int 0x80): This is the interface to Linux system calls.
- eax = 4: Specifies the sys_write system call, which writes to a file descriptor (stdout in this case). - ebx = 1: Specifies the file descriptor for standard output. - ecx = msg: Points to the memory address of the message string. - edx = 13: Specifies the length of the string.
4. The program exits with a status of 0 (xor ebx, ebx clears the register).
Compilation and Execution
To assemble and run this code on a Linux system, follow these steps:
- Save the code in a file (e.g., hello.asm or any name you prefer).
- Assemble and link it:
nasm -f elf32 hello.asm -o hello.o ld -m elf_i386 -s -o hello hello.o
- Run the resulting executable:
./hello
This program will print "Hello, World!" to the screen.
External links
https://www.tutorialspoint.com/assembly_programming/index.htm