3 Reading from a file

As the first example, let us investigate how to read bytes from a file using system calls.

Looking up a system call table, reading a file uses system service 3, which is either named read or sys_read. Next, we look up “man 2 read”, which describes the C prototype of read as follows:

ssize_t read(int fd, void *buf, size_t count);  
  

As system call number 3, it means that eax should have a value of 3. Next, register ebx should store the “file descriptor” of the file we are trying to access. ecx stores the pointer pointing to a memory location to store the read content. Last, edx specifies the number of bytes to read.

Furthermore, the return value (number of bytes actually read) is stored in register eax.

Note that 3 files are pre-open for all Linux processes. The standard input file has a file descriptor of 0, the standard output file has a file descriptor of 1, and the standard error file has a file descriptor of 2.

The following is a simple program that attempts to read 5 characters from the standard input file:

1.data 
2buffer: .fill 5     # allocate 5 files 
3.text 
4.global _start 
5_start: 
6  movl $3,%eax      # system call number 3 (read) 
7  movl $0,%ebx      # file descriptor 0 is the standard input file 
8  movl $buffer,%ecx # the address of where to store the read bytes 
9  movl $5,%edx      # request to read 5 bytes 
10  int  $0x80        # request the OS to perform the operation 
11  movl $1,%eax      # system call number 1 (exit)  
12  movl $0,%ebx      # specify the exit code 
13  int  $0x80        # request the OS to perform the operation

In order to test this program, you have to run it in gdb, as the program has no means to indicate what it has read. Furthermore, it is best to prepare an input file, which is just a regular text file with some content.

In gdb, instead of running the program using the usual run command, you can redirect the input of the program from a specific file using the < operator. In other words, the following gdb command runs the code, but also specifies that the file test.input should be used as the standard input file in place of the keyboard:

To check the program, a break point should be placed on line 11, and inspect eax as well as the memory content at buffer.

run < test.input  
  

This method is better than using the keyboard (console) as the standard input file because the console is also used by gdb itself. Having the programming being debugged and gdb to read from the same console can lead to confusion.