CS 140: Argument Passing

The MIPS Calling Convention

What follows is a quick and dirty discussion of the MIPS calling convention. Some of the basics should be familiar from CS 107, and if you've already taken CS 143 or EE 182, then you should have seen even more of it. I've omitted some of the complexity, since this isn't a class in how MIPS function calls work, so don't expect this to be exactly correct in full, gory detail.

Whenever a function call happens, you need to put the arguments on the call stack for that function, before the code for that function executes, so that the callee has access to those values. The caller has to be responsible for this (be sure you understand why). Therefore, when you compile a program, the assembly code emitted will have in it, before every function call, a bunch of instructions that prepares for the call in whatever manner is conventional for the machine you're working on. This includes saving registers as needed, putting stuff on the stack, saving the location to return to somewhere (so that when the callee finishes, it knows where the caller code is), and some other bookkeeping stuff. Then you do the jump to the callee's code, and it goes along, assuming that the stack and registers are prepared in the appropriate manner. When the callee is done, it looks at the return location as saved earlier, and does a jump back to that location.

If you think about it, some of these things should remind you of context switching.

As an aside: in general, function calls are not cheap: you have to do a bunch of memory writes to prepare the stack, you need to save and restore registers before and after a function call, you need to write the stack pointer, you have a couple of jumps which probably wrecks some of your caches, ... This is why inlining code is much faster.

MIPS has a few conventions which do speed things up. The one that is particularly relevant for the assignment is this: while conceptually all the arguments should go on the stack, MIPS convention designates registers 4-7 as reserved for the first four arguments to a function. Therefore, optimized code will not bother putting the arguments on the stack if there are four or fewer arguments to a function, since it can just put them all in registers. Moreover, code (optimized or not) will generally assume that it can access the arguments via the registers instead of the stack.


Argument Passing to main

In main's case, there is no caller to prepare the stack before it runs. Therefore, the kernel needs to do it. Fortunately, since there's no caller, there are no registers to save, no return address to deal with, etc. The only difficult detail to take care of, after loading the code, is putting the arguments to main on the stack and in the correct registers.

(The above is a small lie: I think that most compilers will emit code where main isn't strictly speaking the first function. I've never been too clear on exactly how that works, but you don't need to think about it too much. If you want to look into it more, try disassembling a program and looking around a bit. However, you can just act as if main is the very first function called.)

Nachos is written for the MIPS architecture (the machine directory simulates a MIPS chip, despite the fact that we're running it on a Sparc; this is because Nachos was originally written for a MIPS machine, and later ported to Solaris by Prof. Mendel Rosenblum). Therefore, we need to adhere to the MIPS calling convention, which is detailed in the FAQ (actually, this too is a lie: we are using a simplified version of the MIPS calling convention). Basically, you put all the arguments on the stack, move the stack pointer appropriately, and also put the first four arguments in registers 4-7 for easy access to those arguments. The programs will assume that all these conditions hold true when it begins running.

So, what are the arguments to main? Just two: an int (argc) and a char ** (argv). argv is an array of strings (argument vector), and argc is the number of strings in that array (argument count). However, the hard part isn't these two things. The hard part is getting all the individual strings in the right place. As we go through the procedure, let us consider the following example command: /bin/ls -l *.h *.c.

The first thing to do is to break the command line into individual strings: /bin/ls, -l, *.h, and *.c. These constitute the arguments of the command (including the program name itself).

These individual, null-terminated strings should be placed on the user stack. They may be placed in any order, as you'll see shortly, without affecting how main works, but for simplicity let's assume they are in reverse order (keeping in mind that the stack grows downward on a MIPS machine). As we copy the strings onto the stack, we record their (virtual) stack addresses. These addresses will become important when we write the argument vector (two paragraphs down).

After we push all of the strings onto the stack, we adjust the stack pointer so that it is word-aligned: that is, we move it down to the next 4-byte boundary. This is required because we will next be placing several words of data on the stack, and they must be aligned in order to be read correctly. In our example, as you'll see below, the strings start at address 0xffed. 4 bytes below that would be at 0xffe9, so we could in theory put the next word on the stack there. However, since the machine will only read aligned words, we instead leave the stack pointer at 0xffe8.

Once we align the stack pointer, we then push the elements of the argument vector (that is, the addresses of the strings /bin/ls, -l, *.h, and *.c) onto the stack. This must be done in reverse order, such that argv[0] is at the lowest virtual address (again, because the stack is growing downward). This is because we are now writing the actual array of strings; if we write them in the wrong order, then the strings will be in the wrong order in the array. This is also why, strictly speaking, it doesn't matter what order the strings themselves are placed on the stack: as long as the pointers are in the right order, the strings themselves can really be anywhere. After we finish, we note the stack address of the first element of the argument vector, which is argv itself.

Finally, we push argv (that is, the address of the first element of the argv array) onto the stack, along with the length of the argument vector (argc -- 4 in this example). This must also be done in this order, since argc is the first argument to main and therefore is on first (smaller address) on the stack. In addition, we place argc in machine register 4 and argv in register 5 (since they are the first two arguments to the function) and we leave the stack pointer to point to the location where argc is, because it is at the top of the stack, the location directly below argc.

All of which may sound very confusing, so here's a picture which will hopefully clarify what's going on. This represents the state of the stack and the relevant registers right before the beginning of the user program (assuming for this example a 16-bit virtual address space with addresses from 0x0000 to 0xffff):

-------------   -------------   -------------   -------------
*argv[argc-1]      0xfffc         *argv[3]          *.c\0
-------------   -------------   -------------   -------------
*argv[argc-2]      0xfff8         *argv[2]          *.h\0
-------------   -------------   -------------   -------------
     ...           0xfff5         *argv[1]          -l\0
-------------   -------------   -------------   -------------
  *argv[0]         0xffed         *argv[0]        /bin/ls\0
-------------   -------------   -------------   -------------
 word-align        0xffec        word-align           \0             
-------------   -------------   -------------   -------------
argv[argc-1]       0xffe8          argv[3]          0xfffc
-------------   -------------   -------------   -------------
argv[argc-2]       0xffe4          argv[2]          0xfff8
-------------   -------------   -------------   -------------
     ...           0xffe0          argv[1]          0xfff5
-------------   -------------   -------------   -------------
   argv[0]         0xffdc          argv[0]          0xffed
-------------   -------------   -------------   -------------
    argv           0xffd8           argv            0xffdc
-------------   -------------   -------------   -------------
    argc           0xffd4           argc              4
-------------   -------------   -------------   -------------
        
         Stack pointer:                             0xffd4
         Register r4:               argc              4
         Register r5:               argv            0xffdc

Nota Bene

When using the SUN version of Nachos, there's one final complication with placing the argv array elements, argv, and argc on the stack. The simulated machine (MIPS) and the host machine (Sparc) use different byte orders; so if we copy the integer "0x0000ffdc" directly from the kernel onto the stack, the simulated machine will actually interpret it as "0xdcff0000". In order to avoid this problem, we need to use the function "WordToMachine", which performs the conversion. So, for example, the following code would take the current stack pointer, decrement it by the space occupied by one word (4 bytes), push argc onto the stack, and re-write the stack pointer:

int argc = 4;
int sp = machine->ReadRegister(StackReg);
sp -= 4;
int stackArgc = WordToMachine(argc);
addrSpace->CopyIn(&stackArgc, sp, 4);
machine->WriteRegister(StackReg, sp);

Where this example assumes the existence of some function void AddrSpace::CopyIn(void* kernelSrc, int userDst, int length) to copy data from a kernel buffer to a user buffer -- your implementation may vary.

Note that this byte-swapping is not required for accesses to machine registers, only the main memory. And it's not required for the strings, either, since they aren't affected by byte order (because they're character-based, and characters are single-byte quantities).


cs140-aut0405-staff@lists.stanford.edu