MIPS architecture

Memory organization
Processor organization
    Registers
     Instruction Set
Basic assembly language
Load and store
Arithmetic Operations
Input and Output System Calls
Function Calls and The Stack
Recursion
Floating Point Operations
Other Architectures

Memory organization


The purpose of memory is to store groups of bits, and deliver them (to the processor for loading into registers) upon demand. Most present-day computers store information in multiples of 8 bits, called a byte (or octet). Most also assign a numeric address to each byte. This is convenient because characters can be stored in bytes.
Memory addresses are 32-bit numbers, ranging from 0x00000000 to 0xFFFFFFFF. This is a large amount of memory, most computers do not have actual memory for all of this "address space."
Memory can hold both program instructions and data. One function of the operating system is to assign blocks of
memory for the instructions and data of each process (running program). Another thing a good operating system does is to allow many processes to run concurrently on the computer.

The SPIM simulator always assigns your program to these fixed, even numbered locations, for your convenience:

    0x00400000 - Text segment - program instructions
    0x10000000 - Data segment
    0x7FFFFFFF, and decreasing addresses - Stack segment

A word generally means the number of bits that can be transferred at one time on the data bus, and stored in a register. In the case of MIPS, a word is 32 bits, that is, 4 bytes.
Words are always stored in consecutive bytes, starting with an address that is divisible by 4. Caution: other processors, other definitions.
Some people refer to 16 bits as a word, to others it may mean 64 bits.
 

Storage order of words


How should one store a word (say, a register holding a number) in 4 bytes? There are two equally valid schemes: starting with the lowest numbered byte,

    Little Endian: Store the little end of the number (least significant bits) first, or
    Big Endian: Store the big end of the number first.

    (These terms come from Gulliver's Travels by Jonathan Swift, in which two parties are at war over whether hard-boiled eggs should be opened at the little end or the big end of the egg.)

MIPS is little-endian. One result of this is that character data appear to be stored "backwards" within words. Here is a representation of part of memory, storing the characters "Help" (0x706c6548) and then the number 32766  (0x00007ffe).
 
 

0x10000000 0x48 ("H")
                                      0x10000001 0x65 ("e")
                                      0x10000002 0x6c ("l")
                                      0x10000003  0x70 ("p")
                                      0x10000004 0xfe
                                      0x10000005 0x7f
                                      0x10000006 0x00
                                      0x10000007 0x00

 
 

Processor organization


What a processor does, its function, can be completely described in terms of its registers, where it stores information, and what it does in executing each instruction. in the MIPS, all registers hold 32 bits.
 

Registers


The program counter (PC) always holds the address of the next instruction. Normally it is incremented every time an instruction is executed. This controls the flow of the program.

There are 32 general purpose registers, numbered 0..31. In assembly language, they also have symbolic names, which are shown in the register window of the SPIM simulator. These names suggest conventions for use of the registers. Registers may be referred to by name or number in assembly language, for instance, register 4 can be referred to as $4 or $a0, and is usually used for an argument to a procedure.

The first and last registers are special:

    $0 is a constant 0, and cannot be changed
    $31 stores the return address from a procedure (function) call. It is named $ra.

The conventions for usage we will adhere to are (a compiler for a high-level language may or may not adhere to these):
 
 
    $1($at) Assembler temporary. We will avoidusing it.
    $v0 - $v1 - values returned by a function
    $a0 - $a3 - arguments to a function (Pascal terminologyis: parameters to a procedure)
    $t0 - $t9  - temporary use. Use freely but if youcall a procedure that procedure may change it.
    $s0 - $s7  - saved by procedures. When you are writinga procedure; you must save and restore previous values.

 

Instruction set


Instructions can be grouped into the following catagories. The rest of the course covers the operations performed by the instructions, and how to use them in writing programs. Here I will just give an example of each type of instruction.
 

Load and Store : Load data from memory into a register, or store register contents in memory
    examples:      lw   $t0, num1    #load word in num1 into $t0
                          sw  $t0, num2    #store word in $t0 into num2, ie. num2 := num1
                          li   $v0, 4          #load immediate value (constant) 4 into $v0
Operations: Arithmetic and Logic : perform the operation on data in 2 registers, store the result in a 3rd register
    example:     add $t0, $t3, $t4     # $t0 := $t3 + $t4
Jump and branch : alter the PC to control flow of the program, producing the effects of IF statements and loops
    A few specialized instructions.
 
There are also floating point instructions and registers, for real numbers. We will probably not get to these in this course.

Basic assembly language instructions

 

    load and store

    Arithmetic

    System calls


In this course we will be dealing with instructions on 3 different, relatively low, levels. Starting from the lowest (found left to right in the SPIM text segment window) they are:
 

  1.binary machine instructions (shown in hexadecimal)
  2.symbolic actual machine instructions. Registers shown numerically ($31)
  3.assembly language (nicer) instructions, the ones you write in a source file, registers can be referred to symbolically ($ra)


The correspondance is usually 1:1, however, since RISC is a Reduced instruction set, some normal and useful assembly language instructions can be coded as one or two other, less intuitive, instructions, so the assembler provides a limited translation service. In this section, I will only describe the nice instructions at level 3.
 
 
 

Load and store instructions


These instructions are used to move data between memory and the processor. There are always 2 operands, a register and an address. Addresses can take several forms, as we shall see, the simplest is the label of a data item.

In this example, a word (32 bits) is moved from Num1 to Num2, and a copy is left in register $t0:

        lw    $t0, Num1    # load word,  $t0 := Num1
        sw    $t0, Num2    # store word, Num2 := $t0

Note the order of the operands, the register is always first. The direction of movement is opposite.

There are similar instructions to load and store bytes (8 bits), half-words (16 bits), and double-words (64 bits)
 

Load immediate (constant)


There are also 2 instructions that load a constant, which is incorporated into the instruction itself (immediate):

    load immediate:    li  Rdest, Imm        # example:  li  $v0, 4
    load address:      la  Rdest, label      # example:  la  $a0, prompt_message

Since a label represents a fixed memory address after assembly, la is actually a special case of load immediate.

Note the difference between lw and la: If Num1 is at location [0x10001000] and contains 0xfffffffe (-2), then

    la   $a0, Num1   loads 0x10001000, while
    lw  $a0, Num1   loads 0xfffffffe
 
 
 

Arithmetic


Arithmetic is done in registers. There are 3 operands to an arithmetic operation:

  1.destination register
  2.Source 1 register
  3.Source 2 register or constant

The compact way of writing this pattern is:    add   Rdest, Rsrc1, Src2
The following code calculates the expression (x + 5 - y) * 35 / 3

                 lw    $t0, x             #load x from memory
           lw    $t1, y             #load y from memory
           add   $t0, $t0, 5        # x + 5
           sub   $t0, $t0, $t1      # x + 5 - y
           mul   $t0, $t0, 35       #(x + 5 - y)*35
           div   $t0, $t0, 3        #(x + 5 - y)*35/3
 

Arithmetic overflow, signed numbers


add, sub, mulo, & div check for overflow of signed numbers, an incorrect result, in which the sign bit is improperly altered. (mul doesn't do any checking). Usually we want to work with signed numbers. The processor generates an exception, "arithmetic overflow," and stops the program. This is preferable to generating incorrect results, as is usually the case with older processors, which generate a "flag" which programmers typically ignore.
 

Unsigned numbers


Sometimes we want to work with unsigned numbers. In this case overflow checking is unwanted. Generally we append a u to the operation code to express our intent. The unsigned operation codes are:

      addu,  subu,  mulou,   divu

This also applies to the short load instructions, lb and lh. They normally fill the unused part of the register with the sign bit of the short data. If we want to guarantee that the high part of the register is filled with 0's, we can use lbu & lhu instead. For example, we generally think of a character as an unsigned value.
 
 
 

Input and Output - System calls


Controlling the hardware responsible for input and output is a difficult and specialized job, and beyond the scope of this course. All computers have an operating system that provides many services for input/output anyway. The SPIM simulator provides 10 basic services. The ones we will be using right now are given here.

The others are for floating point (real) numbers. The call code always goes in $v0, and the system is called with syscall.
 
 

Service
Call code
Arguments (input)
Results
print integer
1
$a0 = integer
signed decimal integer printed in console window
print string
4
$a0 = address of string
string printed in console window
read integer
5
(none)
$v0 holds integer  that was entered
read string
8
$a0=address to store  $a1= length limit
characters are stored
exit
10
(none)
Ends the program

#    PROGRAM TO ADD TWO NUMBERS
#    Text segment
#               (all programs start with the next 3 lines)
        .text           #directive identifying the start of instructions
        .globl  __start

__start:

# -------------  print prompt on "console" --------------------
        la      $a0, prompt     # address of prompt goes in
        li      $v0, 4          # service code for print string
        syscall

# -------------  read in the integer --------------------------
        li      $v0, 5          # service code
        syscall
        sw      $v0, Num1       # store what was entered
# -------------- read another
        li      $v0, 5          # service code
        syscall
        sw      $v0, Num2       # store what was entered

# ------ Perfrom the addition, $a0 := Num1 + Num2
        lw      $t0, Num1
        add     $a0, $t0, $v0

# ------ print the sum, it is in $a0
        li      $v0, 1          # print integer service call
        syscall

# ------ print a final string identifying the result, and ending with a new line
        la      $a0, final
        li      $v0, 4
        syscall

        li      $v0, 10         # exit program service
        syscall

#------------------------------------------------------------------
#       Data segment
#------------------------------------------------------------------
        .data
Num1:   .word   0
Num2:   .word   0
prompt: .ascii  "Please type 2 integers, end each with the "
        .asciiz "Enter key:\n"
final:  .asciiz " is the sum.\n"

#------ end of file  ADDNUMS.ASM
 

Control structures


    Assembly language instructions for control of execution
    IF constructs
    Loops

So far we have seen how to move data between memory and processor registers, and how to do arithmetic in the registers. All are programs have been linear, a list of instructions carried out from start to exit. In this chapter we look at the assembly language instructions that alter the control flow of a program, and then at how these can be used to implement the high-level language constructs of if, while, & for.

The order of program execution is determined by the PC (program counter). It can be altered by changing the PC. This can be done by jump and branch instructions. Jump is just the notorious go to that has been supressed from high level languages. Branch instructions are so named because they offer an alternative, to branch or not. A "flow chart" of a program "branches" where there is a branch instruction.
 

Assembly language instructions for control of execution

 

Unconditional jump


A jump instruction puts the address of a label into the PC. Execution continues from that point. jar also saves the old PC value as a "return address" in register $ra ($31). This is used for procedure call, see chapter 8.
 

Instruction
Example
Meaning
j label j do_it_again next instruction is at the label do_it_again:
jal label(jump and link) jal my_proc execution of procedure my_proc will start.$ra holds address of instructionfollowing the jal
jr Rsrc (jump register) jr $ra Retrun from procedure call by putting  $ra value back into the PC 

Conditional branch


These instructions branch, that is, change the PC to the label, when a condition is true. The condition is always a comparison of 2 registers, or a register and a constant. Normally, they are compared as signed numbers. Where it matters, you get an unsigned compare by ending the operation code with a u.
 
Instruction Branch to label if Unsigned
 beq Rsrc1,Src2,label  Rsrc1 = Src2
 bne Rsrc1,Src2,label  Rsrc1 <> Src2
 blt Rsrc1,Src2,label  Rsrc1 < Src2 bltu
 bgt Rsrc1,Src2,label  Rsrc1 > Src2 bgtu
ble Rsrc1,Src2,label  Rsrc1 <= Src2 bleu
 bge Rsrc1,Src2,label  Rsrc1 >= Src2  bgeu

 

Comparison with zero is useful, and can be done simply by using $0 as Src2 (remember, it always holds the value 0). For your convenience, you can write comparison with zero by putting a z at the end of the operation code. Thus these two instructions are the same:

               blt  $t0, $0, cold
               bltz $t0, cold

(You can't use both u and z. There would be no point.)

These are all the instructions you need to write very complicated programs. In fact, you can write programs that are too complicated with them, so complicated that you have to explain the flow with so many arrows that your paper looks like a bunch of spaghetti. You must avoid "spaghetti programs" and use the well-known, disciplined constructs you have learned in your other programming classes.
 

IF constructs: choosing alternatives


In an IF statement, you "branch around" a section of code. In an IF - THEN - ELSE construct, one of two alternatives is executed. Here are some pseudo code examples, with assembly language translations. (Assume that swim and cycle are labels of .asciiz strings containing the phrases.)

IF num < 0 THEN
  num := - num
ENDIF
{absolute value}
                                IF temperature >= 25 THEN
                                    activity := "go swimming"
                                ELSE
                                   activity := "ride bicycle"
                               ENDIF
                               print (activity)
    lw   $t0, num
    bgez $t0, endif0
  # must be ltz------
     sub  $t0,$0,$t0   # 0-num
     sw   $t0, num
endif0:
  #  num is now guaranteed non-negative

                                lw  $t0, temperature
                                blt $t0, 25, else25
                                la  $a0, swim
                                j   endif25
                            else25:
                                la  $a0, cycle
                            endif25:
                                li  $v0, 4   # print string call code
                                syscall
 

Note that the assembly "branch around" is the opposite of the high-level condition. Also note the jump instruction needed before ELSE.
 

Loops: repetition


If a jump (or branch) reloads the PC with a lower address, that instruction will be reached again, and again... A loop results, as the same block of instructions is repeated. One must be careful to avoid an infinite loop by providing a branch out of the repetitive region that will eventually be taken. Two well-known and well-behaved constructs are WHILE and FOR. FOR is actually a special case of WHILE that uses a counter. Both are pre-test loops, the test (branch
instruction) comes at the beginning of the loop body, and there is a jump instruction at the end.

{sum non-negative numbers}
sum := 0
num := 0
WHILE num >= 0 DO
    sum := sum + num
    read (num)
ENDWHILE
                                   {write 10 lines}
                                   FOR countdown := 10 downto 1 DO
                                      print ("Hello again...")
                                   ENDFOR
#   add    $t0, $0, $0 really:
move   $t0, $0   #sum=0
move   $v0, $0   #num=0
while5:
      bltz   $v0, endwhile5
      add  $t0, $v0
                    #$v0 switches role here
      li   $v0, 5   #read int call
      syscall
      j  while5
endwhile5:

                                  # set up print arguments before loop
                                  la  $a0, hello_again
                                  li  $v0, 4    #print string call

                                  li  $t0, 10   # countdown
                                  for10:
                                       beqz  $t0, endfor10
                                       syscall   #print another line
                                       sub $t0, 1    #decr. countdown
                                       j  for10
                                  endfor10:

A REPEAT loop is a post-test loop. It may be used in special situations when it is known that the loop body should be executed at least once.

    Embarrasing historical note: The first version of FORTRAN implmented FOR loops as post-test loops. As a result, they always executed once even if it made no sense. It took 20 years to correct the situation.
 
 
 

Instruction formats and addressing modes


We now know enough instructions to write reasonable programs. It is time to look at various ways to refer to data, and at how machine instructions are encoded. This will give us insights into what hardware designers have to do, and explain why they have chosen to omit certain reasonable (from the programmers point of view) instructions from the instruction set. We have already seen some of these conversions. First, lets look at MIPS instruction formats.
 

    Instruction formats

    Addressing modes

    Memory refrencing - used with load and store instructions


 
 

Instruction formats


To keep the MIPS processor simple and fast, all instructions fit in 32-bit words, and there are just 3 different instruction layouts. Decoding of instructions starts with the leftmost 6 bits, the operation code. This determines how the rest of the bits will be handled. The formats are:
 

Register


For operations using only registers as operands, the operation code is 000000
 
6 bits
5 bits
5 bits
5 bits
5 bits
6 bits
                   000000
source reg
target reg
Destination
shift amt
function
                   000000
01010 
00111
00101
00000
100010 

The last line of the table gives as an example a subtract instruction, sub $5, $10, $7
 


This diagram shows how data flows from registers to the ALU (Arithmetic Logic Unit) and the result is stored back into the destination register, for the instruction

                                        add   $t3, $t1, $t2



Immediate


Some instructions include a constant value. This constant is included right in the instruction, so it is called an immediate value. In 32 bits, there is room for a 16 bit immediate, 6 bits of op. code, and 2 register numbers (5 bits each). A good example is add immediate (addi), which adds source register + immediate and stores in target register. Since the immediate value is shorter than the register, it is "sign-extended" to form a 32-bit signed number. Thus addi $t1,$t2,0xfffe (shown in table) actually adds 0xfffffffe, which is -2.
 

6 bits 5 bits 5 bits 16 bits
op code source reg target reg immediate value
001000 01010 01001 1111 1111 1111 1110

This layout is also used for load, store, and branch instructions. (see diagram for store)
 

Jump


The jump instructions (j & jal) use all 26 bits following the op code for the address to jump to. Since all instructions are word aligned, the last 2 bits of the
address are always 0, so they are left out of the instruction. Upon execution, the 26 bit jump target is shifted left 2 bits, and stored in the PC, this causes the jump
to take effect.

6 bits - op code 26 bits - jump target (words)
j 0x0040000c [see later]
000010 0x0100003

 
 
 
    Effectively this is a 28-bit address, so jumps can "only" target the first 256 Mbytes of memory. This could be a problem for the operating system of a machine with more that this amount of memory. Such a machine would  probably be manipulating massive amounts of data, and so could reserve the "low" memory for programs. Alternatively, compilers could be directed to use jump-register instructions or branch-always instructions instead of jumps.

Addressing modes


There are a number of ways to refer to data, either as source operands, or destination locations for storage. This section starts with these different methods from the programmers point of view, and discusses how these are encoded in the MIPS instruction formats.
 

Immediate


Source operands can be constants. A constant value is usually encoded directly in the machine language instruction, so that it si available immediately after the instruction is decoded, without fetching it from memory. Understandably, this does not provide any location for storing data.

MIPS instructions encode immediate constants in the lower 16 bits of the immediate instruction layout. For constants larger than 16 bits, the assembler generates 2 machine instructions, the upper half of the constant is loaded into the $at register with the "load upper immediate" instruction.

Examples of instructions with immediate data:

     addi $t0,t1,65
    sub  $t0,7                 #assembled as:
 

    addi $t0, $t0, -7
    li   $t3, 0x12345678      #assembled as
 

    lui  $at, 0x1234
    ori  $t3, $at, 0x5678     #puts it all together
    bgez $t5, 16              #skip 4 instructions ahead if $t5
                              #is non-negative   (4 is encoded in
                              #immediate field)

The last example is also referred to as PC-relative addressing, because the immediate value is (shifted left 2 bits and) added to the PC when the branch is taken. The constant is often referred to as an offset or displacement. It is a signed number, so the new PC value can be before (a loop) or after the current instruction.

This mode is very useful because the operating system can move the program to another part of memory without affecting branch instructions.

Most processors use PC-relative addressing for branch instructions. The number of bits used for the displacement limits the distance to the branch target. In older designs, such as the 6502 and 80x86, 8 bits are used, limiting the displacement to + or - 128 bytes before or after. This usually caused students some distress.
With MIPS, the limit is 128Kbytes. Since branch instructions are usually used within procedures, this is unlikely to cause you any problems!
 

Register addressing


Registers may be used as sources and destinations of data. Access to registers is extremely fast, much faster than fetching data from memory. All the instructions listed above reference 1 or 2 registers as well as the immediate constant. In addition, many instructions reference registers exclusively. Most processors require at least one operand of arithmetic instructions to be in a register. Instructions with all operands in registers (or immediate) are the fastest to execute.

MIPS provides 32 registers, and encourages you to load your data into them and work with it there, resulting in very fast execution. Since it takes only 5 bits to specify a register, as opposed to 32 bits for a memory location, the register instruction layout can easily accomodate 3 register designations.

Examples of register instructions:

    addu $t3, $t1, $t2       # add (unsigned) $t3 := $t1 + $t2
    sub  $t0, $t3            # subtract (signed) $t0 := $t0 - $t3
 

Memory addressing


With 32 bit addresses, a computer can access as much memory as you can afford to buy. Well, up to 4 Gigabytes. You can store lots of data there, and transfer it to and from the processor as needed. In return for large amounts of storage, access is slower. Typically an instruction will allow access to at most one memory address. MIPS, in common with other RISC processors and most supercomputers, restricts this to load and store instructions. There are several ways we could
specify the address:

label - fixed address built in to the instruction

Also called direct addressing, the known address can be built into the instruction as a constant. Programers usually specify this address using a variable name or label in the data segment, the compiler or assembler assigns it a numeric value. Since it is a 32-bit value, a MIPS assembler will break it into two 16-bit immediate quantities, and assemble 2 machine instructions to do the job. Examples:

    lw  $t0, MyNumber       #load the word into $t0
    sb  $t9, firstInitial   #store rightmost 8 bits of $t9 in data segment

indirect - register contains the address

Besides holding data, a register can be used as a pointer, holding an address that points to the data. The data is transferred by putting the register;s contents on the address bus, the data is transferred on the data bus. This is called indirect addressing because the register itself is not the target, rather it points to the target in memory indirectly. A common use of this is to step through a string one character at a time. One first loads a register with the address of the string, and then
uses it to access the characters in succession, incrementing the register each time. For example, suppose we have the string

    catStr:  .asciiz   "cat"

The address can be loaded into a register with the load-address instruction (in machine language it is just like load immediate, except the constant is the address rather than data), and then refer to the characters indirectly

    la  $t0, catStr
    lb  $t1, ($t0)        # 'c' is now in $t1
    addi      $t0, 1      # point to next character
    lb  $t2, ($t0)        # 'a' is now in $t2
 

Base addressing


A variation of indirect addressing is useful for referring to fields of a record or struct. Suppose we store telephone numbers in a record of 4 short integers (16-bit halfwords), holding area code, prefix, 4-digit number, and extension. Then the extension number starts 6 bytes from the beginning of the record. We can get it by adding the offset 6 to a register pointing to the record:

    la    $t0, MyPhone     # $t0 points to a telephone record
    lh    $t1, 6($t0)      # $t1 loaded with the extension # in that record
 

Indexed addressing


Suppose we want to index an array. We know the address of the (start of) the array, and want a register to index it. Now the address is fixed, and the register will hold a variable offset. The following code will copy 10 bytes (the assumed length of both strings, including final 0) from str1 to str2, using $t0 going from 9 downto 0

      li   $t0,9           # t0 indexes arrays
copyloop:
      lb   $t1, str1($t0)  # t1 used to transfer character
      sb   $t1, str2($t0)  #    to str2
      sub  $t0, 1          # decrement array index
      bgez $t0, copyloop   # repeat until t0 < 0

Note that if we were moving words, we would need to decrement the index by 4 instead of 1.

To implement this in MIPS, the assembler needs to produce 3 machine instructions for each indexed instruction, beacuse the 32-bit constant address must be split into 2 16 bit immediate values.
 

The bottom line - MIPS memory addressing


MIPS, in keeping with the RISC philosophy, actually has only one memory addressing mode at the machine level, it corresponds to base addressing. Indirect addressing is simply a special case with offset = 0. Direct and indexed addressing both involve fixed 32-bit addresses, the assembler calculates these numerical addresses and splits them into 16-bit sections. The high order part goes it $at using the lui instruction, the low order part ends up in the immediate field of the
machine load or store instruction. The sb instruction above assembled as follows, str2's address was 0x1001000c (0x1001 = 4097)

0x3c011001  lui $1, 4097 [str2]             ; 7: sb   $t1, str2($t0)
0x00280821  addu $1, $1, $8
0xa029000c  sb $9, 12($1) [str2]

You can figure out how this has the same effect as if we had been able to code directly

            sb $t1, 0x1001000c($t0)

Meanwhile, here is the layout of the actual sb $9, 12($1) base addresssed instruction, and a diagram explaining how the immediate 12 is added to $t0, the sum placed on the address bus, and $t1 (a character) on the data bus.
 

op code sb rs ($1) rt - $9 immediate (offset) 0x000c
101000 00001 01001 0000 0000 0000 1100

MIPS function call and return


When we call a function (procedure, subroutine), we normally want to come back to the place we left, the following instruction. To do so, we need a means of remembering our place, so the function can return to it. Some convention needs to be agreed upon as to where to store a return address. Some conventions that are in use are:
 


MIPS uses register 31 as the return address register

We have actually seen the jump instructions involved in function call and return, they are:
 


As an example, we could have a simple function to print an integer, that takes care of the detail of remembering the system call number, and another to print an end-of-line character:

    print_int:              endline:
           li  $v0, 1               la  $a0, endl  #define in .data as "\n"
           syscall                  li  $v0, 4
           jr  $ra                  syscall
                                    jr  $ra

    These can then be called to output some numbers

        li  $a0, 42        # writeln (42)
        jal print_int
        jal endline
        li  $a0, 1999      # writeln (1999)
        jal print_int
        jal endline

This suffices for a main program calling functions sequentially. However, it does not allow for a function to call another function, so later on we will see how to save return addresses systematically, using a stack.

EXAMPLE CODE:
    SUMS.ASMSUMSHEX.ASM
 

Bit Fiddling operations

 


So far we have been manipulating numbers, in sizes of either bytes or words. In this section, we will look at the operations that are usually provided for working with individual bits. There are several reasons for doing this. One can compress data, using only as many bits as necessary, so several fields can share the same word. We have seen that instruction layouts divide a word up like this. An assembler must put the elements of an instruction together, for example, lining up the
5-bit fields for registers and combining them.

Multiplying and dividing by powers of 2 is also easily done simply by shifting bits. Adding a zero on the right of a binary number multiplies it by two, just as adding a zero on the right of a decimal number multiplies it by 10. A shift left instruction is handier, and often faster, than setting up a divide operation. Let's look at the bit-fiddling operations, and how they are implemented in MIPS instructions.
 

Shifting operations


By shifting we mean moving all the bits in a word either right or left. Think of them as books filling a shelf. If we push them all over, some books will fall off the end, and be lost. There will be space available on the other end. Since a word is always 32 bits, we need to put something in that space. In a logical shift, the empty positions are filled with 0's. If you shift a word 32 or more bits, it will contain all 0's. In a rotate operation, the bits falling out one end are replaced at the other end. (Think of picking up each book as it falls off the shelf, and replacing it at the other end.) As a result, no bits are lost, and rotating a word by 32 bit positions restores its original value.

All processors, so far as I know, have shifting hardware, and instructions, capable of shifting at least 1 bit left or right. There doesn't appear to be any reason to shift a 32 bit word by more that 31 bits, any more than one would need to set an ordinary clock 50 hours ahead. You could just set it 2 hours ahead.

Here are some examples of shift and rotate operations, each starting from the same bit pattern. I hilighted 3 bits, so you can see their new locations. The new filler 0000's are in italics

Before:             11011011011011011011011011111111

Shift right 6      00000011011011011011011011011011

Shift left  5       01101101101101101101111111100000

Rotate left 4       10110110110110110110111111111101 <-- leftmost 4 came here

So, it is possible to get any bit into any position, bringing other bits along the same distance, and either losing bits of the end, or replacing them at the other end.
 

Arithmetic shifts (signed numbers)


Adding a zero on the right of a binary number multiplies it by two, as I mentioned. By shifting a number 1 bit left, each bit has twice the place value it had before, and we bring in a 0 to hold the units place. Multiplication by any power of 2 is possible in this way. For instance, to multiply by 16, shift left 4 bits. Example, 7 * 16

    7 * 16  =  111 * 10000  =  1110000  = 0x70  =  112

Conversely, shifting right divides. Any remainder is thrown away with the discarded bits. 127 / 8, shifting right 3 bits:

    01111111  --->  00001111   =  15

There are some special considerations for arithmetic shifts.
 


-38  =  11111111111111111111111111011010,  divide by 4, shift right aritnmetic 2

gives   11111111111111111111111111110110,  which is -10

Now, you may not quite agree with that answer. It makes mathematicians happy, because it agrees with the remainder of +2, that is, -10*4 +2 = -38. The answer has been truncated in the negative direction. Computer scientists are divided, and in fact some hardware gives this result for division, and some gives -9. In fact the MIPS architecture does not specificy how negative integers should be divided. As for the SPIM simulator, it simply uses the divide instruction of the "host"
processor, so results of divide instructions can vary! Not so when you divide by shifting, you will always get the "mathmatical" result we have just seen.
 

MIPS shift instructions


The format of these instructions is: op code, Destination register, source register, shift amount. The shift amout is usually a constant, 1-31 bits make sense. It may also be contained in a register (variable shift). Examples, source is $t0:

        sll  $t4, $t0, 5     #shift left logical 5 bits (multiply by 32)
        sra  $t5, $t0, 2     #shift right arithmetic 2 bits (divide by 4),
                             #sign extended
        srl  $v0, $t0, 1     #shift right logical 1 bit. Sign bit is now 0
        srlv $v0, $t0, $t1   #shift right logical, $t1 says how many bits
 
 

Type of shift
Left
Right
Logical sll  sllv srl  srlv
Arithmetic (none:   use sll) sra  srav
Rotate rol ror

 
 

Logical operations - Boolean Algebra


Today's digital computers are composed entirely of logical circuits, called gates. These gates use 1 or more transistors to implement the logical operatios AND, OR, and NOT. A variation on (inclusive) OR is eXclusive-OR, usually written XOR. These are also called boolean operations, after George Boole, originator of "The Algebra of Truth."

Boolean algebra deals with two values, FALSE and TRUE. Since boolean values can be encoded in 1 bit, we can do so, and by convention 0 represents FALSE and 1 represents TRUE. The operations can be defined by enumerating all the possible values of operands and results in a "Truth table"

operands
    AND
OR
XOR
0
0
    0
0
0
0
1
    0
1
1
1
0
    0
1
1
1
1
    1
1
0

NOT simply reverses the value. NOT 0 = 1, and NOT 1 = 0

Some laws of boolean algebra, these identities can be verified using truth tables: (In these laws, + means OR, a bar over something means NOT, while AND is shown like multiplication in algebra, by writing two terms together.)

     A 1 = A         A + 0  =  A                   identity
     A 0 = 0         A + 1  =  1                   null
     A A = A         A + A  =  A                   idempotent
     A(A + B) = A    A + AB =  A                   absorption
       _                         _
     A A  = 0                A + A  =  1           inverse
     AB  =  BA               A + B = B + A         commutative
     (AB)C = A(BC)           (A+B)+C = A+(B+C)     associative

     A(B + C) = AB + AC      A + BC = (A + B)(A + C)    distributive
     ____   _   _             _______   _  _
     (AB) = A + B             (A + B) = A  B       de Morgan

NOT (A AND B) = (NOT A) OR (NOT B)    1st deMorgan in Pascal notation
           _         _
A XOR B = (A B) + (A B)               XOR Defined in terms of other operations

The deMorgan laws are of particular importance to programmers, they show an important relationship between AND, OR and NOT. When NOT is distributed over an expression, AND exchanges with OR. This is worth considering when constructing WHILE loops. Here is the proof of one by truth table:

        _____ _ _ _   _
A B A B (A B) A B A + B
0 0 0 1 1 1 1
0 1 0 1 1 0 1
1 0 0 1 0 1 1
1 1 1 0 0 0 0

Compilers will store boolean values in a byte or word holding either a 0 or 1, this wastes some space. An array of boolean may be packed 32 values to a word, and shift instructions used to extract specific values. Usually any non-zero value will evaluate to TRUE because beqz, bnez instructions will be used.

The important point about logical instructions in assembly language is that they operate on all corresponding bits of the operands in parallel. There are no carries or other interaction between different bit positions.

MIPS logical instructions are all the 3 operand format, just like add and sub. Like them, the second source can be a constant. (Well, of course not only has destination and source registers. It is actually implemented by nor (not or) that follows the usual pattern.)
 

Masking


A common use of logical operations is isolating, or "masking" parts of words. Just like a mask covers your face and lets your eyes show through, a bit mask lets some bits show through, and hides others. Think of one operand as the "data" and the other as the "mask". This is a consequence of the identity and null laws above. For the different operations:
 


Example: Find the Source Register in an instruction

0x214afff3 addi $10, $10, -13

Instruction in binary:  00100001010010101111111111110011
Mask:                   00000011111000000000000000000000
Result of AND:          00000001010000000000000000000000
Shift right 21 bits     00000000000000000000000000001010

And we have the source register number, suitable for printing with syscall service 1.
 

Example: Changing case of letters


In order to make case-insensitive comparisons, both items being compared need to be in the same case. Only the letters should be converted, of course. By looking at the ASCII code chart, we see the only difference between the upper case and lower case letters is bit 5. To make a letter:

    lower case, force this bit to be 1, by ORing with  00100000 = 0x20
    upper case, force this bit to be 0, by ANDing with 11011111 = 0xDF
    the opposite case, change this bit, by XORing with 00100000 = 0x20
 

Example: Binary digit to ASCII character


Suppose we believe that register $t0 contains a binary number that can be represented by a single digit, 0..9. (This would be true for the remainder of division by 10.) From the ASCII table, we see that the numeric characters '0' .. '9' are the hex values 0x30 .. 0x39. All we have to do is OR the register with 0x30, and we have the required character, suitable for storing in a string.

                or  $t0, 0x30      #binary number to ASCII character
 

Example: Binary 4 bits to ASCII character for Hexadecimal - '0' .. '9' 'A' .. 'F'


If we want to represent 4 bits as a hex digit in a string, this isn't quite good enough, because the ASCII code 'A" doesnt follow immediately after '9', instead, there is a gap of 7 other characters. The solution is to test and add 7 if necessary. It is also a good idea to make sure only the lowest 4 bits are non-zero. In this coding example, we show what happens to the value 13, in binary 1101, in $t0:

     and   $t0, 0x000f      #00001101  only allow lowest 4 bits non-zero
     or    $t0, 0x30        #00111101
     ble   $t0,'9', numeric #00000111   to be added, since > '9'
        add  $t0, 7         #01000100   ASCII 'D'
numeric:
 
 
 

Example: Convert word to Hex string, 8 digits


This example uses the single digit conversion code above, and fills a string of 8 characters with the correct hex digits. To do this we need to shift the word being converted 4 bits right each time, apply the "binary to hex" conversion, and store the resulting character in the string right to left.

# function to convert word value to a hex string. Returns pointer to the string
#    arguments:
#        input:  a0, word to be converted
#        output: v0, pointer to the string
#    registers used:
#        t2 - working copy of word
#        t1 - point to string characters, right to left
#        t0 - single 4-bit value, converting to character, as above

word2hex:
    la     $v0, str8            #string workspace defined in data segment
    add    $t1, $v0, 8          #point to last character position + 1
    move   $t2, $a0             #copy input, will be shifting $t2
hexloop:
      sub    $t1, 1             #point 1 character left
      and    $t0, $t2, 0x000f   #only allow lowest 4 bits non-zero, converting in $t0
      srl    $t2,$t2, 4         #shift copy of word so next 4 bits are on right
      or    $t0, 0x30           #make it numeric ASCII character, but ...
      ble    $t0,'9', numeric   #IF > '9',
        add  $t0, 7             #must add 7 to get 'A' .. 'F'
numeric:
      sb     $t0, ($t1)         #store hex character in string
      bne    $t1,$v0, hexloop   #UNTIL all 8 characters stored
    jr     $ra                  #RETURN from function call, $v0 points to string

A fully running version using this code is hex1.asm
 

--------------

Function calls, and the stack

(Pascal programmers, please interpret function as "function or procedure". Fortran programmers, as "function or subroutine.") We saw in note6a how a program can call a function using jal, and the function returns using jr $ra. This is obviously not the whole story, because we are still not able to have a function call another function, because that would wipe out the return address in $ra. There is need for a standard method or saving multiple return addresses (and other registers as well). The elegant solution is known as "The Stack."

Case Study: URL encoding

As a non-trivial example, the fields you fill out in forms posted over the internet have to be "URL encoded" to avoid confusion about the structure of the data. (space & % = " for example have special meaning, so these characters need to be represented some other way.) The chosen solution is to encode letters and numbers as themselves, spaces as plus signs (+), and everything else as 3 characters of the form %hh, where hh is the special character's hexadecimal value. For instance, '=' would be represented by %3D, and '+' by %2B. (2B or not 2B: of course this only applies to a real '+' in the data, not one that is representing a space!)

Anyway, this is complex enough that it will pay to structure the problem of producing a URL-encoded string by calling some sub-functions. The ones that come to mind are:

With these defined, the problem can be expressed in pseudo code as a function with 2 arguments, pointers to the plain string, and (space for) the encoded string. A refinement would be to check for enough space. You can see that other functions will need to be called, and that we will need two pointers to step through the two strings, as the encoded string will be larger than the original if it contains any special characters. The next step is to refine this algorithm further. C is the language probably closest to assembler, so I refined it to working C code, before proceeding with assembler.

To start with, let's get the little functions on the table. Here is

bin2hex

# char bin2hex (binary) converts 4 bits (least significant) of input to an
#                       ASCII character giving the hex value. '0'-'9' or 'A'-'F'
# registers:
#       $a0, input argument, unchanged
#       $v0, char value returned
#       no other registers used

bin2hex:
        andi    $v0, $a0, 0x0f  # mask off 4 lower bits
        ori     $v0, $v0, 0x30  # make it ASCII digit
        ble     $v0, '9', numeric       #if hex value A..F, then
          add   $v0, 7                  # add 7, since there are 7 characters
numeric:                        # between '9' and 'A'
        jr      $ra             # RETURN


AlphaNumeric

Evaulates boolean expression (ch >= 'A' and ch <= 'Z') or (ch >= 'a' and ch <= 'z') # or (ch >= '0' and ch <= '9')

This could be done "on the fly" by careful arrangement of 6 branch instructions. There is a more reliable way, now that we know about the logical instructions. It involves a new bunch of instructions, the "set" instructions, which are similar to "branch" in that they compare two operands, but instead of branching, they store the boolean result in the desination register. We then use boolean instructions with these values. Here is how the code starts, ch in $a0:

        sge     $t5,$a0,'A'
        sle     $t6,$a0,'Z'
        and     $t7,$t5,$t6     #(ch >= 'A' and ch <= 'Z')
This being RISC, it turns out that some very ugly machine code is generated by SPIM, because the only machine instructions available are slt, beq and bne. Normally the best policy is to let the assembler or compiler do its work, even if the resulting code may be less than optimal. Compilers are getting very good at optmizing their code. In this case, since you may be looking at this code in SPIM, I have optmized slightly by getting rid of the equal part of the comparisons, because we can test for strict inequality by adding or subtracting 1 to the constant. (Due to a bug in the simulator, it handles '9'+1 fine, but muddles up 'A'-1). Anyway, here is the whole thing:
#############################################################
# boolean function AlphaNumeric (ch)
#       $a0: argument ch, unchanged
#       $v0: boolean value output, TRUE (1) if ch is letter or number
#       Uses registers t5, t6, t7, t8
#############################################################
AlphaNumeric:
        sgt     $t5,$a0,0x40    #ch > 'A'-1
        slt     $t6,$a0,'Z'+1   #ch < 'Z'+1
        and     $t7,$t5,$t6     #(ch >= 'A' and ch <= 'Z')
        sgt     $t5,$a0,0x60    # ch < 'a'-1
        slt     $t6,$a0,'z'+1
        and     $t8,$t5,$t6     #(ch >= 'a' and ch <= 'z')
        or      $t8, $t8,$t7    # $t8 = big or small letter
        sgt     $t5,$a0,0x2f    #  ch > '0'-1
        slt     $t6,$a0,'9'+1
        and     $t7,$t5,$t6     #(ch >= '0' and ch <= '9')
        or      $v0, $t8,$t7    # $v0 = AlphaNumeric truth value returned
        jr      $ra









Conventions for register usage

As soon as several program units (main and functions) are in use, it gets more difficult for humam programmers to keep track of registers in use. For one thing, it may be hard to remember just where to put arguments to a function, for another, a function we call may "trash" a register we were using. The following conventions are recommended: Each of the functions given above follows these conventions. Both expect an argument in $a0, and return a result in $v0. This makes it easier for us to remember how to use them. In addition their use of the $t registers is documented. This makes it easier for designing URLencode. Since it is both called and a caller, it would need to use the $s registers for values saved past calls, but would need to save and restore those it used, because there is no telling which $s registers might be in use. Fortunately, we can see that only $a0, $v0, and $t5-$t8 are used by the functions we will call.

Therefore, we only need to save $ra (we want to return to our caller, obviously!) and $a0 (to be nice to our caller, who may expect the pointer to the plain string to still be there.) Now, lets consider where we can store these things:


1. We could declare variables in the data segment for each register to be saved by each function. This gets messy. And it doesnt work if a function calls itself.

The Stack

2. Use a stack Like a stack of papers, a stack is known to accountants as a "Last In, First Out" (LIFO) data structure. This is ideal for functions as typically they need to save some registers on the way in, and retrieve the values on the way out. As functions call each other, they pile up more items on the stack, as each one returns, they (must!) remove the values they piled on. This is such a useful method that all operating systems set up space for a stack for programs, and initialize a stack pointer.

By custom, computer stacks grow "downward" from higher to lower memory addresses. The stack pointer always points to the top word on the stack. The permitted stack operations are:

These customary terms alude to a stack of plates on a cafeteria counter, the kind that uses a spring pushed down by the weight of the plates. When plates are removed, the stack "pops" up. Ideally, it is impossible to take any but the top plate off the stack.

This figure shows the operations on a stack. In any 32-bit machine, a 32 bit word is always pushed or popped, so the sp is always kept a multiple of 4.

PUSH adds a plate to stack, POP removes top one

CISC processors typically have instructions that implicitly manipulate the stack pointer, usually named SP. They are PUSH, POP, CALL and RET. The last two store and retrieve return addresses directly to and from the stack.

In MIPS, the stack pointer is $29, named $sp. It is only made special by the fact that the operating system sets up the stack for you, with $sp pointing to the initial top of stack. To push, the sequence is

Pop reverses this order, first retrieving the value previously pushed, then incrementing the stack pointer: The stack is a last-in-first-out data structure. This means you need to pop items in the reverse order they were pushed. It is generally important to pop precisely the items that were pushed, rather than abandoning them.

Stack is LIFO


With all this in mind, and remembering that we had decided that URLencode should save both $ra and $a0, the beginning and end of the function looks like this:
URLencode:
# function URLencode (plain, encoded, maxlength)
########  entry code: save registers
        sub     $sp, 4          # PUSH return address
        sw      $ra, ($sp)
        sub     $sp, 4          # PUSH  a0
        sw      $a0, ($sp)      #   (be nice, restore caller's register)
########################
While the exit code looks like this. Note that the order of POPs is reversed from the order of PUSHes. Once the stack is used, it is extermely important to exit in this way. The stack must have exactly the same contents upon return from a function that it had upon entry. Other functions depend on this!
############ exit code: restore registers, POP in reverse order to PUSH
endURL:
        lw      $a0, ($sp)      # POP start of string addr
        add     $sp, 4
        lw      $ra,($sp)       # POP return address
        add     $sp, 4
        jr      $ra             ##### return, using newly restored $ra

Putting it all together

Finally, we are in a position to write the whole code for URLencode. Have a look. Sandwiched between the entry and exit code above, is simply a loop to go through the input string. In addition to the pointer moving through that string, we need another pointer to move through the output string. I have used $t0 and $t1 for these, and $t2 for the individual characters. Inside the loop is a 3 way IF. One of the conditions calls AlphaNumeric to get its truth value, another (for special characters) makes 2 calls to bin2hex.

The whole thing, including a main program to test it out, has a few extra refinements, added after the basic algorithm was implemented. A third argument, $a2, specifies the maximum space available for the output string. This is good safe programming practice, as otherwise other variables could be overwritten, with usually bad results. At the beginning of each successive loop iteration, we make sure there is space for 4 characters, %hh, and a null byte.


Recursion

Recursion is a powerful mathematical technique, in which a function is defined in terms of itself. The standard example is the factorial function, the product of all integers up to its argument, written 3! = 3*2*1.

It is defined recursively as:

In recursion, there must always be at least one base case, for which no further recursion is required. Otherwise, the recursion would be infinite. A recursive function must test for this base case with an IF statement. Programmers who are used to loops often try to make this test part of a loop structure, this is incorrect. The recursive calls of factorial behave much like a loop, but the method is more general, and no FOR or WHILE statements are needed.

Implementation

A recursive function is implemented the same way as any other function that calls another function:

Historical note:

We didn't always know how to do this. In 1959, when the first version of FORTRAN was released, recursion was explicitly prohibited, because the original return address would be lost. In 1960 C. A. R. Hoare published the Quicksort algorithm, a very efficient recursive sorting method. His paper describing the implementation (in assembly language) mentions the use of "a unique data structure called a nest, which has the property that the last thing put in is the first thing taken out." Sound familiar? If the computer science world had been paying attention, we would be calling them nests instead of stacks today. Fortran didn't formally allow recursion until the Fortran 90 standard, 30 years later.

Factorial function

Here is a recursive implementation of factorial, first in C, then in assembly:
int factorial (int n){
    if (n < 2) return 1;
    return (n * factorial (n-1));  /* n! = n * (n-1)! */
}
factorial:
    bgtz  $a0, doit
        li   $v0, 1      # base case, 0! = 1
        jr   $ra
doit:
    sub  $sp,8           # stack frame 
    sw   $s0,($sp)       # will use for argument n   
    sw   $ra,4($sp)      #  return address

    move $s0, $a0        #  save argument
    sub  $a0, 1          #  n-1
    jal  factorial       #  v0 = (n-1)!
    mul  $v0,$s0,$v0     #     n*(n-1)!

    lw   $s0,($sp)       # restore registers from stack
    lw   $ra,4($sp) 
    add  $sp,8
    jr   $ra
Note that we used $s0 to save the value of n, after first saving its former value. And of course we save $ra. The result is that all the successive values of n are stacked up, and then used in the multiply instruction. Also, $a0 wasn't saved, and it will be 0 when we are all done.

Excescise: Modify this code so that $a0 is restored, by pushing and popping it. You shouldn't have to use $s0 or any more stack space than at present, but be careful where your pushes and pops go.

The towers of Hanoi

The standard non-trivial example of recursion is the "Towers of Hanoi" game. In this game there are 3 pegs, upon one of them is a stack of rings of different sizes, forming a tower. The challenge is to move the entire stack to another peg, following these rules:
  1. Move only one ring at a time
  2. A ring may never sit on top of a smaller ring
This takes a lot of time. According to legend, there is a game of 64 rings going on somewhere, with one ring being moved each second. When the new tower is complete, the world (universe) will come to an end. We have nothing to worry about, since adding a single ring to the game makes the time taken twice as long, plus one second, since the recursive procedure for moving a stack of rings from a source peg to a destination peg, using the third peg as a spare, is:
MoveStack(pegs: soruce, destination, spare; integer numberrings)
    MoveStack (source, spare, destination, numberrings -1)  #get all but bottom ring out of the way
    MoveRing  (source, destination)                         #move the bottom one
    MoveStack (spare, destination, source)                  # and pile the rest on top
This can be implemented by using some integer or pointer to identify the individual pegs, and has been in the programs

Quicksort

Quicksort is a similar recursive algorithm, that is quite fast, in contrast to the towers. The idea is simply to break an array to be sorted into two arrays, and sort each of these. The tricky coding is in:

Partition:

Choose an array element at random, called the pivot element. Form two arrays:

Then just Quicksort each of these smaller arrays

The base case is an array of 1 element need not be sorted, as it cannot be out of order.

Finally, stick the smaller arrays and pivot together. Hoare defined Partition (for arrays) so that this is automatic.(No, I am not going to code it in assembly language in this millenium.)

------

Floating point operations (3.14159)

So far we have only considered numeric operations with integers. Sometimes we want to represent numbers with fractional parts, or very big or very small numbers. In addition, physical measurements cannot be totally precise, rather they can only be made to a certain degree of accuracy. For example, a book published in 1950 states that a river is 120,000,000 years old. It makes little sense to say that today that river is 120,000,049 years old! (The situation is quite different for a bank, it must keep an accurate total of its deposits, which might be $120,000,049.98)

Very big and very small numbers can be expressed more conveniently in "scientific" notation, using some significant digits multiplied by a power of 10, for instance

       1.2×108     0.123456×10-27     -30.09×1030
Digital computers can store numbers of this sort using the binary equivalent of scientific notation, that is, a sign bit, a sequence of significant binary digits (called the mantissa), and the location of the "binary point" relative to these, in the form of a power of 2 (called the exponent). These are called floating point numbers, because the point "floats" around in (and beyond) the digits. Historically, many different schemes for storing these 3 quantities have been used. More recently, an IEEE standard has been published for 32 and 64 bit floating point numbers. (As well as some other sizes.) Hardware manufacturers now tend to make floating point arithmetic units to operate on numbers stored according to this standard. In any case, if we are provided with a set of instructions for performing the operations, the details of the storage scheme need not concern us.

The important points to remember are:


MIPS floating point operations

Like most processors of its time, MIPS is designed to accomodate one or more coprocessors, other chips that share the processing load. Coprocessor 1 will usually be a floating point coprocessor, and this is what SPIM simulates. There are 2 sizes of floating point numbers: The coprocessor contains 16 FP registers, named $f0 - $f15. Each FP register can hold either a single or a double.

Input and output:

SPIM provides system services for floating point I/O: Examples of usage can be found in the program circle.a

Load and store

Data (normally floating) can be moved between memory and FRegisters. Floating constants can be declared in the data segment with the .float directive. These are pseudo-instructions. The data actually travels through the processor.
For example:
--------------
pi:   .float  3.14159             # assemble a single float in data segment

Floating point arithmetic operations

The expected operations are provided The pattern is familiar, 3 registers (this time in the coprocessor): destination := source1 (op) source2

In addition, these 2-operand operations are available:

Conversion: Integer <---> Floating

No hardware exists t do arithmetic on a mixture of numeric types. If you want to add an integer to a float, for example, one of them must be converted to the other's type. Likewise single <--> double conversion must be done to match types.

Type conversions are the responsibility of the Floating point coprocessor, and are done with data in the F-registers.The instructions are of the form

which means: convert to single floating, the integer word in FRsource, storing result in FRdest. Note the right-to-left pattern.

Converting integer to floating transforms the bit pattern used, but preserves the numeric value of small integers (up to about 6 digits). Converting from floating to integer, on the other hand, causes any fractional part to be lost (truncated), and if the floating number has a very large magnitude, perhaps its value cannot be stored as an integer.

In this example, we convert pi = 3.14159 to an integer, with the value 3:

Moves to and from the coprocessor

Since conversions are done by the coprocessor, it is necessary to move integer data back and forth between processor general registers and coprocessor registers. This movement does not change the bit patterns, so the data must be converted before being used for operations. The programmer must keep track of the type of data in each coprocessor register. It ia also possible to have single floating data in the general registers, but not very useful.

The instructions for moving data between general registers and coprocessor registers are:

The instruction set allows for multiple coprocessors, for now we are only using coprocessor one, the floating point coprocessor. Note that the general register is always listed first, just like in load and store instructions. Generally, we should be moving integer data.

Example: continuing from above, to calculate the radius of the circle (diameter/2), from the diameter in $f0:

Compare (and branch) with floating numbers

We can compare floating values, and branch depending upon the result. The compares are done by the coprocessor, which generates a "status" of true or false. Then the special branch instructions, "branch on coprocessor condition flag" are used to effect the branch.

The compare instructions have the form   c.condition.s  FR, FR   with 3 of the expected 6 conditions implemented. Instead of .s you may also use .d   The 3 forms are:

Each compare instruction compares the two operands and then sets the coprocessor status, which is then tested by the branch instructions:
 
bc1t  label  branch if coprocessor 1 condition flag true
bc1f  label  branch if coprocessor 1 condition flag false

Example:

# IF x > y THEN k = 1 ELSE k = 2    x in $f1, y in $f2
     c.lt.s  $f2, $f1         # y < x  is same as  x > y
     bc1f  else
           li  $t0,1          #THEN k = 1  (in $t0)
           b   endif
else:
           li  $t0,2          #ELSE k = 2
endif:
All the possible comparisons can be obtained either by reversing the operands, as we did here, or choosing the opposite branch instruction.

A final word of caution about equality

Thats all you need to know to program floating point operations in (this) assembly language. However, let us remember that floating point calculations are not exact, and therefore it is not a really good idea to expect exact equality of two different computations, even if they should be theoretically equal. This caution holds regardless of the computer language you are using. For instance, adding 1/10 to itself 10 times may not be exactly equal to 1.0! It is more likely to be equal if the additions are done in double, and the result is converted to single for comparing, but even this is not foolproof.

(x == 1.0) is more safely written as abs(x-1.0) < 0.000001
 
 

Other processor architectures and assembly languages

Every processor has its own instruction set and assembly language. This language is specific to the architecture, especially the registers, of the processor. Nevertheless, there is a family resemblance between instruction sets and assembly languages. In this section we look at some examples, to appreciate the similarities and differences.

You will observe that in all instruction sets, there are groups of instructions for: Moving data, arithmetic, jump, compare, branch, logical, and shift. There also may be operations affecting the stack, and for input/output.

A) Processor size, the external view

The ability of a processor to access memory is directly related to speed and to two sets of wires, the "Data bus" and the "Address bus."
Size Examples Data Bus Address Bus Address Space
8 6502, 6809, Z80 8 16 64 K
16 8086 16 20 1 M
68000 16 24 16 M
32 MIPS, SPARC,68020 32 32 4 G
80386, Pentium 32 46 32 T

B) The internal view: Registers and instruction sets.

As programmers, we are interested in the registers that are available to store and manipulate data in the processor, and the instructions that are available to do so. We need to know the size and number of registers, and which operations are allowed on them. In an orthogonal instruction set, such as that of MIPS, all registers are treated equally. At the other extreme, each register has its own special purpose. In the middle is the 68000, with 8 data registers and 8 address registers.

All processors have a program counter, usually named PC. It needs to have the same effective size as the Address Bus. Then they have some "general purpose" registers that can hold addresses and data. In addition, the intel 80x86 series (including the Pentium) have 4 or 6 "Segment" registers, which are used to extend the address space.

Processor Register
Size
Number of 
General regsites
6502 8 4
8086 16 8
68000, 68020 32 16
MIPS, SPARC* 32 32*
80386, Pentium 32 8
* accessible at one time. SPARC rotates a register window on function call and return, giving each function 8 local registers.

6502

The 6502 is an 8-bit processor that was widely used in the 1970's for the first personal computers, such as Apple and Commodore. It has 4 registers, 8 bits each. Arithmetic operations mostly use A as one source operand and as destination. The other operand is either immediate or from memory, using direct, indexed, or indirect-indexed addressing. Subroutine calls push the 16-bit return address on the stack, and returns take it off. PHA and PLA push and pop the accumulator, respectively.

Here is some sample code, for the well known problem of printing 2 hex digits representing a byte initially passed as an argument in the accumulator, A:

BIN2HEX         ; Convert 0..15 in A to Hex digit
        CMP     #0A     ; immediate hex. Compare sets status flags
        BCC     $1      ; Branch less than since 6502 leaves C=0 to indicate borrow
        ADC     #6      ; adds 7 since Carry is set
$1      ADC     #30     ; now C=0, we have ASCII character
        RTS             ; return with it in A

OUTHEX          ; Output byte in A as 2 hex digits (ASCII chars)
        PHA             ; push a copy on stack
        LSR  A          ; Process left nibble
        LSR  A          ; by shifting right 4 bits
        LSR  A
        LSR  A
        JSR     BIN2HEX ; call the above
        JSR     OUTCHAR ; given output routine
        PLA             ; pop original byte from stack
        AND     #0F     ; mask to get right nibble
        JSR     BIN2HEX
        JSR     OUTCHAR
        RTS             ; all done
Here is another example, to write the string "Hello World!" stored at label HELLO, using X as a counter and Y as an array index. This is called direct indexed addressing, it is equivalent to MIPS hello($t0) when $t0 indexes a string.
        LDX     #12     ; print 12 chars
        LDY     #0
LOOP
        LDA     HELLO,Y ; String starts at HELLO, indexed by Y
        JSR     OUTCHAR
        INY
        DEX             ; decrement counter 
        BNE     LOOP    ; and test for 0

80x86

The Intel 8086 was chosen by IBM for their first PC because it had just become available, and broke the 64K address space limitation by the introduction of Segment registers. The addresses stored with instructions are still 16 bits, but they are relative to the start of a segment. This works very well as long as a program can operate with data and code segments of a maximum of 64K. Over the years, as software has become more complex, techniques to overcome the limitations of the 8086 have proven cumbersome. The 80386 was the first 32-bit processor of the series. It increased the size of the general registers from 16 to 32 bits, added 2 additional segment registers, and a new mode of using them to overcome the 1 Meg. limitation of the 8086. It also retains the 8086 mode of operation, so older software can continue to run. It took about 15 years after its introduction for Microsoft to produce an operating system that made full use of the 32 bit design.

The 4 16 bit "general purpose" registers are named AX, BX, CX, and DX. Each is divided into two 8-bit halves, named AH and AL, BH... etc. so that they can be used as 8 8-bit registers if desired. AX is the accumulator, BX may be used as an indirect address, CX as a counter, and DX for I/O.

4 more 16-bit registers are used primarily to hold addresses. They are named SI, DI, BP, and SP. SP is the stack pointer.

Move, arithmetic, and logical instructions have 2 operands, destination and source. The destination is also the first operand of instructions like ADD. One operand must be a register, the other can be a register, memory, or immediate (source only.) For example:

          ADD   AX, mynumber
          MOV   sum, AX
adds the contents of AX to the 16 bits at memory location mynumber, and stores the result in AX, then moves it into memory location sum. This is the right-to-left pattern we are familiar with. (68000 and SPARC use left-to-right order!)

Here is the byte printed as 2 hex characters example, in 8086 assembly language. Note: this assembler is not case sensitive.
BIN2HEX:                ; Convert 0..15 in AL to Hex digit
        CMP     AL, 0AH ; immediate hex. Compare sets status flags
        JB      numeric ; Branch "below" = less than, unsigned
        ADD     AL, 7   ; adds 7 to skip 7 chars between '9' and 'A'
numeric:
        ADD     al,30h  ; make it ASCII character
        RET             ; return with it in AL

OUTHEX:         ; Output byte in AL as 2 hex digits (ASCII chars)
        PUSH    AX      ; push a copy on stack  (must push 16 bits)
        MOV     CL,4    ; Shift count of 4 bits (must use CL for this)
        SHR     AL,CL   ; Process left nibble
                        ; by shifting right 4 bits
        CALL    BIN2HEX ; call the above
        CALL    OUTCHAR ; given output routine
        POP     AX      ; pop original byte from stack
        AND     AL,0Fh  ; mask to get right nibble
        CALL    BIN2HEX
        CALL    OUTCHAR
        RET             ; all done