# PIPELINED DESIGN APPROACH TO MICROPROCESSOR ARCHITECTURES A PARTIAL IMPLEMENTATION: MIPS™ PIPELINED ARCHITECTURE ON FPGA

## A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF MIDDLE EAST TECHNICAL UNIVERSITY

ΒY

# MUZAFFER CAN ALTINİĞNELİ

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN ELECTRICAL AND ELECTRONICS ENGINEERING

DECEMBER 2005

Approval of the Graduate School of Natural and Applied Sciences

Prof. Dr. Canan ÖZGEN Director

I certify that this thesis satisfies all the requirements as a thesis for the degree of Master of Science.

Prof. Dr. İsmet ERKMEN Head of Department

This is to certify that we have read this thesis and that in our opinion it is fully adequate, in scope and quality, as a thesis for the degree of Master of Science.

Prof. Dr. Hasan GÜRAN Supervisor

| Examining Committee Members         |            |  |
|-------------------------------------|------------|--|
| Assist. Prof. Dr. Cüneyt BAZLAMAÇCI | (METU, EE) |  |
| Prof. Dr. Hasan GÜRAN               | (METU, EE) |  |
| Dr. Ece (GÜRAN) SCHMIDT             | (METU, EE) |  |
| Assist. Prof. Dr. İlkay ULUSOY      | (METU, EE) |  |
| M.S. Eng. Murat ŞANSAL              | (ASELSAN)  |  |

I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original to this work.

Name, Last Name: Muzaffer Can ALTINİĞNELİ

Signature :

# ABSTRACT

#### PIPELINED DESIGN APPROACH TO MICROPROCESSOR ARCHITECTURES A PARTIAL IMPLEMENTATION: MIPS™ PIPELINED ARCHITECTURE ON FPGA

ALTINİĞNELİ, Muzaffer Can M.S, Department of Electrical and Electronics Engineering Supervisor: Prof. Dr. Hasan GÜRAN

September 2005, 120 Pages

This thesis demonstrate how pipelining in a RISC processor is achieved by implementing a subset of MIPS R2000 instructions on FPGA. Pipelining, which is one of the primary concepts to speed up a microprocessor is emphasized throughout this thesis. Pipelining is fundamentally invisible for high level programming language user and this work reveals the internals of microprocessor pipelining and the potential problems encountered while implementing pipelining. The comparative and quantitative flow of this thesis allows to understand why pipelining is preferred instead of other possible implementation schemes. The methodology for programmable logic development and the capabilities of programmable logic devices are also given as background information. This thesis can be the starting point and reference for programmers who are willing to get familiar with microprocessors and pipelining.

Keywords: Microprocessor, MIPS, Pipelining, FPGA

### MİKRO İŞLEMCİLERDE PIPELINED DİZAYN YAKLAŞIMI MIPS™ PIPELINED İŞLEMCİ MİMARİSİNİN FPGA ÜZERİNDE KISMI BİR UYGULAMASI

ALTINİĞNELİ, Muzaffer Can Yüksek Lisans, Elektrik Elektronik Mühendisliği Tez Yöneticisi: Prof. Dr. Hasan GÜRAN

Eylül 2005, 120 Sayfa

Bu çalışmada, RISC işlemcilerde "Pipelining" konusu, FPGA üzerinde MIPS R2000 komut setinin bir kısmı tamamlanarak açıklanmıştır. Çalışma boyunca, Mikro İşlemcilerin hızlarının arttırılması konusunda temel bir unsur olan "Pipelining" konusu üzerinde durulmuştur. Temel olarak "Pipelining" işlevi, yüksek seviyede programlama yapan kişilere görünmezdir. Bu çalışma "Pipelining" işlevinin ayrıntılarını ve bu işlev gerçekleştirilirken karşılaşılan problemleri ortaya koymaktadır. "Pipelining" dışındaki diğer tasarım yaklaşımlarının neden uygulanamaz oldukları, bu tezin karşılaştırmalı ve nicel akışı sayesinde anlaşılabilir. Donanım tasarımında temel alınan metodolojiler ve donanımların kabiliyetleri hakkında tez boyunca bir alt yapı oluşturulmaya da çalışılmıştır. Bu tez, Mikro İşlemciler ve "Pipelining" işlevi ile tanışıklık kazanmak isteyen programcılar için bir başlangıç ve referans noktası olabilir.

Anahtar Kelimeler: Mikro İşlemci, MIPS, Pipeline, FPGA

To My Generous Family

# ACKNOWLEDGMENTS

I owe much gratitude to my Advisor, Professor Dr. Hasan Güran, for inspiring me to carry out this thesis. His criticism and suggestions bring this work to this point and I am always aware during our work that this thesis is first of all for my benefit.

Everyone working at ASELSAN deserve my thanks, especially Erdinç Atılgan, Kemal Burak Codur and Murat Şansal. They guided me to right technical people, supported me technically and mentally during my work. This thesis ended up with an implementation because ASELSAN gave the hardware support without waiting any outcome despite it is a commercial organization.

I also owe lots to my father, mother and sister. They interested in all of my needs while I was embedded to my work. I also grasp the idea of being a family in addition to fundamentals of pipelining in microprocessors at the end of this work.

# TABLE OF CONTENTS

| PLAGIARISMi<br>ABSTRACTiv                                            |
|----------------------------------------------------------------------|
| ÖZ                                                                   |
| ACKNOWLEDGMENTS                                                      |
| TABLE OF CONTENTS                                                    |
| LIST OF TABLES                                                       |
| LIST OF FIGURES                                                      |
| LIST OF ABBREVIATIONS                                                |
| CHAPTER                                                              |
|                                                                      |
| 1. INTRODUCTION                                                      |
| 2. BACKGROUND AND MOTIVATION                                         |
| 2.1. Programmable Logic Design                                       |
| 2.1.1. History of Programmable Logic                                 |
| 2.1.1.1. Simple Programmable Logic Device (SPLD)                     |
| 2.1.1.1.1. Programmable Logic Array (PLA)                            |
| 2.1.1.1.2. Programmable Array Logic (PAL)                            |
| 2.1.1.2. Complex Programmable Logic Device (CPLD)                    |
| 2.1.1.3. Field Programmable Logic Gate Array (FPGA)                  |
| 2.1.2. Basic Design Process                                          |
| 2.2. Integrated Software Environment (ISE <sup>™</sup> ) 10          |
| 2.3. Virtex™ FPGA13                                                  |
| 2.3.1. Function Generation Capabilities of CLB                       |
| 2.3.2. Distributed (Shallow) Memory Usage of CLB 15                  |
| 2.3.3. Shift Register Configuration of CLB                           |
| 2.3.4. Arithmetic Capabilities of CLB 15                             |
| 2.4. PCI Host Software: In-Circuit Debugging of the Architecture 16  |
| 3. RELATED RESEARCH                                                  |
| 3.1. MIPS R2000 Instruction Set Architecture (ISA)                   |
| 3.2. MIPS Instructions and MIPS Assembly Language                    |
| 3.2.1. MIPS Instruction Format                                       |
| 3.2.2. MIPS Addressing Modes                                         |
| 3.2.3. MIPS Instruction Decoding                                     |
| 3.3. Survey of Instruction Set Architecture Implementation Scheme 24 |
| 3.3.1. Single Cycle Implementation Scheme                            |
| 3.3.2. Multi Cycle Implementation Scheme                             |
| 3.3.3. Pipelined Implementation Scheme                               |
| 3.3.4. Quantitative Comparison of Implementation Schemes 31          |

| 3.4. Problems and Solutions in Pipelined Architectures                       | 33  |
|------------------------------------------------------------------------------|-----|
| 3.4.1. Structural Hazards                                                    | 33  |
| 3.4.2. Brach Hazards                                                         |     |
| 3.4.3. Data Hazards                                                          |     |
| 3.4.4. Exception Hazard                                                      | 37  |
| 4. IMPLEMENTATION OF MIPS PIPELINED ARCHITECTURE                             |     |
| 4.1. Internal Structure of the Processor                                     |     |
| 4.1.1. Instruction Fetch Unit                                                |     |
| 4.1.1.1. Input/Output Signals of Instruction Fetch Unit                      |     |
| 4.1.1.2. Function of Instruction Fetch Unit                                  |     |
| 4.1.2. Instruction Decode Unit                                               |     |
| 4.1.2.1. Input/Output Signals of Instruction Decode Unit                     |     |
| 4.1.2.2. Function of Instruction Decode Unit                                 |     |
| 4.1.3. Forwarding and Hazard Detection Unit                                  | 49  |
| 4.1.3.1. Input/Output Signals of Forwarding and Hazard                       |     |
| Detection Unit                                                               |     |
| 4.1.3.2. Function of Forwarding and Hazard Detection Unit                    |     |
| 4.1.4. Control Unit                                                          |     |
| 4.1.4.1. Input/Output Signals of Control Unit                                |     |
| 4.1.4.2. Function of Control Unit                                            |     |
| 4.1.5. Execute Unit                                                          |     |
| 4.1.5.1. Input/Output Signals of Execute Unit                                |     |
| 4.1.5.2. Function of Execute Unit                                            |     |
| 4.1.6. Data Memory Unit<br>4.1.6.1. Input/Output Signals of Data Memory Unit | 57  |
| 4.1.6.2. Function of Data Memory Unit                                        |     |
| 4.1.7. Exception Detection Unit                                              |     |
| 4.1.7.1. Input/Output Signals of Exception Detection Unit                    |     |
| 4.1.7.2. Function of Exception Detection Unit                                |     |
| 4.1.8. Register Blocks between Stages of Processor                           |     |
| 4.2. External Structure of the Processor                                     |     |
| 4.2.1. External Monitoring of the Processor                                  |     |
| 4.2.2. External Manipulation of the Processor                                |     |
| 5. VERIFICATION OF MIPS PIPELINED ARCHITECTURE                               |     |
| 5.1. Verification of Correct Operation of Instructions                       |     |
| 5.2. Verification of Hazard Detection and Handling                           |     |
| 5.3. Verification of Exception Handling                                      |     |
| 6. CONCLUSIONS AND FUTURE WORK                                               | 91  |
| REFERENCES                                                                   |     |
| APPENDICIES                                                                  |     |
| A. IMPLEMENTED SUBSET OF MIPS R2000 ISA                                      | 96  |
| B. MIPS MONITOR SOFTWARE                                                     | 105 |

| C. FLOW DIAGRAMS ARCHITECTURE ELEMENTS            | 112 |
|---------------------------------------------------|-----|
| Instruction Fetch Unit Flow Diagram               | 112 |
| Instruction Decode Unit Flow Diagram              | 113 |
| Forwarding and Hazard Detection Unit Flow Diagram | 114 |
| Instruction Execute Unit Flow Diagram             | 115 |
| Instruction Execute Unit Flow Diagram (continued) | 116 |
| Data Memory Unit Flow Diagram                     | 117 |
| Exception Detection Unit Flow Diagram             | 117 |
| Register Block Unit Flow Diagram                  | 118 |
| D. LAYOUT OF BOARD                                | 119 |
| E. RESOURCES IN THIS THESIS                       | 120 |

# LIST OF TABLES

# TABLE

| 3.1: Calculation of CPI for Multi Cycle Implementation Scheme     | . 32 |
|-------------------------------------------------------------------|------|
| 3.2: Instruction Time Calculation for Implementation Schemes      | . 33 |
| 4.1: Forwarding Mechanism for Register Bank Primary Port          | . 50 |
| 4.2: Forwarding Mechanism for Register Bank Secondary Port        | . 50 |
| 4.3: ID_Control Signal Fields                                     | . 53 |
| 4.4: EX_Control ALUOp Signal Values                               | . 53 |
| 4.5: Base Addresses of Processor's Internal Signals               | . 63 |
| 5.1: Verification of Correct Instruction Operation                | . 68 |
| 5.2: Timing Diagram for Instruction Operation Verification        | . 70 |
| 5.3: Verification of Hazard Detection and Handling                | . 77 |
| 5.4: Timing Diagram for Handling Hazard Verification              | . 79 |
| 5.5: Verification of Exception Handling "ADDU" and "ADD"          | . 83 |
| 5.6: Timing Diagram for Exception Handling of ADDU and ADD        | . 84 |
| 5.7: Verification of Exception Handling "SUBU" and "SUB"          | . 85 |
| 5.8: Timing Diagram for Exception Handling of SUBU and SUB        | . 86 |
| 5.9: Verification of Exception Handling "ADDIU" and "ADDI"        | . 87 |
| 5.10: Timing Diagram for Exception Handling of ADDIU and ADDI     | . 88 |
| 5.11: Verification of Exception Handling Undefined Instructions   | . 89 |
| 5.12: Timing Diagram for Undefined Instruction Exception Handling | . 90 |
| A.1: MIPS Registers                                               | 104  |
|                                                                   |      |

# **LIST OF FIGURES**

# FIGURE

| 2.1: PLA Architecture                                             | 6    |
|-------------------------------------------------------------------|------|
| 2.2: PAL Architecture                                             |      |
| 2.3: CPLD Architecture                                            |      |
| 2.4: FPGA Architecture                                            |      |
| 2.5: Basic Design Flow in FPGAs, ©Xilinx                          |      |
| 2.6: MIPS Project Properties Window                               |      |
| 2.7: MIPS Project Source File Listing                             |      |
| 2.8: Virtex Architecture Overview ©Xilinx                         |      |
| 2.9: Function Generator Configuration of CLB.                     |      |
| 2.10: Carry Logic Diagram ©Xilinx                                 |      |
| 2.11: Multiplier Implementation ©Xilinx                           |      |
| 2.12: MIPS Monitor Software                                       |      |
| 3.1: MIPS Instruction Format                                      |      |
| 3.2: Immediate Addressing Mode                                    |      |
| 3.3: Register Addressing Mode                                     |      |
| 3.4: Base Addressing Mode                                         |      |
| 3.5: PC Relative Addressing Mode                                  | 23   |
| 3.6: Pseudo Direct Addressing Mode                                | 23   |
| 3.7: MIPS Opcode Map and Frequency of Instructions                | 24   |
| 3.8: Single Cycle Implementation Scheme ©[COD98]                  |      |
| 3.9: Multi Cycle Implementation Scheme ©[COD98]                   | . 28 |
| 3.10: State Flow Diagram of Multi Cycle Scheme Control Unit       |      |
| 3.11: Pipelined Implementation Scheme ©[COD98]                    | 30   |
| 3.12: Simultaneously Executing Instructions in Pipeline           | 31   |
| 3.13: Single and Multi Cycle Instruction Sequence                 |      |
| 3.14: Data Hazard Solution by Forwarding                          |      |
| 3.15: Data Hazard Solution by Stalling and Forwarding             |      |
| 3.16: Forwarding of the Most Recent Data                          |      |
| 4.1: Internal Structure of the Pipelined Processor                |      |
| 4.2: External Structure of the Pipelined Processor                |      |
| 4.3: Input/Output Signals of Instruction Fetch Unit               |      |
| 4.4: Input/Output Signals of Instruction Decode Unit              |      |
| 4.5: Input/Output Signals of Forwarding and Hazard Detection Unit | 51   |
| 4.6: Input/Output Signals of Control Unit                         | 55   |
| 4.7: Input/Output Signals of Execute Unit                         |      |
| 4.8: Input/Output Signals of Data Memory Unit                     | 58   |
| 4.9: Input/Output Signals of Exception Detection Unit             | 60   |

| 4.10: Input/Output Signals of Reg Unit                 |     |
|--------------------------------------------------------|-----|
| 4.11: Input/Output Signals of Reg_Wr Unit              |     |
| 4.12: StateCAD Diagram of Wait_Sm Unit                 | 65  |
| 4.13: Input/Output Signals of Reg_Prg Unit             | 66  |
| B.1: Main Screen of MIPS Monitor Software              | 106 |
| B.2: Main Functions of MIPS Monitor Software           | 107 |
| B.3: PCI Device Selection Dialog                       | 108 |
| B.4: PCI Device Selection Dialog                       | 109 |
| B.5: Unresolved Hazards View                           | 110 |
| B.6: Overflow Exception Detection View                 | 111 |
| B.7: Undefined Instruction Exception Detection View    | 111 |
| C.1: Instruction Fetch Unit Flow Diagram               |     |
| C.2: Instruction Decode Unit Flow Diagram              | 113 |
| C.3: Forwarding and Hazard Detection Unit Flow Diagram | 114 |
| C.4: Instruction Execute Unit Flow Diagram             | 115 |
| C.5: Instruction Execute Unit (continued) Flow Diagram | 116 |
| C.6: Data Memory Unit Flow Diagram                     | 117 |
| C.7: Exception Detection Unit Flow Diagram             | 117 |
| C.8: Register Block Unit Flow Diagram                  |     |
| D.1: Layout of Board                                   | 119 |
|                                                        |     |

# LIST OF ABBREVIATIONS

| ALU  | Arithmetic Logic Unit                              |
|------|----------------------------------------------------|
| API  | Application Interface                              |
| ASIC | Application Specific Integrated Circuit            |
| BRAM | Block Random Access Memory                         |
| CISC | Complex Instruction Set Computer                   |
| CLB  | Configurable Logic Block                           |
| CLK  | CLocK                                              |
| CPI  | Clock cycle Per Instruction                        |
| CPLD | Complex Programmable Logic Device                  |
| DLL  | Delay Locked Loop                                  |
| EX   | Execute (stage)                                    |
| FF   | Flip Flop                                          |
| FPGA | Field Programmable Gate Array                      |
| GCC  | Gnu C Compiler                                     |
| GPR  | General Purpose Register                           |
| HDL  | Hardware Description Language                      |
| ID   | Instruction Decode (stage)                         |
| IF   | Instruction Fetch (stage)                          |
| IOB  | Instruction Set Architecture                       |
| ISA  | Integrated Software Environment                    |
| ISE  | Look Up Table                                      |
| LUT  | Memory (stage)                                     |
| MEM  | Microprocessor without Interlocked Pipeline Stages |
| MIPS | MUItipIXer                                         |
| MUX  | No Operation (instruction)                         |
| NOP  | Programmable Array Logic                           |
| PAL  | Program Counter                                    |
| PC   | Printed Circuit Board                              |
| PCB  | Programmable Logic Array                           |
| PLA  | Programmable Logic Array                           |
| PROM | Programmable Read Only Memory                      |
| RISC | Reduced Instruction Set Computer                   |
| SoRC | System on Re-programmable Chip                     |
| SPLD | Simple Programmable Logic Device                   |
| RISC | Reduced Instruction Set Computer                   |
| SoRC | System on Re-programmable Chip                     |
|      |                                                    |

# **CHAPTER 1**

# INTRODUCTION

Faster execution of computer programs was the one of the most challenging concerns of engineers in the past and also will be much more challenging in the future. Increased demands of the industry for real time applications yield the presence of faster and deterministic processor architectures in years in the market.

Developers have always been under the effect of their era's restrictions while determining their architectural approach. This was the reason why Complex Instruction Set based computers (CISC) came before the much simpler counter parts, the Reduced Instruction Set (RISC) based computers. Developers constructed first more challenging CISC because of memory restrictions and little compiler support. Developments in memory technology in parallel with compiler enhancements resulted in emergence of RISC based computers. They are much simpler to build, much simpler to understand; hence open for improvements and maintenance.

The number of high level programming language compilers developed and specialized for RISC architectures grew rapidly. High level programming became more popular over years and programmers kept away from low level error prone long lasting assembly programming. Another reason for choosing high level programming is that different vendors proposed different architectures; hence it was not feasible to learn the architecture specific assembly code. Pipelining is one way of increasing the processor's performance. It was proposed for RISC based computers mainly because of their regularity. Pipelining accompanied with improved compiler support gave superior performance and further improvements made by scaling these architectures.

The primary goal of this thesis is to grasp the idea behind pipelining by partially developing RISC architecture, specifically Microprocessor without Interlocked Pipeline Stages (MIPS), because of its simplicity and rich documentation.

Understanding the pipelining is important because pipelining is transparent to high level programmer. Programmers are aware of Program Counter (PC), register bank and memory when they debug their programs, but they can not observe the internal register blocks used for pipelining. Programmers can not understand why the assembly code generated by different compiler vendors is different for the same high level software without knowing the internals of pipelining even they know the compiler well.

The secondary goal of this thesis is to understand the problems faced in pipelining, because it is the first step that comes before the superscalar speculative architectures. To go one step further, problems in pipelining must be solved.

The last goal of this thesis is to get familiarity with hardware design process cycle and grasp internals of programmable logic design especially for Field Programmable Gate Arrays (FPGAs). FPGAs promise parallelism which is the key concept for speed. FPGAs are reprogrammable and are becoming more popular in the market. They replace to application specific integrated circuit (ASIC) and discrete processors and they are also called as system on reprogrammable chip (SoRC).

2

This thesis is organized as follows: Chapter 2 serves to provide necessary background for development environment, programmable logic design and FPGAs. Chapter 3 describes the different implementation schemes for the same instruction set and clarifies why pipelining is the best quantitatively. It also describes the problems encountered in pipelining and solution proposals. Chapter 4 gives the details of particular subset of MIPS implementation. Chapter 5 is devoted for formal verification of the partially implemented architecture by using in circuit debugging at runtime via specially developed software, MIPS Monitor. Chapter 6 gives the conclusions and makes remarks for further future work. The appendices presents the implemented instruction set assembly codes, instruction descriptions and some screen shots to demonstrate the usage of MIPS Monitor software.

# **CHAPTER 2**

# **BACKGROUND AND MOTIVATION**

This chapter serves for the following purposes:

- providing the necessary background for understanding the rest of thesis,
- (2) motivations behind the usage of software and hardware development environments in thesis,
- (3) internals of platform FPGA which was preferred as design solution,

Readers, who are quite familiar with these concepts, can skip this chapter and start reading Chapter 3 first.

## 2.1. Programmable Logic Design

Since late 1970s, programmable logic circuits are greatly enhanced and dominated the electronics market. Developers had a tendency to use reprogrammable devices (simple and complex programmable logic devices), instead of application specific integrated circuits (ASIC) to develop large and interoperable systems because of their following characteristics [XDRM99]:

- Low cost per gate.
- Reduces Risk; engineers can make design changes in minutes.

- Faster Testing and Manufacturing.
- Ease in Verification.
- Ability participating in Hardware-Software Co-Design.
- Versatile support for Input/Output Standards.

# 2.1.1. History of Programmable Logic

By the late 1970s, standard logic components were exclusively used as standard building blocks of logic circuits. These components (e.g., 74XX series TTL parts) were located on printed circuit boards (PCBs) and any change in logic resulted corresponding revision in PCB layout. The side effects encountered, when some part of design changed, was able to be avoided by replacing these components with programmable logic devices (PLDs). Given that the design in PLDs was flexible, no rewiring on PCBs was required. In addition, less board area and power was consumed by PLDs. PLDs can be divided in two sets as simple and complex PLD.

## 2.1.1.1. Simple Programmable Logic Device (SPLD)

These devices are mainly used for address decoding [Barr99].

# 2.1.1.1.1. Programmable Logic Array (PLA)

Ron Cline from Signetics<sup>™</sup> put forward the idea of two programmable planes on 1975 [XPM04]. Any combinatorial logic can be expressed in the form of two level logics: as product of sums or sum of products. For that reason, by using PLA, any combinational logic can be implemented, if number of inputs and outputs are enough for required implementation. Despite the architecture is very flexible, because of

high fuse count, propagation delay is higher than PAL. Unwanted connections (fuse) are blown after programming.



Figure 2.1: PLA Architecture

## 2.1.1.1.2. Programmable Array Logic (PAL)

John Birkner from MMI proposed a second alternative for the PLA array on 1978. Instead of one programmable planes, the OR array was fixed after fabrication [XPM04]. PALs are more constrained than PLAs, but, because of fewer connections, they have lower propagation delay.



Figure 2.2: PAL Architecture

### 2.1.1.2. Complex Programmable Logic Device (CPLD)

Macrocells were obtained by extending PLDs with additional flip flops (FFs). CPLDs were simply combinations of these macrocells with programmable interconnects, switch matrix (SM). SM within CPLD may or may not be fully connected unlike the programmable interconnect within PLD. In other words, some of theoretically possible connections between PLDs may not actually be supported within a given CPLD. Therefore 100% utilization of macrocells is very difficult to achieve. Some designs will not fit a given CPLD, even though there are sufficient logic gates and FFs.

CPLDs can also be used as address decoders like PLDs, but more often as high performance control logic and finite state machines. Traditionally, CPLDs have been chosen over FPGAs, whenever high performance logic is required [Barr99].



Figure 2.3: CPLD Architecture

#### 2.1.1.3. Field Programmable Logic Gate Array (FPGA)

In 1985, a company called Xilinx<sup>™</sup> introduced FPGAs, composed of configurable logic blocks (CLBs), which are surrounded by programmable interconnects and comprise function generators or look up tables (LUTs) and flip flops (FFs). FPGAs can be one time programmable similar to PLD or SRAM based (or reprogrammable). [XPM04] [TRENZ01] [BZEID]



#### 2.1.2. Basic Design Process

Design entry or design specification can be in the form of schematic capture or hardware description language (HDL). In schematic form, after determining the capture tool and the manufacturer's library, designer can connect the gates from library with wires and then generates netlist, which is the textual description of the circuit. Schematic capture is not feasible for large designs because it is not scalable, not reusable, strongly vendor dependent and hard to maintain. In HDL design entry, the design is entered in high level description language emphasizing design's function or behavior and then synthesized by the vendor independent tool and netlist is generated. The design is more maintainable, scalable and reusable than schematic design entry.



Figure 2.5: Basic Design Flow in FPGAs, ©Xilinx

In design implementation, the first step is translation of low level and generic netlist file into device specific resources. After translation step, mapping step checks the design according to device specific rules, add further logic or make replications to meet the timing requirements using device resources. At last, in place and route step, already allocated resources are distributed along FPGA taking into account the physical constraints and routing resources. At this point physical layout is determined and timing information for design entities and interconnects (Back Annotation) is available. After routing, the device is ready to be programmed.

In device programming stage, the SRAM based FPGA's configuration, which is volatile after power on and also defining the logic and interconnect, is programmed to a Programmable Read Only Memory (PROM) device with part name xc18v02.

Design verification is a parallel process to design development. Design entry in either schematic or HDL form can be simulated behaviorally, while it can be tested based on the code syntax. After synthesis phase, generated netlist format can be simulated functionally by providing test vectors and tested by checking the desired output vector. Timing simulation comes after the place and route phase using back annotation.

### 2.2. Integrated Software Environment (ISE™)

Integrated Software Environment is the environment provided by Xilinx<sup>™</sup> for Design Entry, Design Synthesis, Design Implementation, Design Verification and Device Programming phases (described in 2.1.2) of design development [XISE03]. MIPS project was created in ISE with the project properties given in Figure 2.6.

| Project Properties            | ×                  |
|-------------------------------|--------------------|
| Project Properties            |                    |
| Property Name                 | Value              |
| Device Family                 | Virtex 🔽           |
| Device                        | xcv300             |
| Package                       | bg432              |
| Speed Grade                   | -5                 |
|                               |                    |
| Top-Level Module Type         | Schematic          |
| Synthesis Tool                | XST (VHDL/Verilog) |
| Simulator                     | Modelsim           |
| Generated Simulation Language | VHDL               |
|                               |                    |
|                               |                    |
| OK Cance                      | el Default Help    |

Figure 2.6: MIPS Project Properties Window

Top-Level module for Design Entry is selected as Schematic Capture for visualization purposes. All other sub-modules are coded in hardware description language VHDL [CDVHDL] [Perry02]. XST (Xilinx Synthesis Technology) tool was used to synthesize netlist from VHDL code. Modelsim® simulator was selected for post-place and route simulation purposes.



Figure 2.7: MIPS Project Source File Listing

MIPS project comprise source files describing the architecture of entities which are listed in (Figure 2.7) for the following purposes;

- Design Entry (e.g. file extensions \*.vhd and \*.sch )
- Physical and Timing user constraints files for Design Implementation (e.g. file extension \*.ucf)
- Test Bench files for Post-Place and Route Simulation (e.g. file extensions \*.vhd)

- Post-Place and Route simulation macro file which compiles the design and Test Bench files, invokes the simulator, loads signals to view windows and runs the simulation for specified time duration. (e.g. file extension \*.do)
- State Machine editor file (e.g. file extension \*.dia)
- Impactus command file for device programming (e.g. file extension \*.cmd)

# 2.3. Virtex<sup>™</sup> FPGA

MIPS project is implemented on an xcv300-5bg432 Virtex FPGA device with the following properties and layout (Figure 2.8): [XDS003-2] [SYNP99] [XCNSTR] [Brown96]

- 32x48 CLB Array provide functional elements for constructing logic connected by global routing matrix or switch matrix (Figure 2.4),
- VersaRing<sup>™</sup> forms the interface between Input Output Blocks (IOBs) and CLBs,
- 16 Block Rams (BRAMs) each 4096x1 totally 65536x1 bits,
- 4 Delay-Locked Loops (DLLs) that eliminate the skew between the clock input pad and internal clock input pins throughout the device,
- Ball grid 432 package having 316 I/O pins reserved for users with speed grade -5 which yields system performance up to 200 MHz.



Figure 2.8: Virtex Architecture Overview ©Xilinx

### 2.3.1. Function Generation Capabilities of CLB

Each CLB comprises 4 function generator (LUTs) distributed into two slices. Each slice contains 2 function generators and additional logic that combines the outputs of LUTs and generates 5 (MUXF5) and 6 (MUXF6) input functions (Figure 2.9). Each slice can generate any functions of 5 inputs up to some functions of 9 inputs; hence any CLB can generate any functions of 6 inputs up to some functions of 19 inputs.



Figure 2.9: Function Generator Configuration of CLB

### 2.3.2. Distributed (Shallow) Memory Usage of CLB

Each LUT in a Slice can be configured as 16x1 bit synchronous RAM and two LUTs in a Slice can be configured as 16x2 bit or 16x1 bit dual port or 32x1 bit synchronous RAM.

### 2.3.3. Shift Register Configuration of CLB

Each LUT in a slice can be configured as dynamically addressable16 bit shift register.

#### 2.3.4. Arithmetic Capabilities of CLB

Each LUT in a slice has a dedicated XORCY gate for single bit sum to form a full adder and dedicated carry path (Figure 2.10) which is using also dedicated routing resources along vertically adjacent CLBs [XAPP215]. By introducing the additional XORCY gate, 2 inputs of LUT left as spare and these inputs can be used to implement additional logic thereby increasing cell functionality. [TW04] [KCHAP93] [DFMULT]



Figure 2.10: Carry Logic Diagram ©Xilinx

Multiplication in FPGA is performed by shifting and adding the partial products in parallel fashion. There exists 2 input AND gate per LUT to implement 1 bit multiplier [XAPP215] and this pattern repeats throughout the multiplier. In case of operands (partial products) are not equal to each other  $C_{IN}$  signal is propagated (Figure 2.11). Additional AND gate is essential to kill or generate  $C_{OUT}$  signal produced when the propagation of  $C_{IN}$  signal is stopped (when both operands equal) [HPCC].



Figure 2.11: Multiplier Implementation ©Xilinx

### 2.4. PCI Host Software: In-Circuit Debugging of the Architecture

The "MIPS Monitor" (Figure 2.12) software which is running on PC was developed to debug the architecture after generated configuration was programmed into the target PROM or a new program is ready to be programmed while Virtex FPGA was running [PLXSDK01].

"MIPS Monitor" uses PCI Application Interface (API) provided by PLX Technology<sup>™</sup> to read the FPGA's internal data and program memory, pipeline stage's inputs/outputs, pipeline register states and current PC. It also enables the user to observe stalls and exceptions. It reflects information read by using PCI API to its graphical user interface, hence to user.

"MIPS Monitor" uses PCI API provided by PLX Technology<sup>™</sup> to write the control signals to Virtex FPGA which resets the architecture or increment the PC by one thereby enabling single step operation.

"MIPS Monitor" graphical user interface enables the user by providing the following functionalities:

- Selecting the proper PCI 9030 device which is on the same board FPGA placed,
- Viewing the program which was already assembled and programmed to PROM,
- Viewing, loading and verifying a new program to local block instruction memory of FPGA.
- Inserting break points and running the architecture in single step or in free mode by using the graphical user interface of "MIPS Monitor".



Figure 2.12: MIPS Monitor Software

The layout of the board used during this thesis is given in APPENDIX D, Layout of Board.

# **CHAPTER 3**

# **RELATED RESEARCH**

### 3.1. MIPS R2000 Instruction Set Architecture (ISA)

MIPS R2000 was first produced in 1988 by MIPS Computer Systems and was one of the RISC processors designed at that time. MIPS stands for Microprocessor without Interlocked Pipeline Stages and as its name implies, by eliminating pipeline interlocks between stages, instruction conflicts are resolved. Next generations are: R2010, also includes floating point co-processor, R3000 with cache control and lastly R4000 a 64 bit version of architecture. MIPS 32- and 64-bit architectures are used in networking and consumer device markets, such as in car navigation systems, digital television and cameras, video game controllers, switches and routers.

Primary metric to compare performance of Architectures is execution time of a program and it is presented in the following equation [COD98]:

| Seconds _ | Instruction Count | v | Clock Cycles | v        | Seconds     |
|-----------|-------------------|---|--------------|----------|-------------|
| Program   | Program           | ~ | Instruction  | <u> </u> | Clock Cycle |

The multiplication factors on the right hand side of the equation do not determine performance individually, but have an affect. Selected ISA affects the instruction count. ISA Implementation scheme which will be described in section 3.3 affects clock cycles per instruction (CPI). The organization and technology of the architecture affects the clock rate.

These factors also depend on each other in inversely proportional relationship, making one better makes the other worse. For example making instructions complex reduces the instruction count but may decrease the clock rate. Good performance can be obtained by, first choosing ISA then determining the implementation scheme and last determining the technology.

MIPS (Microprocessor without interlocked Pipeline Stages) R2000 ISA has RISC based architecture obeying four design principles [COD98] [JGRAY00];

- Smaller is faster, MIPS have 32 general purpose register each 32 bits length. MIPS instructions operate only on registers. Registers are smaller hence faster than external memory.
- Simplicity favors regularity, MIPS's instructions have the same size each 32 bits length and the same number of operands, hence decoding and pipelining are simpler compared to variable length instructions present in CISC ISA.
- *Good design demand good compromises,* MIPS sticks to small number of instruction types and addressing modes.
- Make common case fast (corollary of Amdahl's law), implementing commonly used instructions in fast way makes the whole architecture faster.

#### 3.2. MIPS Instructions and MIPS Assembly Language

MIPS instructions can be grouped as Arithmetic, Transfer, Branch, Immediate and Jump instructions.

Arithmetic instructions operates on registers and requires three operands, two for source one for destination. The arithmetic or logical

operation takes place on two source operands and result is written back into destination register.

Transfer instructions are used for loading data from memory to registers or storing data from registers to memory. Transfer instructions require two operands. One register content is used as base address and the immediate field in the instruction as the offset from base, the other register is used either destination address of the value to be loaded or the source address of the value to be stored.

Branch instructions operate on two register operands, evaluate the condition and according the result continue execution or take the branch by modifying the PC.

Immediate instructions use the immediate field as an operand.

Jump instructions are use the immediate field to jump unconditionally by modifying the PC.

The detailed descriptions, functionalities and assembly language formats of MIPS R2000 instructions implemented and verified in this thesis are presented in APPENDIX A, Implemented Subset of MIPS R2000 ISA.

#### 3.2.1. MIPS Instruction Format

General instruction format is given in Figure 3.1.

| Field size | 6 bits | 5 bits     | 5 bits | 5 bits              | 5 bits | 6 bits                  | Comment                       |
|------------|--------|------------|--------|---------------------|--------|-------------------------|-------------------------------|
| R-format   | Op     | Rs         | Rt     | Rd                  | ShAmt  | Funct                   | Arithmetic instruction format |
| I-format   | Op     | Rs         | Rt     | Address / Immediate |        |                         | Branch, imm. format           |
| J-format   | Ор     | Target add | ress   |                     |        | Jump instruction format |                               |

#### Figure 3.1: MIPS Instruction Format

The Op field is the opcode of the instruction and used as the primary key in instruction decoding. Rs, Rt and Rd fields specify the address of

register in operation. ShAmt field specify the shift amount in operation. Funct field selects the specific variant of the operation in opcode field.

### 3.2.2. MIPS Addressing Modes

Immediate addressing (Figure 3.2) means the operand is constant within the instruction itself;



Figure 3.2: Immediate Addressing Mode

Register addressing (Figure 3.3) means where all operands are registers;



Figure 3.3: Register Addressing Mode

Base addressing (Figure 3.4) means where the operand is in memory whose address is calculated by adding base address in a register with an offset in immediate field. Addressing of memory is implemented as word (4 bytes) aligned.



Figure 3.4: Base Addressing Mode

PC relative addressing (Figure 3.5) means that the instruction memory will be addressed by adding the present PC and the constant in the instruction.



Figure 3.5: PC Relative Addressing Mode

Pseudo direct addressing (Figure 3.6) means the Address field in the instruction is concatenated with the program counter and the instruction memory than addressed.



Figure 3.6: Pseudo Direct Addressing Mode

## 3.2.3. MIPS Instruction Decoding

MIPS R2000 instructions implemented and verified in this thesis were chosen according their frequency of usage in two totally different programs spice and gnu C compiler (gcc). These values were calculated from pixie which is an instruction measurement tool [COD98].

MIPS core instructions (all presented in Figure 3.7) cover 95% for gcc and 45% for spice. MIPS core instructions dominate gcc and integer plus floating point core instructions dominate spice. Instructions that did

not cover in this thesis constitute the remaining part 5% for gcc and 55% for spice. 49% of spice can be covered by simply adding a floating point arithmetic core to architecture, which results in 5% for gcc and 6% for spice as uncovered.

Instructions are decoded and control signals are generated based on Figure 3.7. Related procedures will be described in detail in 0.

| op(31:26) | Name     | gcc   | spice | func(5:0) | Name  | gcc      | spice |
|-----------|----------|-------|-------|-----------|-------|----------|-------|
| 0x00      |          |       |       |           |       | -        | l .   |
| 0x01      |          |       |       |           | sl    | 5%       | 5%    |
|           | <u> </u> |       |       | 0x02      | srl   | >0.5%    | 1%    |
| 0x02      | J        |       |       | 0x08      | jr    | 1%       | 1%    |
| 0x03      | jal      | 1%    | 1%    | 0x10      | mfhi  |          |       |
| 0x04      | beq      | 9%    | 3%    | 0x11      | mthi  |          |       |
| 0x05      | bne      | 8%    | 2%    |           |       |          |       |
| 0x08      | addi     | >0.5% | >0.5% | 0x12      | mflo  |          |       |
|           |          |       |       | 0x13      | mtio  |          |       |
| 0x09      | addiu    | 17%   | 1%    | 0x19      | multu | <u> </u> |       |
| 0x0A      | siti     | 1%    | >0.5% | 0x20      | add   | >0.5%    | >0.5% |
| 0x0B      | sltiu    | 1%    | >0.5% | 0x20      | addu  | 9%       | 10%   |
| 0x0C      | andi     | 2%    | 1%    |           |       | 8%       | 10%   |
| 0x0D      | ori      |       |       | 0x22      | sub   |          |       |
|           |          |       |       | 0x23      | subu  | >0.5%    | 1%    |
| 0x0E      | xori     |       |       | 0x24      | and   | 1%       | >0.5% |
| 0x0F      | lui      | 2%    | 6%    | 0x25      | or    |          |       |
| 0x23      | w        | 21%   | 7%    |           |       |          |       |
| 0x28      | SW       | 12%   | 2%    | 0x26      | xor   |          |       |
|           |          |       |       | 0x27      | nor   |          |       |
|           |          |       |       | 0x2A      | st    | 2%       | 0%    |

Figure 3.7: MIPS Opcode Map and Frequency of Instructions

0x2B

sttu

1%

0%

# 3.3. Survey of Instruction Set Architectures Implementation Schemes

The path which is followed by instructions and data and controlled by signals generated by control unit called data path. Each type of instruction follows different path trough architecture because the operands on which instruction operates differ.

Data path is formed by state and combinational logic elements. These elements are combined in different organizations and different implantation schemes emerge.

Building architecture requires some sequential decompose and re-unite iterations. It is necessary to decompose in order to understand, and it is necessary to re-unit in order to build. There exists a contradiction, because it is necessary to decompose in order to reunite. This contradiction was used as a methodology and followed throughout the survey of implementation schemes. Big picture is given first. Then it is decomposed and fully understood.

#### 3.3.1. Single Cycle Implementation Scheme

In this scheme (Figure 3.8) single instruction starts on clock edge and ends on the next clock edge. The clock rate is determined by the slowest instruction; in spite there exists faster instructions in ISA. Hence this scheme is impractical to implement but useful to understand. Each instruction irrespective of its instruction format is fetched from memory; the next PC is calculated by adding 4 byte offset to present PC and decoded according to its bit field based on Figure 3.1. The operation on registers is determined by the ALUOp control signal which depends on the Funct field of the instruction and determined in decode stage.



Figure 3.8: Single Cycle Implementation Scheme ©[COD98]

Multiplexers can be used to divide the architecture into smaller pieces. The presence of a multiplexer before an input element means that that element is used by as many different instruction types as the number of inputs of the multiplexer. The select signal, namely the instruction type determines the path of the data throughout the architecture for the present clock cycle. For instance, the multiplexer with control signal ALUSrc determines either ALU is used for address calculation for data memory load/store or arithmetic operation on register operands. In either case ALU can be used only by one instruction type in the same clock, hence some hardware duplications exist in the architecture for other calculations such as the adder for next program counter, despite the ALU can be used for this purpose. This is another fact which proves that this implementation scheme is impractical to implement and its

problems will be solved in multi cycle implementation scheme which will be described in section 3.3.2.

Similarly, the multiplexer with control signal MemtoReg determines which data will be written to the register bank either the result calculated by ALU or the data loaded from data memory.

The multiplexer with control signal RegDst differentiate R-type and Itype instructions because the destination register address field is different for these types. For R-type instructions, the destination address is specified in Rd field whereas in I-type instructions the destination address is specified in Rt field (Figure 3.1).

The multiplexer with control signal PCSrc determines the next PC. The next PC is PC+4 bytes for all instruction types except from conditional branch. For branch instructions (Branch control signal is asserted) if the condition is satisfied (e.g. for "branch on equal" instruction, when the operands are the same, their difference will be zero. Hence the ALU's zero output set to '1') the next PC is calculated according to Figure 3.5.

#### 3.3.2. Multi Cycle Implementation Scheme

In this scheme (Figure 3.9) instructions are executed in multi clock cycles. Register Blocks are added between functional units to hold the temporal values for using on a later clock cycle. Clock rate is determined by the slowest functional unit and functional units can be used more than once per instruction (e.g. single ALU is used instead of an ALU and two adders Figure 3.8) as long as access to this unit occurs on different clock cycles. Single memory unit is used instead of separate instruction and data memories and multiplexer with control signal lorD determines data or instruction access.



Figure 3.9: Multi Cycle Implementation Scheme ©[COD98]

Jump instruction is also shown in the scheme. The multiplexer with control signal PCSource selects next program counter calculated based on Figure 3.6 when unconditional jump instruction was fetched from memory. A more complex control logic compared to single cycle implementation scheme is needed and the state flow diagram of control unit is given in Figure 3.10.



Figure 3.10: State Flow Diagram of Multi Cycle Scheme Control Unit

#### 3.3.3. Pipelined Implementation Scheme

In this scheme (Figure 3.11), there exists single clock cycle between subsequent instructions like single cycle implementation scheme.

Clock rate is as high as multi cycle implementation scheme and is determined by the slowest functional unit similar to multi cycle implementation scheme. There exist register blocks between functional units, which are responsible for storing the information for the next clock cycle.

The difference between multi cycle scheme and pipelined scheme is that the instruction does not wait for the previous instruction until the end of write back stage and directly fetched from instruction memory while the previous instruction is being decoded.

The same control signals which are valid for single and multi cycle schemes are also valid for pipelined scheme, but in contrast to multi cycle implementation scheme, special control unit implementation (flow diagram was given in Figure 3.10) is not necessary for generation of these control signals. Sequencing is inherently present in this scheme and control signals generated in decode stage go with the instruction throughout the pipeline and are wasted up until the last stage.



Figure 3.11: Pipelined Implementation Scheme ©[COD98]

Pipelining does not improve or speed up the functional units in the architecture, instead increases the throughput by decreasing the time between instructions. There exist as much instructions as the number of stages in the pipeline simultaneously, e.g. while the fifth instruction is being fetched (IF) from memory, in the same time, first instruction is in write back (WB) stage following five clock cycles its IF stage (Figure 3.12).

|               | CLK1     | CLK2 | CLK3 | CLK4 | CLK5 |
|---------------|----------|------|------|------|------|
| Instruction 1 | IF       | ID   | EX   | MEM  | WB   |
| Instruction 2 | <u> </u> | IF   | ID   | EX   | MEM  |
| Instruction 3 |          |      | IF   | ID   | EX   |
| Instruction 4 |          |      |      | IF   | ID   |
| Instruction 5 |          |      |      |      | IF   |

Figure 3.12: Simultaneously Executing Instructions in Pipeline

#### 3.3.4. Quantitative Comparison of Implementation Schemes

Primary metric to compare performance of Architectures is execution time of a program as stated in section 3.1. Pipelined implementation scheme has the best features of other implementation schemes, low clock cycle per instruction like single cycle scheme which is optimally equal to 1 disregarding pipeline hazards described in section 3.4 and high clock rate like multi cycle implementation scheme; therefore it is expected to give the best performance. It will be a good practice to demonstrate the relative performances by giving a realistic example. MIPS instructions has the frequency of usage as stated in Figure 3.7 in gcc program and number of clock cycles as stated in Figure 3.10 which also summarized in Table 1.

CPI can be calculated by using this table adding the weighted sums of instructions in gcc program.

 $CPI = 5 \times 0.23 + 4 \times 0.13 + 3 \times 0.19 + 3 \times 0.02 + 4 \times 0.43$ = 4.02

| Instruction Type | Frequency | Number of Clock |
|------------------|-----------|-----------------|
|                  |           | Cycles          |
| LOAD             | 23%       | 5               |
| STORE            | 13%       | 4               |
| BRANCH           | 19%       | 3               |
| JUMP             | 2%        | 3               |
| ALU              | 43%       | 4               |

Table 3.1: Calculation of CPI for Multi Cycle Implementation Scheme

The clock rate or clock cycle period is determined by the slowest stage in the pipeline. For second per instruction calculation, clock period shall be multiplied with CPI (equation given in section 3.1). Optimal speedup is obtained from pipelining by using balanced stages in pipeline. Say that each stage is balanced and takes T sec/clock cycle.



Figure 3.13: Single and Multi Cycle Instruction Sequence

For single cycle implementation scheme, single cycle clock period takes 5T seconds. For multi cycle implementation scheme, single cycle period takes T seconds similar to pipelined implementation scheme. Hence, the instruction times given in Table 3.2 were obtained. According to this table, it can be seen that, pipelined implementation is nearly 5 times faster than the other implementation schemes.

| Implementation | Seconds/Instruction (CPI x sec/clock) |
|----------------|---------------------------------------|
| Scheme         |                                       |
| Single Cycle   | 1 x 5T = 5T                           |
| Multi Cycle    | 4.02 x T = 4.02T                      |
| Pipelined      | 1 x T = T                             |

**Table 3.2: Instruction Time Calculation for Implementation Schemes** 

## 3.4. Problems and Solutions in Pipelined Architectures

As stated in section 3.3.4, optimal performance and speedup can be obtained from pipelining by balancing the stages and full speed usage of the pipeline without stalls. In reality this can be not possible always. Even perfect balance between pipeline stages can not be adequate alone.

There may be existent restrictions;

- Dependencies between instructions,
- Some hardware restrictions to support pipelining,
- Branches can not be determined until Execute (EX) stage and following instructions can be fetched uselessly.

Detailed explanation how these cases are handled given in the following sections.

## 3.4.1. Structural Hazards

Structural hazards emerged because the underlying hardware does not support special instruction combinations which are simultaneously present in the pipeline. For example, the instructions 1 and 4 presented in Figure 3.12 access the memory in the same clock cycle, CLK4. If the instruction memory and the data memory are not separated physically, this architecture can not support this special combination. In clock cycle CLK5, both Instruction Decode (ID) and Write Back (WB) stages access the register bank, but in this case the hardware clash is avoided by using forwarding mechanism which will be described in the section 3.4.3.

#### 3.4.2. Brach Hazards

Branch hazards emerged because three instructions, following the branch instruction, are already in the pipeline in any case until branch condition is evaluated or unconditional jump address determined (according to Figure 3.11). In case of branches are taken, these fetched instructions must be discarded and the goal of using pipeline in its full speed one instruction per clock cycle can not be achieved. Three clock cycles are wasted effectively in case of taken branch; assuming branch is not taken always.

In this thesis, the decision making and address calculation mechanism moved to ID stage to reduce the wasted time to one clock cycle. The assumption which is called delayed branch mechanism, "braches are always not taken" is followed. In this case, the following instruction is always fetched. In case of taken branch, one slot is left as discarded and useless. If the decision is left to compiler as in case in high level programming, compilers usually fill this slot with useful instructions which are independent from the branch condition. If useful instruction can not be found, this slot is filled with well known No Operation (NOP) instruction which does not change the internal state of microprocessor. A NOP instruction is added manually after every branch in this thesis, because programming is done in assembly and compiler support is not present. There exists no special implementation in this thesis which detects this hazard and flushes the fetched instruction.

One delay slot can be easily filled with NOP or with useful instruction, but as the pipeline gets bigger, filling slots with useful instructions gets also harder. There exists other mechanism proposed in the literature to solve this problem. Dynamic prediction mechanism with additional hardware is one of them, which depends on the past statistics collected for that branch point. The decision is made based on this statistics which is changing in time with conditions.

## 3.4.3. Data Hazards

Data hazards emerged because an instruction which depends on the previous instruction is in the pipeline and previous instruction did not finish its work, for example does not write back the calculated result to destination register. In this type of hazard, the solution is not left to compilers entirely like the branch hazard described in 3.4.2 and tried be solved with hardware if possible. The hazard will appear when the destination register of the previous instruction in either EX, MEM or WB stage is the same as the one of the source registers of the current instruction which is in the ID stage. In Figure 26, data hazard is resolved by forwarding data from EX, MEM and WB stages of the first instruction to ID stages of following instructions which has a without waiting to complete first instruction to WB its destination register R1.



Figure 3.14: Data Hazard Solution by Forwarding

The data hazard must be resolved in ID stage before register bank access and branch decision. A NOP instruction is inserted into the instruction sequence, if hazard can not be solved and time is gained for resolution by using forwarding in the next clock cycles. In Figure 3.15, hazard can not be solved by just using forwarding, because the result for destination register R2 will be not available until memory access. Therefore, pipeline is stalled for one clock cycle and data hazard is resolved in the next clock cycle by forwarding data from Data Memory (MEM) stage of previous instruction to ID stage of the current instruction.

|                       | CLK1     | CLK2  | CLK3 | CLK4        | CLK5 | CLK6 |
|-----------------------|----------|-------|------|-------------|------|------|
| lw <b>R2</b> ,100(R1) | IF       | ID    | EX   | MEM<br>●    | WB   |      |
| and R4, <b>R2</b> ,R5 | <u> </u> | STALL | IF   | <b>▼</b> ID | EX   | MEM  |
| or R8, <b>R2</b> ,R6  |          |       |      | IF          | ID   | EX   |

Figure 3.15: Data Hazard Solution by Stalling and Forwarding

Some extra precautions must be taken into account while using forwarding mechanism. In Figure 3.16, the result obtained in clock cycle CLK4 from the addition of second instruction is forwarded from EX

stage instead of the result obtained in clock cycle CLK3 from MEM stage, because it is more recent.

|                               | CLK1 | CLK2 | CLK3 | CLK4   | CLK5 |
|-------------------------------|------|------|------|--------|------|
| add <b>R1</b> ,R1,R2          | IF   | ID   | ● EX | MEM    | WB   |
| add <b>R1</b> , <b>R1</b> ,R3 |      | IF   | ▼ ID | ● EX   | MEM  |
| add <b>R1,R1</b> ,R4          |      |      | IF   | ♥ ID ♥ | EX   |

Figure 3.16: Forwarding of the Most Recent Data

## 3.4.4. Exception Hazard

Hardware shall prevent completion of instructions which are following the instruction which cause exception and let all prior instructions to complete. Internal register blocks shall be flushed to prevent them to effect Register Bank and Data Memory. Program Counter shall be equated to special address like Branch or Jump instruction case. This address is generally called as interrupt or exception vector.

# **CHAPTER 4**

# IMPLEMENTATION OF MIPS PIPELINED ARCHITECTURE

This chapter describes the internal structure of the processor and the auxiliary structures to monitor and manipulate the internal registers of the processor. Internal structures of the processor are constituted by combining the following primary units and their subunits. (Figure 4.1: Internal Structure of the Pipelined Processor)

- Instruction Fetch Unit (IF\_Unit)
  - Instruction Memory (256x32bit block memory)
- Instruction Decode Unit (ID\_Unit)
  - Register Bank (dual port 32x32bit block memory)
- Forwarding and Hazard detection Unit (FORWD\_HZRD Unit)
- Control Unit (CONTROL\_Unit)
- Execute Unit (EXECUTE\_Unit)
- Data Memory Unit (256x32bit block memory)
- Exception Detection Unit (EXCEPTION\_DTCT\_UNIT)
- Four register blocks responsible for storing information between clock cycles and located between Units;
  - Instruction Fetch Instruction Decode (IF\_ID Unit)
  - Instruction Decode Execute (ID\_EX Unit)
  - Execute Data Memory (EX\_MEM Unit)
  - Data Memory Instruction Decode (MEM\_WB Unit)

Auxiliary structures of the processor are constituted by combining the following units. Units and their interconnections are presented in Figure 4.2.

- Clock Delay Locked Loop to eliminate the skew between clock input pad and the internal clock input pins (CLKDLL Unit)
- Interface between the processor and the PCI Bridge (pci\_9030 Unit)
- External reset of the processor (reg\_wr Unit)
- External programming of the Instruction Memory (reg\_prg Unit)
- External single step execution of processor (wait\_sm Unit)
- External reading of internal state of register blocks (reg Units)
- Processor itself (top\_level Unit)



Figure 4.1: Internal Structure of the Pipelined Processor



Figure 4.2: External Structure of the Pipelined Processor

#### 4.1. Internal Structure of the Processor

In this section the primary building blocks are described in detail by stating their functions and input/output signals (in figures, inputs are placed on the left and outputs are placed on the right). General signals which are common for majority of building blocks are described here. Remaining signals are described in related building block sections. Every signal is described once that means the same input signal of various blocks is also an output signal of single block; therefore there will be a cross reference (links can be followed by CTRL + Click in this document) input signal definition section of each block to output signal definition section of source block of the signal in which the same signal is described in detail to avoid redefinition. During definition of signal levels, "set" means logic level 1 and reset means logic level 0.

<u>CLK (1 bit) and RESET (1 bit):</u> Internal clock (20 MHz) and internal reset signals. These signals are active high signals.

<u>Register Dest (5 bit)</u>: This signal is transferred across all pipelines for instructions which will write to Register Bank in WB stage.

#### 4.1.1. Instruction Fetch Unit

The design of the Instruction Fetch Unit is realized by using HDL Design entry method. Instruction Fetch Unit includes the subunit Instruction Memory (256x32bit block memory) from which instructions are fetched in every clock cycle except when an unresolved (load/store) hazard exists in the pipeline which ends up with pipeline stall. The hardware flow diagram of this building block is given in APPENDIX C, Figure C.1: Instruction Fetch Unit Flow Diagram.

## 4.1.1.1. Input/Output Signals of Instruction Fetch Unit

The connections of Instruction Fetch Unit with other units can be seen in Figure 4.1: Internal Structure of the Pipelined Processor. All Input/Output signals can be seen in Figure 4.3: Input/Output Signals of Instruction Fetch Unit.

Output signals are as the following;

<u>Current PC (8 bit):</u> Signal goes to auxiliary structures to monitor the present state of the Program Counter.

<u>Incremented PC (32 bit)</u>: Signal goes to Instruction Decode Unit and forwarded until WB stage for jal instruction, because this instruction writes the return address into Register Bank address 31 for later usage in return from subroutine (by using jr instruction). This signal is also used in instruction decode stage to calculate the branch and jump address.

<u>Instruction (32 bit)</u>: Signal which is fetched from instruction memory goes to Instruction Decode and Control Units. Instruction is parsed into fields according to Figure 3.1 in Instruction Decode unit and control signals are generated in Control Unit. These signals are passed to internal register blocks for further evaluation of the parsed fields in the following clock cycles after decode stage.

<u>Wait Stages (1 bit)</u>: Signal is OR'ed with pci\_wait signal and goes to all internal registers between building blocks. If this signal is set that means, memory access (instruction memory, data memory and Register Bank access requires one clock cycle) is taking place and all processor stages are stopped during this signal is set which corresponds to one clock cycle period. Program Counter is also not updated during this signal is set.



Figure 4.3: Input/Output Signals of Instruction Fetch Unit

Input Signals are as the following;

Exception (1 bit): Exception Detection Unit output signal.

Exception Address (32 bit): Exception Detection Unit output signal.

Branch Addr (32 bit): Instruction Decode Unit output signal.

Equal (1 bit): Instruction Decode Unit output signal.

IF Control (3 bit): Control Unit output signal.

<u>Program Data (31 bit) and Program WE (1 bit)</u>: Signals are fed from external sources and used when in external programming mode. These signals are useless in normal operating mode of the processor.

<u>Pci wait (1 bit)</u>: Signal comes from external source and used as single step execution trigger. Program Counter is updated during the clock rising edges if and only if this signal is not set.

<u>Unresolved (1 bit):</u> Forwarding and Hazard detection Unit output signal.

## 4.1.1.2. Function of Instruction Fetch Unit

The primary function of Instruction Fetch Unit is to fetch instructions from Instruction memory and send it to Control and Decode Units for processing. If Wait Stages or Pci wait or Unresolved signal is set, current program counter retains its value, hence the same instruction is fetched from memory on the next clock cycle. If a branch or jump instruction is in decode stage inspecting the IF CONTROL signal, next program counter is determined according to evaluation of Equal and Branch Address signals. During instruction memory access. Wait Stages signal is set and processor is stopped for one clock cycle. On the next clock cycle, Wait\_Stages signal will be in reset state and processor is allowed to run, hence during operation of processor Wait Stages signal toggles. This halves the processor's effective clock speed from 20 MHz to 10 MHz. If RESET signal is set, Program Counter is set to byte address 16 after overflow exception vector. In case of an exception PC is set to proper exception vector. If Program WE signal is set, Instruction memory enters in external programming mode and on every clock cycle Program Data signal is written to Instruction Memory sequentially.

#### 4.1.2. Instruction Decode Unit

The design of the Instruction Decode Unit is realized by using HDL Design entry method. Instruction Decode unit includes the subunit Register Bank (dual port 32x32bit block memory) from which operands on which operations take place are fetched and to which operation results or loaded data from data memory are stored in every clock cycle. The hardware flow diagram of this building block is given in APPENDIX C, Figure C.2: Instruction Decode Unit Flow Diagram.

## 4.1.2.1. Input/Output Signals of Instruction Decode Unit

The connections of Instruction Decode Unit with other units can be seen in Figure 4.1: Internal Structure of the Pipelined Processor. All Input/Output signals can be seen in Figure 4.4: Input/Output Signals of Instruction Decode Unit.

Output signals are as the following;

<u>ALU PORTA (32 bit)</u>: Signal goes to ALU port A for evaluation according to instruction present in EX stage. This signal can come from the other stages by forwarding or represents shift amount for sll and srl instructions.

<u>ALU PORTB (32 bit)</u>: Signal goes to ALU port B for evaluation according to instruction present in EX stage. This signal can come from the other stages by forwarding or represents Incremented Program Counter for jal instruction or zero or sign extended immediate field according to control signal. For memory store operation sw, this signal represents the data which will be stored to data memory and directly forwarded to MEM stage.

<u>Avlb Stage (2 bit)</u>: Signal goes to Forwarding and Hazard detection Unit and is used to determine if unresolved data hazard which ends up with pipeline stall is present. If the result of the instruction in EX stage will be available in MEM stage (lw instruction's Avlb\_Stage is equal to MEM) and the destination of the instruction is the same as the one of the source operands of the instruction present in ID stage then pipeline is stalled for one clock cycle and data hazard is resolved using forwarding mechanism. <u>Branch Addr (32 bit):</u> Signal goes to Instruction Fetch Unit and used to determine the value of next program counter if a conditional or unconditional branch instruction is present in instruction decode stage. <u>Imm Sign Extended (32 bit):</u> Signal goes to Execute Unit and used to calculate the destination register address for sw instruction. The base address is carried to Execute Unit via Port A like lw instruction, but the offset can not be carried via Port B. Port B represents the data which will be stored in data memory for this instruction hence this signal was needed to be transferred.

<u>Register Dest (5 bit)</u>: General signal which represents the destination register which will be used in WB stage.

rs (5 bit), rt (5 bit), Unresolved A (32 bit) and Unresolved B (32 bit): Signals go to Forwarding and Hazard detection Unit. Rs and Rt represent the source addresses of operand registers and are compared with instruction's destination register address in either EX, MEM or WB stages. Forwarding Unit will determine the data hazard is present. If no hazard is detected, the Unresolved\_A and Unresolved\_B which represent the values in register Bank addresses Rs and Rt will be forwarded to ALU ports.

<u>EN RD (1 bit) and EN WR (1 bit):</u> Signals go to auxiliary structures to monitor the present state of the read and write enable pins of Register Bank They were used during development and currently not used.

<u>Equal (1 bit)</u>: Signal goes to Instruction Fetch Unit and if set that means operands on which conditional branch instruction was applied are equal, if not set, inequality condition is true.



Figure 4.4: Input/Output Signals of Instruction Decode Unit

Input signals are as the following;

<u>DataA (32 bit) and DataB (32 bit):</u> Forwarding and Hazard detection Unit output signals. (ResvDataA and ResvDataB)

ID Control (11 bit): Control Unit output signal.

Incremented PC (32 bit): Instruction Fetch Unit output signal.

Instruction (32 bit): Instruction Fetch Unit output signal.

<u>Write Data (32 bit), Write Register (5 bit) and Reg Write (1 bit):</u> These signals are WB stage signals and Write\_Register determines the address of the Register Bank in which the Write\_Data will be written if Reg\_Write signal is set and Write\_Register (destination address) is not equal to 0, because the register address 0 is named as \$zero register and it is not allowed writing to this address.

<u>Wait MEM (1 bit)</u>: Signal is generated by OR'ing the output signal Wait\_Stages of Instruction Fetch Unit and the external one step execute

trigger signal Pci\_wait. If this signal is set, the EN\_WR signal is set and if this signal is reset EN\_RD signal is set, hence the Register Bank is written first and after that it is read.

## 4.1.2.2. Function of Instruction Decode Unit

The functions of Instruction Decode Unit are;

- Preparing the Register Bank addresses and register contents to determine final resolved values on which the instruction in ID stage will operate in following stages,
- Access the Register Bank for writing and reading,
- Make the evaluation of conditional branch and determine the final branch and jump address and fed it to Instruction Fetch Unit.

## 4.1.3. Forwarding and Hazard Detection Unit

The design of the Forwarding and Hazard Detection Unit is realized by using HDL Design entry method. The hardware flow diagram of this building block is given in APPENDIX C, Figure C.3: Forwarding and Hazard Detection Unit Flow Diagram.

## 4.1.3.1. Input/Output Signals of Forwarding and Hazard Detection Unit

The connections of Forwarding and Hazard Detection Unit with other units can be seen in Figure 4.1: Internal Structure of the Pipelined Processor. All Input/Output signals can be seen in Figure 4.5: Input/Output Signals of Forwarding and Hazard Detection Unit. Output signals are as the following; <u>ResvDataA (32 bit) and ResvDataB (32 bit)</u>: Signals go to the DataA and DataB inputs of Instruction Decode Unit and then forwarded to ALU ports taking into account the control signals. The final values of these signals are determined by using the input signals and VHDL code is given below;

#### Table 4.1: Forwarding Mechanism for Register Bank Primary Port

ResvDataA <= ID\_Value when ((ID\_RegWrite = '1') and (ID\_RegDst = Rs) and (ID\_RegDst /= "00000")) else EX\_Value when ((EX\_RegWrite = '1') and (EX\_RegDst = Rs) and (EX\_RegDst /= "00000")) else WB\_Value when ((WB\_RegWrite = '1') and (WB\_RegDst = Rs) and (WB\_RegDst /= "00000")) else Unresolved\_A;

#### Table 4.2: Forwarding Mechanism for Register Bank Secondary Port

ResvDataB <= ID\_Value when ((ID\_RegWrite = '1') and (ID\_RegDst = Rt) and (ID\_RegDst /= "00000")) else EX\_Value when ((EX\_RegWrite = '1') and (EX\_RegDst = Rt) and (EX\_RegDst /= "00000")) else WB\_Value when ((WB\_RegWrite = '1') and (WB\_RegDst = Rt) and (WB\_RegDst /= "00000")) else Unresolved\_B;

<u>Unresolved (1 bit)</u>: Signal goes to Instruction Fetch Unit and like the pci\_wait signal, Program Counter is updated during the clock rising edges if and only if this signal is not set. When this signal is set that means an unresolved (load/store) hazard exists in the pipeline which ends up with pipeline stall. Program Counter and also IF\_ID are not updated during to stall because it is desired to not to lose instruction fetched and decoded during stall. NOP instruction is inserted in ID\_EX stage when this signal is set.



Figure 4.5: Input/Output Signals of Forwarding and Hazard Detection Unit

Input signals are as the following;

ID AVLB (2 bit), ID RegDst (5 bit), ID Value (32 bit), ID RegWrite (1 bit): These signals come from ID\_EX register block which is located between ID and EX stages. These values are written by the instruction which is currently in EX stage and these values are used to determine the ResvDataA and ResvDataB. ID\_AVLB and ID\_RegDst are used to determine the value of Unresolved.

EX AVLB (2 bit), EX RegDst (5 bit), EX Value (32 bit), EX RegWrite (1 bit): These signals come from EX\_MEM register block which is located between EX and MEM stages. These values are written by the instruction which is currently in MEM stage and these values are used to determine the ResvDataA and ResvDataB. EX\_AVLB is not used for any purpose.

<u>WB RegDst (5 bit), WB Value (32 bit) and WB RegWrite (1 bit):</u> These signals come from MEM\_WB register block which is located between MEM and WB stages. These values are written by the instruction which is currently in WB stage and these values are used to determine the ResvDataA and ResvDataB.

<u>Rs (5 bit), Rt (5 bit), Unresolved A (32 bit) and Unresolved B (32 bit):</u> Instruction Decode Unit output signals.

## 4.1.3.2. Function of Forwarding and Hazard Detection Unit

The function of Forwarding and Hazard Detection Unit is to determine data hazards and if possible solving this hazards either by forwarding or stalling the pipeline.

## 4.1.4. Control Unit

The design of the Control Unit is realized by using HDL Design entry method. The hardware flow diagram of this building block is not given in APPENDIX C, because the outputs of this block goes to other blocks as input and all of this signals are defined in destination unit's flow diagrams.

## 4.1.4.1. Input/Output Signals of Control Unit

The connections of Control Unit with other units can be seen in Figure 4.1: Internal Structure of the Pipelined Processor. All Input/Output signals can be seen in Figure 4.6: Input/Output Signals of Control Unit. Output signals are as the following;

<u>IF Control (3 bit)</u>: Signal goes to Instruction Fetch Unit and first bit (MSB), if set means beq instruction is present in decode stage, second bit, if set means bne instruction is present in decode stage and third bit (LSB), if set means either j, jal or jr instruction is present in decode stage.

<u>ID Control (11 bit)</u>: Signal goes to Instruction decode unit and the control word bits are set according to instructions present in ID stage. The resulting signals describe the operands, destination register and effect the branch address calculation. The dependency between ID Control word, the instruction present in ID and the effected outputs are given in Table 4.3.

| 9   | 8  | 7    | 6          | 5                                | 4                                       | 3                                                                                                                                                                    | 2                                                                                                                                                                                                                                           | 1                                                                                                                                                  | 0                                                                                                                                                                                                                                                                                                                                                                                                                      |  |  |
|-----|----|------|------------|----------------------------------|-----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| jal | jr | Not  | lw         |                                  | 00XX→ ALUA, ALUB                        | 00XX→ ALUA, ALUB are registers values, Reg_Dest→ Rd                                                                                                                  |                                                                                                                                                                                                                                             |                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |
|     |    | Used | o/w        |                                  | $1X0X \rightarrow ALUA = 0, Re$         | 1X0X→ ALUA = 0, Reg_Dest→ Rd                                                                                                                                         |                                                                                                                                                                                                                                             |                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |
|     |    |      | Not        |                                  | 1X1X→ ALUA = Shift Amount, Reg_Dest→ Rd |                                                                                                                                                                      |                                                                                                                                                                                                                                             |                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |
|     |    |      | Used       |                                  | For the following instru                | uctions if wor                                                                                                                                                       | d start with 01,                                                                                                                                                                                                                            |                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |
|     |    |      |            |                                  | Reg_Dest→ Rt else R                     | eg_Dest <del>→</del> R                                                                                                                                               | ld                                                                                                                                                                                                                                          |                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |
|     |    |      |            |                                  | X1X0→ ALUB = Zero                       | Extended Im                                                                                                                                                          | mediate                                                                                                                                                                                                                                     |                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |
|     |    |      |            |                                  | X1X1→ ALUB = Sign                       | Extended Im                                                                                                                                                          | mediate                                                                                                                                                                                                                                     |                                                                                                                                                    |                                                                                                                                                                                                                                                                                                                                                                                                                        |  |  |
|     |    |      | jal jr Not | jal jr Not Iw<br>Used o/w<br>Not | jal jr Not lw<br>Used o/w<br>Not        | jal jr Not Iw 00XX→ ALUA, ALUB<br>Used o/w 1X0X→ ALUA = 0, Re<br>Not 1X1X→ ALUA = Shift<br>Used For the following instru<br>Reg_Dest→ Rt else R<br>X1X0→ ALUB = Zero | jal jr Not Iw 00XX→ ALUA, ALUB are registers<br>Used o/w 1X0X→ ALUA = 0, Reg_Dest→ Ro<br>Not 1X1X→ ALUA = Shift Amount, Reg<br>Used For the following instructions if wor<br>Reg_Dest→ Rt else Reg_Dest→ R<br>X1X0→ ALUB = Zero Extended Im | jal jr Not lw 00XX→ ALUA, ALUB are registers values, Reg_D<br>Used o/w 1X0X→ ALUA = 0, Reg_Dest→ Rd<br>Not 1X1X→ ALUA = Shift Amount, Reg_Dest→ Rd | jal jr Not Iw 00XX $\rightarrow$ ALUA, ALUB are registers values, Reg_Dest $\rightarrow$ Rd<br>Used o/w 1X0X $\rightarrow$ ALUA = 0, Reg_Dest $\rightarrow$ Rd<br>Not 1X1X $\rightarrow$ ALUA = Shift Amount, Reg_Dest $\rightarrow$ Rd<br>Used For the following instructions if word start with 01,<br>Reg_Dest $\rightarrow$ Rt else Reg_Dest $\rightarrow$ Rd<br>X1X0 $\rightarrow$ ALUB = Zero Extended Immediate |  |  |

Table 4.3: ID\_Control Signal Fields

<u>EX Control (5 bit)</u>: Signal goes to ID\_EX register block and consumed in EX stage. Signal identifies ALU operation applied to inputs at ALU ports and also called ALUOp signal. The numeric and literal ALUOp values are given in Table 4.4.

Table 4.4: EX\_Control ALUOp Signal Values

| Literal<br>ALUOp | Numeric<br>ALUOp | Comment                                                          |
|------------------|------------------|------------------------------------------------------------------|
| ALU_ADD          | 00000            | rd <= rs+rt, signed, overflow exception generated                |
| ALU_ADDU         | 00001            | rd <= rs+rt, unsigned, overflow exception NOT generated          |
| ALU_AND          | 00010            | rd <= rs AND rt                                                  |
| ALU_EMPTY        | 00011            | ALU_RESULT <= TRUE                                               |
| ALU_MFHI         | 00100            | ALU internal multiplication register to general purpose register |

| ALUOp | Comment                                                                                                                                               |
|-------|-------------------------------------------------------------------------------------------------------------------------------------------------------|
|       | (GPR), rd <= HI                                                                                                                                       |
| 00101 | ALU internal multiplication register to GPR, rd <= LO                                                                                                 |
| 00110 | GPR to ALU internal multiplication Register, HI <= rs                                                                                                 |
| 00111 | GPR to ALU internal multiplication Register, LO <= rs                                                                                                 |
| 01000 | HILO <= rs * rt, signed (not implemented)                                                                                                             |
| 01001 | HILO <= rs * rt, unsigned                                                                                                                             |
| 01010 | rd <= rs NOR rt                                                                                                                                       |
| 01011 | rd <= rs OR rt                                                                                                                                        |
| 01100 | rd <= (rt << shift amount)                                                                                                                            |
| 01101 | rd <= (rs < rt), signed                                                                                                                               |
| 01110 | rd <= (rs < rt), unsigned                                                                                                                             |
| 01111 | rd <= (rt >> sa)                                                                                                                                      |
| 10000 | rd <= rs-rt, signed, overflow exception generated                                                                                                     |
| 10001 | rd <= rs-rt, unsigned, overflow exception NOT generated                                                                                               |
| 10010 | rd <= rs XOR rt                                                                                                                                       |
| 10011 | ALU_RESULT <= OperandB                                                                                                                                |
|       | if (op1 == op2) then branch, 18-bit signed offset added to PC,                                                                                        |
| 10100 | +-128KBytes                                                                                                                                           |
|       | if (op1 != op2) then branch, 18-bit signed offset added to PC,                                                                                        |
| 10101 | +-128KBytes                                                                                                                                           |
| 10110 | rt <= (immediate<<16)                                                                                                                                 |
| 10111 | MEM[\$rs + signed(Immediate)] <= rt                                                                                                                   |
|       | Undefined Instruction in Decode stage, Exception will be                                                                                              |
| 11000 | generated                                                                                                                                             |
|       | 00110<br>00111<br>01000<br>01001<br>01010<br>01011<br>01100<br>01101<br>01110<br>01111<br>10000<br>10001<br>10010<br>10011<br>10100<br>10110<br>10110 |

<u>MEM Control (2 bit)</u>: Signal goes to ID\_EX register block and consumed in MEM stage. First bit (MSB) if set indicates a memory read operation will take place (e.g. for lw instruction) in MEM stage, second bit (LSB) if set indicates a memory write operation will take place (e.g. for sw instruction) in MEM stage.

<u>WB Control (1 bit)</u>: Signal goes to ID\_EX register block and consumed in WB stage. Signal is also called RegWrite and indicates a register write operation will take place in WB stage.



Figure 4.6: Input/Output Signals of Control Unit

Input signals are as the following; <u>Instruction (32 bit):</u> Instruction Fetch Unit output signal.

# 4.1.4.2. Function of Control Unit

The function of Control Unit is to determine control signal values of an instruction which is in decode stage. These control signals move with the instruction throughout the pipeline and are wasted up until the last WB stage.

## 4.1.5. Execute Unit

The design of the Execute Unit is realized by using HDL Design entry method. The hardware flow diagrams of this building block are given in APPENDIX C, Figure C.4: Instruction Execute Unit Flow Diagram and Figure C.5: Instruction Execute Unit (continued) Flow Diagram.

## 4.1.5.1. Input/Output Signals of Execute Unit

The connections of Execute Unit with other units can be seen in Figure 4.1: Internal Structure of the Pipelined Processor. All Input/Output signals can be seen in Figure 4.7: Input/Output Signals of Execute Unit. Output signals are as the following;

<u>Result (32 bit)</u>: Signal goes to Data Memory Unit and if result contains the memory address for load/store instructions, signal will be wasted in MEM stage, else if this result represents a register write operation signal will be wasted in WB stage.

<u>OverFlow (1 bit)</u>: Signal goes to Exception Detection Unit and indicates that there is an arithmetic overflow occurred in signed operation.

<u>Undefined (1 bit)</u>: Signal goes to Exception detection Unit and indicates that there was an undefined instruction (an instruction which is not defined in APPENDIX A, Implemented Subset of MIPS R2000 ISA) in ID stage in previous clock cycle.



Figure 4.7: Input/Output Signals of Execute Unit

Input signals are as the following;

ALU OP (5 bit): Control Unit output signal (EX\_Control).

ALU Src A (32 bit): Instruction Decode Unit output signal

(ALU\_PORTA).

<u>ALU Src B (32 bit):</u> Instruction Decode Unit output signal (ALU PORTB).

<u>Sign Extend (32 bit)</u>: Instruction Decode Unit output signal (Imm\_Sign\_Extended).

# 4.1.5.2. Function of Execute Unit

The function of Execute Unit is to realize the arithmetic and logical operations (Table 4.4) and generate overflow, undefined exception and result signals accordingly and to calculate memory addresses for data memory access operations.

# 4.1.6. Data Memory Unit

The design of the Data Memory Unit is realized by using HDL Design entry method. Data Memory Unit includes the subunit Data Memory (256x32bit block memory) from which data is retrieved with Iw instruction and to which data is stored with sw instruction in every clock cycle. The hardware flow diagram of this building block is given in APPENDIX C, Figure C.6: Data Memory Unit Flow Diagram.

# 4.1.6.1. Input/Output Signals of Data Memory Unit

The connections of Data Memory Unit with other units can be seen in Figure 4.1: Internal Structure of the Pipelined Processor. All Input/Output signals can be seen in Figure 4.8: Input/Output Signals of Data Memory Unit.

Output signals are as the following;

<u>Read Data (32 bit)</u>: Signal goes to WB stage. Signal includes either the result of ALU operation obtained in EX stage in case MEM\_Control signal does not indicate a memory read operation or the content of the data memory at Address signal in case MEM\_Control signal indicates a memory read operation.



#### Figure 4.8: Input/Output Signals of Data Memory Unit

Input signals are as the following;

<u>Address (32 bit)</u>: Execute Unit output signal (Result). <u>MEM Control (2 bit)</u>: Control Unit output signal. <u>Write Data (32 bit)</u>: Decode Unit output signal (ALU PORTB).

## 4.1.6.2. Function of Data Memory Unit

The function of Memory Unit is to realize data memory access operations either read or write according to control signal MEM\_Control. Data fetched from data memory is forwarded WB stage via Read\_Data signal.

## 4.1.7. Exception Detection Unit

The design of the Exception Detection Unit is realized by using HDL Design entry method. The hardware flow diagram of this building block

is given in APPENDIX C, Figure C.7: Exception Detection Unit Flow Diagram.

### 4.1.7.1. Input/Output Signals of Exception Detection Unit

The connections of Exception Detection Unit with other units can be seen in Figure 4.1: Internal Structure of the Pipelined Processor. All Input/Output signals can be seen in Figure 4.9: Input/Output Signals of Exception Detection Unit.

Output signals are as the following;

<u>Exception (1 bit)</u>: Signal goes to Instruction Fetch Unit and to flush pin of internal register block s IF\_ID, ID\_EX and EX\_MEM. Internal register blocks flush their contents when this signal is set. The internal register block MEM\_WB will not be flushed, because exception did occur after the instructions which are currently (while exception occurred) in MEM and WB stage. It is allowed these instructions to complete. Instruction Fetch Unit uses this signal to determine next program counter. This signal has precedence over Branch instructions.

Exception Address (32 bit): Signal goes to Instruction Fetch Unit and is used an equated to Next Program Counter, when Exception signal is set. Byte address 0 in Instruction Memory is reserved for undefined instruction exception and there is an infinite loop located at this position. Byte address 8 is reserved for overflow exception and there is another infinite loop at this position. These 4 word address region can not be programmed by the user and can be thought as the exception handling routines.



Figure 4.9: Input/Output Signals of Exception Detection Unit

Input signals are as the following; <u>OverFlow (1 bit):</u> Execute Unit output signal. <u>Undefined (1 bit):</u> Execute Unit output signal.

### 4.1.7.2. Function of Exception Detection Unit

The function of Exception Detection Unit is to set Exception signal in case either OverFlow or Undefined signal is set in EX stage. The exception address vectors are located at byte address 0 for undefined instruction and 8 for overflow exception in arithmetic instruction.

### 4.1.8. Register Blocks between Stages of Processor

Register Blocks are simply blocks which retain information for one clock cycle period and no arithmetic processing takes place on data. Starting with current clock edge, processing also starts and must end on next clock edge, because register blocks will be overwritten. These elements are placed between:

- Instruction Fetch Instruction Decode (IF\_ID Unit)
- Instruction Decode Execute (ID\_EX Unit)
- Execute Data Memory (EX\_MEM Unit)
- Data Memory Instruction Decode (MEM\_WB Unit)

The hardware flow diagram of this building block is given in APPENDIX C, Figure C.8: Register Block Unit Flow Diagram.

Input signals are as the following in general;

<u>Unresolved (1 bit):</u> Forwarding and Hazard Detection Unit output signal.

Wait Stages (1 bit): Instruction Fetch Unit output signal.

Exception (1 bit): Exception Detection Unit output signal.

### 4.2. External Structure of the Processor

In this section auxiliary structures are described in detail by stating their functions and input/output signals (in figures, inputs are placed on the left and outputs are placed on the right). Auxiliary structures are implemented to reveal the internal state of the processor by monitoring register blocks, which are placed between building blocks. In addition, auxiliary structures enable the user to manipulate the processor, e.g. user can reset the processor, execute the program on instruction memory for single step and program instruction memory of the processor externally.

Host monitor software (MIPS Monitor software described in section 2.4) writes to PCI and reads from PCI local addresses by using PlxApi library. PlxApi runs on host platform accessing to PCI bus which operates with 33 MHz and 32 bits wide. Pci\_9030 interface monitors read and write transactions on PCI Bus initiated by MIPS Monitor software staying on local side which operates with 40 MHz local bus clock and 32 bits wide. The procedure how Pci\_9030 interface detects transactions is described in [PLXSDK02]. Hence external structures of processor operate at 40 MHz while processor is operating at 20 MHz. This can be achieved by using CLKDLL Unit. CLKDLL Unit minimizes the clock skew between the input pad from which clock enters to FPGA and distributed clock across the FPGA. CLKDLL can also change the

phase or the frequency of the clock by multiplying or dividing it by a constant. The clock frequency is divided by two to obtain the 20 MHz in the clock pins of the processor [XLBR04].

### 4.2.1. External Monitoring of the Processor

Reg Unit is developed for this purpose. MIPS Monitor software sends a PCI read request from a specified local address. Reg Unit (Figure 4.10) takes the local address from addr (26 bit) signal and compares it with the baddr (26 bit) signal. If they are equal and the rd signal is set, dout (32 bit) is forwarded to pci\_9030 interface and then PCI bus. MIPS software reflects this information to the user via its graphical user interface.



Figure 4.10: Input/Output Signals of Reg Unit

Base addresses from 1 to 10 (total 40 bytes) is reserved for monitoring of internal signals of the processor. The stage names attached to signal names represents the stage from which the signal is monitored (e.g. EX\_Reg\_Dst signal represents the destination register of the instruction which is currently in EX stage, similarly MEM\_Reg\_Dst represents the destination of the instruction in MEM stage and WB\_Reg\_Dst represents the destination of the instruction in WB stage). Base addresses their corresponding processor register blocks are given Table 4.5:

| Base    | Internal Signals that can be Presented by MIPS Monitor Software                       |
|---------|---------------------------------------------------------------------------------------|
| Address |                                                                                       |
| 1       | EX_OVFL, EX_Reg_Dst(5 bit), MEM_Reg_Dst(5 bit), WB_Reg_Dst(5 bit), ID_Incr_PC(8 bit), |
|         | curr_pc(8 bit)                                                                        |
| 2       | ID_Instruction (32 bit)                                                               |
| 3       | EX_ALUA (32 bit)                                                                      |
|         |                                                                                       |
| 4       | EX_ALUB (32 bit)                                                                      |
| 5       | EX ALU RES (32 bit)                                                                   |
| 5       | EX_ALU_RES (32 bit)                                                                   |
| 6       | MEM_ADDR (32 bit)                                                                     |
|         |                                                                                       |
| 7       | MEM_WRITE_DATA (32 bit)                                                               |
| 8       | MEM_READ_DATA (32 bit)                                                                |
|         |                                                                                       |
| 9       | WB_REG_WR_DATA (32 bit)                                                               |
| 10      | EN_RD, EN_WR, MEM_WAIT, ID_Unresolved, EX_AVLB(1:0)                                   |
|         |                                                                                       |
| I       |                                                                                       |

 Table 4.5: Base Addresses of Processor's Internal Signals

### 4.2.2. External Manipulation of the Processor

Reg\_Wr, Reg\_Prg and Wait\_Sm Units are developed to manipulate the state of the processor. MIPS Monitor software sends a PCI write request and data to a specified local address. According to data, next action will be determined.



Figure 4.11: Input/Output Signals of Reg\_Wr Unit

Reg\_Wr Unit (Figure 4.11) which is developed to enable of external reset of the processor takes the local address from addr (26 bit) signal and compares it with the baddr (26 bit) signal. If they are equal and the wr signal is set and the din (32 bit) is equal to 2 then dout which is connected to reset pin of the processor is set.



Figure 4.12: StateCAD Diagram of Wait\_Sm Unit

Wait\_Sm Unit (Figure 4.12) is developed to enable the processor for single step operation. The Input/Output signals are quite similar to Reg\_wr Unit. The only difference is, instead of dout output, pci\_wait signal is outputted from Wait\_Sm Unit. The design of the Wait\_Sm is realized by using state machine entry method StateCAD tool provided by Xilinx ISE. If addr signal is base address (base address 0 is reserved for single cycle operation), signal wr is set and din equals to 1, then pci\_wait output stays reset during four clock cycles and processor is enabled to operate during this interval. Since processor operate at half frequency of external world, this duration corresponds to two processor

clock cycles. Processor access memory and pipeline advances one step within this time.



Figure 4.13: Input/Output Signals of Reg\_Prg Unit

Reg\_Prg Unit (Figure 4.13) which is developed for external programming and includes a program memory (256x32 bits). Reg\_Prg takes the local address from addr (26 bit) signal and compares it with the baddr (26 bit) signal (Base address 11 is reserved for external programming). If they are equal and the wr signal is set and the din (32 bit) is not equal to X"FFFF\_FFF" then din is written at each clk edge (clk connected of external clock operating at 40 MHz) to internal memory. When din is equals to X"FFFF\_FFFF", writing sequence to internal memory is finished and another writing sequence from Reg\_Prg Unit memory to instruction memory of processor is started. This process is managed by clk2 signal (operating at 20 MHz) which is also the clock domains problem is solved. It was foreseen as the fastest way during design.

# **CHAPTER 5**

# VERIFICATION OF MIPS PIPELINED ARCHITECTURE

The operation of the architecture is verified with MIPS Monitor software with following the steps:

- Verification of correct operation of instructions,
- Verification of proper hazard detection and solution,
- Verification of proper exception detection and handling.

The details of how the use of MIPS Monitor software is described in APPENDIX B, MIPS Monitor Software and the operation is described in section 2.4. The mnemonic names and the corresponding numeric values of MIPS registers are given at the end of in APPENDIX A, Implemented Subset of MIPS R2000 ISA in Table A.1.

### 5.1. Verification of Correct Operation of Instructions

Instructions described in APPENDIX A, Implemented Subset of MIPS R2000 ISA are tested and the procedure of testing and the observed results are stated in this section.

The test program given in Table 5.1 is written and then downloaded to processor to demonstrate that all instructions are tested. A requirement number (as R#) is given in the comment section of the code and the clock cycle in which the requirement is fulfilled is pointed out in the first column of Table 5.2.

Results of operations and contents of stages are read by using MIPS Monitor software and results are tabulated in Table 5.2.

| #######                                                                                                                        | #######################################    | #######################################                                              |  |  |  |  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------|--------------------------------------------------------------------------------------|--|--|--|--|--|--|--|--|
| #                                                                                                                              |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| # TEST_                                                                                                                        | 1                                          |                                                                                      |  |  |  |  |  |  |  |  |
| #                                                                                                                              |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| # Create                                                                                                                       | d by Can Altıniğneli                       |                                                                                      |  |  |  |  |  |  |  |  |
| # To den                                                                                                                       | nonstrate the instructions defin           | ed in APPENDIX A correctly implemented                                               |  |  |  |  |  |  |  |  |
| #######                                                                                                                        | ##############################             | #######################################                                              |  |  |  |  |  |  |  |  |
| UNDEFI                                                                                                                         | NED:                                       |                                                                                      |  |  |  |  |  |  |  |  |
| beq \$zer                                                                                                                      | ro, <i>\$zero</i> , UNDEFINED              | # UNDEFINED EXCEPTION VECTOR                                                         |  |  |  |  |  |  |  |  |
| nop                                                                                                                            |                                            |                                                                                      |  |  |  |  |  |  |  |  |
|                                                                                                                                |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| OVERFL                                                                                                                         |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| beq \$zei                                                                                                                      | ro, <i>\$zero</i> , OVERFLOW               | # OVERFLOW EXCEPTION VECTOR                                                          |  |  |  |  |  |  |  |  |
| nop                                                                                                                            |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| OTICT                                                                                                                          |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| START:                                                                                                                         |                                            |                                                                                      |  |  |  |  |  |  |  |  |
|                                                                                                                                | DDI and ADDU are verified                  |                                                                                      |  |  |  |  |  |  |  |  |
| addi                                                                                                                           | \$ <i>s0</i> , \$ <i>zero</i> , 0x6        | # \$s0 shall = x6, DestAdr:16, <b>R1</b>                                             |  |  |  |  |  |  |  |  |
| addi<br>add                                                                                                                    | \$s1, \$zero, 0x4<br>\$s2, \$s0, \$s1      | # \$s1 shall = x4, DestAdr:17, <b>R2</b>                                             |  |  |  |  |  |  |  |  |
| addu                                                                                                                           | \$\$2, \$\$0, \$\$1<br>\$\$2, \$\$0, \$\$1 | # \$s2 shall = xA, DestAdr:18, <b>R3</b><br># \$s2 shall = xA, DestAdr:18, <b>R4</b> |  |  |  |  |  |  |  |  |
| auuu                                                                                                                           | φ52, φ50, φ51                              | $\# \varphi SZ SHall = XA, DESIAULTO, A4$                                            |  |  |  |  |  |  |  |  |
| #ADDIU,                                                                                                                        | SUB and SUBU are verified                  |                                                                                      |  |  |  |  |  |  |  |  |
| addiu                                                                                                                          | <i>\$s0, \$zero</i> , 0x2                  | # \$s0 shall = x2, DestAdr:16, <b>R5</b>                                             |  |  |  |  |  |  |  |  |
| addiu                                                                                                                          | <i>\$s1, \$zero</i> , 0x4                  | # \$s1 shall = x4, DestAdr:17, <b>R6</b>                                             |  |  |  |  |  |  |  |  |
| sub                                                                                                                            | \$s2, \$s0, \$s1                           | # \$s2 shall = xFFFF_FFE, DestAdr:18, <b>R7</b>                                      |  |  |  |  |  |  |  |  |
| subu                                                                                                                           | \$s2, \$s1, \$s0                           | # \$s2 shall = x2, DestAdr:18, <b>R8</b>                                             |  |  |  |  |  |  |  |  |
|                                                                                                                                |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| #OR, ORI, AND, ANDI, XOR, XORI, NOR, SRL, SLL, LUI are verifiedori\$t0, \$zero, 0xFFFF# \$t0 shall = x0000FFFF, DestAdr:8 , R9 |                                            |                                                                                      |  |  |  |  |  |  |  |  |
| ori                                                                                                                            | <i>\$t0</i> , <i>\$zero</i> , 0xFFFF       | # \$t0 shall = x0000FFFF, DestAdr:8 , <b>R9</b>                                      |  |  |  |  |  |  |  |  |
| lui                                                                                                                            | <i>\$t1</i> , 0xFFFF                       | # \$t1 shall = xFFF0000, DestAdr:9, <b>R10</b>                                       |  |  |  |  |  |  |  |  |
| or                                                                                                                             | \$t2, \$t0,\$t1                            | # \$t2 shall = xFFF_FFF, DestAdr:10, <b>R11</b>                                      |  |  |  |  |  |  |  |  |
| and                                                                                                                            | \$t2, \$t0,\$t1                            | # \$t2 shall = x0000_0000, DestAdr:10, <b>R12</b>                                    |  |  |  |  |  |  |  |  |
| xor                                                                                                                            | \$t2, \$t0,\$t1                            | # \$t2 shall = xFFFF_FFF, DestAdr:10, <b>R13</b>                                     |  |  |  |  |  |  |  |  |
| nor                                                                                                                            | \$t2, \$t0,\$t1                            | # \$t2 shall = x0000_0000, DestAdr:10, <b>R14</b>                                    |  |  |  |  |  |  |  |  |
| andi                                                                                                                           | <i>\$t0,\$t0</i> , 0x0000                  | # \$t0 shall = x0000_0000, DestAdr:8, <b>R15</b>                                     |  |  |  |  |  |  |  |  |
| srl                                                                                                                            | <i>\$t1,\$t1</i> ,16                       | # \$t1 shall = x0000_FFFF, DestAdr:9, <b>R16</b>                                     |  |  |  |  |  |  |  |  |
| sll                                                                                                                            | <i>\$t1,\$t1</i> ,16                       | # \$t1 shall = xFFF_0000, DestAdr:9, <b>R17</b>                                      |  |  |  |  |  |  |  |  |
| xori                                                                                                                           | <i>\$t1,\$t1</i> ,0xFFFF                   | # \$t1 shall = xFFFF_FFF, DestAdr:9, <b>R18</b>                                      |  |  |  |  |  |  |  |  |
| #SLT SI                                                                                                                        | _TI, BEQ, BNE, NOP are verifi              | ed                                                                                   |  |  |  |  |  |  |  |  |
| LOOP_3                                                                                                                         |                                            |                                                                                      |  |  |  |  |  |  |  |  |
|                                                                                                                                |                                            |                                                                                      |  |  |  |  |  |  |  |  |

Table 5.1: Verification of Correct Instruction Operation

```
subi
         $s0, $s0, 1
                                     # $s0 shall = x1, DestAdr:16, R19
slti
         $t0, $s0, 0x0
                                     # $t0 shall = x1, if $s0 negative, signed comparison, R20
         $t0, $zero, LOOP_3TIMES
beq
nop
                                     #after 3 iterations exit from loop
                                     #s0 shall = xFFFF_FFFF, therefore $t1 shall = x1, R21
slt
         $t1, $s0, $zero
         $t1, $zero, JUMP_POINT
bne
nop
#SLTIU, SLTU, MULTU, MFHI, MFLO, MTHI, MTLO, SW, LW, JR, J, JAL are verified
MULTIPLY:
addi
                                     # $s0 shall = xFFFF_FFFF, DestAdr:16, R22
         $s0, $zero, -1
addi
                                     # $s1 shall = xFFFF_FFE, DestAdr:17, R23
         $s1, $zero, -2
multu
         $s0, $s1
                                     # HI shall = xFFFF_FFD, LO shall = x2
mfhi
         $t0
                                     # $t0 shall = xFFFF_FFD, DestAdr:8, R24
mflo
         $t1
                                     # $t1 shall = x2, DestAdr:9, R25
mthi
         $zero
mtlo
         $zero
mfhi
         $s0
                                     # $s0 shall = 0, DestAdr:16, R26
mflo
         $s1
                                     # $s1 shall = 0, DestAdr:17, R27
addi
         $s1, $s1, 0x4
                                     # $s1 shall = 4, DestAdr:17, R28
                                     # MEM[0] shall store xFFFF_FFD, R29
sw
         $t0, 0($s0)
                                     # MEM[4] shall store x2, R30
sw
         $t1, 0($s1)
jr
         $ra
                                     # Jump after jal instruction, R31
nop
JUMP_POINT:
jal MULTIPLY
nop
                                     # MEM[0]-->$t2 shall = xFFFF_FFD, DestAdr:10, R32
lw
         $t2, 0($s0)
lw
         $t3, 0($s1)
                                     # MEM[1]-->$t3 shall = x2, DestAdr:11, R33
sltiu
         $t0, $t2, 1
                                     # $t0 shall = 0,because $t2 > 1, R34
bne
         $t0, $zero, START
                                     # shall not jump to START, R35
nop
         $t0, $t2, $zero
                                     # $t0 shall = 0,because $t2 > 0, R36
sltu
         $t0, $zero, START
                                     # shall not jump to START, R37
bne
nop
j START
                                     # shall jump to START, R38
nop
Eternity:
beq $zero, $zero, Eternity
nop
```

|                                                                         |              |           | ۵.       |                      |   |           |                      |      |           |                      |      |           | 9,6                   |           | 6         | 4,0                   |           | <b>+</b>  | \$s1                  |      | <₹        | \$s1                  |      | ব         |
|-------------------------------------------------------------------------|--------------|-----------|----------|----------------------|---|-----------|----------------------|------|-----------|----------------------|------|-----------|-----------------------|-----------|-----------|-----------------------|-----------|-----------|-----------------------|------|-----------|-----------------------|------|-----------|
|                                                                         | WB_STAGE     | INSTR_WB  | wD       |                      |   | <u>'</u>  | .                    |      | <u>'</u>  |                      |      | Ľ         | addi \$s0, \$zero, 6  |           | 8         | addi \$s1, \$zero, 4  |           | ×4        | add \$s2, \$s0, \$s1  |      | ΨX        | 2, \$s0,              |      | XA        |
|                                                                         | <sup>B</sup> | INS       | Reg_D    |                      |   | 1         |                      |      |           |                      |      | I         | addi \$s              |           | X10       | addi \$s              |           | X11       | add \$s'              |      | X12       | addu \$s2, \$s0, \$s1 |      | X12       |
|                                                                         |              |           | Rd_D     |                      |   |           |                      |      |           |                      |      | 9X        |                       |           | X4        |                       |           | XA        |                       |      | XA        |                       |      | 73        |
|                                                                         | MEM STAGE    | INSTR_MEM | Wr_D     | -<br>-               |   |           | -                    |      |           | addi \$s0, \$zero, 6 |      | 9X        | addi \$s1, \$zero, 4  |           | X4        | add \$s2, \$s0, \$s1  |           | ×4        | addu \$s2, \$s0, \$s1 |      | X4        | addiu \$s0, \$zero, 2 |      | X2        |
| I                                                                       | 2            | 2         | Addr     |                      |   |           |                      |      |           | add                  |      | 9X        | add                   |           | X4        | add                   |           | XA        | add                   |      | XA        | addi                  |      | X2        |
|                                                                         |              |           | Ovr      |                      |   |           |                      |      | 0         |                      |      | 0         |                       |           | -         |                       |           | 0         |                       |      | 0         |                       |      | 0         |
|                                                                         |              | ×         | Res      |                      |   |           | n, 6                 |      | 9X        | ro, 4                |      | X4        | \$s1                  |           | XA        | ,\$s1                 |           | XA        | ero, 2                |      | X2        | ero, 4                |      | X4        |
|                                                                         | EX STAGE     | INSTR_EX  | ALU_B    | '                    |   |           | addi \$s0, \$zero, 6 |      | 9X        | addi \$s1, \$zero, 4 |      | X4        | add \$s2, \$s0, \$s1  |           | X4        | addu \$s2, \$s0, \$s1 |           | X4        | addiu \$s0, \$zero, 2 |      | X2        | addiu \$s1, \$zero, 4 |      | X4        |
| ו מטוב ט.ב. דוווווווט שומטו מוודוטר וווטע ערעטון סטבו מעטון אבווורמעטון |              |           | ALU_A    |                      |   |           |                      |      | Q.        |                      |      | R         |                       |           | 9X        |                       |           | 92        |                       |      | Q.        |                       |      | 0X        |
|                                                                         | щ            | ₽.        | Incr_PC  | zero, 6              |   | X14       | zero, 4              |      | X18       | s0, \$s1             |      | XIC       | s0, \$s1              |           | X20       | izero, 2              |           | X24       | izero, 4              |      | X28       | s0,\$s1               |      | X2C       |
|                                                                         | ID STAGE     | INSTR_ID  | Instr    | addi \$s0, \$zero, 6 |   | X20010006 | addi \$s1, \$zero, 4 |      | X20110004 | add \$s2, \$s0, \$s1 |      | X02119020 | addu \$s2, \$s0, \$s1 |           | X02119021 | addiu \$s0, \$zero, 2 |           | X24100002 | addiu \$s1, \$zero, 4 |      | X24110004 | sub \$s2, \$s0, \$s1  |      | X02119022 |
|                                                                         | IF STAGE     | INSTR_IF  | Curr_PC  | addi \$s1, \$zero,   | 4 | X14       | add \$s2, \$s0,      | \$s1 | X18       | addu \$s2, \$s0,     | \$s1 | XIC       | addiu \$s0,           | \$zero, 2 | X20       | addiu \$s1,           | \$zero, 4 | X24       | sub \$s2, \$s0,       | \$s1 | X28       | subu \$s2, \$s1,      | \$s0 | X2C       |
|                                                                         | CLK          | (R#)      | <u> </u> | -                    |   | I         | 2                    |      | <u> </u>  | m                    |      | <u> </u>  | 4                     | Ĕ         | 1         | ъ                     | R2        | 1         | 9                     | ñ    | I         | 2                     | R4   | -         |

Table 5.2: Timing Diagram for Instruction Operation Verification

| IF STAGE                                   | $\vdash$              |            | 101              |           |                       |            |     |           |                       |           | aw         | WR STAGE              |
|--------------------------------------------|-----------------------|------------|------------------|-----------|-----------------------|------------|-----|-----------|-----------------------|-----------|------------|-----------------------|
|                                            |                       |            |                  |           |                       |            |     | -         |                       |           |            |                       |
|                                            | INSTR_ID              | ٩.         |                  |           | INSTR_EX              | ×          |     | _         | INSTR_MEM             |           | .SNI       | INSTR_WB              |
| Curr_PC Instr Incr_PC                      |                       | Incr_PC    |                  | ALU_A     | ALU_B                 | Res        | Ovr | Addr      | Wr_D                  | Rd_D      | Reg_D      | Wr_D                  |
| ori \$t0, \$zero, -1 subu \$s2, \$s1, \$s0 | subu \$s2, \$s1, \$s0 | \$s1, \$s0 |                  |           | sub \$s2, \$s0, \$s1  | \$s1       |     | add       | addiu \$s1, \$zero,   | 4         | addiu \$\$ | addiu \$s0, \$zero, 2 |
| X30 X02309023 X30                          |                       | X30        |                  | X2        | X4                    | XIIIIIIe   | 0   | X4        | ×4                    | X4        | X10        | X2                    |
| lui \$t1, 65535 ori \$t0, \$zero, -1       | ori \$t0, \$zero, -1  | cero, -1   |                  |           | subu \$s2, \$s1, \$s0 | , \$s0     |     | Ins       | sub \$s2, \$s0, \$s1  |           | addiu \$s  | addiu \$s1, \$zero, 4 |
| X34 X3408ffff X34                          | 08ffff                | X34        |                  | X4        | X2                    | R          | 0   | Xiiiiiie  | ×4                    | XIIIIIIe  | X11        | X4                    |
| or \$t2, \$t0, \$t1 lui \$t1, 65535        | lui \$t1, 65535       | 15535      |                  |           | ori \$t0, \$zero, -1  | ) 1        |     | gns       | subu \$s2, \$s1, \$s0 |           | st dus     | sub \$s2, \$s0, \$s1  |
| X38 X3C09ffff X38                          | 09ffff X38            |            |                  | R         | X0000                 | X0000ffff  | 0   | Ŗ         | 22                    | 22        | X12        | Xiiiiifie             |
| and \$t2, \$t0, \$t1 or \$t2, \$t0, \$t1   | or \$t2, \$t0, \$t1   | t0, \$t1   |                  |           | lui \$t1, 65535       | 35         |     | u.        | ori \$t0, \$zero, -1  |           | s\$ nqns   | subu \$s2, \$s1, \$s0 |
| X3C X01095025 X3C >                        | X3C                   |            | Ê                | R         | X0000tttf             | XIIII0000  | 0   | X0000ffff | X0000###              | X0000tttf | X12        | X2                    |
| xor \$t2, \$t0, \$t1 and \$t2, \$t0, \$t1  | and \$t2, \$t0, \$t1  | št0, \$t1  |                  |           | or \$t2, \$t0, \$t1   | \$t1       |     |           | lui \$t1, 65535       |           | ori \$t0   | ori \$t0, \$zero, -1  |
| X40 X01095024 X40 X0000ffff                | X40                   |            | DOOX             | Offiff    | XIIII0000             | XIIIIIII   | 0   | X##10000  | X0000                 | XIIII0000 | 8          | X0000tttf             |
| nor \$t2, \$t0, \$t1 ×or \$t2, \$t0, \$t1  | xor \$t2, \$t0, \$t1  | tio, \$t1  |                  |           | and \$t2, \$t0, \$t1  | \$t1       |     |           | or \$t2, \$t0, \$t1   |           | lui \$t    | lui \$t1, 65535       |
| X44 X01095026 X44 X0000ffff                | X44                   |            | )000X            | JIII      | XIIII0000             | R          | 0   | XIIIIIII  | XIIII 0000            | XIIIIIII  | 6X         | XIIII0000             |
| andi \$t0, \$t0, 0 nor \$t2, \$t0, \$t1    | nor \$t2, \$t0, \$t1  | tio, \$t1  |                  |           | ×or \$t2, \$t0, \$t1  | \$t1       |     | an        | and \$t2, \$t0, \$t1  |           | or \$t2    | or \$t2, \$t0, \$t1   |
| X48 X01095027 X48 X001                     | 35027 X48 3           |            | 100X             | X0000ffff | XIIII0000             | XIIIIIII   | 0   | ₽         | XIIII 0000            | R         | XA         | XIIIIIIIX             |
| srl \$t1, \$t1, 16 andi \$t0, \$t0, 0      | andi \$t0, \$t0, 0    | \$t0, 0    |                  |           | nor \$t2, \$t0, \$t1  | \$t1       |     | ×         | xor \$t2, \$t0, \$t1  |           | and \$t    | and \$t2, \$t0, \$t1  |
| X4C X31080000 X4C X0                       | X4C                   |            | <u></u>          | X0000ffff | XIIII0000             | <b>9</b>   | 0   | XIIIIIII  | XIIII000              | XIIIIIIX  | XA         | R                     |
| sll \$t1, \$t1, 16 srl \$t1, \$t1, 16      | srl \$t1, \$t1, 16    | t1, 16     |                  |           | andi \$t0, \$t0, 0    | ), 0       |     | DC        | nor \$t2, \$t0, \$t1  |           | xor \$t(   | xor \$t2, \$t0, \$t1  |
| X50 X0094C02 X50 X00                       | 34C02 X50             |            |                  | X0000###  | R                     | R.         | 0   | R         | XIIII0000             | R.        | AX         | XIIIIIII              |
| xori \$t1, \$t1, -1 sll \$t1, \$t1, 16     | sll \$t1, \$t1, 16    | t1,16      |                  |           | srl \$t1, \$t1, 16    | 16         |     | Ð         | andi \$t0, \$t0, 0    |           | nor \$t    | nor \$t2, \$t0, \$t1  |
| X54 X00094C00 X54                          | 34C00                 | X54        |                  | X10       | XIII 0000             | X0000      | 0   | 2         | R.                    | ×         | XA         | R                     |
| addi \$s0, \$s0, -1 xori \$t1, \$t1, -1    | ×ori \$t1, \$t1, -1   | \$t1, -1   |                  |           | sll \$t1, \$t1, 16    | 16         |     | S         | srl \$t1, \$t1, 16    |           | andi \$    | andi \$t0, \$t0, 0    |
| X58 X3929ffff X58 >                        | 29ffff ×58            |            | $\left  \right $ | X10       | X0000ffff             | XIIII 0000 | 0   | X0000ffff | XIIII0000             | X0000ffff | 8          | 0X                    |

Table 5.2: Timing Diagram for Instruction Operation Verification (continued)

|                                                                              | WB_STAGE  | INSTR_WB   | o wr_D  | srl \$t1, \$t1, 16  | X0000ffff  | sil \$t1, \$t1, 16  |     | XIIII 0000 | ×ori \$t1, \$t1, -1  | XIIIIII             | addi \$\$0, \$\$0, -1 | ×         | slti \$t0, \$s0, 0   | <u>R</u>  | beq \$zero, \$t0, -3 |   | <u>R</u>  | dou                  | R.        | addi \$s0, \$s0, -1  | R.        | slti \$t0, \$s0, 0   | R         |
|------------------------------------------------------------------------------|-----------|------------|---------|---------------------|------------|---------------------|-----|------------|----------------------|---------------------|-----------------------|-----------|----------------------|-----------|----------------------|---|-----------|----------------------|-----------|----------------------|-----------|----------------------|-----------|
|                                                                              | ×         | Z          | Reg_D   | Suls                | S.         | ŝ                   |     | S.         | Xori                 | 2                   | addi                  | ×10       | 믕                    | ~         | bed                  |   | ×1f       |                      | 2         | addi                 | X10       | ŝ                    | 8         |
|                                                                              |           |            | Rd_D    |                     | XIIII 0000 |                     |     | XIIIIIII   |                      | X                   |                       | R         |                      | R         |                      |   | R         |                      | QX        |                      | DX        |                      | 2         |
| continued)                                                                   | MEM STAGE | INSTR_MEM  | Wr_D    | sil \$t1, \$t1, 16  | XOOOOHHH   | ×ori \$t1, \$t1, -1 |     | X0000##F   | addi \$s0, \$s0, -1  | XIIIIIII            | slti \$t0, \$s0, 0    | R         | beq \$zero, \$t0, -3 | R         | dou                  |   | R         | addi \$s0, \$s0, -1  | XIIIIIII  | slti \$t0, \$s0, 0   | DX        | beq \$zero, \$t0, -3 | R         |
| erification (                                                                | 2         |            | Addr    | S                   | XIIII0000  |                     |     | XIIIIIII   | ad                   | X1                  | N<br>N                | R         | pec                  | R         |                      |   | R         | ad                   | R         | N                    | R         | peq                  | R         |
| tion Ve                                                                      |           |            | ovr     |                     | -          |                     |     | -          |                      | -                   |                       | -         |                      | -         |                      |   | -         |                      | -         |                      | ×         |                      | -         |
| ion Opera                                                                    |           |            | Res     | ÷-                  | XIIIIIII   |                     |     | ×          | 0                    | R                   | ],-3                  | R         |                      | 2         |                      |   | 2         | 0                    | R         | ), -3                | R         |                      | R         |
| Table 5.2: Timing Diagram for Instruction Operation Verification (continued) | EX STAGE  | INSTR_EX   | ALU_B   | xori \$t1, \$t1, -1 | X0000ffff  | addi \$s0, \$s0, -1 |     | XIIIIIII   | slti \$t0, \$s0, 0   | R                   | beq \$zero, \$t0, -3  | R.        | dou                  | DX        | addi \$s0, \$s0, -1  |   | XIIIIIII  | slti \$t0, \$s0, 0   | R.        | beq \$zero, \$t0, -3 | R.        | dou                  | R.        |
| g Diagram                                                                    |           |            | ALU_A   |                     | XIIIT0000  |                     |     | 73         |                      | X1                  |                       | R         |                      | R         |                      |   | XI        |                      | 2         |                      | 2         |                      | 2         |
| 5.2: Timin                                                                   | GE        | <u>e</u> , | Incr_PC | \$s0, -1            | X5C        | s0'0                |     | 09X        | \$t0, -3             | X64                 |                       | X68       | \$s0, -1             | X5C       | s0'0                 |   | X60       | \$t0, - 3            | X64       |                      | X68       | \$s0, -1             | X5C       |
| Table (                                                                      | ID STAGE  | INSTR_ID   | Instr   | addi \$s0, \$s0, -1 | X2210ffff  | slti \$t0, \$s0, 0  |     | X2A080000  | beq \$zero, \$t0, -3 | X1008fffd           | dou                   | X00000000 | addi \$s0, \$s0, -1  | X2210ffff | slti \$t0, \$s0, 0   |   | X2A080000 | beq \$zero, \$t0, -3 | X1008fffd | dou                  | 00000000X | addi \$s0, \$s0, -1  | X2210ffff |
|                                                                              | IF STAGE  | INSTR_IF   | Curr_PC | slti \$t0, \$s0, 0  | X5C        | beq \$zero, \$t0, - | m   | X60        | dou                  | X64                 | addi \$s0, \$s0, -1   | X58       | slti \$t0, \$s0, 0   | X5C       | beq \$zero, \$t0, -  | m | X60       | dou                  | X64       | addi \$s0, \$s0, -1  | X58       | slti \$t0, \$s0, 0   | X5C       |
|                                                                              | сгк       | (R#)       |         | 19                  | R16        | 20                  | R17 |            | 21                   | н<br>81<br>81<br>81 | 22                    | R19       | 23                   | 1         | 24                   |   | 1         | 25                   | ·         | 26                   | ·         | 27                   |           |

|                                                                             | WB_STAGE  | INSTR_WB  | Wr_D    | beq \$zero, \$t0 3       | DX        | dou                  | 0X           | addi \$s0, \$s0, -1  |        | XIIIIIII       | slti \$t0, \$s0, 0     |     | X         | beq \$zero, \$t0, -3   | QX        | dou                    | Q.        | slt \$t1, \$s0, \$zero | X         | bne \$t1, \$zero, 15 |   | 0X        |
|-----------------------------------------------------------------------------|-----------|-----------|---------|--------------------------|-----------|----------------------|--------------|----------------------|--------|----------------|------------------------|-----|-----------|------------------------|-----------|------------------------|-----------|------------------------|-----------|----------------------|---|-----------|
|                                                                             | WB        | LSNI      | Reg_D   | beq \$ze                 | X1f       |                      | DX           | addi \$s             |        | X10            | slti \$t0              |     | 82        | ped \$ze               | X1f       |                        | 2         | slt \$t1, §            | 6X        | bne \$t1             |   | 2         |
|                                                                             |           |           | Rd_D    |                          | R         |                      | XIIIIIII     |                      |        | X1             |                        |     | 2         |                        | 2         |                        | X         |                        | 2         |                      |   | 2         |
| ontinued)                                                                   | MEM STAGE | INSTR_MEM | Wr_D    | dou                      | DX        | addi \$s0, \$s0, -1  | XIIIIIII     | siti \$t0, \$s0, 0   |        | Q.             | beq \$zero, \$t0, -3   |     | X         | dou                    | Q.        | slt \$t1, \$s0, \$zero | DX        | bne \$t1, \$zero, 15   | X1        | dou                  |   | DX        |
| rification (c                                                               | 2         |           | Addr    |                          | DX        | ad                   | XIIIIIII     | N                    |        | X1             | peq                    |     | 0X        |                        | 0X        | sit                    | X1        | pue                    | 0X        |                      |   | R.        |
| tion Vel                                                                    |           |           | Ovr     |                          |           |                      | -            |                      |        | _              |                        |     | 2         |                        | -         |                        | -         |                        | 2         | -                    |   | 0         |
| on Operat                                                                   | ш         | ×         | Res     | <br>- 1                  | XIIIIIII  | 0                    | X            | 0,-3                 |        | <sup>o</sup> x |                        |     | Q.        | zero                   | ×         | 0, 15                  | ×         |                        | ×         |                      |   | XB0       |
| or Instructi                                                                | EX STAGE  | INSTR_EX  | ALU_B   | addi \$s0, \$s0, -1      | XIIIIIII  | slti \$t0, \$s0, 0   | <sup>®</sup> | beq \$zero, \$t0, -3 |        | ×              | dou                    |     | R         | slt \$t1, \$s0, \$zero | Q.        | bne \$t1, \$zero, 15   | X         | dou                    | Q.        | jal 0×001D           |   | XBO       |
| able 5.2: Timing Diagram for Instruction Operation Verification (continued) |           |           | ALU_A   |                          | R         |                      | XIIIIIII     |                      |        | R              |                        |     | R         |                        | XIIIIIII  |                        | R         |                        | R         |                      |   | 0X        |
| .2: Timinç                                                                  | GE        | ₽         | Incr_PC | s0' 0                    | X60       | \$t0, -3             | X64          |                      |        | X68            | , \$zero               |     | XBC       | ero, 15                | 02X       |                        | X74       | 010                    | XBO       |                      |   | XB4       |
| Table 5                                                                     | ID STAGE  | INSTR_ID  | Instr   | slti \$t0, \$s0, 0       | X2A080000 | beq \$zero, \$t0, -3 | X1008fffg    | dou                  |        | 0X             | slt \$t1, \$s0, \$zero |     | X0200482A | bne \$t1, \$zero, 15   | X1409000f | dou                    | 00000000X | jal 0x001D             | X0c00001D | dou                  |   | X00000000 |
|                                                                             | IF STAGE  | INSTR_IF  | curr_PC | beq \$zero, \$t0, -<br>3 | )<br>X60  | dou                  | X64          | slt \$t1, \$s0,      | \$zero | X68            | bne \$t1, \$zero,      | 15  | XBC       | dou                    | 02X       | jal 0×001D             | XAC       | dou                    | XBO       | addi \$s0, \$zero,   | - | X74       |
|                                                                             | CLK       | (R#)      |         | 28                       |           | 29                   |              | 8                    | R21    |                | 31                     | R20 |           | 32                     |           | 33                     |           | 34                     | R21       | 35                   |   |           |

|                                                                              | WB_STAGE  | INSTR_WB  | Wr_D    | dou                   |    | DX        | jal 0x001D            | XBO       | dou                   | <u>R</u>  | addi \$s0, \$zero, -1 | XIIIIIII  | addi \$s1, \$zero, -2 | Xiiiiiie  | multu \$s0, \$s1 | R         | mfhi \$t0   | XIIIIIId  | mflo \$t1   | X         | mthi \$zero        | <u>R</u>  | mtlo \$zero        | R.        |
|------------------------------------------------------------------------------|-----------|-----------|---------|-----------------------|----|-----------|-----------------------|-----------|-----------------------|-----------|-----------------------|-----------|-----------------------|-----------|------------------|-----------|-------------|-----------|-------------|-----------|--------------------|-----------|--------------------|-----------|
|                                                                              | WB        | LSNI      | Reg_D   |                       |    | R.        | jaloj                 | X1f       |                       | ×         | addi \$sC             | X10       | addi \$s1             | X11       | multu            | 2         | mf          | 82        | mf          | 6X        | mthi               | 2         | mtlo               | 2         |
|                                                                              |           |           | Rd_D    |                       |    | XBO       |                       | 2         |                       | XIIIIIII  |                       | Xffffffe  |                       | 2         |                  | XIIIIIII  |             | Z         |             | 2         |                    | 2         |                    | 2         |
| (continued)                                                                  | MEM STAGE | INSTR_MEM | Wr_D    | jal 0x001D            |    | XBO       | dou                   | 0X        | addi \$s0, \$zero, -1 | XIIIIIII  | addi \$s1, \$zero, -2 | XIIIIIIe  | multu \$s0, \$s1      | Xiiiiiie  | mfhi \$t0        | X4010     | mflo \$t1   | X4812     | mthi \$zero | X11       | mtlo \$zero        | X13       | mfhi \$s0          | X8010     |
| erification (                                                                | 2         | 2         | Addr    |                       |    | XBO       |                       | R.        | add                   | XIIIIIII  | addi                  | Xiiiiiie  | 3                     | R.        |                  | XIIIIIII  |             | X2        |             | <u>R</u>  | -                  | 2         |                    | R.        |
| ation Ve                                                                     |           |           | Ovr     |                       |    | 0         |                       | -         |                       | 0         |                       | 0         |                       | -         |                  | -         |             | -         |             | 0         |                    | 0         |                    | 0         |
| ion Opera                                                                    |           |           | Res     |                       |    | R         | 0, -1                 | XIIIIIII  | 0,-2                  | Xiiiiiie  | s1                    | R         |                       | XIIIIIId  |                  | 77        |             | R         |             | R         |                    | R         |                    | <u>x</u>  |
| for Instruct                                                                 | EX STAGE  | INSTR_EX  | ALU_B   | dou                   |    | 0X        | addi \$s0, \$zero, -1 | XIIIIIII  | addi \$s1, \$zero, -2 | Xiiiiiie  | multu \$s0, \$s1      | XIIIIIIe  | mfhi \$t0             | X4010     | mflo \$t1        | X4812     | mthi \$zero | X11       | mtlo \$zero | X13       | mfhi \$s0          | X8010     | mflo \$s1          | X8812     |
| Table 5.2: Timing Diagram for Instruction Operation Verification (continued) |           |           | ALU_A   |                       |    | R         |                       | R         |                       | 2         |                       | XIIIIIII  |                       | R         |                  | R         |             | R         |             | R         |                    | 2         |                    | DX        |
| 5.2: Timir                                                                   | GE        | ٩         | Incr_PC | zero, -1              |    | X78       | zero, -2              | X7C       | l, \$s1               | X80       | <del>1</del> 0        | X84       | t1                    | X88       | ero              | XBC       | ero         | 06X       | s0          | X94       | s1                 | X98       | \$s1,4             | X9C       |
| Table                                                                        | ID STAGE  | INSTR_ID  | Instr   | addi \$s0, \$zero, -1 |    | X2010ffff | addi \$s1, \$zero, -2 | X2011fffe | multu \$s0, \$s1      | X02110019 | mfhi \$t0             | X00004010 | mflo \$t1             | X00004812 | mthi \$zero      | X00000011 | mtlo \$zero | X00000013 | mfhi \$s0   | X00008010 | mflo \$s1          | X00008812 | addi \$s1, \$s1, 4 | X22310004 |
|                                                                              | IF STAGE  | INSTR_IF  | Curr_PC | addi \$s1, \$zero,    | -2 | X78       | multu \$s0, \$s1      | X7C       | mfhi \$t0             | 08X       | mflo \$t1             | X84       | mthi \$zero           | 88X       | mtlo \$zero      | XBC       | mfhi \$s0   | 06X       | mflo \$s1   | X94       | addi \$s1, \$s1, 4 | 86X       | sw \$t0, 0(\$s0)   | XBC       |
|                                                                              | СГК       | (R#)      |         | 36                    |    | •         | 37                    | •         | œ                     |           | 66                    | R22       | 40                    | R23       | 41               | ·         | 42          | R24       | 43          | R25       | 44                 |           | 45                 |           |

74

|                                                                               | WB_STAGE  | INSTR_WB  | Wr_D    | mfhi \$s0          | R         | mflo \$s1          | Q.        | addi \$s1, \$s1, 4 | ×         |     | sw \$tû, 0(\$s0) | Q         |     | sw \$t1, 0(\$s1) |           | jr \$ra          | Q.        | nop                 | Q         | dou                 |    | QX        | lw \$t2, 0(\$s0)      | XIIIIIId  |
|-------------------------------------------------------------------------------|-----------|-----------|---------|--------------------|-----------|--------------------|-----------|--------------------|-----------|-----|------------------|-----------|-----|------------------|-----------|------------------|-----------|---------------------|-----------|---------------------|----|-----------|-----------------------|-----------|
|                                                                               | WB_S      | INST      | Reg_D   | mfhi               | X10       | mflo               | X11 X     | addi \$s'          | X11 X     |     | sw \$t0          | DX        |     | sw \$t1,         | ×         | jr               | R N       | Ĕ                   | R         | 12                  |    | R         | lw \$t2,              | XA        |
|                                                                               |           |           | Rd_D    |                    | R         |                    | X4        |                    | R         |     |                  | X4        |     |                  | DX        |                  | 0X        |                     | R         |                     |    | XIIIIIII  |                       | 22        |
| continued)                                                                    | MEM STAGE | INSTR_MEM | Wr_D    | mflo \$s1          | X8812     | addi \$s1, \$s1, 4 | X4        | sw \$t0, 0(\$s0)   | XIIIIIII  |     | sw \$t1, 0(\$s1) | X2        |     | jr \$ra          | 8%        | dou              | DX        | dou                 | DX        | W \$t2, 0(\$s0)     |    | DX        | lw \$t3, 0(\$s1)      | R         |
| Trication (c                                                                  | Σ         | 2         | Addr    |                    | R.        | adi                | X4        | N.                 | DX        |     | ŝ                | X4        |     |                  | DX        |                  | DX        |                     | DX        | 2                   |    | DX        | 1                     | X4        |
| lion vei                                                                      |           |           | ovr     |                    | -         |                    | -         |                    | 0         |     |                  | 0         |     |                  | -         |                  | -         |                     | 0         |                     |    | -         |                       | 0         |
| on Operal                                                                     |           |           | Res     | 4                  | ×4        | 6                  | R         | (1                 | ×4        |     |                  | 2         |     |                  | R         |                  | R         | (                   | 2         |                     |    | X4        |                       | 2         |
| or instructi                                                                  | EX STAGE  | INSTR_EX  | ALU_B   | addi \$s1, \$s1, 4 | ×4        | sw \$t0, 0(\$s0)   | XIIIIIII  | sw \$t1, 0(\$s1)   | X2        |     | jr \$ra          | 8         |     | dou              | R.        | dou              | R.        | Iw \$t2, 0(\$s0)    |           | lw \$t3, 0(\$s1)    |    | R.        | sltiu \$t0, \$t2,     | XI        |
| I able 5.2: Himing Diagram for Instruction Operation Verification (continued) |           |           | ALU_A   |                    | R.        |                    | Q.        |                    | X4        |     |                  | XBO       |     |                  | QX        |                  | Q.        |                     | Q         |                     |    | X4        |                       | PIIIIIX   |
|                                                                               | GE        | ₽.        | Incr_PC | \$s0)              | XAO       | \$s1)              | XA4       |                    | XAB       |     |                  | XAC       |     |                  | XB4       | \$s0)            | XB8       | \$s1)               | XBC       | \$12, 1             |    | XCD       | ero, -45              | XC4       |
| l able 3                                                                      | ID STAGE  | INSTR_ID  | Instr   | sw \$t0, 0(\$s0)   | XAE080000 | sw \$t1, 0(\$s1)   | XAE290000 | jr \$ra            | X03E00008 |     | dou              | 00000000X |     | dou              | 00000000X | lw \$t2, 0(\$s0) | X8E0A0000 | lw \$t3, 0(\$s1)    | X8E2B0000 | sltiu \$t0, \$t2, 1 |    | X2D480001 | bne \$t0, \$zero, -45 | X1408ffd3 |
|                                                                               | IF STAGE  | INSTR_IF  | Curr_PC | sw \$t1, 0(\$s1)   | XAD       | jr \$ra            | XA4       | dou                | XA8       |     | dou              | XB0       |     | lw \$t2, 0(\$s0) | XB4       | lw \$t3, 0(\$s1) | XB8       | sltiu \$t0, \$t2, 1 | XBC       | bne \$t0, \$zero, - | 45 | XCI       | dou                   | XC4       |
|                                                                               | СLК       | (F#)      | 1       | 46                 | R26       | 47                 | R27       | 48                 | R28       | R29 | 49               | R30       | R31 | 50               | I         | 51               | I         | 52                  | I         | 53                  |    | I         | 54                    | R32       |

Table 5.2: Timing Diagram for Instruction Operation Verification (continued)

|                                                                             | WB_STAGE  | INSTR_WB  | Wr_D    | lw \$t3, 0(\$s1)      |        | X         | sltiu \$t0, \$t2, 1     |    | 2         | bne \$t0, \$zero, -45   | <u>R</u>  | dou                     | R         | sltu \$t0, \$t2, \$zero | ×         | bne \$t0, \$zero, -48 |     | DX        |
|-----------------------------------------------------------------------------|-----------|-----------|---------|-----------------------|--------|-----------|-------------------------|----|-----------|-------------------------|-----------|-------------------------|-----------|-------------------------|-----------|-----------------------|-----|-----------|
|                                                                             | WB        | ISNI      | Reg_D   | Iw \$t3               |        | ۳.        | sltiu \$1               |    | 8X        | bne \$t0,               | X1f       |                         | ×         | sltu \$t0,              | 82        | bne \$t0,             |     | X1f       |
|                                                                             |           |           | Rd_D    |                       |        | ×         |                         |    | R         |                         | R         |                         | R         |                         | R         |                       |     | 2         |
| continued)                                                                  | MEM STAGE | INSTR_MEM | Wr_D    | sltiu \$t0, \$t2, 1   |        | X         | bne \$t0, \$zero, -45   |    | DX        | dou                     | DX        | sltu \$t0, \$t2, \$zero | DX        | bne \$t0, \$zero, -48   | 0X        | dou                   |     | 0X        |
| crification (                                                               | 2         |           | Addr    |                       |        | R         | pue                     |    | R         |                         | R         | sltu                    | R         | pue                     | R         |                       |     | DX        |
| ition ve                                                                    |           |           | Ovr     |                       |        | -         |                         |    | -         |                         | 0         |                         | 2         |                         | 2         |                       |     | _         |
| on opera                                                                    |           |           | Res     | , -45                 |        | R         |                         |    | R         | zero                    | R         | 48                      | R         |                         | R         |                       |     | R         |
| able 5.2: Timing Diagram for Instruction Operation Verification (continued) | EX STAGE  | INSTR_EX  | ALU_B   | bne \$t0, \$zero, -45 |        | DX        | dou                     |    | R.        | sltu \$t0, \$t2, \$zero | R.        | bne \$t0, \$zero, -48   | R.        | dou                     | Q.        | j 0x0004              |     | X4        |
| g vlagram 1                                                                 |           |           | ALU_A   |                       |        | R.        |                         |    | R         |                         | XIIIIIId  |                         | R         |                         | R.        |                       |     | DX        |
| .Z: Timin                                                                   | GE        | ₽.        | Incr_PC |                       |        | XC8       | , \$zero                |    | XCC       | ero, -48                | 0QX       |                         | XD4       | 14                      | 80X       |                       |     | XDC       |
| Table 5                                                                     | ID STAGE  | INSTR_ID  | Instr   | dou                   |        | X00000000 | sltu \$t0, \$t2, \$zero |    | X0140402B | bne \$t0, \$zero, -48   | X1408ffd0 | dou                     | X00000000 | j 0×0004                | X08000004 | dou                   |     | X00000000 |
|                                                                             | IF STAGE  | INSTR_IF  | Curr_PC | sltu \$t0, \$t2,      | \$zero | XC8       | bne \$t0, \$zero, -     | 48 | XCC       | dou                     | DQX       | j 0×0004                | XD4       | dou                     | XD8       | addi \$s0, \$zero,    | 9   | X10       |
|                                                                             | CLK       | (#2)      | ·       | 55                    | R33    | R35       |                         | 56 | R34       | 22                      |           | 58                      | R37       | 59                      | R36       | 99                    | R38 |           |

1 \$ on Varification Tahle 5 2<sup>.</sup> Timing Dia

### 5.2. Verification of Hazard Detection and Handling

The test program given in Table 5.3 is downloaded to processor to demonstrate that Data Hazards are resolved using the feedback paths between stages. The pipeline is halted in case of the presence of an unresolved hazard. A requirement number (as R#) is given in the comment section of the code and the clock cycle in which the requirement is fulfilled is pointed out in the first column of Table 5.4.

| ######################################                                                                                                                                                                                                                                                              | ******                                                                                                                        |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------|
| <pre># Created by Can Altıniğneli # To demonstrate data hazards are con ####################################</pre>                                                                                                                                                                                  | ,                                                                                                                             |
| beq <i>\$zero, \$zero,</i> <u>UNDEFINED</u><br>nop                                                                                                                                                                                                                                                  | # UNDEFINED EXCEPTION VECTOR                                                                                                  |
| OVERFLOW:<br>beq \$zero, \$zero, OVERFLOW<br>nop                                                                                                                                                                                                                                                    | # OVERFLOW EXCEPTION VECTOR                                                                                                   |
| START:<br>add \$a0, \$zero,\$zero<br>addi \$t1, \$zero, 5<br>addi \$t0, \$zero, 1<br>nop<br>nop<br>nop                                                                                                                                                                                              | # \$a0 shall = 0, DestAdr:x4, <b>R1</b><br># \$t1 shall = 5, DestAdr:x9, <b>R2</b><br># \$t0 shall = 1, DestAdr:x8, <b>R3</b> |
| # Feedback path exists between ID and<br># result is written to destination register<br># that there is no need to wait until to V<br># architecture correctly handles up-to-c<br>add \$t0, \$t0, \$t0<br>add \$t0, \$t0, \$t0<br>add \$t0, \$t0, \$t0<br>add \$t0, \$t0, \$t0<br>nop<br>nop<br>nop | r. The code snippet below shows<br>VB stage of an instruction and                                                             |
| # Feedback path exists between ID and<br>subi \$t0, \$t0, 1<br>subi \$t0, \$t0, 3<br>nop<br>nop<br>nop                                                                                                                                                                                              | d EX stages.Data Hazard resolved, <b>R5</b><br># \$t0 = \$t0 - 1, \$t0 shall = xF<br># \$t0 = \$t0 - 3, \$t0 shall = xC       |

| Table 5.3: Verification of Hazard Detection and Han | dling |
|-----------------------------------------------------|-------|
|-----------------------------------------------------|-------|

| # Feedback path exists between ID and<br>subi \$t0, \$t0, 1<br>nop                 | d MEM stages.Data Hazard resolved, <b>R6</b><br># \$t0 = \$t0 - 1, \$t0 shall = xB |
|------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| subi <i>\$t0, \$t0,</i> 3                                                          | # \$t0 = \$t0 - 3, \$t0 shall = x8                                                 |
| nop                                                                                |                                                                                    |
| nop<br>nop                                                                         |                                                                                    |
|                                                                                    |                                                                                    |
| # Feedback path exists between ID and                                              | 0                                                                                  |
| <b>subi</b> <i>\$t0</i> , <i>\$t0</i> , 2                                          | # \$t0 = \$t0 - 3, \$t0 shall = x5                                                 |
| nop<br>nop                                                                         |                                                                                    |
| subi \$t0, \$t0, 4                                                                 | # \$t0 = \$t0 - 5, \$t0 shall = x2                                                 |
| nop                                                                                |                                                                                    |
| nop<br>nop                                                                         |                                                                                    |
| <b>sw</b> <i>\$t0</i> , 0( <i>\$a0</i> )                                           | # MEM[0] shall store x2                                                            |
| nop                                                                                |                                                                                    |
| пор                                                                                |                                                                                    |
| # Although feedback path exists betwee                                             | 0                                                                                  |
| # Data Hazard can not be resolved by t<br># is inserted between "add" and "lw" ins |                                                                                    |
| # by feedback path between ID and ME                                               |                                                                                    |
| lw \$t0, 0(\$a0)                                                                   | # MEM[0]>\$t0 = x2, DestAdr:8                                                      |
| add \$t2, \$t0, \$t1                                                               | #\$t2 = \$t0 + \$t1, \$t2 shall = x7, DestAdr:10, <b>R9</b>                        |
| Eternity:                                                                          |                                                                                    |
| beq \$zero, \$zero, <u>Eternity</u> # Infinite                                     | Loop                                                                               |
| nop                                                                                |                                                                                    |

Results of operations and contents of stages are read by using MIPS Monitor software and results are tabulated in Table 5.4.

| WB STAGE  | INSTR_WB  | D Reg_D Wr_D    |                          |               | ·                    |               | -                        | •             | add \$a0, \$zero, \$zero | X4 X0        | addi \$t1, \$zero, 5 | X9 X5        | addi \$t0, \$zero, 1 | X8 X1        | dou                  | 0X<br>0X          | dou                  | 0X<br>0X      | dou                  | 0X<br>0X          | add \$t0, \$t0, \$t0 | X8 X2         | add \$t0, \$t0, \$t0 | X8 X4        | add \$t0, \$t0, \$t0 | 8X 8X        |
|-----------|-----------|-----------------|--------------------------|---------------|----------------------|---------------|--------------------------|---------------|--------------------------|--------------|----------------------|--------------|----------------------|--------------|----------------------|-------------------|----------------------|---------------|----------------------|-------------------|----------------------|---------------|----------------------|--------------|----------------------|--------------|
| MEM STAGE | INSTR_MEM | ar Wr_D Rd_D    | -                        | •             | -                    | ·             | add \$a0, \$zero, \$zero |               | addi \$t1, \$zero, 5     | 5 X5 X5      | addi \$t0, \$zero, 1 | X1 X1        | dou                  |              | dou                  |                   | dou                  |               | add \$t0, \$t0, \$t0 | 2 X1 X2           | add \$t0, \$t0, \$t0 | t X2 X4       | add \$t0, \$t0, \$t0 | 3 X4 X8      | add \$t0, \$t0, \$t0 |              |
|           |           | Res Ovr Addr    | -                        | 1<br>1<br>1   | szero                | - 0 0X        | 12                       | X5 0 X0       |                          | X1 0 X5      | -                    | X0 0 X1      | -                    |              | -                    |                   | to                   | X2 0 X0       |                      | X4 0 X2           | Ę.                   | X8 0 X4       | to                   | X10 0 X8     | _                    | V0 0         |
| EX STAGE  | INSTR_EX  | ALU_A ALU_B     | add \$a0, \$zero, 9      | OX OX         | addi \$t1, \$zero, 5 | X0 X5         | addi \$t0, \$zero, 1     | X0 X1         | dou                      | OX OX        | dou                  | OX OX        | dou                  | OX OX        | add \$t0, \$t0, \$t0 | X1 X1 X1          | add \$t0, \$t0, \$t0 | X2 X2         | add \$t0, \$t0, \$t0 | X4 X4 X4          | add \$t0, \$t0, \$t0 | X8 X8         | dou                  | 07           |                      |              |
| ID STAGE  | INSTR_ID  | Instr Incr_PC / | add \$a0, \$zero, \$zero | X00002020 X14 | addi \$t1, \$zero, 5 | X20090005 X18 | addi \$t0, \$zero, 1     | X20080001 X1C | dou                      | X0000000 X20 | dou                  | X0000000 X24 | dou                  | X0000000 X28 | add \$t0, \$t0, \$t0 | X01084020 X2C X2C | add \$t0, \$t0, \$t0 | X01084020 X30 | add \$t0, \$t0, \$t0 | X01084020 X34 X34 | add \$t0, \$t0, \$t0 | X01084020 X38 | dou                  | X0000000 X3C | dou                  | V0000000 V40 |
| IF STAGE  | INSTR_IF  | Curr_PC         | addi \$t1, \$zero, 5     | X14           | addi \$t0, \$zero, 1 | X18           | dou                      | X1C           | dou                      | X20          | dou                  | X24          | add \$t0, \$t0, \$t0 | X28          | add \$t0, \$t0, \$t0 | X2C               | add \$t0, \$t0, \$t0 | X30           | add \$t0, \$t0, \$t0 | X34               | dou                  | X38           | dou                  | X3C          | dou                  | V40          |
| CLK       | (R#)      | <u> </u>        | -                        |               | 2                    |               | m                        |               | 4                        | Έr           | ۍ                    | R2           | 9                    | R3           | 2                    |                   | œ                    |               | 6                    |                   | 10                   |               | 1                    |              | 12                   |              |

# Table 5.4: Timing Diagram for Handling Hazard Verification

| WB_STAGE  | INSTR_WB  | Wr_D    | add \$t0, \$t0, \$t0 | X10       | dou                 | 0X        | dou                 | OX        | dou                 | 0X        | addi \$t0, \$t0, -1 | Xf        | addi \$t0, \$t0, -3 | XC       | dou                 | DX        | dou                 | X         | dou                 | 0X             | addi \$t0, \$t0, -1 | Ж         | dou                 | 0X        | addi \$t0, \$t0, -3 | 8X        |
|-----------|-----------|---------|----------------------|-----------|---------------------|-----------|---------------------|-----------|---------------------|-----------|---------------------|-----------|---------------------|----------|---------------------|-----------|---------------------|-----------|---------------------|----------------|---------------------|-----------|---------------------|-----------|---------------------|-----------|
| M         | 2         | Reg_D   | add                  | 8X        |                     |           |                     | 0X        |                     | DX        | add                 | 8X        | add                 | 8X       |                     | DX        |                     | X         |                     |                | add                 | 8X        |                     | DX        | add                 | 8X        |
|           |           | Rd_D    |                      | 0X        |                     | 0X        |                     | 0X        | -                   | ×f        | n                   | xc        |                     | DX       |                     | 0X        |                     | DX        | <del>, -</del>      | ЯX             |                     | 0X        | n                   | 8X        |                     | 0X        |
| MEM STAGE | INSTR_MEM | Wr_D    | dou                  | X         | dou                 |           | dou                 | 0X        | addi \$t0, \$t0, -1 | XIIIIIII  | addi \$t0, \$t0, -3 | Xmmmd     | dou                 | DX       | dou                 | DX        | dou                 |           | addi \$t0, \$t0, -1 | XIIIIIII       | dou                 |           | addi \$t0, \$t0, -3 | Xiiiiiiid | dou                 | DX        |
| 2         | -         | Addr    |                      | 0X        |                     | 0X        |                     | 0X        | ac                  | ž         | ac                  | xc        |                     | 0X       |                     | 0X        |                     | ox.       | ac                  | ВX             |                     | 0X        | ac                  | 8X        |                     | DX        |
|           |           | ovr     |                      | 0         |                     | 0         |                     | 0         |                     | 0         |                     | 0         |                     | 0        |                     | 0         |                     | 0         |                     | 0              |                     | 0         |                     | 0         |                     | 0         |
| GE        | EX        | Res     |                      | 0X        |                     |           | to, -1              | Xf        | ť0, -3              | с<br>Х    |                     | DX        |                     | 0X       |                     | 0X        | ťto, -1             | Щ×        |                     | DX             | ť0, -3              | 8X        |                     | DX        |                     | 0X        |
| EX STAGE  | INSTR_EX  | ALU_B   | dou                  | 0X        | dou                 |           | addi \$t0, \$t0, -1 | XIIIIIII  | addi \$t0, \$t0, -3 | Xiiiiiiid | dou                 |           | dou                 | 0X       | dou                 | 0X        | addi \$t0, \$t0, -1 | XIIIIIII  | dou                 | 0X             | addi \$t0, \$t0, -3 | Xiiiiiid  | dou                 | 0X        | dou                 | 0X        |
|           |           | ALU_A   |                      | 0X        |                     | 0X        |                     | X10       |                     | ž         |                     | 0X        |                     | 0X       |                     | 0X        |                     | ž         |                     | <sup>o</sup> x |                     | BX        |                     | 0X        |                     | 0X        |
| H.        | 0         | Incr_PC |                      | X44       | .0, -1              | X48       | 0, -3               | X4C       |                     | X50       |                     | X54       |                     | X58      | .0, -1              | X5C       |                     | 09X       | 0, -3               | X64            |                     | X68       |                     | X6C       |                     | X70       |
| ID STAGE  | INSTR_ID  | Instr   | dou                  | 00000000X | addi \$t0, \$t0, -1 | X2108ffff | addi \$t0, \$t0, -3 | X2108fffd | dou                 | 00000000X | dou                 | 00000000X | dou                 | 0000000X | addi \$t0, \$t0, -1 | X2108ffff | dou                 | 00000000X | addi \$t0, \$t0, -3 | X2108fffd      | dou                 | 00000000X | dou                 | 00000000X | dou                 | 00000000X |
| IF STAGE  | INSTR_IF  | Curr_PC | addi \$t0, \$t0, -1  | X44       | addi \$t0, \$t0, -3 | X48       | dou                 | X4C       | dou                 | X50       | dou                 | X54       | addi \$t0, \$t0, -1 | X58      | dou                 | X5C       | addi \$t0, \$t0, -3 | 09X       | dou                 | X64            | dou                 | X68       | dou                 | X6C       | addi \$t0, \$t0, -2 | X70       |
| сгк       | (R#)      |         | 13                   | R4        | 14                  |           | 15                  | R5        | 16                  |           | 17                  |           | 18                  | <u> </u> | 19                  | L         | 20                  |           | 21                  | R6             | 22                  |           | 23                  | <u> </u>  | 24                  |           |

| <u> </u> |
|----------|
| ñ        |
| Ŀ        |
| E.       |
| ĕ        |
| 5        |
| g        |
| <u>3</u> |
| Ë        |
| Š        |
| ē        |
| zal      |
| a l      |
| 6        |
| Ē.       |
| þ        |
| a        |
| ÷        |
| ę        |
| E        |
| gra      |
| a,       |
| <u> </u> |
| bū       |
| Ē        |
| Ē        |
| 4        |
| e 5.4    |
| ole      |
| Tat      |
| -        |

| WB_STAGE  | INSTR_WB  | Wr_D    | dou                 | 0X        | dou                 | 0×        | dou                 | 0X       | addi \$t0, \$t0, -2 | 9X        | dou                 | ×        | dou                | 0X       | addi \$t0, \$t0, -4 | X2        | dou              | 0X        | dou              | 0X        | dou              | DX        | sw \$t0, 0(\$a0)     | 0X        | dou                    | ×         |
|-----------|-----------|---------|---------------------|-----------|---------------------|-----------|---------------------|----------|---------------------|-----------|---------------------|----------|--------------------|----------|---------------------|-----------|------------------|-----------|------------------|-----------|------------------|-----------|----------------------|-----------|------------------------|-----------|
| 3         | =         | Reg_D   |                     | 0X        |                     | DX        |                     | 0X       | ado                 | 8X        |                     | 0X       |                    | 0X       | ado                 | 8X        |                  | 0X        |                  | 0X        |                  | X         | SW                   | 0X        |                        | DX        |
|           |           | Rd_D    |                     | QX        |                     | Ŗ         | 2                   | Уб       |                     | 2         |                     | DX       | 4                  | X2       |                     | DX        |                  | DX        |                  | DX        |                  | ₽.        |                      | QX        |                        | ₽         |
| MEM STAGE | INSTR_MEM | Wr_D    | dou                 | 0X        | dou                 | 0X        | addi \$t0, \$t0, -2 | XIIIIIe  | dou                 | 0X        | dou                 | 0X       | addi \$t0, \$t0, - | XIIIIIIC | dou                 | 0X        | dou              | 0X        | dou              | 0X        | sw \$t0, 0(\$a0) | X2        | dou                  | OX        | dou                    | DX        |
| 2         | -         | Addr    |                     | 0X        |                     | DX        | ac                  | X6       |                     |           |                     | DX       | ac                 | X2       |                     | 0X        |                  | 0X        |                  | 0X        | ι<br>ν           |           |                      | 0X        |                        | DX        |
|           |           | ovr     |                     | 0         |                     | 0         |                     | 0        |                     | 0         |                     | 0        |                    | 0        |                     | 0         |                  | 0         |                  | 0         |                  | 0         |                      | 0         |                        | 0         |
| GE        | EX        | Res     |                     | 0X        | t0, -2              | 9X        |                     | 0X       |                     |           | t0, -4              | X2       |                    | 0X       |                     | 0X        |                  | 0X        | \$a0)            | 0X        |                  |           |                      | 0X        | \$a0)                  | DX        |
| EX STAGE  | INSTREX   | ALU_B   | dou                 | 0X        | addi \$t0, \$t0, -2 | Xiiiiiie  | dou                 | 0X       | dou                 | 0X        | addi \$t0, \$t0, -4 | XIIIIIIC | dou                | 0X       | dou                 | 0X        | dou              | 0X        | sw \$t0, 0(\$a0) | X2        | dou              | 0X        | dou                  | 0X        | lw \$t0, 0(\$a0)       | DX        |
|           |           | ALU_A   |                     | 0X        |                     | 8X        |                     | 0X       |                     | 0X        |                     | 9X       |                    | 0X       |                     | 0X        |                  | 0X        |                  | 0X        |                  | 0X        |                      | οx        |                        | DX        |
| E.        | 0         | Incr_PC | 0, -2               | X74       |                     | X78       |                     | X7C      | 0, -4               | X80       |                     | X84      |                    | X88      |                     | XBC       | \$a0)            | X90       |                  | X94       |                  | X98       | 3aO)                 | X9C       | 0, \$t1                | XAD       |
| ID STAGE  | INSTR_ID  | Instr   | addi \$t0, \$t0, -2 | X2108fffe | dou                 | 00000000X | dou                 | 0000000X | addi \$t0, \$t0,    | X2108fffc | dou                 | 0000000X | dou                | 0000000X | dou                 | 00000000X | sw \$t0, 0(\$a0) | XAC880000 | dou              | 00000000X | dou              | 00000000X | Iw \$t0, 0(\$a0)     | X8C880000 | add \$t2, \$t0, \$t1   | X01095020 |
| IF STAGE  | INSTR_IF  | Curr_PC | dou                 | X74       | dou                 | X78       | addi \$t0, \$t0, -4 | X7C      | dou                 | X80       | dou                 | X84      | dou                | X88      | sw \$t0, 0(\$a0)    | XBC       | dou              | 06X       | dou              | X94       | Iw \$t0, 0(\$a0) | X98       | add \$t2, \$t0, \$t1 | X9C       | beq \$zero, \$zero, -1 | DAX       |
| СLК       | (R#)      |         | 25                  | I         | 26                  | 1         | 27                  | I        | 28                  | R7        | 29                  | I        | 90                 | 1        | 31                  | I         | 32               | I         | ŝ                | I         | 34               | 1         | 35                   | I         | 36                     | 1         |

| ation(continued) |  |
|------------------|--|
| Hazard Verific   |  |
| for Handling     |  |
| ng Diagram       |  |
| Table 5.4: Timi  |  |

| WB_STAGE  | INSTR_WB  | Wr_D                | dou                  | QX        | lw \$t0, 0(\$a0)       | X2        | dou                    | DX        | add \$t2, \$t0, \$t1   | X7        |
|-----------|-----------|---------------------|----------------------|-----------|------------------------|-----------|------------------------|-----------|------------------------|-----------|
| ×         | Z         | Reg_D               |                      | OX        | M                      | 8X        |                        | 0X        | add                    | ΑX        |
|           | _         | Rd_D                |                      | X2        |                        | QX        | Ħ                      | ۲X        |                        | 0X        |
| MEM STAGE | INSTR_MEM | Wr_D                | lw \$t0, 0(\$a0)     | 0X        | dou                    | 0X        | add \$t2, \$t0, \$t1   | X5        | beq \$zero, \$zero, -1 | QX        |
| ~         | -         | Addr                | =                    | 0X        |                        | 0X        | ad                     | X7        | ped                    | DX        |
|           |           | ovr                 |                      | 0         |                        | 0         |                        | 0         |                        | 0         |
| ЭE        | EX        | Res                 |                      | Ω.        | 0, \$t1                | X7        | zero, -1               | ×         |                        | R         |
| EX STAGE  | INSTR_EX  | ALU_B               | dou                  | 0X        | add \$t2, \$t0, \$t1   | \$5       | beq \$zero, \$zero, -1 | DX        | dou                    | 0X        |
|           |           | ALU_A               |                      | 0X        |                        | X2        |                        | 0X        |                        | 0X        |
| GE        | ₽,        | Incr_PC             | t0, \$t1             | XAD       | Szero, -1              | XA4       |                        | XA8       | Szero, -1              | XA4       |
| ID STAGE  |           | Instr               | add \$t2, \$t0, \$t1 | X01095020 | beq \$zero, \$zero, -1 | X1000ffff | dou                    | 00000000X | beq \$zero, \$zero, -1 | X1000ffff |
| IF STAGE  | INSTR_IF  | INSTR_IF<br>Curr_PC |                      | XAD       | dou                    | XA4       | beq \$zero, \$zero, -1 | XAD       | dou                    | XA4       |
| СLК       | (R#)      |                     | 37                   | 82        | 8                      | 1         | 66                     | 1         | 40                     | 62        |

# Table 5.4: Timing Diagram for Handling Hazard Verification(continued)

### 5.3. Verification of Exception Handling

First, the test program given in Table 5.5 is downloaded to processor to demonstrate that "ADDU" and "ADD" instructions generate exceptions according to definitions in APPENDIX A, Implemented Subset of MIPS R2000 ISA.

A requirement number (as R#) is given in the comment section of the code and the clock cycle in which the requirement is fulfilled is pointed out in the first column of Table 5.6.

Table 5.5: Verification of Exception Handling "ADDU" and "ADD"

| ####################################### | #######################################                          |
|-----------------------------------------|------------------------------------------------------------------|
| #                                       |                                                                  |
| # TEST 3                                |                                                                  |
| #                                       |                                                                  |
| # Created by Can Altıniğneli            |                                                                  |
| , .                                     | structions generate overflow exceptions according to APPENDIX A. |
| ####################################### | <b>o</b>                                                         |
| UNDEFINED:                              |                                                                  |
| beg \$zero, \$zero, UNDEFINED           | # UNDEFINED EXCEPTION VECTOR                                     |
| nop                                     |                                                                  |
|                                         |                                                                  |
| OVERFLOW:                               |                                                                  |
| beg \$zero, \$zero, OVERFLOW            | # OVERFLOW EXCEPTION VECTOR                                      |
| nop                                     |                                                                  |
|                                         |                                                                  |
| START:                                  |                                                                  |
| add \$t0, \$zero,\$zero                 | # \$t0 shall = 0                                                 |
| lui <i>\$t0</i> , 0x8000                | # \$t0 shall = x8000 0000, DestAdr:8, <b>R1</b>                  |
| addu \$t1, \$t0, \$t0                   | # \$t1 shall = x0000 0000, DestAdr:9, No Exception shall be      |
| generated, R2                           |                                                                  |
| add \$t0, \$t0, \$t0                    | # \$t0 shall = x0000 0000, DestAdr:8, Exception shall be         |
| generated,                              |                                                                  |
|                                         | # and pipeline register blocks IF_ID, ID_EX and EX_MEM are       |
| flushed, R3                             |                                                                  |
| Eternity:                               |                                                                  |
| beq \$zero, \$zero, <u>Eternity</u>     |                                                                  |
| nop                                     |                                                                  |
|                                         |                                                                  |

Results of operations and contents of stages are read by using MIPS Monitor software and results are tabulated in Table 5.6.

| ЧD       |
|----------|
| and      |
| В        |
| 8        |
| A        |
| ٩        |
| g        |
| ÷        |
| P        |
| a        |
| Т        |
| tion     |
| e        |
| ы        |
| Ω        |
| for      |
| ε        |
| ra<br>B  |
| ag       |
| ö        |
| 6        |
| .⊑       |
| <u>,</u> |
| F        |
| ö        |
| Ъ.       |
| e        |
| ap       |
| Ē        |
|          |

| WB_STAGE  | INSTR_WB   | Wr_D    |                          |           |                          |      |           |                          |      |           | add \$t0, \$zero, \$zero |            | Q          | lui \$t0, 32768        | X8000000   |    | addu \$t1, \$t0, \$t0  |            | QX       | add \$t1, \$t0, \$t0   | Q         |
|-----------|------------|---------|--------------------------|-----------|--------------------------|------|-----------|--------------------------|------|-----------|--------------------------|------------|------------|------------------------|------------|----|------------------------|------------|----------|------------------------|-----------|
| MB        | ISN        | Reg_D   |                          |           |                          |      |           |                          |      |           | add \$t0, \$             |            | 8          | 111 \$10               | 82         |    | addu \$t               |            | £        | add \$t1               | <u>R</u>  |
|           |            | Rd_D    |                          |           |                          |      |           |                          |      | Q         |                          |            | X8000000   |                        | R          |    |                        |            | QX       |                        | Q         |
| MEM STAGE | INSTR_MEM  | Wr_D    |                          |           |                          |      |           | add \$t0, \$zero, \$zero |      | R         | lui \$t0, 32768          |            | X8000      | addu \$t1, \$t0, \$t0  | X8000000 X |    | add \$t1, \$t0, \$t0   |            | QX       | beq \$zero, \$zero, -1 | <u>R</u>  |
|           |            | Addr    |                          |           |                          |      |           | ad                       |      | Q         |                          |            | X8000000   |                        | QX         |    |                        |            | 0X       |                        | Q         |
|           |            | 0vr     |                          |           |                          |      | -         |                          |      | 0         |                          |            | 0          |                        | -          |    |                        |            | -        |                        |           |
|           |            | Res     |                          |           | \$zero                   |      | QX        |                          |      | X8000000  | \$10                     |            | Q.         | \$10                   | R          |    | ro, -1                 |            | QX       |                        | 2         |
| EX STAGE  | INSTR_EX   | ALU_B   |                          |           | add \$t0, \$zero, \$zero |      | QX        | lui \$t0, 32768          |      | X8000     | addu \$t1, \$t0, \$t0    |            | X8000000 X | add \$t1, \$t0, \$t0   | X8000000   |    | beq \$zero, \$zero, -1 |            | 0X       | dou                    | Q         |
|           |            | ALU_A   |                          |           |                          |      | DX        |                          |      | Q         |                          |            | X8000000   |                        | X8000000   |    |                        |            | 0X       |                        | Q.        |
| 36        | <u>a</u> . | Incr_PC | o, \$zero                | X14       | 2768                     |      | X18       | t0, \$t0                 |      | X1C       | 0, \$10                  |            | X20        | zero, -1               | X24        |    |                        |            | Q.       | zero, -1               | XC        |
| ID STAGE  | INSTR_ID   | Instr   | add \$t0, \$zero, \$zero | X00004020 | lui \$t0, 32768          |      | X3C088000 | addu \$t1, \$t0, \$t0    |      | X01084821 | add \$t1, \$t0, \$t0     |            | X01084820  | beq \$zero, \$zero, -1 | X1000ffff  |    | dou                    |            | 0000000X | beq \$zero, \$zero, -1 | X1000ffff |
| IF STAGE  | INSTR_IF   | Curr_PC | lui \$t0, 32768          | X14       | addu \$t1, \$t0,         | \$10 | X18       | add \$t1, \$t0,          | \$10 | X1C       | beq \$zero,              | \$zero, -1 | X20        | dou                    | X24        |    | beq \$zero,            | \$zero, -1 | 82       | dou                    | XC        |
| CLK       | (歴)        | 1       | -                        | 1         | 2                        |      | L         | m                        |      | 1         | 4                        | 22         | 1          | ഹ                      | 2          | 82 | ۵                      | 22         | 8        | ~                      | 1         |

After verifying "ADDU" and "ADD" instructions exception handling mechanism, the test program given in Table 5.7 is downloaded to processor to demonstrate that "SUBU" and "SUB" instructions generate exceptions according to definitions in APPENDIX A, Implemented Subset of MIPS R2000 ISA.

A requirement number (as R#) is given in the comment section of the code and the clock cycle in which the requirement is fulfilled is pointed out in the first column Table 5.8.

Table 5.7: Verification of Exception Handling "SUBU" and "SUB"

| ****                                                                                                                                                                                                                                                                                                                                                                                                   |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| #                                                                                                                                                                                                                                                                                                                                                                                                      |  |
| # TEST_4                                                                                                                                                                                                                                                                                                                                                                                               |  |
| #                                                                                                                                                                                                                                                                                                                                                                                                      |  |
| # Created by Can Altıniğneli                                                                                                                                                                                                                                                                                                                                                                           |  |
| # To demonstrate ADDU and ADD instructions generate overflow exceptions according to APPENDIX A.                                                                                                                                                                                                                                                                                                       |  |
| UNDEFINED                                                                                                                                                                                                                                                                                                                                                                                              |  |
| beq \$zero, \$zero, UNDEFINED # UNDEFINED EXCEPTION VECTOR                                                                                                                                                                                                                                                                                                                                             |  |
| пор                                                                                                                                                                                                                                                                                                                                                                                                    |  |
| OVERFLOW:<br>beq \$zero, \$zero, OVERFLOW # OVERFLOW EXCEPTION VECTOR<br>nop                                                                                                                                                                                                                                                                                                                           |  |
| START:       # \$t0 shall = 0         add       \$t0, \$zero, \$zero       # \$t0 shall = 0         lui       \$t0, 0x8000       # \$t0 shall = x8000_0000, DestAdr:8         addi       \$t1, \$zero, 1       # \$t1 shall = 1, DestAdr:9         subu       \$t2, \$t0, \$t1       # No Exception shall be generated, R1         sub       \$t2, \$t0, \$t1       # Exception shall be generated, R2 |  |
| <u>Eternity</u> :<br>beq <i>\$zero, \$zero,</i> <u>Eternity</u>                                                                                                                                                                                                                                                                                                                                        |  |
| nop                                                                                                                                                                                                                                                                                                                                                                                                    |  |

Results of operations and contents of stages are read by using MIPS Monitor software and results are tabulated in Table 5.8.

| AGE            | WB         | Wr_D    |                          |           |                          |           |                          |           | ro, \$zero               | <u>R</u>  | 2768                   | X8000000  | zero, 1                | X1           | 10, \$1                | X7 fiffiff | t0, \$t1               | XO        |
|----------------|------------|---------|--------------------------|-----------|--------------------------|-----------|--------------------------|-----------|--------------------------|-----------|------------------------|-----------|------------------------|--------------|------------------------|------------|------------------------|-----------|
| WB_STAGE       | INSTR_WB   | Reg_D   | .                        |           |                          |           | .                        |           | add \$t0, \$zero, \$zero |           | lui \$t0, 32768        | 8X<br>8X  | addi \$t1, \$zero, 1   | 6X           | subu \$t2, \$t0, \$t1  | ΥX<br>Υ    | sub \$t2, \$t0, \$t1   | OX        |
|                |            | Rd_D    |                          |           |                          |           | zero                     | R         |                          | X8000000  |                        | X         | 11                     | X7 fiffiff   | -                      | R          |                        | DX        |
| MEM STAGE      | INSTR_MEM  | Wr_D    |                          |           |                          |           | add \$t0, \$zero, \$zero | <u>R</u>  | lui \$t0, 32768          | X8000     | addi \$t1, \$zero,     | ×         | subu \$t2, \$t0, \$t   | X            | sub \$t2, \$t0, \$t1   | <u>R</u>   | beq \$zero, \$zero, -1 | 0X        |
| E EX STAGE MEM |            | Addr    |                          |           |                          |           | add (                    | ×         |                          | X8000000  | adi                    | X1        | sul                    | X7 fiffiff   | TIS I                  | ×          | ped                    | OX        |
| ·              |            | Ovr     |                          |           |                          | 0         |                          | 0         |                          | 0         |                        | 0         |                        | <del>.</del> |                        | 0          |                        | 0         |
| 6E             | R_EX       | Res     |                          |           | o, \$zero                | Q         | 768                      | X8000000  | ero, 1                   | X         | it2, \$t0, \$t1        | X7ffffff  | 0, \$t1                | X7 ffffff    | zero, -1               | Q          |                        | OX        |
| EX STAGE       | INSTR_EX   | ALU_B   |                          |           | add \$t0, \$zero, \$zero | DX        | lui \$t0, 32768          | X8000     | addi \$t1, \$zero, 1     | X1        | subu \$t2, \$t0, \$t1  | X1        | sub \$t2, \$t0, \$t1   | X            | beq \$zero, \$zero, -1 | Q.         | dou                    | 0X        |
|                |            | ALU_A   | -                        |           |                          | Q.        |                          | R.        |                          | ,<br>R    | -                      | X8000000  |                        | X8000000     |                        | DX         | -                      | OX        |
| GE             | <u>e</u> , | Incr_PC | o, \$zero                | X14       | 2768                     | X18       | cero, 1                  | X1C       | t0, \$t1                 | X20       | 0, \$t1                | X24       | zero, -1               | X28          |                        | DX         | zero, -1               | XC        |
| ID STAGE       | INSTR_ID   | Instr   | add \$t0, \$zero, \$zero | X00004020 | lui \$t0, 32768          | X3C088000 | addi \$t1, \$zero, 1     | X20090001 | subu \$t2, \$t0, \$t1    | X01095023 | sub \$t2, \$t0, \$t1   | X01095022 | beq \$zero, \$zero, -1 | X1000ffff    | dou                    | 0000000X   | beq \$zero, \$zero, -1 | X1000ffff |
| IF STAGE       | INSTR_IF   | Curr_PC | lui \$t0, 32768          | X14       | addi \$t1, \$zero, 1     | X18       | subu \$t2, \$t0, \$t1    | X1C       | sub \$t2, \$t0, \$t1     | X20       | beq \$zero, \$zero, -1 | X24       | dou                    | X28          | beq \$zero, \$zero, -1 | 82         | dou                    | XC        |
| CLK            | (R#)       |         | -                        | •         | 2                        |           | m                        | 1         | 4                        |           | ഹ                      | Έ         | ى                      | 22           | ~                      | 8          | œ                      | -         |

Table 5.8: Timing Diagram for Exception Handling of SUBU and SUB

Lastly, to verify "ADDIU" and "ADDI" instructions exception handling mechanism, the test program given in Table 5.9 is downloaded to processor. A requirement number (as R#) is given in the comment section of the code and the clock cycle in which the requirement is fulfilled is pointed out in the first column in Table 5.10.

### Table 5.9: Verification of Exception Handling "ADDIU" and "ADDI"

| #######################################                     | ******                                                                            |
|-------------------------------------------------------------|-----------------------------------------------------------------------------------|
|                                                             |                                                                                   |
| # TEST_5<br>#                                               |                                                                                   |
| # Created by Can Altıniğneli                                |                                                                                   |
| # To demonstrate ADDIU and ADDI in                          | structions generate overflow exceptions according to APPENDIX A.                  |
| #######################################                     | *********                                                                         |
|                                                             | # UNDEFINED EXCEPTION VECTOR                                                      |
| beq <i>\$zero</i> , <i>\$zero</i> , <u>UNDEFINED</u><br>nop | # UNDEFINED EXCEPTION VECTOR                                                      |
|                                                             |                                                                                   |
| OVERFLOW:                                                   |                                                                                   |
| beq \$zero, \$zero, <u>OVERFLOW</u>                         | # OVERFLOW EXCEPTION VECTOR                                                       |
| nop                                                         |                                                                                   |
| START:                                                      |                                                                                   |
| add \$t0, \$zero,\$zero                                     | # \$t0 shall = 0                                                                  |
| addiu \$t0, \$t0, 0xFFFF                                    | # \$t0 shall = xFFF_FFFF, DestAdr:8                                               |
| addiu <i>\$t0, \$t0,</i> 1                                  | # No Exception shall be generated, <b>R1</b>                                      |
| lui \$t0, 0x8000<br>addi \$t0, \$t0, -1                     | # \$t0 shall = x8000_0000, DestAdr:8<br># Exception shall be generated, <b>R2</b> |
|                                                             | # Exception shall be generated, <b>112</b>                                        |
| Eternity:                                                   |                                                                                   |
| beq \$zero, \$zero, <u>Eternity</u>                         |                                                                                   |
| nop                                                         |                                                                                   |

Results of operations and contents of stages are read by using MIPS Monitor software and results are tabulated in Table 5.10.

| WB_STAGE            | WB_SIAGE<br>INSTR_WB | Wr_D                       |                          |             |                          |           |                          |           | add \$t0, \$zero, \$zero | Q         | addiu \$t0, \$t0, 65535 | Xfffffff  | addiu \$t0, \$t0, 1    | Q         | lui \$t0, 32768             | X8000000 | addi \$t0, \$t0, -1    | OX        |
|---------------------|----------------------|----------------------------|--------------------------|-------------|--------------------------|-----------|--------------------------|-----------|--------------------------|-----------|-------------------------|-----------|------------------------|-----------|-----------------------------|----------|------------------------|-----------|
| MB                  |                      | Reg_D                      |                          |             |                          |           | add \$t0, \$zero, \$zero |           | add \$t0,                | ®         | addiu \$t(              | <u>®</u>  | lui \$t0, 32768 addiu  | œ         | addi \$t0, \$t0, -1 Iui \$t | <u>®</u> | addi \$                | R         |
|                     |                      | Rd_D                       |                          |             |                          |           |                          | R         | 535                      | Xfffffff  |                         | Ŗ         |                        | X80000000 |                             | Q        |                        | QX        |
| MEM STAGE           |                      | Wr_D                       | ].                       |             |                          |           |                          | QX        | addiu \$t0, \$t0, 65535  | XIIIIIII  | addiu \$t0, \$t0,       | X1        |                        | X8000     |                             | <u>x</u> | beq \$zero, \$zero, -1 | DX        |
| ×                   |                      | Addr                       |                          |             |                          |           | add \$                   | <u>R</u>  | 0, \$10, 1               | Xfffffff  | lui \$t0, 32768 ad      | Q.        | addi \$t0, \$t0, -1    | X8000000  | beq \$zero, \$zero, -1 ad   | Q        | beq dou                | OX        |
|                     |                      | 0vr                        |                          |             |                          | 0         |                          | 0         |                          | 0         |                         | 0         |                        | -         |                             | 0        |                        | 0         |
|                     | INSTR_EX             | Res                        |                          |             | 0, \$zero                | Q         | addiu \$t0, \$t0, 65535  | XHTHH     |                          | Q         |                         | X8000000  |                        | X7ffffff  |                             | Q        |                        | OX        |
| EX STAGE            |                      | ALU_B                      |                          |             | add \$t0, \$zero, \$zero | X         |                          | XIIIIIII  |                          | X1        |                         | X8000     |                        | XIIIIIII  |                             | R.       |                        | R.        |
| GE EX STAGE MEM STA |                      | ALU_A                      |                          |             |                          | QX        |                          | R         |                          | XIIIIIII  |                         | QX        |                        | X8000000  |                             | Q        |                        | OX        |
| ji                  |                      | Incr_PC<br>, \$zero<br>X14 | \$t0, 65535              | X18<br>10,1 | št0, 1                   | X1C       | 1, 32768                 | X20       | t0, -1                   | X24       | zero, -1                | X28       |                        | ę         | izero, -1                   | XC       |                        |           |
| ID STAGE            | INSTR_ID             | Instr                      | add \$t0, \$zero, \$zero | X00004020   | addiu \$t0, \$t0         | X2508ffff | addiu \$t0, \$t0, 1      | X25080001 | lui \$t0, 32             | X3C088000 | addi \$t0, \$t0, -1     | X2108ffff | beq \$zero, \$zero, -1 | X1000ffff | dou                         | X0000000 | beq \$zero, \$zero, -1 | X1000ffff |
| IF STAGE            | INSTR_IF             | Curr_PC                    | addiu \$t0, \$t0, 65535  | X14         | addiu \$t0, \$t0, 1      | X18       | lui \$t0, 32768          | X1C       | addi \$t0, \$t0, -1      | X20       | beq \$zero, \$zero, -1  | X24       | dou                    | X28       | beq \$zero, \$zero, -1      | 8X       | dou                    | XC        |
| CLK                 | (R#)                 |                            | ÷                        |             | 2                        |           | m                        |           | 4                        | Σ.        | ഹ                       |           | ى                      |           | 2                           |          | œ                      |           |

Table 5.10: Timing Diagram for Exception Handling of ADDIU and ADDI

To verify undefined instruction exception handling, the machine code of the program given in Table 5.9 is modified as given in Table 5.11, hence an undefined instruction is generated. Processor will raise an undefined exception while the modified instruction is in EX stage and this result can be observed by inspecting Table 5.12.

|            |                                   | -                          |
|------------|-----------------------------------|----------------------------|
| [0x000000] | 0x1000FFFF                        | # beq \$zero, \$zero, -1   |
| [0x000004] | 0x0000000                         | # nop                      |
| [0x000008] | 0x1000FFFF                        | # beq \$zero, \$zero, -1   |
| [0x00000C] | 0x0000000                         | # nop                      |
| [0x000010] | 0x00004020                        | # add \$t0, \$zero, \$zero |
| [0x000014] | 0x2508FFFF                        | # addiu \$t0, \$t0, 65535  |
| [0x000018] | 0x25080001→ changed as 0xFF080001 | # addiu \$t0, \$t0, 1      |
| [0x00001C] | 0x3C088000                        | # lui \$t0, 32768          |
| [0x000020] | 0x2108FFFF                        | # addi \$t0, \$t0, -1      |
| [0x000024] | 0x1000FFFF                        | # beq \$zero, \$zero, -1   |
| [0x000028] | 0x0000000                         | # nop                      |

Table 5.11: Verification of Exception Handling Undefined Instructions

| WB_STAGE  | INSTR_WB   | Wr_D    |                          |           |                          |           |                          |           | add \$t0, \$zero, \$zero | R         | addiu \$t0, \$t0, 65535 | Xfffffff | 0, \$t0, 1             | R         |
|-----------|------------|---------|--------------------------|-----------|--------------------------|-----------|--------------------------|-----------|--------------------------|-----------|-------------------------|----------|------------------------|-----------|
|           | INSTR      | Reg_D   | _                        |           | -                        |           | -                        |           | add \$t0, \$z            |           | addiu \$t0,             | <u></u>  | addiu \$t0, \$t0, '    | ,<br>R    |
|           |            | Rd_D    |                          |           |                          |           | zero                     | Q         | 635                      | Xfffffff  | _                       | R        |                        | R         |
| MEM STAGE | INSTR_MEM  | Wr_D    | ].                       |           |                          |           | add \$t0, \$zero, \$zero | 2         | addiu \$t0, \$t0, 65535  | Xfffffff  | addiu \$t0, \$t0,       | 2        | lui \$t0, 32768        | 2         |
|           |            | Addr    |                          |           |                          |           | add                      | 2         | addiu \$t0, \$t0, 1 addi | XIIIIIIX  | lui \$10, 32768 ac      |          |                        | 2         |
|           |            | Ovr     |                          |           |                          | -         |                          | 0         |                          | 0         |                         |          |                        | 0         |
|           |            | Res     |                          |           | add \$t0, \$zero, \$zero | R         | , 66535                  | XIIIIIIX  |                          | Ŗ         |                         | R        | addi \$t0, \$t0, -1    | <u>R</u>  |
|           | INSTR_EX   | ALU_B   | <b>]</b> .               |           |                          | 2         | addiu \$t0, \$t0, 65535  | XIIIIII   |                          | Xfffffff  |                         | 2        |                        | 2         |
| 1         | <u>e</u> . | ALU_A   |                          |           |                          | 2         |                          | <u>R</u>  | 2768                     | 2         | \$t0, -1                | QX       | Szero, -1              | <u>R</u>  |
|           |            | Incr_PC | .o, \$zero               | X14       | J, 65535                 | X18       | \$t0, 1                  | X1C       |                          | X20       |                         | 2        |                        | X4        |
| ID STAGE  | INSTR_ID   | Instr   | add \$t0, \$zero, \$zero | X00004020 | addiu \$t0, \$t0, 65535  | X2508ffff | addiu \$t0, \$t0, 1      | XFF080001 | lui \$t0, 32768          | X3C088000 | addi \$t0, \$t0, -1     | 0X       | beq \$zero, \$zero, -1 | X1000ffff |
| IF STAGE  | INSTR_IF   | Curr_PC | addiu \$t0, \$t0, 65535  | X14       | addiu \$t0, \$t0, 1      | X18       | lui \$t0, 32768          | X1C       | addi \$t0, \$t0, -1      | X20       | beq \$zero, \$zero, -1  | OX       | dou                    | X4        |
| CLK       | (R#)       | 1       | -                        | 1         | 5                        | 1         | m                        |           | 4                        | 1         | ч                       | 1        | ى                      | 1         |

Table 5.12: Timing Diagram for Undefined Instruction Exception Handling

# **CHAPTER 6**

# **CONCLUSIONS AND FUTURE WORK**

Pipelining, basic way of obtaining faster processor, was inspected in detail throughout this thesis and the basic principles were applied by implementing a pipelined processor on a real hardware (FPGA).

It was aimed to clarify why pipelining is preferred instead of other possible implementation schemes by comparing them quantitatively and after that it was concluded that the best performance can be obtained by applying Pipelined Implementation Scheme.

Different solution proposals were stated for problems faced while implementing pipelining. It became clearer that the main point causing problems was the dependencies between instructions. These dependencies degrades the instruction throughput and CPI can be greater than one which is optimal solution and this problem was resolved by constituting forwarding (bypass) lines between stages. Structural deficiencies are overcame by using separate Instruction and Data Memory. The Control (Branch) hazards caused by conditional or unconditional branches are overcame by making the decision in ID stage instead of EX in the expense of using extra hardware. It is tried to be explained how exceptions shall be handled in a pipelined architecture. After all of these statements and giving implementation details, architecture was verified with test programs and results were tabulated. There exist unimplemented instructions in MIPS R2000 ISA, because the first goal of this thesis is to reveal the internals of pipelining and not to implement a complete processor. The most frequently instructions were chosen and implemented. A custom exception handing mechanism was implemented instead of implementing a complete coprocessor for similar reasons.

There are many directions in which the work described in this thesis can be extended. There can be a research in the future which can propose a method to measure the orthogonality of ISA which is the primary metric for the effectiveness of pipelining. The processor can be extended to completely cover all instructions in MIPS R2000 ISA. Dynamic prediction mechanism can be used to branch decision instead of simple delayed branch approach. As a further step, processor can be upgraded by adding a floating point co-processor and virtual memory support to implement R3000 ISA. A more overwhelming work is to operate with 64 bit instructions and converge to R4000 ISA architecture which is commercially available today.

Another direction to extend this research is to inspect the effects of using longer pipelines, fetching longer instructions like in R4000 from memory and implementing sequencing and some handling mechanisms for all of these circumstances.

## REFERENCES

- [Barr99] Barr Michael, "Programmable Logic: What is it to Ya?" Embedded System Programming, pages 74-84, June 99
- [Brown96] Stephen Brown, Jonathan Rose, "FPGA and CPLD Architectures: A Tutoral", IEEE Design and Test for Computers, 1996
- [BZEID] Bob Zeidman, Introduction to CPLD and FPGA Design
- [CDVHDL] Volnei A. Pedroni, "Circuit Design with VHDL", MIT Press 2004
- [COD98] David A. Patterson and John L. Hennessy "Computer Organization and Design", Chapters 3-6, 1998
- [DFMULT] J. Senthil Kumar, G. Lakshminarayanan, B. Venkataramani, G. Siriram, M.S. Jambunathan, "Design and Implementation of FPGA based Fast Multipliers with Optimum Placement and Routing using Structure Organizer"
- [Perry02] Douglas L. Perrry, "VHDL Programming by Example", 2002
- [HPCC] Scott Hauck, Mathew M. Hosler, Thomas W. Fry, "High Performance Carry Chains for FPGAs"

- [JGRAY00] Jan Gray, "Building a RISC System on FPGA", Circuit Cellar The Magazine for Computer Applications, March 2000
- [KCHAP93] Ken Chapman, "Fast Integer Multipliers, Engineering Design Magazine's Design Ideas Column, March 1993
- [PLXSDK01] PLX SDK User's Manual section 4, March 2001
- [PLXSDK02] PLX PCI 9030 Data Book, v14, Page 2.7, 2002
- [SYNP99] "Synthesis for 1 Milion Gate FPGAs: Synplicity Support for Xilinx Virtex Series", Synplicity Inc. 1999
- [TRENZ01] Trenz Electronic, "Introduction to FPGA Technology", November 2001
- [TW04] R.H. Turner, R.F. Woods, "High Efficient Limited Range Multipliers for LUT Based FPGA Architectures", IEEE Transactions on very large scale integrated systems, vol 12, No:10, October 2004
- [XAPP215] Xilinx Application Note, Design Tips for HDL Implementation of Arithmetic Functions, June 2000
- [XCNSTR] Xilinx 5.xi Constraints, Understanding Timing and Placement Constraints

- [XDRM99] Xilinx Design Reuse Methodology for ASIC and FPGA Designers, System on Chip Design reuse Solutions, An Addendum to Reuse Methodology Manual for SoC Design, pages 1-27, October 99
- [XDS003-2] Xilinx Data Sheet for Virtex<sup>™</sup> 2.5V FPGA, pages 5-24, December 2002
- [XISE03] Xilinx ISE Quick Start Tutorial, pages 12-17, June 2003
- [XLBR04] Xilinx Libraries Guide, V6.3i, pages 321-323
- [XPM04] Karen Parnell, Nick Mehta, Xilinx Programmable Logic Design Quick Start Hand Book, pages 1-20, April 2004

## **APPENDIX A**

## **IMPLEMENTED SUBSET OF MIPS R2000 ISA**

| 31              | 26         | 25 2 | 1 20 | 16 | 15 | 11 | 10 6       | 5 D           |
|-----------------|------------|------|------|----|----|----|------------|---------------|
| SPEC<br>0 0 0 0 | IAL<br>0 0 | rs   |      | rt |    | rd | 0<br>00000 | ADD<br>100000 |

Format : ADD rd, rs, rt

Fonction : To add 32-bit integers. If an overflow occurs, then trap.

Description : rd  $\leftarrow$  rs + rt

The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs to produce a 32-bit result. If the addition results in 32-bit 2's complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs. If the addition does not overflow, the 32-bit result is placed into GPR rd.

#### Restrictions : None

Exceptions : Integer Overflow

Notes : ADDU performs the same arithmetic operation but does not trap on overflow.

| 31 26               | 25 21 | 20 16 | 15 0      |
|---------------------|-------|-------|-----------|
| ADDI<br>0 0 1 0 0 0 | rs    | rt    | immediate |

Format : ADDI rt, rs, immediate

Fonction : To add a constant to a 32-bit integer. If overflow occurs, then trap.

Description : rt + rs + immediate

The 18-bit signed immediate is added to the 32-bit value in GPR rs to produce a 32-bit result. If the addition results in 32-bit 2's complement arithmetic overflow, the destination register is not modified and an Integer Overflow exception occurs. If the addition does not overflow, the 32-bit result is plaoed into GPR rt.

Restrictions : None

Exceptions : Integer Overflow

Notes : ADDIU performs the same arithmetic operation but does not trap on overflow.

| ADDIU              | )    |       |    | Add Immediate Unsigned Wor | d  |           |   |
|--------------------|------|-------|----|----------------------------|----|-----------|---|
| 31                 | 26 2 | 25 21 | 20 | 16                         | 15 |           | 0 |
| ADDIU<br>0 0 1 0 0 |      | rs    | rt |                            |    | Immediate | ] |

Format : ADDIU rt, rs, immediate

Fonction : To add a constant to a 32-bit integer

Description : rt + rs + immediate

The 16-bit signed immediate is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rt. No Integer Overflow exception occurs under any circumstances.

Restrictions : None

Exceptions : None

Notes : None

| ADDU Add Unsigned Word |    |    |       |    |    |    |       |                |
|------------------------|----|----|-------|----|----|----|-------|----------------|
| 31                     | 26 | 25 | 21 20 | 16 | 15 | 11 | 10 6  | 5 0            |
| SPE<br>000             |    | rs |       | rt | rd |    | 00000 | ADDU<br>100001 |

Format : ADDU rd, rs, rt

Fonction : To add 32-bit integers

Description : rd + rs + rt

The 32-bit word value in GPR rt is added to the 32-bit value in GPR rs and the 32-bit arithmetic result is placed into GPR rd. No Integer Overflow exception occurs under any circumstances.

Restrictions : None

Exceptions : None

| 31 26             | 25 21 | 20 16 | 15 11 | 10 6       | 5 0           |
|-------------------|-------|-------|-------|------------|---------------|
| SPECIAL<br>000000 | rs    | rt    | rd    | 0<br>00000 | AND<br>100100 |

Format : AND rd, rs, rt

Fonction : To do a bitwise logical AND

#### Description : rd + rs AND rt

The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical AND operation. The result is placed into GPR rd.

#### Restrictions : None

Exceptions : None

Notes : None

#### ANDI And Immediate 31 21 20 16 15 26 25 0 AND rs rt immediate 001100

Format : ANDI rt, rs, immediate

And

Fonction : To do a bitwise logical AND with a constant

#### Description : rt + rs AND immediate

The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical AND operation. The result is placed into GPR rt.

Restrictions · None

Exceptions : None

Notes : None

0 

#### BNE Branch on Not Equal

|        | 16 1 | 20 1 | 21 | 25 | 26           | 31 |
|--------|------|------|----|----|--------------|----|
| offset |      | rt   | s  | rs | BNE<br>00101 | 0  |

#### Format : BNE rs, rt, offset

Fonction : To compare GPRs then do a PC-relative conditional branch

#### Description : if rs != rt then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs and GPR rt are not equal, branch to the effective target address after the instruction in the delay slot is executed.

#### Restrictions : None

#### Exceptions : None

Notes : With the 18-bit signed instruction offset, the conditional branch range is  $\pm$  128 KBytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.

#### BEQ ... 05.05 04.00 10.10

| 31 26         | 25 21 | 20 16 | 15 0   |
|---------------|-------|-------|--------|
| BEQ<br>000100 | rs    | rt    | offset |

#### Format : BEQ rs. rt. offset

Fonction : To compare GPRs then do a PC-relative conditional branch

#### Description : if rs = rt then branch

An 18-bit signed offset (the 16-bit offset field shifted left 2 bits) is added to An hold signed onset (the foot onset the branch (not the branch itself), in the address of the instruction following the branch (not the branch itself), in the branch delay slot, to form a PC-relative effective target address. If the contents of GPR rs and GPR rt are equal, branch to the effective target add-ress after the instruction in the delay slot is executed.

Restrictions : None

#### Exceptions : None

Notes : With the 18-bit signed instruction offset, the conditional branch range is  $\pm$  128 Kbytes. Use jump (J) or jump register (JR) instructions to branch to addresses outside this range.

| Branch on Equal |
|-----------------|
|-----------------|

| J           | Jump        |
|-------------|-------------|
| 31 26       | 25 0        |
| J<br>000010 | instr_index |

Format : J target

Fonction : To branch within the current 256 MB-aligned region

#### Description :

This is a PC-region branch (not PC-relative); the effective target address is in the "ourrent" 256 MB-aligned region. The low 28 bits of the target address is the instr\_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruction in the delay slot (not the branch itself).

Restrictions : None

#### Exceptions : None

Notes : Forming the branch target address by catenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program code addresses fit into a 256 MB region aligned on a 256 MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.

| JAL           | Jump And Link |
|---------------|---------------|
| 31 26         | 0             |
| JAL<br>000011 | instr_index   |

Format : JAL target

Fonction : To execute a procedure call within the current 256 MB-aligned region

#### Description :

Description : Place the return address link in GPR 31. The return link is the address of the second instruction following the branch, at which location execution conti-nues after a procedure call. This is a PC-region branch (not PC-relative); the effective target address is in the "current" 256 MB-aligned region. The low 28 bits of the target address is the inst\_index field shifted left 2 bits. The remaining upper bits are the corresponding bits of the address of the instruc-tion in the delay slot (not the branch itself).

#### Restrictions : None

Exceptions : None

Notes : Forming the branch target address by catenating PC and index bits rather than adding a signed offset to the PC is an advantage if all program ode addresses fit into a 256 MB region aligned on a 256 MB boundary. It allows a branch from anywhere in the region to anywhere in the region, an action not allowed by a signed relative offset.

#### ....

| JR                |       |                                         | Jun   | np Register  |
|-------------------|-------|-----------------------------------------|-------|--------------|
| 31 26             | 25 21 | 20                                      | 6 5   | 0            |
| SPECIAL<br>000000 | rs    | 000000000000000000000000000000000000000 | 00000 | JR<br>001000 |

Format : JR rs

Fonction : To execute a branch to an instruction address in a register

Description : PC + rs Jump to the effective target address in GPR rs.

Restrictions : None

Exceptions : None

| LUI | Load Upper Immediate |
|-----|----------------------|
|     |                      |

| 31           | 26 | 25 21 | 20 | 16 | 15 0      |
|--------------|----|-------|----|----|-----------|
| LUI<br>00111 | 1  | 00000 | rt |    | immediate |

Format : LUI rt, immediate

Fonction : To load a constant into the upper half of a word

Description : rt ← immediate ||0<sup>16</sup>

The 16-bit immediate is shifted left 16 bits and concatenated with 16 bits of low-order zeros. The 32-bit result is placed into GPR rt.

Restrictions : None

Exceptions : None

Notes : None

| 26.25 | 21.20 | 16.15 |  |
|-------|-------|-------|--|

| 31         | 26 | 25 21 | 20 16 | 15 0   |
|------------|----|-------|-------|--------|
| LV<br>1000 |    | base  | rt    | offset |

Load Word

Format : LW rt, offset(base)

LW

Fonction : To load a word from memory as a signed value

### Description : rt + memory[base+offset]

The contents of the 32-bit word at the memory location specified by the aligned effective address are fetched, sign-extended to the GPR register length if necessary, and placed in GPR rt. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

Restrictions : None

Exceptions : None

Notes : None

| MFHI              |                                         |    |    |    | M  | ove From | m HI registe   | r |
|-------------------|-----------------------------------------|----|----|----|----|----------|----------------|---|
| 31 26             | i 25                                    | 16 | 15 | 11 | 10 | 6        | 5 0            | ) |
| SPECIAL<br>000000 | 000000000000000000000000000000000000000 |    |    | rd | 00 | 0000     | MFHI<br>010000 |   |

| MFLO    |       |                 |    |    |    |    | Move Fro | m   | LO register    |
|---------|-------|-----------------|----|----|----|----|----------|-----|----------------|
| 31 :    | 26 25 |                 | 16 | 15 |    | 11 | 10       | 6 5 | 5 0            |
| SPECIAL |       | 0<br>0000000000 |    |    | rd |    | 00000    | Ι   | MFLO<br>010010 |

Fonction : To copy the special purpose HI register to a GPR

Description : rd HI The contents of special register HI are loaded into GPR rd.

Restrictions : None

Format : MFHI rd

Exceptions : None

Notes : None

.

Fonction : To copy the special purpose LO register to a GPR

Description : rd + LO The contents of special register LO are loaded into GPR rd.

Restrictions : None

Format : MFLO rd

Exceptions : None

| MTHI              |       | Mo | we To HI register |
|-------------------|-------|----|-------------------|
| 31 26             | 25 21 | 20 | 65 0              |
| SPECIAL<br>000000 | rs    | 0  | MTHI<br>010001    |

Format : MTHI rs

Fonction : To copy a GPR to the special purpose HI register

Description : HI + rs The contents of GPR rs are loaded into special register HI.

Restrictions : None

Exceptions : None

Notes : None

| MU |  |
|----|--|
|    |  |
|    |  |

Multiply Unsigned Word

| 31 26             | 25 2 | 1 20 | 16 15 | 6               | 5 0             |
|-------------------|------|------|-------|-----------------|-----------------|
| SPECIAL<br>000000 | rs   | rt   |       | 0<br>0000000000 | MULTU<br>011001 |

Format : MULTU rs, rt

Fonction : To multiply 32-bit unsigned integers

#### Description : (LO, HI) + rs x rt

The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, The 32-bit word value in GPR rt is multiplied by the 32-bit value in GPR rs, treating both operands as unsigned values, to produce a 64-bit result. The low-order 32-bit word of the result is placed into special register LO, and the high-order 32-bit word is placed into special register HI. No arithmetic exception occurs under any circumstances.

Restrictions : None

Exceptions : None

Notes : None

|   | MTLO              |      |       | Move       | To LO regis   | ter |
|---|-------------------|------|-------|------------|---------------|-----|
|   | 31 26             | 25 2 | 1 20  |            | 6.5           | 0   |
| ľ | SPECIAL<br>000000 | rs   | 00000 | 0000000000 | MTLO<br>01001 |     |

Format : MTLO rs

Fonction : To copy a GPR to the special purpose LO register

Description : HI + rs The contents of GPR rs are loaded into special register LO.

Restrictions : None

Exceptions : None

Notes : None

| NOR               |       |       |       |            | Not Or        |
|-------------------|-------|-------|-------|------------|---------------|
| 31 26             | 25 21 | 20 16 | 15 11 | 10 6       | 5 0           |
| SPECIAL<br>000000 | rs    | rt    | rd    | 0<br>00000 | NOR<br>100111 |

Format : NOR rd, rs, rt

Fonction : To do a bitwise logical NOT OR

Description : rd + rs NOR rt

The contents of GPR rs are combined with the contents of GPR rt in a bit-wise logical NOR operation. The result is placed into GPR rd.

Restrictions : None

Exceptions : None

| OR |       |       |       |       |    |  |
|----|-------|-------|-------|-------|----|--|
| 31 | 26 25 | 21 20 | 16 15 | 11 10 | 65 |  |

| 31         | 26 25 | 5 21 | 20 16 | i 15 | 11 | 10 6       | 5 0          |
|------------|-------|------|-------|------|----|------------|--------------|
| SPE<br>000 |       | rs   | rt    | rd   |    | 0<br>00000 | OR<br>100101 |

Format : OR rd, rs, rt

Fonction : To do a bitwise logical OR

### Description : rd $\leftarrow$ rs OR rt

The contents of GPR rs are combined with the contents of GPR rt in a bitwise logical OR operation. The result is placed into GPR rd.

Restrictions : None

Exceptions : None

Notes : None

ORI

Or

| 31 26         | 25 21 | 20 16 | 15 0      |
|---------------|-------|-------|-----------|
| ORI<br>001101 | rs    | rt    | immediate |

Or Immediate

Format : ORI rt, rs, immediate

Fonction : To do a bitwise logical OR with a constant

Description : rt + rs OR immediate

The 16-bit immediate is zero-extended to the left and combined with the contents of GPR rs in a bitwise logical OR operation. The result is placed into GPR rt.

Restrictions : None

Exceptions : None

Notes : None

| SLL               |       |       |       | Shift Word | Left Logical  |  |
|-------------------|-------|-------|-------|------------|---------------|--|
| 31 26             | 25 21 | 20 16 | 15 11 | 10 6       | 5 0           |  |
| SPECIAL<br>000000 | 00000 | rt    | rd    | sa         | SLL<br>000000 |  |

Format : SLL rd, rt, sa

Fonction : To left-shift a word by a fixed number of bits

#### Description : rd + rt << sa

The contents of the low-order 32-bit word of GPR rt are shifted left, inserting zeros into the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by sa.

Restrictions : None

Exceptions : None

Notes : SLL r0, r0, 0, expressed as NOP, is the assembly idiom used to denote no operation.

| SLT | Set On Less Than |
|-----|------------------|
|     |                  |

| 31 26             | 25 21 | 20 16 | 15 1 | 110 E      | 5 0           |
|-------------------|-------|-------|------|------------|---------------|
| SPECIAL<br>000000 | rs    | rt    | rd   | 0<br>00000 | SLT<br>101010 |

Format : SLT rd, rs, rt

Fonction : To record the result of a less-than comparison

### Description : rd + (rs < rt)

Compare the contents of GPR rs and GPR rt as signed integers and record the Boolean result of the comparison in GPR rd. If GPR rs is less than GPR rt, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.

#### Restrictions : None

Exceptions : None

Notes : None

### SLTIU

| 31             | 26 | 25 | 21 | 20 | 1  | 6 1 | 5 (       | 1 |
|----------------|----|----|----|----|----|-----|-----------|---|
| SLTIU<br>00101 | 1  |    | rs |    | rt |     | immediate |   |

Set on Less Than Immediate Unsigned

Format : SLTIU rt, rs, immediate

Fonction : To record the result of an unsigned less-than comparison with a constant

#### Description : rt + (rs < immediate)

Compare the contents of GPR rs and the sign-extended 18-bit immediate as unsigned integers and record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.

Restrictions : None

Exceptions : None

Notes : None

### Set on Less Than Immediate

| 31 26          | 25 21 | 20 16 | 15 0      |
|----------------|-------|-------|-----------|
| SLTI<br>001010 | rs    | rt    | immediate |

#### Format : SLTI rt, rs, immediate

SLTI

Fonction : To record the result of a less-than comparison with a constant

### Description : rt + (rs < immediate)

Compare the contents of GPR rs and the 16-bit signed immediate as signed integers and record the Boolean result of the comparison in GPR rt. If GPR rs is less than immediate, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.

Restrictions : None

Exceptions : None

Notes : None

| SLT        | U    |    |      |      |    | Se | t on Less Th | an Unsigned    |
|------------|------|----|------|------|----|----|--------------|----------------|
| 31         | 26 2 | 25 | 21 2 | D 16 | 15 | 11 | 10 6         | 5 0            |
| SPE<br>000 | CIAL | rs |      | rt   | rd |    | 00000        | SLTU<br>101011 |

Format : SLTU rd, rs, rt

Fonction : To record the result of an unsigned less-than comparison

#### Description : rd ← (rs < rt)

Compare the contents of GPR rs and GPR rt as unsigned integers and record the Boolean result of the comparison in GPR rd. If GPR rs is less than GPR rt, the result is 1 (true); otherwise, it is 0 (false). The arithmetic comparison does not cause an Integer Overflow exception.

Restrictions : None

Exceptions : None

Shift Word Right Logical

| 31 26             | 25 21 | 20 16 | 15 11 | 10 6 | 5 0           |
|-------------------|-------|-------|-------|------|---------------|
| SPECIAL<br>000000 | 00000 | rt    | rd    | sa   | SRL<br>000010 |

Format : SRL rd, rt, sa

SRL

Fonction : To execute a logical right-shift of a word by a fixed number of bits

#### Description : rd + rt >> sa

The contents of the low-order 32-bit word of GPR rt are shifted right, inser-ting zeros into the emptied bits; the word result is placed in GPR rd. The bit-shift amount is specified by sa.

Restrictions : None

Exceptions : None

Notes : None

#### SUB

| SUB          |      |      |      |    |    |    | Su    | bstract Word  |
|--------------|------|------|------|----|----|----|-------|---------------|
| 31           | 26 2 | 25 2 | 1 20 | 16 | 15 | 11 | 10 6  | 5 0           |
| SPEC<br>0000 |      | rs   |      | rt | rd |    | 00000 | SUB<br>100010 |

| SUBU Substract Unsigned Word |    |    |    |    |    |    |    |            |                |   |  |
|------------------------------|----|----|----|----|----|----|----|------------|----------------|---|--|
| 31                           | 26 | 25 | 21 | 20 | 16 | 15 | 11 | 10         | 65             | 0 |  |
| SPE<br>000                   |    | rs |    | rt |    | rd |    | 0<br>00000 | SUBU<br>100011 |   |  |

Format : SUB rd, rs, rt

Fonction : To subtract 32-bit integers. If overflow occurs, then trap

#### Description : rd + rs - rt

The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs to produce a 32-bit result. If the subtraction results in 32-bit 2's comple-ment arithmetic overflow, then the destination register is not modified and an Integer Overflow exception occurs. If it does not overflow, the 32-bit result is placed into GPR rd.

Restrictions : None

Exceptions : Integer Overflow

Notes : SUBU performs the same arithmetic operation but does not trap on overflow.

#### Format : SUBU rd, rs, rt

Fonction : To subtract unsigned 32-bit integers

Description : rd + rs - rt

The 32-bit word value in GPR rt is subtracted from the 32-bit value in GPR rs and the 32-bit arithmetic result is and placed into GPR rd. No integer overflow exception occurs under any oircumstances.

Restrictions : None

Exceptions : None

| sw        | SW Store Word |       |      |      |       |     |   |  |  |  |  |
|-----------|---------------|-------|------|------|-------|-----|---|--|--|--|--|
| 31        | 26            | 25 21 | 20 1 | 6 15 | 11 10 | 65  | ٥ |  |  |  |  |
| SV<br>101 |               | base  | rt   |      | offs  | set |   |  |  |  |  |

Format : SW rt, offset(base)

Fonction : To store a word to memory

#### Description : memory[base+offset] rt

The least-significant 32-bit word of register rt is stored in memory at the location specified by the aligned effective address. The 16-bit signed offset is added to the contents of GPR base to form the effective address.

Restrictions : None

Exceptions : None

Notes : None

XOR

| 31           | 26 | 25 | 21 2 | 20 | 16 15 | 11 | 10    | 65 | 5 0           |
|--------------|----|----|------|----|-------|----|-------|----|---------------|
| SPEC<br>0000 |    | rs |      | rt |       | rd | 00000 | )  | XOR<br>100110 |

Format : XOR rd, rs, rt

Fonction : To do a bitwise logical Exclusive OR

### Description : rd + rs XOR rt

LCombine the contents of GPR rs and GPR rt in a bitwise logical Exclusive OR operation and place the result into GPR rd.

Restrictions : None

Exceptions : None

Notes : None

| XORI        |            |    |    |       |    | Exclusive Or Immediate |        |   |
|-------------|------------|----|----|-------|----|------------------------|--------|---|
| 31          | 26 25      | 21 | 20 | 16 15 | 11 | 10                     | 6 5    | 0 |
| XC<br>0 0 1 | DRI<br>110 | rs | rt |       |    | imm                    | ediate |   |

Format : XORI rt, rs, immediate

Fonction : To do a bitwise logical Exclusive OR with a constant

Description : rt + rs XOR immediate

Combine the contents of GPR rs and the 18-bit zero-extended immediate in a bitwise logical Exclusive OR operation and place the result into GPR rt.

Restrictions : None

Exceptions : None

Notes : None

### Table A.1: MIPS Registers

| Name               | Register number | Usage                                        |  |  |  |
|--------------------|-----------------|----------------------------------------------|--|--|--|
| \$zero             | 0               | the constant value 0                         |  |  |  |
| \$at               | 1               | reserved for the assembler                   |  |  |  |
| \$v0–\$v1          | 2–3             | values for results and expression evaluation |  |  |  |
| \$a0 <b>-</b> \$a3 | 4–7             | arguments                                    |  |  |  |
| \$t0-\$t7          | 8–15            | temporaries                                  |  |  |  |
| \$s0–\$s7          | 16–23           | saved                                        |  |  |  |
| \$t8-\$t9          | 24–25           | more temporaries                             |  |  |  |
| \$k0–\$k1          | 26–27           | reserved for the operating system (OS)       |  |  |  |
| \$gp               | 28              | global pointer                               |  |  |  |
| \$sp               | 29              | stack pointer                                |  |  |  |
| \$fp               | 30              | frame pointer                                |  |  |  |
| \$ra               | 31              | return address                               |  |  |  |
|                    |                 |                                              |  |  |  |

## **APPENDIX B**

## **MIPS MONITOR SOFTWARE**

MIPS Monitor Software is written to monitor internal state of the processor, to externally stimulate the processor and to verify correctness of its operation. MIPS Monitor Software is written in C++ and developed in Microsoft<sup>™</sup> Visual C++ Environment. Document-View architecture is used during is development. This Appendix is prepared to serve as a user manual of MIPS Monitor Software.

The main screen of MIPS Monitor software is given below:



Figure B.1: Main Screen of MIPS Monitor Software

The main functions of MIPS Monitor software is collected under Function menu. These functions can be summarized as:

<u>Emulator Input</u>: This option is used to run with real hardware or to test the graphical interface with simulator without hardware. This interface was used during development while the hardware was not present and "Simulator" option was disabled after development. "PCIDevice" option must be chosen before starting the MIPS Monitor software for proper operation. After that, the "PCI Device Selection Dialog" (Figure B.3) will appear and user can select the bridge on which interface transactions will occur.

<u>File  $\rightarrow$  Emulator (F7):</u> A "File Open Dialog" will appear after selecting this option. The selected program will be loaded Program Memory section of main screen, but this program is not downloaded to processor.



Figure B.2: Main Functions of MIPS Monitor Software

| PCI Device Selection Dialog |            |             |           |           |           |  |  |  |
|-----------------------------|------------|-------------|-----------|-----------|-----------|--|--|--|
|                             | Bus Number | Slot Number | Vendor ID | Device ID | Chip Type |  |  |  |
| Device 1                    | 01         | 06          | 1065      | 9030      | 9030      |  |  |  |
| C Device 2                  | _          | _           | _         | _         | _         |  |  |  |
| C Device 3                  | _          | _           | _         | _         | _         |  |  |  |
| C Device 4                  | _          | _           | _         | _         | _         |  |  |  |
|                             |            |             | ĸ         |           |           |  |  |  |

Figure B.3: PCI Device Selection Dialog

<u>Insert Break Point (F9):</u> This option enables the user to insert break points to stop the processor at a desired point while running or before Run (F8) option is selected. A red diamond will appear to indicate the point where the processor will stop its operation.

<u>Single Step (F5)</u>: This option enables the user to trigger the processor for single step running. It is stated in Table 4.5 which fields of the IF, ID, EX, MEM and WB stages can be observed by using the MIPS Monitor software.



Figure B.4: PCI Device Selection Dialog

<u>Run (F8)</u>: This option when selected runs the processor up to a Break Point is encountered.

<u>Reset (F10)</u>: This option when selected resets the processor externally.

<u>Load & Verify:</u> A "File Open Dialog" will appear after selecting this option. The selected program will be loaded Program Memory section of main screen and also this program is downloaded to processor.

MIPS Monitor Software can notify the programmer about the presence of unresolved hazard in the pipeline by drawing a dashed box around the IF and ID stages and stating the status in "Current Status" section of Main Screen. Programmer can expect a nop instruction insertion into EX stage in the next clock cycle (Figure B.5).



Figure B.5: Unresolved Hazards View

MIPS Monitor software also has the ability to inform the programmer about the presence and sort of the exception in the pipeline. This information is presented in "Current Status" section of Main Screen.

The Overflow Exception is detected and reflected to Programmer as in Figure B.6.



Figure B.6: Overflow Exception Detection View

The Undefined Instruction is detected and reflected to Programmer as in Figure B.7.



Figure B.7: Undefined Instruction Exception Detection View

## **APPENDIX C**

# FLOW DIAGRAMS ARCHITECTURE ELEMENTS

### Instruction Fetch Unit Flow Diagram



Figure C.1: Instruction Fetch Unit Flow Diagram



Instruction Decode Unit Flow Diagram

Figure C.2: Instruction Decode Unit Flow Diagram



Forwarding and Hazard Detection Unit Flow Diagram

Figure C.3: Forwarding and Hazard Detection Unit Flow Diagram





Figure C.4: Instruction Execute Unit Flow Diagram



Instruction Execute Unit Flow Diagram (continued)

Figure C.5: Instruction Execute Unit (continued) Flow Diagram

### Data Memory Unit Flow Diagram



Figure C.6: Data Memory Unit Flow Diagram

### **Exception Detection Unit Flow Diagram**



Figure C.7: Exception Detection Unit Flow Diagram

**Register Block Unit Flow Diagram** 



Figure C.8: Register Block Unit Flow Diagram

# **APPENDIX D**

# LAYOUT OF BOARD



# **APPENDIX E**

# **RESOURCES IN THIS THESIS**

A soft copy of this thesis, in addition to all of the source codes of hardware and software mentioned about in this thesis are collected and presented in the CD attached to back cover.