# **Experiences in Modeling and Simulation of Computer Architectures in DEVS**

# Gabriel Wainer\*, Sergio Daicz\*\* and Alejandro Troccoli\*\*

\* Department of Systems and Computer Engineering, Carleton University, 4456 Mackenzie Building, 1125 Colonel By Drive, Ottawa, ON. K1S 5B6. Canada; \*\* Departamento de Computación, FCEN– Universidad de Buenos Aires, Planta Baja. Pabellón I., Ciudad Universitaria (1428), Buenos Aires, Argentina; E-mail: {gwainer@sce.carleton.ca}

The use of traditional approaches to teach computer organization usually generates misconceptions in the students. The simulated computer ALFA-1 was designed to fill this gap. DEVS was used to attack this complex design of the chosen architecture, allowing for the definition and integration of individual components. DEVS also provided a formal specification framework, which allowed reduction of testing time and improvement of the development process. Using ALFA-1, the students acquired some practice in the design and implementation of hardware components, which is not usually achievable in computer organization courses.

Keywords: Applications of DEVS methodology, DEVS models, simulation in education, computer organization

# 1. Introduction

Educators in operating systems and computer organization courses usually face several problems derived from the learning process in the area. Computer architecture concepts are usually analyzed theoretically, leaving students with incomplete and sometimes erroneous views of how a computer works. These misconceptions remain in higher level courses, making it difficult for thorough learning in the area.

Computer organization literature [1, 2, 3, 4, 5] usually attacks the complexity of computer systems by using several layers to describe them. Each layer describes one abstraction level, providing higher insight when analyzing a given subsystem. These levels usually include assembly language, instruction sets, microprogramming, and digital logic. Lower levels (such as transistor or electronic levels) are usually not described. The layers are studied using different modeling techniques. For instance, many existing books express assembly language syntax using state machines, while circuits are described using Boolean logic. This diversity contributes to loose comprehension of systems operation as a whole. Likewise, detailed behavior of the subsystems and their interaction are too complex to be attacked. The introduction of higher levels (programming languages, operating systems) makes the task even more complex. Even practice helps to make theory clear; experimental tasks are difficult to accomplish in the area of architectural design. The construction of computer architectures requires expensive laboratories and expertise in some areas not widely known in early career courses. Very few software tools can be used for educational purposes. It is also difficult to provide experimental assignments to complete in the schedule of a standard course. At present, most practical experience is achieved at the assembly language level, where the available tools are well known. Nevertheless, assembly programming does not provide experience in instruction set designing, microprogramming or digital logic.

With this in mind, we proposed building a simulated computer to be used as an educational tool. We have approached this project with several objectives in mind, which include developing a program with the following characteristics:

- 1. The ability to describe the multiple abstraction levels studied in computer organization courses;
- 2. The possibility of being programmed by students in early stages of their careers (considering that students usually take courses on programming before studying computer organization);
- 3. The capacity of defining different components using a unique approach;
- 4. Extensibility of the components;

TRANSACTIONS of The Society for Modeling and Simulation International ISSN 0740-6797/01 Copyright © 2001 The Society for Modeling and Simulation International

Copyright © 2001 The Society for Modeling and Simulation Internationa Volume 18, Number 4, pp. 179-202

- 5. Modifiability of the architectures;
- 6. Good testing facilities;
- Pedagogical values: The chosen tools should have a fast learning curve due to the lack of time available in one-term courses; and
- 8. Availability in public domain, to be used in any existing course without restrictions.

We undertook this project with the goal of meeting these requirements. We will explain now several existing ways of approaching the problem of modeling and simulating computer architectures, and how we faced the task to meet our goals.

#### 1.1. Overview of Related Efforts

Simulations of computer architectures have been around since the 1970s and this problem has been approached using several points of view. Nevertheless, none of the available tools meets our requirements, which is explained as follows. In this section, we attempt to cover the entire spectrum in the area, but at present many more tools are available. We include a few examples of each available type of environment, as other existing tools differ only slightly.

## 1.1.1. General Purpose Tools

Many of the existing tools are general purpose and can be applied in building any kind of processor by defining an instruction set, the computer organization, and its components. Most of these tools are devoted to analyzing the performance of architectural properties. For instance, SimpleScalar [6] allows the flexible simulation of modern processors. The environment defines its own architecture, and it is provided with a GNU C++ compiler. It allows a complete architecture to be defined as building blocks and can include advanced architectural aspects (nonblocking caches, speculative and out-of-order execution). HASE (Hierarchical Architecture design and Simulation Environment) was built for the rapid development and exploration of computer architectures with multiple abstraction levels [7]. The environment includes a design editor, object libraries for each abstraction level, and validation facilities. SimOS [8] is a complete machine simulation environment designed to study uniprocessor and multiprocessor systems. It defines different details of an architecture by providing different CPU models. It includes a high level description of the architecture, and different components can be included: caches, multiprocessor memory buses, disk drives, consoles, and other devices.

These tools (and many other similar ones) allow for the defining of the main building blocks of an architecture and their interaction, but none meets our educational goals. They are devoted to analysis of processor performance under architectural changes and for research on architectures and operating systems. Therefore, they are too complex to be used in early courses. In addition, several of the levels needed (for instance, digital logic or assembly language) are not supported. Often, the building blocks cannot be extended or modified. Many are unavailable for public domain or use in large courses. Nevertheless, extensibility and modifiability can be achieved when higher level constructions are considered; also, good testing facilities are available.

## 1.1.2. Specific Purpose Tools

Many existing tools are built to emulate existing architectures. This is a good approach from a pedagogical perspective, but in general, the goals of extensibility and modifiability are constrained by the underlying architecture. All the platforms described in this section lack these facilities and most of them cannot describe all of the abstraction levels needed.

Several tools focus on the *Intel*<sup>TM</sup> 80x86 architectures. For instance, the p86 [9] defines the Instruction Set level and the Assembly Language level of 8086-based computers. It includes an assembler and debugger, allowing students to experience a reduced version of the 8086 processor in a simulated environment. A similar set of tools is included in the Simx86 environment [10]. This set of tools includes a family of simulators for the *Intel* 80x86 family, including a simulator for the 8088 and the 80286 and a partial simulator for 80386. SimpleScalar [6] was used to build the functional simulation of the x86 instruction set, providing a specific purpose tool tailored to *Intel* architectures.

Other environments use the *MIPS* architecture. For instance, the previously discussed SimpleScalar [6] and SimOS [8] were used to define complete architectures based on different MIPS processors. The MPS simulator [11] is based on the MIPS R3000 processor. It includes the definition of RAM, ROM, processor, disks, tapes, printer, and terminal. This simulator provides a good understanding of the general computer organization and instruction set levels. It also provides good facilities for teaching early undergraduate courses. Nevertheless, it does not allow the definition of other levels, and the goals of extensibility and modifiability cannot be achieved.

A tool almost adequate for our purpose is Spim, a simulator of the MIPS R2000/R3000 assembly language programs [12]. It implements the assembler-extended instruction set, omitting some of the complex details. A good feature of this tool is that it is associated with a renowned book on computer architecture. Nevertheless, this simulator does not define many of the multiple abstraction levels described in other literature, and therefore it is not useful for many existing courses in the area. Also, extensibility and modifiability are limited.

Several other currently used architectures have been built using simulation. For instance, SimOS [8] also was used to model *Silicon Graphics* and *Digital Alpha* processors. Alpha processor simulators were also presented in [16]. In the latter case, the authors have used simulation to find an architectural solution that satisfies some of the product goals for the Alpha architecture. They have used the tools to analyze pipelining levels and instruction-level parallelism.

As we can see, several modern architectures have been modeled and can be analyzed using simulation. However, none of them is suitable for undergraduate courses. These tools do not meet our pedagogical needs as it is difficult to extend them or use them to model other architectures.

Several other architectures have been used to build simulators. The CHIP (Cornell Hypothetical Instructional Processor) is a simulated computer emulating a *PDP-11* processor. It was designed as an educational tool for undergraduate courses and includes dynamic memory mapping, two modes of processor operation and eight interrupt priority levels. It also supports emulated I/O devices, a debugger and a C compiler. PROVIR [14] is a virtual processor based on the *IBM* 360 architecture. It includes the instruction level definition, an assembler, a debugger, and the kernel of an operating system. In [15], the authors describe a method for simulating the *Z80* processor using spreadsheets. The user can write assembly language programs, which are assembled and executed using a spreadsheet. In all these cases, the authors focused on defining the instruction set. No detailed specification of lower levels was included. Worse yet, the environments are based on processors that are not used nowadays, and students do not gain experience with current architectures.

Many other simulators are designed to analyze multiprocessor systems. For instance, Limes [17] simulates N processors running a parallel application. The tool implements the assembly language level and can be used to evaluate architectures or parallel algorithms. PROTEUS is a high performance simulator for MIMD multiprocessors [18]. It was developed to simulate a wide range of architectures, with the goal of improving accuracy and performance. Several processors are connected via a bus or a network, but it is devoted to executing an application in multiple CPUs. In [19], a tool for the modeling and simulation of clustered computers was presented. The goal was to construct architectures of symmetric multiprocessors and clusters of uniprocessors and to evaluate their performance through benchmarking. Several other examples of multiprocessor simulation can be found in [20-29]. We do not describe these in detail because none of them are adequate in the educational sense, none meets our requirements. They could be used in higher level courses to support computer architecture lectures, but they are not suited to our project due to their complexity.

#### 1.2 Development Approaches

After concluding that existing simulation tools were not appropriate, we decided to build a toolkit to meet all of our requirements. The first step was to choose which kind of development environment to use. Any simulation language could have been applied: GPSS [30], Maisie [31], Simulink [32], ACLS [33], ModSim [34], Simscript [35], etc. Many of the simulators included in Section 1.1. were built using this approach. For instance, in [7] the tools were built using Sim++ [36]. In [19], BONeS, a Block-oriented Network Simulator, was used as the building tool [21]. Another possibility included the use of a Hardware Description Language (HDL).

HDLs are indispensable for computer and digital design. Presently, VHDL, Verilog, and SDL are three of the most widely used description languages. VHDL [37] was developed by IBM, TI and Intermetrics in 1983, and became an IEEE standard in 1987 (and 1993). VHDL can be used for documentation, verification, and synthesis of large digital designs. Three different approaches can be used to describe hardware using this language: structural, data flow, and behavioral. To make designs more understandable and maintainable, a design is typically separated into several blocks. This might be done with a block diagram editor or with the use of hierarchical drawings to represent a block diagram. Once the basic building blocks of a design are defined, they can be interconnected to create a larger design.

Verilog [38] is less sophisticated than VHDL. It was developed in 1983, becoming IEEE standard in 1995. Verilog is easier to learn than VHDL but lacks constructs to support system level design. Structural models are built from gate primitives and other modules and describe a circuit using logic gates, letting the user specify the function and delay for a gate. Test modules can be associated with designs. Once the structural models are defined, behavioral models can be included to define submodels in terms of inputs and outputs. A behavioral model can be used to test structural designs. Logic synthesis can be achieved from the model specifications, providing alternative implementations.

SDL [39] is a Specification and Description Language, standardized as ITU recommendation Z.100 (in 1980, latest version in 2000). It is a wide spectrum language used to specify from requirements to implementation. It was developed as a description language for reactive systems, which allows the presentation in a graphical form as extended finite states. The basic theoretical model of an SDL system consists of a set of state machines that run in parallel. These machines are independent of each other and communicate with discrete signals. An SDL system consists of structure (including system, block, process, and procedure hierarchy), communication (signals with optional signal parameters and channels), behavior, data (in the form of abstract data types), and inheritance. It is a language widely used in telecommunications, but it has also been applied to other areas.

The use of a simulation language or HDL could have provided good results for many of our goals. We would have prefered HDLs to simulation languages for a variety of reasons. A proposed architecture can be extended or modified easily. Most HDLs include verification and validation tools. Multiple abstraction levels can be described in detail. Nevertheless, we face several educational problems if we intend to use an HDL or a simulation language. The main problem is that learning any of these languages could take most of the term before they can be used in a simulated architecture. Some computer engineering careers include early courses on HDLs, and they could be a prerequisite for computer organization. But this is not the case with many computer science degrees; computer organization is taught only to support future courses. Moreover, in several cases, the language constructions constrain the components that can be simulated and suffer from limitations; for instance, VHDL is inadequate in representing mixed analog and digital processing [40].

Due to these reasons, we preferred to develop models using a general purpose language (especially if a public domain compiler is available). Standard programming languages are flexible enough to describe multiple levels and extend or modify a given architecture, and programming courses are prerequisites for computer organization subjects. This approach was used in several of the simulation tools presented in Section 1.1. Some of these simulators were built using standard languages (C, C++ or Java; even in [15], the tool was developed using an spreadsheets).

As a first stage of this project, we built a simple computer called Alfa-0 [41] using C++ to develop each of the system's levels. This set of tools lets the students better understand the complete behavior in each layer. We built an environment similar to Spim, but emulating a SPARC processor [42], and a complete emulator of the ATARI processor [43]. The emulator allowed standard ATARI games to run on any Intel processor by defining the behavior of the processor and input/output subsystem.

Assembly language, microarchitecture, and digital logic levels were simulated individually, providing a complete outlook of the system's organization. Unfortunately, the models were too complex to be integrated. Likewise, the use of a standard programming language caused students to confuse the models developed with their simulators. This also led to difficulties in extending or modifying the architecture. To avoid these problems, the simulated computer was completely redesigned using DEVS [44] as the modeling framework. This paradigm was chosen due to the hierarchical and discrete event nature of the problem under study. The following section will explain some basic aspects of this decision.

#### 1.3. Overview of the DEVS Modeling Paradigm

DEVS provides a systems theoretic framework for describing discrete event systems as composites of submodels. Each submodel can be behavioral (called *atomic*) or structural (called *coupled*), consisting of a time base, inputs, and states that are used to compute the next states and outputs. Every model can be integrated into a hierarchy, allowing the reuse of tested models. A DEVS atomic model is described by:

$$M = \langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

X: input events set;

S: state set;

Y: output events set;

**δ**<sub>int</sub>: S  $\rightarrow$  S, internal transition function;

 $δ<sub>ext</sub>: Q × X → S, external transition function, with Q = {(s, e) /$  $s ∈ S, and e ∈ [0, D(s)]};$ λ: S → Y, output function; and

**D**: **S**  $\rightarrow$   $R_0^+$ , duration function.

Models use input/output ports to communicate. Each state in a model has a given lifetime, defined by the duration function. Once the lifetime of a given state finishes, the internal transition function is activated to produce an internal state change. Before this change, the present state of the model can be spread through the output ports. These ports allow events to be sent to other models. The values are sent by the output function, which must execute before activating the internal transition. At any moment, a model can receive input external events from other models through its input ports. When an external event arrives, the external transition function is activated. The external transition function computes a new state for the model using the present state, the input values, and the elapsed time for the model (defined by the duration function). Every time a transition function is activated, a new lifetime must be associated with the new state.

DEVS atomic models can be used to build coupled models, defined by:

$$CM = \langle X, Y, D, \{M_i\}, \{I_i\}, \{Z_{ij}\}, select >$$

X is the set of input events;

**Y** is the set of output events;

**D** is an index of components, and  $\forall i \in D$ ,

Mi is a basic DEVS model;

 $\begin{array}{l} \textbf{I_i} \text{ are the influencees of model } i, \text{ and } \forall \ j \in I_i \\ \textbf{Z_{ij}} \text{: } Y_i \rightarrow X_j \text{ is the } i \text{ to } j \text{ translation function, and} \end{array}$ 

select is the tiebreak selector.

Each coupled model consists of a set of basic models (atomic or coupled) connected through the input/output ports of the interfaces. Each component is identified by an index number. The influencees of each model define other models to which output values must be sent. The translation function uses an index of influencees, created for each model (I<sub>i</sub>). The function defines which outputs of model  $M_i$  are connected to inputs in model  $M_j$ . When two submodels have simultaneous events, the *select* function defines which of them should be activated first.

Unlike the other approaches presented earlier, DEVS meets all of our goals:

- 1. DEVS is a hierarchical and modular technique that allows the description of the multiple levels of an architecture. The SES/SB technique [45] even lets the user define different architectures in the same class hierarchy, choosing between different versions as needed.
- 2. There are different DEVS development environments that can be adapted to different teaching programs. According to the programming paradigm taught in the first-year courses, different existing DEVS toolkits can be used: those with the procedural paradigm (mainly written in C/C++), those that are object-oriented (written in Java or C++), or the functional versions (DEVS/Scheme [45]).
- 3. DEVS supports the definition of models specified in different paradigms, allowing definition of multicomponents, each defined using a different technique.
- 4. DEVS allows any existing model to be extended easily.
- 5. Coupled or atomic models can be modified.
- 6. Each model can be associated with an Experimental Framework (a set of DEVS atomic models that can be coupled with other DEVS models, providing an environment for conducting experiments) used as a testing module. This approach improves testing facilities.
- 7. The learning curve for DEVS is fast enough to be applied in undergraduate courses. Our students learned the basic aspects of the methodology and related tools in approximately 16 man/hours (two man/hours to learn the basic aspects and a two-hour training session) [46].
- 8. Many of the existing DEVS environments are public domain.

Besides meeting our individual goals, DEVS provides several advantages over the other approaches:

- (a) DEVS is a formal approach. Formal specification mechanisms are useful in improving the security and development costs of a simulation. A formal conceptual model can be validated, improving the error detection process and reducing testing time. DEVS models are closed under coupling; therefore, a coupled model is equivalent to an atomic one, improving reuse. DEVS supplies facilities to translate the formal specifications into executable models. In this way, the behavior of a conceptual model can be validated against the real system, and the response of the executable model can be verified against the conceptual specification.
- (b) The existence of an internal transition function is a unique feature that eases the definition of certain properties. Internal state changes can be captured, describing complex internal interactions in a simple and natural way. For instance, if we intend to model a timer with different skews from a unique clock, we can use one signal generator, and the internal state of the clock could define the different output signals. Certain circuits (for instance, synchronous buses) react according to their internal state, which can be modeled straightforwardly using internal transition functions. Modeling of these phenomena is difficult under other methodologies.
- (c) DEVS is a complete modeling and simulation technique. It provides a way to specify models that can be coupled into higher level ones, which are later simulated by independent abstract entities (in centralized or parallel fashions). Each model can be associated with an experimental framework, allowing for the individual testing of components and making integration testing easier.
- (d) DEVS, as a discrete event paradigm, uses a continuous time base, which allows accurate timing representation. Precision of the conceptual models can be improved and CPU time requirements reduced. Higher timing precision can be obtained without using small discrete time segments that would increase the number of simulation cycles.
- (e) Recently, a theory of DEVS quantized models was developed [47]. The theory has been verified when applied to predictive quantization of arbitrary ordinary differential equation models. Quantized models reduce substantially the frequency of message updates. As the information interchange is reduced, the models potentially incur error. In this way, DEVS can be used to express hybrid digital/analog systems. GDEVS [40] also enables the definition of hybrid models, which are expressed in a combined discrete event/ differential equation formalism approximated by DEVS. In GDEVS, the accuracy of an analog subsystem is preserved using piecewise polynomial segments. The error introduced in this approximation can be controlled by increasing the order of the polynomials that represent analog signals between successive digital events.

Considering these advantages, we have designed and implemented our simulated computer, called Alfa-1 using the CD++ development environment [48]. This toolkit implements the theoretical concepts defined by the DEVS formalism. Atomic models can be programmed in C++, and can be later incorporated into a model class hierarchy. A specification language allows for the definition of coupled models.

Alfa-1 was developed to model the architecture of a SPARC processor and includes memory, a bus, and an input/output subsystem. If we compare our proposal with the ones defined in Section 1.1., Alfa-1 was developed as a specific purpose architecture. Nevertheless, as all the components of the architecture have been developed independently, they can be used to define new components or other architectures. Several versions of each element have been developed using different abstraction levels. As the models have been developed using DEVS, they can be reused without further complication.

We have chosen the SPARC architecture because these processors include several interesting features (for instance, multiple registers organized as overlapping windows) that cannot be found in other architectures. Many existing workstations are based in this processor, which are usually expensive. As we have reproduced the complete architecture, we have provided a way of running SPARC applications in other platforms provided with a GNU C++ compiler.

In the following sections, we present some of the results obtained. We first include a definition of the underlying architecture. Then, the specification of some of the DEVS submodels is presented, exemplifying the definition of each model using CD++. Finally, we show some execution results.

As explained in the conclusions, we encountered none of the problems associated with other tools, and all of our goals were achieved. An important remark is that the approach proved to accomplish our educational goals because undergraduate students developed the whole architecture and its components. They had taken a previous course on computer programming in C++ and had a background in mathematics. The definition of the formal models was performed by third-year students from a discrete event simulation course, and these formal specifications were used by students in the second-year computer organization course to build all the models that are described in the following sections.

## 2. A Model of the Processor Architecture

Alfa-1 uses a processor organization based in the specification of the Integer Unit of the SPARC processor (Sun Microsystems). Figure 1 is a sketch of this architecture showing the main components of the model developed. This figure presents the main subcomponents of the Integer Unit, which were defined as the components of a DEVS coupled model. Each of the components was defined as an atomic or a coupled model, specified by using DEVS.

This RISC processor is provided with 520 integer registers. Eight of them are global (RegGlob, shared by every procedure), and the remaining 512 are divided in windows of 24 registers each (RegBlock).



Figure 1. Organization of the Integer Unit

Each window includes input, output, and local registers for every procedure that has been executed recently. When a routine begins, 16 new registers are reserved (8 local and 8 output), and the 8 output records of the calling procedure are used as inputs. A specialized 5-bit register, called CWP (Circular Window Pointer) marks the active window. Every time a new procedure starts, CWP is decremented. The processors's registers organization is sketched in the following figure (Figure 2).

Besides these general purpose registers, the architecture includes:



Figure 2. Organization of the Processor's Registers

• **PCs**: The processor has two program counters. The PC contains the address of the next instruction. The nPC (next PC) stores the address of the PC after the execution of the present instruction. Each instruction cycle finishes by copying the nPC to the PC, and adding four bytes (one word) to the nPC. If the instruction is a conditional branch, nPC is assigned to PC, and nPC is updated with the jump address (if the jump condition is valid).

• Y: This is used by the product and division operations.

• **BASE** and **SIZE**: The memory is considered flat (that is, neither segmentation nor pagination mechanisms are included). Likewise, multiprogramming is not supported. The BASE register points to the lowest address a program can access. The SIZE stores the maximum size available for the program.

• **PSR** (Processor Status Register): This stores the current status for the program. It is interpreted in Table 1.

• WIM (Window Invalid Mask): This 32-bit register (one bit per window) is used to avoid the overwriting of a window in use by another procedure. When CWP is decremented, these circuits verify if the WIM bit is active for the new window. In that case, an interrupt is raised and the interrupt service routine stores the content of the window in memory. Usually, WIM only has one bit in 1 marking the oldest window.

• **TBR** (Trap Base Register): It points to the memory address storing the position of a trap routine. It is interpreted in Table 2.

The first 20 bits (Trap Base Address) store the base address of the trap table. When an interrupt request is received, the num-

#### G. Wainer, S. Daicz and A. Troccoli

| Bits | Content                         | Description                                         |
|------|---------------------------------|-----------------------------------------------------|
| 3124 | Reserved                        |                                                     |
| 23   | N – Negative                    | 1 when the result of the last operation is negative |
| 22   | Z – Zero                        | 1 when the result of the last operation is zero     |
| 21   | V – Overflow                    | 1 when the result of the last operation is overflow |
| 20   | C – Carry                       | 1 when the result of the last operation carried one |
|      | -                               | bit                                                 |
| 1912 | Reserved                        |                                                     |
| 118  | PIL – Processor Interrupt Level | Lowest interrupt number to be serviced.             |
| 7    | S – State                       | 1= Kernel mode; 0=User mode.                        |
| 6    | PS – Previous State             | Last mode.                                          |
| 5    | ET – Enable Trap                | 1=Traps enabled; 0=Traps disabled.                  |
| 40   | CWP – Current Window            | Points to the current register window.              |
|      | Pointer                         | _                                                   |

Table 1. Contents of the Process Status Register

Table 2. Contents of the Trap Base Register

| Bits | Content           | Description                    |  |
|------|-------------------|--------------------------------|--|
| 3112 | Trap base address | Base address of the Trap table |  |
| 114  | Тгар Туре         | Trap to be serviced            |  |
| 30   | Constant (0000)   |                                |  |

ber of the trap to be serviced is stored in the bits 11..4. Therefore, the TBR points to the table position containing the address of the service routine. The last 4 bits in 0 guarantees at least 16 bytes to store each routine.

When the instruction set level of the SPARC architecture is analyzed, we see that each instruction has a fixed size of 32 bits. Memory operands may be 8, 16, or 32 bits. There are basic *Load/Store* operations, classified according to the size and sign of their operands.

Arithmetic and Boolean operations include *add*, *and*, *or*, *div*, *mul*, *xor*, *xnor*, and *shift*. These are able to change the PSR according to the operation code used. Several *jump* instructions are available, including *relative* jumps, *absolute* jumps, *traps*, *calls*, and *return* from traps. Other instructions include the movement of the register window, NOPs, and read/write operations on the PSR.



Figure 3. Organization of the ALU

*Multiplication* uses 32-bit operands, producing 64-bit results. The most significant 32 bits are stored in the Y register, and the remaining are stored in the ALU-RES register. Integer *division* operations take a 64-bit dividend and a 32-bit divisor, producing a 32-bit result. The Y register stores the 32 most significant bits of the dividend. One ALU input register stores the least significant bits of the dividend, and the other, the divisor. The integer result is stored in the ALU-RES register, and the remainder in the Y register. Most instructions are carried out by the ALU, whose structure is depicted in Figure 3. It includes two multiplexers connected to the ALU, Multiplier/Divider unit, and shifter.

There are two execution modes: *User* and *Kernel*. Certain instructions can be executed only in Kernel mode. Also, the Base and Size registers are used only when the program is running in User mode.

The CPU executes under the supervision of the **Control Unit**. It receives signals from the rest of the processor using 64 input bits (organized in 5 groups: the Instruction Register, the PSR, BUS\_BUSY\_IN, BUS\_DACK\_IN, and BUS\_ERR). Its outputs are sent using 70 lines organized in 59 groups. Some of them include reading/writing internal registers activating lines for the ALU or multiplexers. Also, connections with the PC, nPC, Trap controller, and PSR registers are included. Finally, the Data, Address, and Control buses can be accessed.

The **memory** is organized using byte addressing and Little-Endian to store words. The processor issues a memory access operation by writing an address (and data, if needed) in the bus. Then, it turns on the AS (Address Strobe) signal, interpreted by the memory as an order to start the operation. The memory uses the address available and analyzes the RD\_WR line to see which



Figure 4. Organization of the Bus

operation was asked. If a *read* was issued, one word (4 bytes) is taken from the specified address and sent through the data lines. In a *write* operation, the address stored in the Byte Select register (lines BSEL0..3) defines the byte to be accessed in the word pointed to by the Address register. If an address is wrong, the ERR line is turned on. A Data Acknowledge (DTACK) is sent when the operation is finished.

The system components are interconnected using a bus (see Figure 4). The bus Masters use the BGRANT (Bus Grant) and IACK (IRQ Acknowledgment) lines to be connected to the two devices with the following lower and upper priorities. The device with the highest priority is connected to a constant "1" signal in the BGRANT line. The BGRANT signal is sent to the lower priority devices up to the arrival to a device that requested the bus. When the device finishes the transfer, a IACK is transmitted. Input/output operations are memory mapped. Each device has a fixed set of addresses. Data written in those addresses are interpreted as instructions for a device. Fifteen IRQ lines

(IRQ1..IRQ15) are provided, and devices are connected to these lines. Higher priority devices are connected to lower IRQs.

Finally, an external cache memory is defined. The generic structure for the cache controller is defined in Figure 5. The design and implementation of these modules were not included in the original version of Alfa-1. They were defined as an assignment done by undergraduate students, following the procedures that will be presented in the following sections. In a first stage, the circuits were tested separately, different algorithms were implemented, and finally the device was integrated into the architecture. Each model was developed as a DEVS model that was integrated into a coupled model. This extension to the original architecture (which will not be explained in detail) shows some of the capabilities for extensibility and modifiability of Alfa-1.

#### 3. Implementing the Architecture as DEVS Models

The architecture presented in the previous section was completely implemented using CD++. First, the behavior of each component was carefully specified, with an analysis of inputs, outputs and timing for each element. The specification also provided test cases. Then, each component was defined as a DEVS model following the specification. Afterwards, each model was implemented in CD++, including an experimental framework following the test cases defined in the specification. Finally, the main model was built as a coupled model connecting all the submodels previously defined. This model follows the design



Figure 5. Organization of the Cache Memory

presented in Figure 1, and its detailed definition can be found in [49].

Two implementations were considered. First, we reproduced the basic behavior of each circuit, coded as transition functions. Then, some of them were implemented in detail using Boolean logic. The basic building blocks were developed as atomic models, coupling them using digital logic concepts. In this way, two different abstraction levels were provided. Depending on the interest, each of them can be used. Once thoroughly tested, the basic models were integrated into higher level modules up to completing the definition of the architecture. The following sections will be devoted to presenting some of the components implemented as assignments completed by our students. We show how different abstraction levels can be modeled and present examples of modifiability of Alfa-1.

#### 3.1 Inc/Dec

As explained earlier, we use 520 general purpose registers organized as overlapped windows. In a given time, only one window can be active. The *Inc/Dec* model is the component that chooses the active window using a 5-bit CWP register. The models that are part of the CWP logic are shown in the Figure 2. The *CWP* is incremented or decremented, and its value (stored in a d-latch represented as another DEVS) is received through the lines OP0-OP4. The outputs are transmitted through the lines RES0-RES4. This atomic model can be defined as:

**INC/DEC** =  $\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$ 

$$\begin{split} \mathbf{X} &= \text{OP} \in \{0, ..., 2^{5} - 1\} \cup \text{FCOD} \in \{0, 1\}; \\ \mathbf{S} &= \text{OP}, \text{ OLD} \in \{0, ..., 2^{5} - 1\}, \text{ delay_time_ID} \in R_0^+; \text{ } Y = \\ \mathbf{RES} \in \{0, ..., 2^{5} - 1\}; \end{split}$$

The behavior for the transition functions can be informally defined as shown in Figure 6.

The FCOD value is used to tell if the value must be incremented or decremented. The ALU model is used to perform this operation. Here, we can see that when an external event arrives, the *hold\_in* function is activated. This macro rep-

resents the behavior of the DEVS time advance function (D), and it is in charge of manipulating the *sigma* variable. This is a state variable predefined for every DEVS model representing the remaining time up to the next scheduled internal event. The model will remain in the current state during this time, after which output and internal transition functions are activated. The *hold\_in* macro makes this timing definition easier. Passivate is another macro, which uses an infinite *sigma*, and puts the model in *passive* phase (*hold\_in(passive, infinite)*).

Figure 7 shows the implementation of these functions using CD++. As we can see, the *external transition* function ( $\delta_{ext}$ ) receives five operands as inputs, together with a function code. According to this code, the parameter is incremented or decremented. Afterwards, the model keeps the present value during a delay related to the circuit operation. The *output* function ( $\lambda$ ) is activated, and if the circuit changed its state, the present value is transmitted. Then, the internal transition function ( $\delta_{int}$ ) passivates the model (that is, an internal event with infinite delay is scheduled, waiting for the next input). The constructor allows for specification of the model's name, input/output ports, and parameters.

As we can see, the definition of a DEVS atomic model is simpler than the use of any standard programming language. We have explained some of the advantages of using DEVS in Section 1, but in this case, we can see how to apply it in building our models. DEVS provides an interface, consisting of only four functions to be programmed. This modular definition is independent of the simulator, and it is repeated for every model. Therefore, one can focus on the model development. The user concentrates only on the behavior under external events, the outputs that must be sent to other submodels, and the occurrence of internal events. Behavior for every model is encapsulated in these functions, together with the elapsed time definition. Testing patterns can be easily created, as the model can only activate these functions.

Once we have defined the atomic model, we can test it by injecting input values and inspecting the outputs. An experimental frame can be built, including pairs of input/output values to test the model automatically. In any case, we have to

Figure 6. Behavior of the transition functions for the INC/DEC model [50]

```
IncDec::IncDec( const string &name ):
Atomic ( name )
, OPO( this->addInputPort( "OPO" ) ), OP1( this->addInputPort( "OP1" ) )
, OP2( this->addInputPort( "OP2" ) ), OP3( this->addInputPort( "OP3" ) )
, OP4( this->addInputPort( "OP4" ) ), FCOD( this->addInputPort( "FCOD" ) )
, RES0( this->addOutputPort( "RES0" ) ), RES1( this->addOutputPort( "RES1" ) )
, RES2( this->addOutputPort( "RES2" ) ), RES3( this->addOutputPort( "RES3" ) )
, RES4( this->addOutputPort( "RES4" ) ), preparationTime( 0, 0, 10, 0 )
         string time( MainSimulator::Instance().getParameter(
               this->description(), "preparation" ) ) ;
   if( time != "" ) preparationTime = time ;
Model &IncDec::externalFunction( const ExternalMessage &msg ) {
// Check the input ports, assigning the input values.
     if( msg.port() == OPO ) _OP[0] = (int) msg.value();
    if( msg.port() == OP1 ) _OP[1] = (int) msg.value();
if( msg.port() == OP2 ) _OP[2] = (int) msg.value();
if( msg.port() == OP3 ) _OP[3] = (int) msg.value();
     if( msg.port() == OP4 ) _OP[4] = (int) msg.value();
if( msg.port() == FCOD ) _FCOD = (int) msg.value();
     if (FCOD == 1) { // Increment
         for (int i=0; i<=4; i++) v[4-i] = OP[i];
         alu.activate(v,"00000","11",'1'); // Increment the va value useing the ALU
         alu.output(res);
         for (int i = 0; i<=4; i++)
                  RES[i].activate(res[i]);
     }
     else
                            // Decrement
     {
         for (int i=0; i<=4; i++) v[4-i] = _OP[i];
alu.activate(v,"11111","11",'0'); // Decrement the v value useing the ALU</pre>
         alu.output(res);
         for (int i = 0; i<=4; i++)
                  _RES[i].activate(res[i]);
     this->holdIn(active, preparationTime); // Schedule a delay for the circuit
  return *this;
Model & IncDec::internalFunction ( const InternalMessage & ) {
    this->passivate(); // When the delay is consumed, activate the output
         return *this ;
}
Model &IncDec::outputFunction( const InternalMessage &msg )
    if (_RES[0]!=_OLD[0] || _RES[1]!=_OLD[1] || _RES[2]!=_OLD[2] ||
_RES[3]!=_OLD[3] || _RES[4]!=_OLD[4]) {
    sendOutput(msg.time(), RES0, _RES[0] );
          sendOutput(msg.time(), RES1, _RES[1] );
          sendOutput(msg.time(), RES2, RES[2]);
sendOutput(msg.time(), RES3, RES[3]);
sendOutput(msg.time(), RES4, RES[4]);
              _OLD[0]=_RES[0]; _OLD[1]=_RES[1];
OLD[2]=_RES[2]; _OLD[3]=_RES[3];
              _OLD[2]=_RES[2];
_OLD[4]=_RES[4];
     }
         return *this ;
```

Figure 7. INC/DEC model definition: Transition functions [50]

| [top]                          |
|--------------------------------|
| components : I_D@IncDec        |
| out : RESO RES1 RES2 RES3 RES4 |
| in : OPO OP1 OP2 OP3 OP4 FCOD  |
| Link : OPO@top OPO@I D         |
| Link : OP1@top OP1@I D         |
| Link : OP2@top OP2@I D         |
| Link : OP3@top OP3@I D         |
| Link : OP4@top OP4@I D         |
| Link : FCOD@top FCOD@I D       |
| Link : RESO@I D RESO@top       |
| Link : RES1@I D RES1@top       |
| Link : RES2@I D RES2@top       |
| Link : RES3@I D RES3@top       |
| Link : RES4@I D RES4@top       |
| _                              |
| [I D]                          |
| preparation : 0:0:5:0          |

Figure 8. INC/DEC coupled model definition [50]

build a coupled model including the model to be tested. This is defined in Figure 8.

These definitions follow the DEVS specifications. They are defined by its *components* (in this case, I\_D, an instance of the IncDec model) and external parameters. Then, the *links* define the influencees and translation functions including the input/ output ports for the model. In this case, the I\_D model is related with the Top model, using the input/output ports defined earlier.

#### 3.2 RegGlob

This model defines the behavior of the global registers. It keeps the contents of the eight global registers, allowing read/write operations on them. Two auxiliary state variables, *olda* and *oldb*, store the last outputs, and output signals are transmitted only for the bits that changed. This model is defined by:

**RegGlob** = < X, S, Y,  $\delta_{int}$ ,  $\delta_{ext}$ ,  $\lambda$ , D >

 $\begin{aligned} \mathbf{X} &= ASEL \in \{0, ..., 2^{3}-1\} \cup BSEL \in \{0, ..., 2^{3}-1\} \cup CSEL \in \{0, ..., 2^{3}-1\} \cup CEN \in \{0, 1\} \cup RESET \in \{0, 1\} \cup CIN \in \{0, ..., 2^{32}-1\}; \end{aligned}$ 

**Y** = AOUT ∈ {0,...,  $2^{32}-1$  } ∪ BOUT ∈ {0,...,  $2^{32}-1$  }. **S** = OLDA, OLDB, INPUT ∈ {0,...,  $2^{32}-1$ }, IN ∈ {0,...,  $2^{32}-1$ }<sup>8</sup>, BCEN, BRESET ∈ {0,1}, SELECTA, SELECTB, SELECTC ∈ {0,...,  $2^{32}-1$ }, delay\_time\_RG ∈  $\mathbf{R}_0^+$ ;

A sketch of this model was shown in the Figure 2. As we can see, it uses three select lines (*asel, bsel,* and *csel*) to choose two output registers and a register to be modified. An array of 32 integers (*IN*) keeps the present values of the registers. The Boolean line *cen* (C enable line) is used to allow write operations. The external transition function models the reception of an input. The function stores the desired operation according to the signal received. Also, we store an input value in the number of registers to be activated. A new internal event is scheduled with a predefined delay, which models the circuit delay. If an external event arrives before the end of the delay, the operation is cancelled.

The output function decides if the register has changed, querying *olda* and *oldb*, which store the previous status of the A and B lines. When the register changes, its value is sent through the chosen output (A or B). This model shows a more interesting use of the internal transition function. In this case, we are considering the internal state to decide how the model must react. The internal transition function sees if the *reset* line has been activated. In that case, it clears the contents of every register. Then, if the *cen* line was activated, the value of the chosen register is updated with the new input.

## 3.3 Other Basic Components

The architectural description is completed with several other DEVS models. We include generic aspects, making a brief description of their behavior. We do not include the definition of the model's transition functions, built as in the previous examples. Details of these models can be found in [49].

**WIMCheck** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

**X** = CWP ∈ {0,...,  $2^{5}-1$  } ∪ WIM ∈ {0,...,  $2^{32}-1$ }; **S** = dlLastRES, RES ∈ {0,...,  $2^{5}-1$  } ∪ delay\_time\_WC ∈  $\mathbf{R}_{0}^{+}$ ; **Y** = RES ∈ {0, 1};

This circuit checks whether the next window to be used will be overwritten. The component consists of a Window Invalid Mask register. It returns the value of the CWP-eth bit of the WIM register.

**MEMORY** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

**X** = DATA ∈ {0,...,  $2^{32}-1$  } ∪ ADDRESS ∈ {0,...,  $2^{31}-1$ } ∪ ADDRESS\_STROBE ∈ {0,1} ∪ BSEL ∈ {0,...,  $2^{4}-1$ } ∪ RD\_WR ∈ {0,1} ∪ RESET ∈ {0, 1};

S = memory: array(memsize: default 32 Kb), delay\_time\_M  $\in \mathbb{R}_0^+$ ;

**Y** = DATA ∈ {0,...,  $2^{32}$ -1 } ∪ DTACK ∈ {0, 1} ∪ ERR ∈ {0, 1}.

The memory is provided with three basic operations: read, write, and reset. When a reset is issued, the memory initial image is loaded. The processor writes an address in the bus and signals the memory using the AS signal when the address is ready. Then, a read/write signal is issued. The memory reacts according to this signal, using an output after a time related with the memory latency.

#### **ADDER** = < X, S, Y, $\delta_{int}$ , $\delta_{ext}$ , $\lambda$ , D >

 $\begin{aligned} \mathbf{X} &= \text{OPA, OPB} \in \{0, ..., 2^{32} - 1\};\\ \mathbf{S} &= \text{delay\_time\_A} \in \mathbf{R}_0^+;\\ \mathbf{Y} &= \text{RES} \in \{0, ..., 2^{32} - 1\} \cup \text{CARRY} \in \{0, 1\}. \end{aligned}$ 

The adder receives two inputs. Depending on the result, the Carry bit can be turned on.

**ALIGNL/ALIGNS** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

$$\begin{split} \mathbf{X} &= \mathsf{OP} \in \{0, ..., 2^{32} - 1\} \cup \mathsf{SIZE} \in \{0, ..., 3\} \cup \mathsf{KIND} \in \{0, ..., 3\} \cup \\ \mathsf{SIGN} \in \{0, 1\}; \\ \mathbf{S} &= \mathsf{delay\_time\_AL} \in \mathbf{R}_0^+; \\ \mathbf{Y} &= \mathsf{RES} \in \{0, ..., 2^{32} - 1\}. \end{split}$$

These models are used to align data read/written during the Load/ Store operations.

ALU = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

 $X = OPA, OPB \in \{0, ..., 2^{32}-1\} \cup FCOD \in \{0, ..., 2^{4}-1\} \cup CIN \in \{0, 1\};$   $Y = RES \in \{0, ..., 2^{32}-1\} \cup CARRY \in \{0, 1\} \cup ZERO \in \{0, 1\} \cup NEGAT \in \{0, 1\} \cup OVFLW \in \{0, 1\}.$  $S = delay\_time\_ALU \in \mathbf{R}_{0}^{+};$ 

This model represents the behavior of the integer Arithmetic-Logic Unit. It is capable of executing the following operations: add, sub, addx, subx (add/sub with carry), and, or, xor, andn, orn, xnor (negated and, or, xor).

**BOOLEAN GATE** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

 $\mathbf{X} = \text{OP1, OP2} \in \{0,1\};$   $\mathbf{S} = \text{delay\_time\_BG} \in \mathbf{R}_0^+;$  $\mathbf{Y} = \text{RES} \in \{0,1\}.$ 

This group of models was included to provide the behavior of the most used Boolean gates: AND, OR, NOT, and XOR. They receive binary inputs, producing a result according to the desired operation.

**BUS** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

**X** = **Y** = DATA, ADDRESS ∈ {0,...,  $2^{32}$ -1} ∪ BSEL ∈ {0,...,  $2^{4}$ -1} ∪ IRQ ∈ {0,...,  $2^{15}$ -1} ∪ CLOCK, AS, RD/WR, DTACK, ERR, RESET, BUSY ∈ {0,1}; **S** = delay\_time\_Bus ∈  $\mathbf{R}_{0}$ +. The bus interprets each of the input signals, providing outputs related to them. If a device which received a 1 in the BGRANTin port needs to write data in the memory, it writes a 0 in the BGRANTout port (no smaller priority device is able to use the bus). Then, the device starts a bus cycle, turning on the BUSY signal. The device writes the address to be accessed in the AD-DRESS lines and the data to be written in DATA. Afterwards, the Byte Select Mask BSEL defines which byte in the word is used. Finally, it turns on the RD/WRout and AS lines to tell if a Write operation was issued. When the memory receives the AS signal, it executes a memory cycle that finishes when the DTACKout line is turned on. The device that issued the write operation receives this signal in its DTACKin line. When the cycle has finished, if BGRANTin is still in 1, the device is able to transfer new data. Otherwise, it turns off the BUSY line, allowing a new bus operation by other device.

**CCLOGIC** = < X, S, Y, 
$$\delta_{int}$$
,  $\delta_{ext}$ ,  $\lambda$ , D >

$$\begin{split} \mathbf{X} &= \text{CARRY, ZERO, NEGAT, OVFLW} \in \{0,1\} \cup \text{COND} \in \{0,..., 2^{4}-1\};\\ \mathbf{S} &= \text{delay_time_CC} \in \mathbf{R}_0^+;\\ \mathbf{Y} &= \text{RES} \in \{0,1\}. \end{split}$$

This model is used in conditional jumps to decide if a branch must be executed.

$$CLOCK = \langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$
  

$$X = \emptyset; \quad S = \text{period} \in R_0^+;$$
  

$$Y = \text{RES} \in \{0,1\}.$$

This represents the CPU clock, for which periods can be configured.

**CWPLOGIC** = < X, S, Y, 
$$\delta_{int}$$
,  $\delta_{ext}$ ,  $\lambda$ , D >

$$\begin{split} \mathbf{X} &= \text{CWP} \in \{0, ..., 2^{4} - 1\} \cup \text{SEL} \in \{0, ..., 2^{4} - 1\};\\ \mathbf{S} &= \text{delay\_time\_CWP} \in \mathbf{R}_{0}^{+};\\ \mathbf{Y} &= \text{GSEL} \in \{0, ..., 2^{3} - 1\} \cup \text{RSEL} \in \{0, ..., 2^{9} - 1\} \cup \text{R/G} \in \{0, 1\}. \end{split}$$

This model is used to determine if access to the global registers or the register window is required. It returns the kind of register (Register window/ $\underline{G}$ lobal) and its number.

**INC4** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

 $\begin{aligned} \mathbf{X} &= \text{OP} \in \{0, ..., 2^{32} - 1\}; \\ \mathbf{S} &= \text{delay\_time\_INC} \in \mathbf{R}_0^+; \\ \mathbf{Y} &= \text{RES} \in \{0, ..., 2^{32} - 1\}. \end{aligned}$ 

This model updates the nPC.

**IRQLOGIC** = < X, S, Y,  $\delta_{int}$ ,  $\delta_{ext}$ ,  $\lambda$ , D >

**X** = IRQ1,..., IRQ15, PIL0, ..., PIL3 ∈ {0,1}; **S** = int\_latency ∈  $R_0^+$ ; **Y** = TF {0,1} ∪ TT ∈ {0,..., 2<sup>8</sup>-1}.

```
Model &Regglob::externalFunction( const ExternalMessage &msg ) {
  switch (msg.port()) {
                   bcen = (int)msg.value();
      case cen:
                                                    // C enable line turned on
                    breset = (int)msg.value();
                                                    // Reset
       case reset:
   }
   if( msg.port() == "cin+i" ) input[i]=(int)msg.value(); // Store the input lines
   if( msg.port() == "asel+i" ) {
                                     // The i-eth line of the A input was enabled
    selecta= msg.value();
                                      // Store the register number received
   1
   if( msg.port() == "bsel+i" ) {
                                     // The i-eth line of the B input was enabled
                                      // Store the register number received
    selectb= msg.value();
   if( msg.port() == "csel+i" ) {
                                     // The i-eth line of the C input was enabled
                                      // Store the register number received
    selectc= msq.value();
this->holdIn ( active, delay );
return *this;
Model &Regglob::internalFunction( const InternalMessage &msg ) {
   if (breset)
                                      // A reset signal was issued
    for (int i=0; i<255; i++) in[i]=0;
                                            // The 8 register (32 bit each) are deleted
                                             // The write line was enabled
   if (bcen)
   for (int i=0; i<32; i++)
                                             // Update the desired register
       in[(selectc*32)+i]=input[i];
  this->passivate();
                                             // Wait the next internal event
 return *this ;
Model &Regglob::outputFunction( const InternalMessage &msg )
   if (olda[i] != in[selecta*32+i]) {
                                             // The register has changed
       this->sendOutput(msg.time(), aout, in[selecta*32+i]);
olda[i] = in[selecta*32+i]; } // Transmit it through the output line
   if (oldb[i] != in[selectb*32+i]) {
                                             // The register has changed
       this->sendOutput(msg.time(), bout, in[selectb*32+i]);
       oldb[i] = in[selectb*32+i]; } // Transmit it through the output line
  return *this ;
```

Figure 9. RegGlob model definition: Transition functions [50]

This model manages the actions that take place when an interrupt is received. The PIL (Processor Interrupt Level) lines mask the interrupts. If one or more IRQs whose numbers are greater than the PIL are received, the interrupt must be serviced. Then, we see which one has the higher priority, and the TF (Trap Found) bit is turned on. The TT (Trap Type) register is loaded according to the highest level interrupt.

**LATCH** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

**X** = IN ∈ {0,..., 
$$2^{32}$$
-1} ∪ EIN, CLEAR ∈ {0,1};  
**S** = delay\_time\_LA ∈  $R_0^+$ ;  
**Y** = OUT ∈ {0,...,  $2^{32}$ -1}.

This model represents a processor register, implemented as a dlatch. The EIN line enable inputs, and the CLEAR line resets the register to zero. **MUL/DIV** = < X, S, Y,  $\delta_{int}$ ,  $\delta_{ext}$ ,  $\lambda$ , D >

**X** = OPA, OPB, YIN ∈ {0,...,  $2^{32}-1$ } ∪ FCOD ∈ {0,...,  $2^{2}-1$ }; **S** = delay\_time\_MUL ∈  $\mathbf{R}_0^+$ ; **Y** = RES, YOUT ∈ {0,..., $2^{32}-1$ } ∪ ZERO, NEGAT, OVFLW ∈ {0,1}.

This model is in charge of performing multiplication and divisions, and turns on the condition bits.

MUX/ MUX4 = < X, S, Y, 
$$\delta_{int}$$
,  $\delta_{ext}$ ,  $\lambda$ , D >

**X** = OPA, OPB, OPC, OPD ∈ {0,...,  $2^{32}-1$ } ∪ SELA, SELB, SELC, SELD ∈ {0,1}; **S** = delay\_time\_MUX ∈  $R_0^+$ ; **Y** = OUT ∈ {0,..., $2^{32}-1$ }.

```
Model &UC::externalFunction( const ExternalMessage &msg ) {
  if( msg.port() == CLCK ) {
    if( ! waitfmc ) {
        cycle = (cycle + 1) % numcycles;
       this->holdIn( active, 0 );
    }
       else this->passivate();
  } else if( msg.port() == DTACK ) {
       if( msg.value() == 1 ) waitfmc = 0;
       this->passivate();
        } else if( msg.port() == CCLOGIC ) {
               cclogic = bit( msg.value() );
               this->passivate();
               } else
                {
                       string portName;
                       int portNum;
                       nameNum( msg.port().name(), portName, portNum );
                       if( portName == "ir" ) ir[portNum] = bit( msg.value() );
                       else psr[portNum] = bit( msg.value() );
                       this->passivate();
                 }
 return *this;
}
Model &UC::internalFunction( const InternalMessage & ) {
 if( as.val ) waitfmc = 1;
  this->passivate();
 return *this;
}
Model &UC::outputFunction( const InternalMessage &msg ) {
   . . .
  // See if the c_en line must be activated
  c en.val = cycle == c W && !isBranch( ir ) && !isStore( ir ) && !isWr( ir );
  // Read the Instruction Register and decode the instruction
  . . .
  if( isArit( ir ) ) {
   copyBits( ir+19, alu_fcod, 4 );
    enable_mul.val = isMulDiv( ir );
    enable_alu.val = isAlu( ir );
   enable_shft.val = isShft( ir );
  } else {
   enable_mul.val = 0;
    enable_alu.val = 1;
    enable shft.val = 0;
    if( isWr( ir ) )
     toBits( ALU XOR, alu fcod, 4 );
    else
      toBits ( ALU ADD, alu fcod, 4 );
  }
  // See branches
  if( isJmp( ir ) )
    incdec fcod.val = INCDEC INC;
  else
    incdec_fcod.val = ir[19];
   // Transmit the outputs
  for( int i = 0; i < numsports; i++ )</pre>
    if( needSend( *sportbits[i] ) )
      this->sendOutput( msg.time(), *sOUT[i], sportbits[i]->val );
  for( int i = 0; i < nummports; i++ )</pre>
    for( int j = 0; j < mportsizes[ i ]; j++ )
      if( needSend( mportbits[i][j] ) )
       this->sendOutput( msg.time(), *mOUT[i][j], mportbits[i][j].val );
  return *this;
```

These models represent 2 or 4 input multiplexers. To choose them, we receive the 4-bit *select* signal whose bit turned on marks which indicate that the input will be sent through the output.

**REGBLOCK** = 
$$\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$$

**X** = ASEL, BSEL, CSEL ∈ {0,..., 2<sup>9</sup>–1} ∪ CEN, RESET ∈ {0,1} ∪ CIN ∈ {0,...,2<sup>32</sup>–1}; **S** = delay\_time\_RBL ∈  $\mathbf{R}_0^+$ ; **Y** = AOUT, BOUT ∈ {0,...,2<sup>32</sup>–1} ∪ ZERO, NEGAT, OVFLW ∈ {0,1}.

This model is in charge of managing the register window.

**SHIFTER** =  $\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$ 

**X** = OPA, OPB ∈ {0,...,  $2^{32}$ -1} ∪ FCOD ∈ {0,1}; **S** = delay\_time\_SH ∈  $\mathbf{R}_0^+$ ; **Y** = RES ∈ {0,..., $2^{32}$ -1}.

This model is in charge of implementing a shifter.

**SIGNEXT13/SIGNEXT22** =  $\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$ 

 $\begin{aligned} \mathbf{X} &= \mathbf{OP} \in \{0, ..., 2^{13} - 1\} \cup \{0, ..., 2^{22} - 1\}; \\ \mathbf{S} &= \text{delay_time_SE} \in \mathbf{R}_0^+; \\ \mathbf{Y} &= \text{RES} \in \{0, ..., 2^{32} - 1\}. \end{aligned}$ 

These models extend the sign of an operand of 13 or 22 bits to 32 bits.

**TRAPLOGIC** =  $\langle X, S, Y, \delta_{int}, \delta_{ext}, \lambda, D \rangle$ 

**X** = TRAPS ∈ {0,...,  $2^{17}-1$ } ∪ TRAP\_NUMBER ∈ {0,...,  $2^{6}-1$ }; **S** = delay time TL ∈  $R_0^+$ ;

 $\mathbf{Y} = \text{TRAP}_{\text{FOUND}} \in \{0, 1\} \cup \text{TRAP}_{\text{INSTRUCTION}} \in \{0, \dots, 2^{31}-1\} \cup \text{TRAP}_{\text{TYPE}} \in \{0, \dots, 2^{8}-1\};$ 

This component defines which trap must be serviced based on a priority system. One of the input lines defines a non-masked trap. The other seven bits are used to receive the number of a

| Line           | Description                  | Priority | Trap Type |
|----------------|------------------------------|----------|-----------|
| INST_ACC_EXCEP | Instruction access exception | 5        | 0x01      |
| ILLEG_INST     | Illegal instruction          | 7        | 0x02      |
| PRIV_INST      | Privileged instruction       | 6        | 0x03      |
| WIN_OVER       | Window overflow              | 9        | 0x05      |
| WIN_UNDER      | Window underflow             | 9        | 0x06      |
| ADDR_NOT_ALIGN | Address not aligned          | 10       | 0x07      |
| DATA_ACC_EXCEP | Data access exception        | 13       | 0x09      |
| INST_ACC_ERR   | Instruction access error     | 3        | 0x21      |
| DATA_ACC_ERR   | Data access error            | 12       | 0x29      |
| DIV_ZERO       | Division by zero             | 15       | 0x2A      |
| DATA_ST_ERR    | Data store error             | 2        | 0x2B      |

| Table | 3. A | vaila | ble | Trai | ns |
|-------|------|-------|-----|------|----|
| Table | J. 1 | vana  | UIC | 114  | 23 |

trap that can be masked. The model returns a bit telling if the trap must be serviced, and eight bits telling the trap type. Table 3 shows the kinds and priorities for each trap available.

According to this table, the model analyzes which is the higher priority trap to be serviced. After a delay, it sends the corresponding index through the output ports.

# 3.4 Control Unit

The Control Unit is in charge of driving the execution flow of the processor. As explained earlier, this model uses several input/output lines. According to the input received, it issues different outputs, activating the different circuits that were defined previously. Figure 10 describes part of the behavior of the Control Unit. The specification of the input/output sets is not included because of its size (details can be found in [49]).

As we can see, this model is activated by the occurrence of a clock tick. In this case, we check whether the Control Unit is waiting for a result coming from the memory (waitfmc). In that case, we have nothing to do and the model passivates. Otherwise, we register that a clock tick has finished. Other external inputs correspond to the signal DTACK coming from the memory or the CCLOGIC (that is, an input arriving from a register). We also recognize inputs for the Instruction Register (to store a new instruction to execute) or for the PSR (to update the condition codes). The internal transition function records that when the Address Strobe is up, we are waiting for the end of a memory transfer. The main tasks of the control unit are executed by the output function. As we can see in the description, the present input values are queried. Depending on the number of clock ticks in the instruction cycle, different output lines are activated.

# 4. The Digital Logic Level

The abstraction levels of several models were further detailed, allowing students to analyze the digital logic level of the circuits. In the previous stage, the behavior of these circuits was



Figure 11. Sketch of the Address Unit



Figure 12. One-bit Comparator [50]

defined by using atomic models. In this case, some of these models were built using atomic models representing the basic Boolean gates (AND, OR, NOT, and XOR). These models (described in the previous section) were used as components that were integrated using digital logic. A coupled model representing the complete circuit replaced the old atomic ones. These modifications, also done in course assignments, show the extensibility and modifiability of Alfa-1. Two of the models implemented this way will be explained.

#### 4.1 CMP Model

The *CMP* is a part of the Address Unit that detects addresses falling out of the program boundaries. The model receives two inputs (through the lines *OPA* and *OPB* that are connected to the BASE and LIMIT registers). As a result, it returns the signal EQ if both values are equal or LW if A is lower than B.

The model is composed of several one-bit comparators, and coupling n of them generates n-bit comparators. Figure 12 shows the basic components of this building block.

This model is formally described by:

 $CM = \langle X, Y, D, \{M_i\}, \{I_i\}, \{Z_{ij}\}, select >$ 

 $\begin{aligned} \mathbf{X} &= \{ \text{OPAn, OPBn / OPAn, OPBn } \in \{0,1\} \}; \\ \mathbf{Y} &= \{ \text{EQ, LW / EQ, LW } \in \{0,1\} \}; \\ \mathbf{D} &= \{ \text{NOT_n_1, NOT_n_2, XOR_n, AND_n_1, AND_n_2 } \}; \\ \text{where each is an atomic defining the corresponding building block, presented previously in Section 3.3.; \\ \mathbf{I} \text{ NOT_n_1} &= \{ \text{AND_n_1} \}; \\ \mathbf{I} \text{ XOR_n} &= \{ \text{NOT_n_2} \}; \\ \mathbf{I} \text{ NOT n 2} &= \{ \text{Self} \}; \end{aligned}$ 



Figure 14. Sketch of the Chip Selector [50]

I AND\_n\_2 = {Self}; I AND\_n\_1 = { AND\_n\_2 }; I self = { Self, NOT\_n\_1, AND\_n\_1, XOR\_n}; and Z<sub>ij</sub> is built using I, as described earlier, and Select = D.

The definition of this coupled model using CD++ is presented in Figure 13.

First, we define the components of the coupled model (corresponding to the D set). Then, the input/output ports are included (which are related with the X/Y sets defined earlier). Finally, the links show the model influencees (which define the translation function). The *select* function is implicitly defined by the order of definition for the model components.

#### 4.2 Chip Selector

The Chip Selector (CS) circuit (Figure 14) is devoted to determining if an address is between two others. The model receives a 32-bit address and an Address Strobe (AS), and it returns a Boolean value telling if the address is between the boundaries.

The MASK models provide two 32-bit sets (MAX Mask, MIN Mask) containing the boundaries of the address to be compared. These models, defined originally as latches, were redefined using Boolean gates. The input address for the chip selector is checked using two comparators, instances of the model were defined in the previous section.

The result obtained is transmitted through the ports LW and EQ for each of the comparators. Both outputs are ORed for the first register (as we are interested to see if CMP A  $\leq$  MAX). Afterwards, the LW output of the second register is inverted (as we are interested to see if CMP B  $\geq$  MIN). If the circuit is enabled, the result obtained is transmitted. Figure 15 shows the coupled model definition of the Chip Selector.

```
[top]
components : NOT_n_1@NOT NOT_n_2@NOT XOR_n@XOR AND_n_1@AND AND_n_2@AND
in : OPAn OPBn
out : LW EQ
Link : OPAn@top in@NOT_n_1
Link : OPBn@top in@@XOR_n
Link : OPBn@top inb@AND_n_1
Link : out@NOT_n_1 ina@AND_n_1
Link : out@XOR_n in@NOT_n_2
Link : out@AND_n_2 EQ@top
Link : out@NOT_n_2 LW@top
```

Figure 13. CMP coupled model

```
[top]
components: MASMAX@MAS MASMIN@MAS CMPA@CMP CMPB@CMP and1@AND and2@AND or@OR not@NOT
in : A31 A30 A29 A28 A27 A26 A25 A24 A23 A22 A21 A20 ... A4 A3 A2 A1 A0 AS
out : CS
Link: A31@top OPA31@CMPA A31@top OPA31@CMPB
                                              Link: A30@top OPA30@CMPA A30@top OPA30@CMPB
Link: A1@top OPA1@CMPA A1@top OPA1@CMPB
                                               Link: A0@top OPA0@CMPA A0@top OPA0@CMPB
Link: out31@MASMAX OPB31@CMPA out31@MASMIN OPB31@CMPB
Link: out30@MASMAX OPB30@CMPA out30@MASMIN OPB30@CMPB
Link: out0@MASMAX OPB0@CMPA out0@MASMIN OPB0@CMPB
Link: AS ina@and2
Link: eq@CMPA ina@or lw@CMPA inb@or
Link: lw@CMPB in@not
Link: out@or ina@and1 out@not inb@and1
Link: out@and1 inb@and2
Link: out@and2 CS@top
```

Figure 15. CS coupled model [50]

| TNDUE              |                     |
|--------------------|---------------------|
| INPUT              | OUTPUT              |
| 00:00:00:00 OP0 1  | 00:00:05:000 res0 1 |
| 00:00:05:00 OP1 0  | 00:00:05:000 res1 0 |
| 00:00:10:00 OP2 1  | 00:00:05:000 res2 0 |
| 00:00:15:00 OP3 0  | 00:00:05:000 res3 0 |
| 00:00:20:00 OP4 0  | 00:00:05:000 res4 0 |
| 00:00:25:00 FCOD 1 |                     |
|                    | 00:00:15:000 res0 1 |
|                    | 00:00:15:000 res1 0 |
|                    | 00:00:15:000 res2 1 |
|                    | 00:00:15:000 res3 0 |
|                    | 00:00:15:000 res4 0 |
|                    |                     |
|                    | 00:00:30:000 res0 1 |
|                    | 00:00:30:000 res1 0 |
|                    | 00:00:30:000 res2 1 |
|                    | 00:00:30:000 res3 0 |
|                    | 00:00:30:000 res4 1 |

Figure 16. Inputs and Outputs for the INC/DEC Model

| INPUT               | OUTPUT                |
|---------------------|-----------------------|
| 00:00:00:00 cen 1   |                       |
| 00:00:00:00 csel2 1 |                       |
| 00:00:00:00 cin0 1  |                       |
|                     |                       |
| 00:00:00:00 cin31 1 |                       |
| 00:00:01:00 csel2 0 |                       |
| 00:00:01:00 csel1 1 |                       |
| 00:00:01:00 cin1 0  |                       |
| 00:00:01:00 cin3 0  |                       |
| 00:00:01:00 cin5 0  |                       |
|                     |                       |
| 00:00:01:00 cin29 0 |                       |
| 00:00:01:00 cin31 0 |                       |
| 00:00:02:00 cen 0   | 00:00:02:010 aout0 1  |
| 00:00:02:00 asel2 1 |                       |
| 00:00:02:00 bsell 1 | 00:00:02:010 aout31 1 |
|                     | 00:00:02:010 bout0 1  |
|                     | 00:00:02:010 bout2 1  |
|                     | 00:00:02:010 bout4 1  |
|                     |                       |
|                     | 00:00:02:010 bout30 1 |
| 00:00:04:00 reset 1 |                       |
| 00:00:05:00 asel2 1 | 00:00:05:010 aout0 0  |
| 00:00:05:00 asel1 0 | 00:00:05:010 aout2 0  |
|                     |                       |
|                     | 00:00:05:010 aout28 0 |
|                     | 00:00:05:010 aout30 0 |

Figure 17. Inputs/Outputs of RegGlob [50]

#### **5. Simulation Results**

The present section shows the results obtained when some of the models previously presented are simulated. In the first case, we show the results of a value of 20 incremented by the *INC/DEC* model. Figure 16 shows the model inputs with their timestamps and the output values obtained.

The first step consists of giving an initial value to the circuit (zero by default). The first event (OP0 = 1 at 00:00:00:00) generates an output only when the model phase changes. As the preparation time for the circuit is five time units, this occurs at 00:00:05:000. The second input does not generate changes in the model and no output is issued. In simulated time 10, a new input is generated through the port OP2. As this value changed, an output is generated at simulated time 15. The following two inputs are not registered because the circuit keeps its present state. The last one increments the value in the register by inserting the value through the FCOD port. The incremented value can be seen five time units later.

Figure 17 is an example of the execution of the RegGlob model under different inputs. At the instant 0, the C enable line is activated, allowing write operations in the register. In this case, the register 4 is selected (csel2 = 1, csel1 = 0 and csel0 = 0), and the number 0xFFFFFFF is used as input (cin0 = ... = cin31 = 1). Afterwards in 00:00:01:00, the register 2 is selected (csel2 = 0 and csel1 = 1), and the number 0x55555555 is input (cin0 = cin2 = cin4... = cin30 = 1, and cin1 = cin3 = cin5 = ... = cin30 = 1). The first value is stored in the register 4, and the second in the register 2.

At 00:00:02:00, C Enable is deactivated. Therefore, the following operations are devoted to read registers. We see that the value in the register 4 is sent through the A output (*asel* = 2) and the register 2 is sent through B (*bsel* = 1). As a result, the values previously loaded are transmitted (that is, 0xFFFFFFFFF in A, and 0x55555555 in B). Afterwards, Reset is activated. Now, we try to read register 4 at 00:00:05:00, and we obtain the value 0x00000000.

The next test (Figure 18) corresponds to the *TrapLogic* model. Here, we can see the result obtained after turning on all the trap bits. Because of this, we expect to obtain the index of the high-

#### December TRANSACTIONS 2001

| INPUTS                                | OUTPUTS            |
|---------------------------------------|--------------------|
| 00:00:00:001 / inst acc excep / 1.000 | 00:00:00:052 tf 1  |
| 00:00:00:002 / illeg_inst / 1.000     | 00:00:00:052 tt7 1 |
| 00:00:00:003 / priv inst / 1.000      | 00:00:00:052 tt6 1 |
| 00:00:00:004 / win over / 1.000       | 00:00:00:052 tt4 1 |
| 00:00:00:005 / win under / 1.000      | 00:00:00:052 tt2 1 |
| 00:00:00:006 / addr not align / 1.000 |                    |
| 00:00:00:007 / data acc excep / 1.000 |                    |
| 00:00:00:008 / inst acc err / 1.000   |                    |
| 00:00:00:009 / data acc err / 1.000   |                    |
| 00:00:00:010 / div zero / 1.000       |                    |
| 00:00:00:011 / data st err / 1.000    |                    |
| 00:00:00:012 / trap inst / 1.000      |                    |
| _                                     |                    |

#### Figure 18. Execution Results for the TrapLogic Model [50]

| dest:                                                       | <pre>set 0x12:<br/>st %r1,<br/>sth %r1,<br/>sth %r1,<br/>stb %r1,<br/>stb %r1,<br/>stb %r1,<br/>stb %r1,<br/>unimp<br/>.ascii "</pre> | 345678, %:<br>[dest]<br>[dest+4<br>[dest+1]<br>[dest+1]<br>[dest+1]<br>[dest+2]<br>[dest+2] | r1<br>0]<br>2]<br>7]<br>2]<br>7]                                                                         | ! Load<br>! Stor<br>! Stor<br>! Stor                                                       | the register 1 with 0x12345678<br>e it in the "dest" variable<br>e the high half-word<br>e the last byte                                                                                                                                                                                                                            |
|-------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Initial                                                     | Image                                                                                                                                 |                                                                                             |                                                                                                          |                                                                                            |                                                                                                                                                                                                                                                                                                                                     |
| Addr.                                                       |                                                                                                                                       | Memory                                                                                      | Image                                                                                                    |                                                                                            | Interpretation                                                                                                                                                                                                                                                                                                                      |
| 040<br>044<br>052<br>056<br>060<br>064<br>068<br>072<br>076 | 11000010<br>11000010<br>11000010<br>11000010<br>11000010<br>11000010<br>000000                                                        | 00100000<br>00110000<br>00101000<br>00101000<br>00101000<br>00101000<br>000000              | 00100000<br>00100000<br>00100000<br>00100000<br>00100000<br>00100000<br>00100000<br>00100000<br>00100000 | 01001000<br>01001100<br>01010010<br>01010100<br>01011001<br>01011110<br>01100011<br>000000 | Store the register 1 in the address 72<br>Store the high part of reg. 1 in address 76<br>Store the high part of reg. 1 in address 82<br>Store the high byte of reg. 1 in address 84<br>Store the high byte of reg. 1 in address 94<br>Store the high byte of reg. 1 in address 94<br>unimp<br>"dest" variable (20: space character) |
| Final i                                                     | mage                                                                                                                                  |                                                                                             |                                                                                                          |                                                                                            |                                                                                                                                                                                                                                                                                                                                     |
| Addr.<br>072<br>076<br>080<br>084<br>088<br>092<br>096      | 00010010<br>01010110<br>00100000<br>01111000<br>00100000<br>00100000<br>00100000                                                      | Memory<br>00110100<br>01111000<br>00100000<br>01100000<br>00100000<br>00100000              | Image<br>01010110<br>00100000<br>01010110<br>00100000<br>00100000<br>01111000<br>00100000                | 01111000<br>00100000<br>01111000<br>00100000<br>00100000<br>00100000<br>01111000           | Values<br>12 34 56 78<br>56 78 20 20 (20 = space)<br>20 20 56 78<br>78 20 20 20<br>20 78 20 20<br>20 78 20<br>20 20 78 20<br>20 20 78 20<br>20 20 78                                                                                                                                                                                |

Figure 19. Storing a value in memory

est priority trap that is pending. The result obtained after the delay time corresponds to the highest one in Section 3.4, *Data Store Error* (whose code is TT = 0x2B = 00101101, the result we obtained). Also, the Trap Found flag is turned on.

Finally, we show two execution examples that are part of a complete program. All the examples were executed in a Pentium processor (133 MHz), using the Linux version of CD++. The average performance for this model was one instruction per sec-

ond. The source code was translated to binary using the GNU MASM assembler Linker. The executable is used as the initial memory image for the simulator. The first part of Figure 19 shows part of a program written in assembly language. The second part presents the binary code generated, together with the addresses for each instruction or data (one word each).

As we can see, this piece of code copies parts of the number 0x12345678 to certain memory addresses. We show the trans-

```
set 1, %r1
                                                  ! Load the register 1 with the value 1
cycle: sll %r1, %r2, %r3
                                                  ! Shift the value the number of times in r2
        stb %r3, [%r2+dest]
                                                  ! Store the result in the variable dest + r2
        subcc %r2, 12, %r0
                                                  ! Repeat the cycle 12 times
        bne cycle
        inc 1, %r2 !Delay slot
        unimp
        .ascii "
                                            ...
dest:
        .ascii "
                                            "
        .ascii "
                                            ...
        .ascii "
                                            ...
Initial Image
                   Memory Image
                                                  Interpretation
Addr.
032
        10000010 00010000 00100000 00000001 set <1>, 1
        100000111 00101000 01000000 00000010 % \left( {\left[ {{\left[ {{{\rm{Take}}} \right]} \right]_{\rm{Take}}}} \right) Take the register 1, shift and store in R3
036
        11000110 00101000 10100000 00111100
                                                 Store in address 60 1 byte
040
        10000000 10100000 10100000 00001100
044
                                                 substract 12 to R0
048
        00010010 10111111 11111111 11111101
                                                 Relative jump to address -2 words (40)
        10000100 0000000 10100000 00000001
052
                                                 increment 1
056
        00000000 0000000 00000000 00000000
                                                 unimp
060
        00100000 00100000 00100000 00100000
                                                 Destination variable
064
        00100000 00100000 00100000 00100000
068
        00100000 00100000 00100000 00100000
 . . .
Final image
060
        00000001 00000010 00000100 00001000 A value 0x01 shifted 12 times
064
        00010000 00100000 01000000 10000000
        0000000 0000000 0000000 0000000
068
. . .
```

Figure 20. Shifting and storing results in memory

lation of the binary codes based on the specification of the instruction set of the SPARC processor. Finally, we show the memory image after the program execution. As we can see, the values stored in memory follow the instructions defined by the executable code.

The following example, illustrated in Figure 20, shows the execution of part of another program. As we can see, the goal is to place a 1 in a given address, and then shift this value to the left, storing the result in the following address. The cycle is repeated 12 times.

Once the basic behavior of the simulated computer was verified, a thorough integration test was attacked. As explained earlier, each circuit was defined together with a set of input/output values that were encapsulated in an experimental framework. Once each of the models was tested, each operation in the instruction set was checked. The procedure was developed using the verification facilities of DEVS, defining 17100 test cases. The mechanism consisted of creating an experimental framework, which executed an instruction in the instruction set. The execution result was stored in memory, and a memory dump

```
set 274543375, %r24 ! stores a value in register 24
set 13908050 , %r22 ! a second value is stored in the register 22
udiv %r24, %r22 , %r10 ! both values are divided and stored in r10
st %r10 , [dest] ! The result is stored in memory
unimp
.align 4
value: .ascii "VALUE:"
dest: .word FFFFFFFF ! Result of Test 100
```

Figure 21. Testing routine for the UDIV Instruction

| Field | Туре  | Expected | Found | DIF   |
|-------|-------|----------|-------|-------|
|       |       | =======  |       | ===   |
| 1     | int32 | 19       | 1     | * * * |

Figure 22. Error message

Message I / 00:00:00:000 / Root(00) to top(01) Message I / 00:00:00:000 / top(01) to mem(02) // Initialize the higher level // components: memory, bus, CS, etc. Message I / 00:00:000 / top(01) to bus(03) Message I / 00:00:000 / top(01) to csmem(04) Message I / 00:00:00:000 / top(01) to cpu(05) Message I / 00:00:00:000 / top(01) to c1(64) Message I / 00:00:00:000 / top(01) to dpc(65) Message D / 00:00:00:000 / mem(02) / ... to top(01) // The models reply the next Message D / 00:00:00:000 / bus(03) / ... to top(01) // scheduled event Message D / 00:00:00:000 / csmem(04) / ... to top(01) Message I / 00:00:00:000 / cpu(05) to ir(06) // The CPU initializes the components Message I / 00:00:00:000 / cpu(05) to pc\_add(07) Message I / 00:00:00:000 / cpu(05) to pc mux(08) . . . Message \* / 00:00:000 / Root(00) to top(01) Message \* / 00:00:00:000 / top(01) to cpu(05) Message \* / 00:00:00:000 / cpu(05) to npc(10) // Take the nPC Message Y / 00:00:00:000 / npc(10) / out2 / 1.000 to cpu(05) Message Y / 00:00:000 / npc(10) / out5 / 1.000 to cpu(05) Message D / 00:00:000 / npc(10) / ... to cpu(05) Message X / 00:00:000 / cpu(05) / in2 / 1.000 to pc\_latch(11) // Send to pc-inc Message X / 00:00:000 / cpu(05) / op2 / 1.000 to pc\_inc(13) // to increment Message X / 00:00:00:000 / cpu(05) / in5 / 1.000 to pc\_latch(11) // the value Message X / 00:00:00:000 / cpu(05) / op5 / 1.000 to pc\_inc(13) Message D / 00:00:00:000 / pc\_latch(11) / 00:00:10:000 to cpu(05) // Schedule the Message D / 00:00:00:000 / pc\_inc(13) / 00:00:10:000 to cpu(05) Message D / 00:00:00:000 / pc\_latch(11) / 00:00:10:000 to cpu(05) Message D / 00:00:00:000 / pc\_inc(13) / 00:00:10:000 to cpu(05) // activation of the // pc-inc model Message D / 00:00:00:000 / cpu(05) / 00:00:000 to top(01) Message D / 00:00:00:000 / top(01) / 00:00:00:000 to Root(00) Message \* / 00:00:00:000 / Root(00) to top(01) Message \* / 00:00:000 / top(01) to cpu(05) Message \* / 00:00:00:000 / cpu(05) to pc(12) 

 Message Y / 00:00:00:000 / pc(12) / out5 / 1.000 to cpu(05)
 // Initial address

 Message D / 00:00:00:000 / pc(12) / ... to cpu(05)
 // 010000 = 32

 . . . Message \* / 00:00:000 / Root(00) to top(01) Message \* / 00:00:000 / top(01) to cpu(05) Message \* / 00:00:000 / cpu(05) to clock(45) // Clock tick Message Y / 00:00:00:000 / clock(45) / clck / 1.000 to cpu(05)
Message D / 00:00:00:000 / clock(45) / 00:01:00:000 to cpu(05) Message X / 00:00:000 / cpu(05) / clck / 1.000 to cu(43) Message D / 00:00:00:000 / cu(43) / 00:00:00:000 to cpu(05) Message D / 00:00:00:000 / cpu(05) / 00:00:00:000 to top(01) Message D / 00:00:00:000 / top(01) / 00:00:000 to Root(00) Message \* / 00:00:00:000 / Root(00) to top(01) Message \* / 00:00:000 / top(01) to cpu(05) // Arrival Message \* / 00:00:000 / cpu(05) to cu(43) // And acti Message Y / 00:00:000 / cu(43) / a\_mux\_reg / 1.000 to cpu(05) // Arrival to the CU // And activation of the components Message Y / 00:00:00:000 / cu(43) / b mux reg / 1.000 to cpu(05) Message Y / 00:00:00:000 / cu(43) / enable\_alu / 1.000 to cpu(05) Message Y / 00:00:00:000 / cu(43) / addr\_mux / 1.000 to cpu(05) Message Y / 00:00:00:000 / cu(43) / ir\_latch\_en / 1.000 to cpu(05) Message Y / 00:00:00:000 / cu(43) / as / 1.000 to cpu(05) Message Y / 00:00:00:000 / cu(43) / rd\_wr / 1.000 to cpu(05) Message Y / 00:00:00:000 / cu(43) / busy / 1.000 to cpu(05) Message Y / 00:00:00:000 / cu(43) / c mux2 / 1.000 to cpu(05) Message Y / 00:00:00:00 / cu(43) / pc mux0 / 1.000 to cpu(05) Message D / 00:00:00:000 / cu(43) / ... to cpu(05) . . . Message \* / 00:00:10:000 / Root(00) to top(01)
Message \* / 00:00:10:000 / top(01) to cpu(05)
Message \* / 00:00:10:000 / cpu(05) to pc\_latch(11) Message D / 00:00:10:000 / pc latch(11) / ... to cpu(05) Message D / 00:00:10:000 / cpu(05) / 00:00:00:000 to top(01) Message D / 00:00:10:000 / top(01) / 00:00:000 to Root(00) Message \* / 00:00:10:000 / Root(00) to top(01) Message \* / 00:00:10:000 / top(01) to cpu(05) Message \* / 00:00:10:000 / cpu(05) to pc\_inc(13) //
Message Y / 00:00:10:000 / pc\_inc(13) / res3 / 1.000 to cpu(05) // Update the nPC Message Y / 00:00:10:000 / pc inc(13) / res5 / 1.000 to cpu(05)

Figure 23. Log file of a simple routine

```
Message D / 00:00:10:000 / pc inc(13) / ... to cpu(05)
. . .
Message *
         1
            00:00:20:001 / Root(00) to top(01)
            00:00:20:001 / top(01) to mem(02)
          1
                                                      // Memory returns the first instr.
Message *
Message Y /
            00:00:20:001 / mem(02) / dtack / 1.000 to top(01)
Message Y /
            00:00:20:001 / mem(02) / out_data0 / 1.000 to top(01)
                                     out_data13 / 1.000 to top(01)
            00:00:20:001 / mem(02) /
Message Y /
            00:00:20:001 / mem(02) / out_data20 / 1.000 to top(01)
Message Y /
            00:00:20:001 / mem(02) /
Message Y /
                                     out data25 / 1.000 to top(01)
            00:00:20:001 / mem(02)
Message Y /
                                   /
                                     out_data31 / 1.000 to top(01)
Message D /
            00:00:20:001 / mem(02)
                                   1
                                      ... to top(01)
            00:00:20:001 / top(01)
                                     in_dtack / 1.000 to bus(03)
Message X /
                                   1
Message X /
            00:00:20:001 / top(01) / in_data0 / 1.000 to cpu(05)
                                     in_data13 / 1.000 to cpu(05)
            00:00:20:001 / top(01)
Message X /
                                   /
                                   1
Message X /
            00:00:20:001 / top(01)
                                     in data20 / 1.000 to cpu(05)
                           top(01) /
Message X /
            00:00:20:001 /
                                     in data25 / 1.000 to cpu(05)
Message X / 00:00:20:001 / top(01) / in data31 / 1.000 to cpu(05)
Message D / 00:00:20:001 / bus(03)
                                   / 00:00:00:001 to top(01)
```

Figure 23. Log file of a simple routine

was executed, obtaining the memory state after the execution. This value is checked against the value obtained when the same program is executed in the real architecture, which is included in the testing experimental framework. This procedure allowed us to find some errors derived from the coupled model. For instance, we could see that the division instruction was not working properly. The generated test included the sentences in Figure 21.

When this example was executed, the testing coupled model found an error, shown in Figure 22.

In this case, the destination should have stored the value 19 (274543375 divided by 13908050). Instead, we have found the value 1, allowing us to see that one of the instructions had an unexpected behavior. In this way, we could find errors in some of the instructions that could be fixed. We also found errors in some addition instructions and in conditional jumps with prediction.

Finally, we show part of the execution of the simulator for the example presented in Figure 20. We show a log file including the messages interchanged between modules in Figure 23. As in other DEVS frameworks, there are four kinds of messages: \* (used to signal a state change due to an internal event), X (used when an external event arrives), Y (the model's output), and **done** (indicating that a model is finished with its task). The I messages initialize the corresponding models. For each message, we show its type, timestamp, value, origin/destination, and the port used for the transmission.

The execution cycle starts by initializing the higher level models (memory, CPU, etc.). The message that arrived at the CPU model is sent to its lower level components: Instruction Register, PC Adder, PC multiplexer, Control Unit, etc.

When the initialization cycle finishes, the imminent model is executed. In this case, the *nPC* model is activated, transmitting the address of the next instruction. As we can see, the 2nd and 5th bits are returned with a 1 value. That means that the nPC value is 100100 = 36 (as we see in Figure 20, the program starts in the address 32). The value is sent to the *pc-inc* model, in charge of adding 4 to this register. The update is finished at 10:000, as the activation time of this model was scheduled using the circuit delay. At that moment, a 4 value is added to the nPC, and we obtain the 3rd and 5th bits in 1 (*res3* and *res5*), that is, 101000 = 40, the next PC. Later, the PC is activated and the value 010000 (that is, 32) is obtained. This is the initial address of the program. The following event is the arrival of a clock tick, sent to the processor. The CPU schedules the next tick (in 1:00:000 time units) and transmits the signal to the Control Unit, which activates several components: *a-mux*, *ALU*, *Addr-mux*, *IR*, etc.

We finally see in the simulated time 20:000 that the memory has returned the first instruction (compare the results with the bit configuration in the address 32). The instruction is sent to the CPU to be stored in the Instruction Register and to follow with the execution. The rest of the instruction cycle is completed in the same way.

We are able to follow the execution flow of any program by analyzing this log file. To simplify the analysis of results, we built a set of scripts using Tk that lets students choose which components should be considered. In this way, behavior of each of the subcomponents can be followed more easily, and students can analyze the behavior of the desired subsystem in detail.

# 6. Conclusion

We have presented the use of DEVS in simulating a simple computer. The models were based on the architecture of the SPARC processor, which includes features not existing in simpler CPUs. The tools can be used in computer organization courses to analyze and understand the basic behavior of the different levels of a computer system. The interaction between levels can be studied, and experimental evaluation of the system can be performed.

The use of DEVS allowed us to have reusable models (in this case, Boolean gates, comparators, multiplexers, latches, etc.). DEVS also allowed us to provide reusable code for different configurations. We provided different machines, one running the digital logic level and the other running the instruction set, with different performance in each case, depending on the educational needs. The concept of internal transition functions can be used to improve the definition of the timing properties of each component, permitting the definition of complex synchronization mechanisms. Nevertheless, in this case, most timing delays were represented as simple input/output relations.

We have met all the goals proposed. Alfa-1 is public domain and has been developed using CD++ which is also public domain, and it was built using GNU C++. Therefore, the toolkit is available for its use in most existing computer organization courses. We described several levels of the architecture (from the Digital Logic level up to the Instruction Set). The assembly language level was also attacked using public domain assemblers that generated executable code that could run in Alfa-1. We have easily extended the components (for instance, a cache memory that was not included in the first versions). We also have modified existing components (implementing, for instance, Digital Logic versions of some of the circuits). Thorough testing could be done using an approach based on the construction of experimental frameworks associated with testing functions. An experimental framework was also built for the final integrated model.

The most important achievements were related to our educational goals. The whole project was designed as an assignment in a third-year discrete event simulation course. The models were formally specified, and the specifications were used by students in a computer organization course to build the final version of the architecture. These students had taken previous prerequisite courses in programming. With only this knowledge, the students were able to build all the components presented here. Final integration was planned by a group of undergraduate teaching assistants (who also developed the Control Unit and a coupled model representing the whole architecture shown in the Figure 1). Individual and integration testing was also done by second-year students. Several of the modifications shown here were developed as course assignments. These facts show the feasibility of the approach from a pedagogical point of view. Upper level courses reported higher success rates and detailed knowledge of the subjects after using Alfa-1.

The tools can be obtained at http://www.sce.carleton.ca/ wainer/usenix. Different experiences can be obtained using this toolkit. In the assembly language level, the students can use existing assemblers to build executables that run in the simulator. A complete analysis of the execution flow at the instruction level can be achieved by tracing the execution in the log file. The students can study the flow of a program and each instruction with detail, starting from the memory image of an executable. The instruction cycle and signal flow in the datapath can be easily inspected. Going deeper, we can see the behavior of those circuits implemented in the digital logic level.

By extending or changing the existing instructions and implementing the changes in the Control Unit, the students can experience the design of instruction sets. This allows them to obtain practice in instruction encoding, and to relate instruction definition with the underlying architecture. Students also can include new components (as was shown with the cache memory example), change existing ones, or implement them using digital logic. The hierarchical nature of DEVS provides the means to go deeper into the hierarchy. For example, the logical gates could be implemented by defining the transistor level (which has not been implemented in this version). We planned to build an Assembler and Linker, but the code generated by those provided by GNU for SPARC plattforms executed straightforwardly. Nevertheless, the implementation of an assembler and linker are interesting assignments that would complete the layered view applied in these courses. Also, a debugger for the Alfa-1 architecture could be built, making study of the assembly language level easier.

At present, Alfa-1 is being extended by defining components of the input/output subsystem. Several input/output devices, interfaces, and DMA controllers will be simulated. Different transference techniques (polling, interrupts, DMA) will be considered. Likewise, the implementation of different cache management algorithms is being finished. Other tasks faced at present include the definition of a graphical interface to enhance the use of the toolkit. The set of scripts mentioned in Section 5 will be used to gather the results of the simulations and will be used as inputs to be displayed in a graphical way. In this way, the study and analysis of the different subsystems will be improved.

#### 7. Acknowledgments

We want to thank the anonymous referees for their detailed comments made on this article. We also thank Prof. Trevor Pearce at SCE, Carleton University, for his help with the final version. Sergio Zlotnik collaborated in the early stages of this project; his work is presented earlier in [51]. The research was partially funded by the Usenix Foundation and by NSERC. It was developed while Gabriel Wainer was an assistant professor at the Computer Sciences Department of the Universidad de Buenos Aires in Argentina.

#### 8. References

- Hennesy, J. and Patterson, D. "Computer Architecture: A Quantitative Approach." Morgan Kaufmann, San Francisco, Second Edition. 1997.
- [2] Patterson, D. Computer Organization and Design: The Hardware/ Software Interface. Second Edition, University of California, Berkeley, 1995.
- [3] Stallings, W. Computer Organization and Architecture. Macmillan, New York, Fourth. Edition, 1996.
- [4] Heuring, V. and Jordan, H. Computer Systems Design and Architecture. Addison-Wesley, 1997.
- [5] Tanenbaum, A. Structured Computer Organization. Fourth Edition, Prentice Hall, New Jersey, 1999.
- [6] Burger, D. and Austing, T. "The SimpleScalar Tool Set. Version 2.0." *Computer Architecture News*. Vol. 25, No. 3, pp 13-25. June 1997.
- [7] Coe, P., Williams, L., and Ibbett, R. "An Interactive Environment for the Teaching of Computer Architecture." *Proceedings of the Annual Joint Conference Integrating Technology into Computer Science Education*, pp 33-35. Barcelona, Spain, 1996.
- [8] Rosemblum, M., Bugnion, E., Devine, S., and Herrod, S. "Using the SimOS Machine Simulator to Study Complex Computer

Systems." ACM Transactions on Modeling and Computer Simulation, January 1997.

- [9] Pearce, T. "Notes on p86 Assembly Language and Assembling." Internal Report, Dept. of Systems and Computer Engineering. Carleton University. http://www.sce.carleton.ca/courses/94201/. 2000.
- [10] Shealy, A., Malloy, B., and Sykes, D. "An Extensible Simulator for the Intel 80x86 Processor Family." *Proceedings of the 30th Annual Simulation Symposium*, pp 157-166, 1997.
- [11] Morsiani, M. and Davoli, R. "The MPS Computer System Simulator." Technical Report UBLCS-99-8, Universitá de Bologna, Italy, April 1999.
- [12] Hennesy, J. and Patterson, D. Appendix A: Assemblers, Linkers and the SPIM Simulator. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco. Second Edition, 1997.
- [13] Babaoglu, O., Bussan, M., Drummond, R., and Schneider, F. "Documentation for the CHIP Computer System." Technical Report TR83-584. Cornell University. Computer Sciences Dept. December 1983.
- [14] Bevilacqua, R., Gomez, L., and Gomez, S. "The PROVIR Virtual Processor." (in Spanish). M.Sc. Thesis. Departamento de Computación. Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires, 2000.
- [15] El Hajj, A., Kabalan, K., Mneimneh, M., and Karablieh, F. "Microprocessor Simulation and Program Assembling Using Spreadsheets," *SIMULATION*, Vol. 75, No. 2, pp 82-90, August 2000.
- [16] Edmonson, J. and Reilly, M. "Performance Simulation of an AL-PHA Microprocessor." *IEEE Computer*, May 1998.
- [17] Ikodinovic, I., Magdic, D., Milenkovic, A., and Milutinovic, V. "Limes: A Multiprocessor Simulation Environment for PC Platforms." *Proceedings of Third International Conference on Parallel Processing and Applied Mathematics (PPAM)*, Kazimierz Dolny, Poland, September 14-17, 1999.
- [18] Brewer, E.A., Dellarocas, C.N., Colbrook, A. and Weihl, W.E. "PROTEUS: A High-Performance Parallel-Architecture Simulator." Technical Report TR-516, MIT / LCS, Laboratory for Computer Science, Cambridge, MA, September 1991.
- [19] Burns, M., George, A., and Wallace, B. "Modeling and Simulative Performance Analysis of SMP and Clustered Computer Architectures." SIMULATION, February 2000.
- [20] Bedicheck, R. "Talisman: Fast and Accurate Multicomputer Simulation." *Proceedings of SIGMETRICS'95*, Ottawa, Ontario, Canada, May 1995.
- [21] Shanmugan, K., Frost, V., and La Rue, W. "A Block-Oriented Network Simulator (BONeS)." *SIMULATION*, February 1992, Vol. 59, No. 2.
- [22] Nguyen, A.-T., Michael, M., Sharma, A., and Torellas, J. "The Augmint Multiprocessor Simulation Toolkit for Intel x86 Architectures Computer Design." *Proceedings of IEEE International Conference on VLSI in Computers and Processors*, 1996. pp 486-490.
- [23] Konas, P. and Yew, P. "Improved Parallel Architectural Simulations on Shared-memory Multiprocessors," *Proceedings of the* 1994 Workshop on Parallel and Distributed Simulation, 1994, pp 156-159.
- [24] Dwarkadas, S., Jump, S.J.R., and Sinclair, J.B. "Execution-driven Simulation of Multiprocessors Address and Timing Analysis,"

ACM Trans. Model. Comput. Simul. Vol. 4, No. 4, Oct. 1994, pp 314-338.

- [25] Brorsson, M. "SM-prof: A Tool to Visualize and Find Cache Coherence Performance Bottlenecks in Multiprocessor Programs." *Proceedings of the 1995 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems*, pp 178-187, 1995.
- [26] Hein, A. and Dal Cin, M. "Performance and Dependability Evaluation of Scalable Massively Parallel Computer Systems with Conjoint Simulation," *ACM Trans. Model. Comput. Simul.* Vol. 8, No. 4, October 1998, pp 333-373.
- [27] Apduhan, B.O., Sueyoshi, T., Namiuchi, Y., Tezuka, T., and Arita, I. "Experiments of a Reconfigurable Multiprocessor Simulation on a Distributed Environment." *Proceedings of International Phoenix Conference on Computers and Communications.* Phoenix, AZ, 1992.
- [28] Tan, G.S.H. and Tep, Y.M. "Experiences in Simulating a Declarative Multiprocessor," *Proceedings of the 28th Annual Simulation Symposium*, pp 95–104, 1995.
- [29] Zagar, M., Basch, D. "Microprocessor Architecture Design with ATLAS," *IEEE Design and Test of Computers*. July 1997.
- [30] Stahl, I. Introduction to Simulation with GPSS. Prentice Hall Internal, 1990.
- [31] Bagrodia, R. "Designing Efficient Simulations Using Maisie," *Proceedings of the 1991 Winter Simulation Conference*, December 8-11, 1991, Phoenix, AZ, pp 243-247.
- [32] Dabney, J. and Harman, T. Mastering Simulink 4. Prentice-Hall, 2001
- [33] Gauthier, M. ACSL Reference Manual. http://www.acslsim.com/. 1998.
- [34] CACI Products Company. MODSIM II, The Language for Object-Oriented Programming, CACI, La Jolla, California, 1991.
- [35] Kiviat, P. et al. "SIMSCRIPT: A Simulation Programming Language," CACI, 1973.
- [36] Fishwick, P. Simulation Model Design and Execution. Prentice Hall, 1995.
- [37] Ghosh, S. "Hardware Description Languages: Concepts and Principles." An IEEE Press Original Monograph, ISBN 0-7803-4744-7, 2000.
- [38] Thomas, D. and Moorby, P. *The Verilog Hardware Description Language*. Kluwer Academic Publishers, Boston, 1991.
- [39] Mitschele-Thiel, A. Systems Engineering with SDL. JW Wiley. 2000.
- [40] Giambiasi, N., Escude, B., and Ghosh, S. "GDEVS: A Generalized Discrete Event Specification for Accurate Modeling of Dynamic Systems," *Transactions of the Society for Computer Simulation (SCS) International*, Vol. 17, No. 3, September 2000, pp 120-134, San Diego, CA.
- [41] Wainer, G. "ALFA-0: A Simulated Computer as an Educational Tool for Computer Organization." G. Wainer. In *Proceedings* of IASTED Applied Modeling and Simulation, 1998. Hawaii, USA.
- [42] Troccoli, A. and Wainer, G. "CRAPS: An Emulator for the SPARC Processor" (in Spanish). In *Proceedings of INFOCOM Argentina* '98. Buenos Aires, Argentina.
- [43] Isacovich, F, Mislej, E., Winternitz, F., and Wainer, G. "An Emulator of the Atari Processor" (in Spanish). Internal Report, Computer Organization Course, Computer Sciences Dept., Universidad de Buenos Aires, Argentina (First Prize at the Stu-

dents Contest of the 28th Conference on Informatics and Operations Research, Buenos Aires, Argentina), 1999.

- [44] Zeigler, B., Praehofer, H., and Kim, T. Theory of Modeling and Simulation: Integrating Discrete Event and Continuous Complex Dynamic Systems. Academic Press, 2000.
- [45] Zeigler, B. Object-oriented Simulation with Hierarchical Modular Models. Academic Press, 1990.
- [46] Wainer, G. and Giambiasi, N. "Application of the Cell-DEVS Paradigm for Cell Spaces Modeling and Simulation," *SIMULATION*, Vol. 76, No. 1, January 2001, pp 22-39.
- [47] Zeigler, B.P. "DEVS Theory of Quantization," DARPA Contract N6133997K-0007: ECE Dept., UA, Tucson, AZ. 1998.
- [48] Rodriguez, D. and Wainer, G. "New Extensions to the CD++ Tool," Proceedings of SCS Summer Multiconference on Computer Simulation, 1999.
- [49] Daicz, S., Troccoli, A., Zlotnik, S. and Wainer, G. "Architectural Definition of the ALFA-1 Simulated Processor," (in Spanish). Internal Report. Departamento de Computación. Universidad de Buenos Aires. http://www.dc.uba.ar/people/proyinv/usenix, 1998.
- [50] De Simoni, L., Enrique, S., Glinsky, E., Petronio, D., Wassermann, D., Wainer, G. et al. "Definition of Components for the ALFA-1 Simulated Processor," (in Spanish). Internal Report. Departamento de Computación. Facultad de Ciencias Exactas y Naturales. Universidad de Buenos Aires, 1998.
- [51] Daicz, S., Troccoli, A., Zlotnik, S. and Wainer, G. "Using the DEVS Paradigm to Implement a Simulated Processor," In *Proceedings of 33rd IEEE/SCS Annual Simulation Symposium*. Washington, D.C. U.S.A.



Sergio Daicz received a B.Sc. (1998) and a Licentiate degree (M.Sc., 2000) in Computer Science from the University of Buenos Aires. His research interests are diverse and include discrete event simulation, algorithmic information theory, and behavior-based intelligence. His is currently working on Ph.D. studies at the University of Buenos Aires, and is a teaching assistant and research assistant in the Robotic Soccer project at the Computer Science Department.



Alejandro Troccoli received a B.Sc. (1998) and a Licentiate degree (M.Sc. 2001) from the Universidad de Buenos Aires, Argentina. He was a teaching and research assistant at the Computer Science Department of the Universidad de Buenos Aires. He is currently a second year Ph.D. student at Columbia University in New York, NY. His research interests include visualization and simulation.



**Gabriel A. Wainer** received a Licentiate degree (M.Sc., 1993) and Ph.D. degree (1998, with highest honors) from the Universidad de Buenos Aires, Argentina, and Université d'Aix-Marseille III, France. He is Assistant Professor at the SCE Department at Carleton University (Ottawa, Canada, 2000-). He was Assistant Professor at the Computer Sciences Department of the Universidad de Buenos Aires (1997-2000), and a teaching and research assistant in that department since 1988. He was

also a visiting research scholar at ACIMS (University of Arizona, Tucson, AZ) and Invited Professor at the Polytechnique de Marseille. He has published more than 60 articles in the field of operating systems, real-time systems and discrete-event simulation. He is Associate Editor of the *TRANSACTIONS of the Society for Modeling and Simulation.* He was the PI of several research projects and a reviewer for different international conferences, journals, and research projects. He is author of a book on real-time systems and another book on discreteevent simulation (in Spanish). Professor Wainer was a member of the Board of Directors of the SCS, is the chair of the Standards Committee of the SCS, and a coordinator of a group on DEVS standardization. He is also an associate director of the Ottawa Center of The McLeod Institute of Simulation Sciences. His current research interest is related to modeling methodologies and tools, modeling and simulations.