COMP-522 MODELLING AND SIMULATION

Switching Hardware Modelling and Simulation in DEVS

Matthew Holly

Department of Computer Science, McGill University

FALL 2002


Introduction

This project explores the modelling and simulation of switching hardware using the DEVS formalism. A prototype of a graphical tool for building complex models is presented. Two different types of switches are modelled and simulated, and their relative performance is evaluated under several different network traffic patterns.


1. Project Goals and Requirements

The goals of this project are many and diverse. For clarity, I have separated them into five main categories:


2. A Switching Hardware Primer

Switches are ubiquitous in networking environments. Switches are used to connect several otherwise unrelated networks together to form an internetwork. Being a multi-input, multi-output device, the sole service provided by a switch is to get packets from one network to another network as fast as possible. In general, the switch lies at the center of a star topology, with each of the networks it serves connected directly to it (unlike, say, the shared bus topology of an Ethernet or the ring topology of FDDI and token rings). By linking increasingly larger networks together, and by interconnecting the switches themselves, internetworks can scale to very large sizes (e.g. the Internet).

The analysis and design of hardware and software to support computer networks is a large field of study; it is simply impossible to provide a self-contained reference for switching hardware here. The interested reader is referred to the excellent book by Peterson and Davie (see section 7). In order to get started, however, a quick primer on two basic types of switches is provided next.

2.1 Crossbar

A crossbar switch is, at least conceptually, a very simple design. Every input port is connected to every output port, and thus upon receipt of a packet the switch merely removes the packet from an input queue and puts it in the appropriate output queue. The following figure shows what a 4X4 crossbar switch looks like when modelled using the tool built for this project. The pink boxes are input ports while the grey boxes are the outputs; it should be clear that there are four inputs each connected to the four different outputs.

The true complexity of the crossbar switch lies in its output ports. Since a packet can arrive at any time from any input port, the output port must deal with contention amongst input ports. When multiple input ports each have a packet to send to the same output port at the same time, the output port must provide intelligent buffering to avoid needlessly lost packets.

Complexity of this type is undesireable in hardware components. Furthermore, connecting every input port directly to every output port is expensive, at least as far as switching hardware goes. The complexity of the switch grows at a rate proportional to the number of (complex) output ports, but since a crossbar switch always has the same number of inputs and outputs the real complexity of the switch is proportional to (at least) the square of the number of output ports.

2.2. Knockout

By making a few assumptions about the nature of network traffic, it is possible to design switches which provide "good enough" or "near-perfect" type service. A Knockout switch, so named because its logical internal structure represents an elimination tournament, is a simplified crossbar switch whose complexity scales better than a full crossbar.

The crucial observation made by the designers of the Knockout switch was that it is unlikely that each and every input port would receive data at the same time. Typically, only some subset of input ports are actually active at any given time, and hence if we can design a switch which can handle the most common case of few active input ports it should be possible to keep the complexity of the hardware down. For a knockout switch with t output ports, accepting only t inputs each clock cycle (where t < n) makes for a switch with reduced hardware requirements without taking too much of a performance hit. Choosing the proper value for t (and n for that matter) is part art and part traffic flow analysis; we will not touch this topic here.

For the purposes of this project, we can consider a Knockout switch as being composed of two smaller modules: the concentrator and the output buffer. (Like the crossbar switch, there exists a substantial amount of "hidden" complexity in the design of the output ports).

The concentrator part of the switch operates just like a tournament: packets "compete" against each other and the winners advance towards the output ports while the losers move closer to being discarded. The winner is decided by choosing one of the two packets at random. The losers of the first stage of competition enter a second stage, analagous to the consolation prize in an elimination tournament. In essence, the first stage of the concentrator picks the first place finisher, the second stage determines the second place finisher, the third stage the third place finisher, and so on up to some number t (again, where t is the number of output ports). The following picture shows an 8-to-4 Knockout switch being modelled in the tool built for this project. As before, the output ports are the grey boxes. The pink boxes are indeed still "input" ports, but many of them are inputs for other blocks (i.e. they're not strictly ports "to the outside world"). The red ports are discard ports: any packet ending up here has been dropped. The lowermost pink block is special: it's a PacketGate node operating in "Robin" mode (the various classes are discussed later).

The output ports required for a Knockout switch implementation are quite unique. Since it's possible that t packets will arrive at each cycle, the output port must either be prepared to whisk aways all t packets in one operation or be able to buffer them in such a way so as not to cause too much overhead. The solution is to have t output buffers, but have a shifter module placed just before them that directs each incoming packet to the next output buffer in round-robin fashion. In this way, the number of packets stored in any one buffer differs from its neighbours by at most one packet, and each output buffer is implemented as a standard buffer sending out one packet per cycle.

2.3. What makes a good switch?

We have already seen that simplicity is a desirable trait for switches, as is the case with arbitrary hardware constructs (there is always a tradeoff between performance and complexity). However, we must mention two other important characteristics of a switch. The first is throughput: how many packets can a switch get from its inputs to its outputs in a given amount of time. The second issue is scalability: how does the complexity of the design change as we add or remove input ports (e.g. is an 8x8 crossbar switch more complex than a 20-to-10 Knockout switch?).

In section 5, the performance of a 4x4 crossbar switch vs. an 8-to-4 Knockout switch is measured for a variety of different traffic loads.


3. Why use DEVS?

Taking a step back from switching hardware, we now discuss why the DEVS formalism might be a good choice for modelling harware.

3.1. What is DEVS?

Naturally, since this is project for a course on modelling and simulation, the reader is referred to the course notes. However, in the event that someone arrives at this site by happenstance, a direct link to the DEVS notes is provided.

3.2. DEVS modelling of Hardware

The question remains as to why DEVS was chosen as the formalism in which to model switching hardware. Following Wainer et al. (see section 7), here are some reasons why DEVS might be a good choice for modelling hardware:

  1. Supports hierarchical modelling and modularity, necessary properties of any nontrivial hardware design projects.
  2. Many mature, public domain DEVS frameworks are available.
  3. DEVS is relatively easy to learn. As it turns out, the primary use for DEVS as a hardware modelling tool is its pedagogical value, its potential use as a teaching tool for computer organization courses.
  4. DEVS is a formal approach, which is important for validation and verification.
  5. DEVS uses a continuous time base, although it’s a discrete event formalism. While in most cases hardware modelling is done in an environment where all operations are under strict clock-based timing, there are instances when designers might need to mesh clocked and non-clocked systems (e.g. mixed digital/analog devices). The fact that DEVS uses a continuous time base may indeed be beneficial in such cases, even with the added overhead of having to explicitly simulate clock cycles.

4. The Modelling and Simulation Tool

One of the major goals of this project was to create a simple yet effective and extensible tool for the graphical modelling of not just switching hardware, but hardware in general. It must be stressed that this tool is a prototype, and the design phase of this tool is was accelerated to emphasize this fact.

4.1. A Quick Tour

The tool was written in Python, and Tkinter was used for the graphical user interface (GUI). In high level terms, the user is able to add, remove and connect "nodes" which themselves are Python modules. Arbitrary user modules can be loaded by the user provided they know the name of the module (and can spell it correctly, and the module can be found in the current Python search path, etc.). There is a simple root class, Node, which users must extend to write their own such modules, and in this way users can define their own hardware constructs at the desired level of abstraction. When the model is simulated, the tool internally builds the implied DEVS model and simulates it using the simulator provided with PythonDEVS. The user does not directly manipulate the DEVS representation for two reasons. Firstly, it can be automatically generated from the graph structure of the node hierarchy, and secondly because PythonDEVS, following the original specification of the DEVS formalism, does not allow models to be disconnected and/or deleted once they are added to the system.

4.2. Behind the Scenes

Source Code tarball
HTML pretty print version of code.

The class hierarchy above has been grouped in order to facilitate the discussion. Firstly, the PythonDEVS group is nothing more than that, the default (essentially unmodified) PythonDEVS modules. The group that hooks the application to PythonDEVS consists of two mere wrapper classes that delegate their methods to the Node object that created them. This is done in order to fully decoupled the main class hierarchy from PythonDEVS, allowing alternative DEVS implementations to be painlessly integrated into the framework at a later date.

The application and custom UI group is quite straightforward, and the purpose of each custom UI widget will become clear in section 4.3 as the GUI is discussed in detail. The application module fully encapsulates the state of the application and contains the script code which starts the program (you can invoke the tool by typing "python application.py" on the command line).

The class hierarchy under the abstract Node class reflects the considerable amount of overlap amongst the basic components. For example, the simple classes under ClockedNode inherit so much from their parent classes that they often consist of merely one or perhaps two overridden methods ... and that's it. The TemplateNode class is a good place to start when creating custom Nodes, but it must be realized that the TemplateNode class is meant to show all the methods which could be overridden and not those methods which usually need to be (e.g. often the external transition function and the output function are specific to the type of node, but something like the time advance function tends to be easy to share).

The most interesting subclass of ClockedNode is the PacketGate class. The PacketGate is a multi-input multi-output Node that forwards packets from its inputs to outputs in one of three ways. The default method is a strictly random scheme where each packet received is sent to one of the outputs (with equal probability for each). The PacketGate can also be configured to operate in "round robin" mode where each packet is sent to the "next" output port. Otherwise, the PacketGate can operate in dispatch mode, forwarding each packet to the appropriate output as defined in the packet itself. Each of these modes can be changed while modelling by double clicking the node and entering the proper behaviour name ("Random", "Robin" or "Dispatch"). The configuration of nodes is covered in more detail in section 4.3.4. It is worth noting here that one of the reasons why switching hardware, rather than hardware in general, is that quite a lot can be done with relatively few primitives.

4.3. The User Interface

The GUI for this prototype tool is decidely simplistic and perhaps a little contrived. The following picture ties the various features of this tool to their appropriate widget:

4.3.1. Module Select

The "Module Select" widget is the most powerful feature of this tool. Initially empty, the user can load arbitrary Python modules by clicking the "LOAD NEW" button and entering the name of the module. Currently the only useful modules one might want to load would be subclasses of the "Node" class, because only subclasses of "Node" can be drawn on the canvas.

For example, one might want to define and import several different classes when creating hardware models at the logic gate level, such as "AND", "OR", "NOT" and "XOR" gates. Each of these classes should be a (possible indirect) subclass of Node, and the application takes care of the rest.

The models of switching hardware constructed in this study were made directly by loading and manipulating the modules as defined in the class hierarchy of section 4.2. After "PacketGate", "IntegerGenerator", "DelayNode" and "Sink" are loaded, they can all be used a abstract blocks when building models with no further hassle.

The ability to dynamically load modules makes this tool very extensible. Dynamic loading can also be a handy "quick fix": at one point I had a huge model loaded and found a bug in a certain class, but was able to fix it in the Python source file and then reload the module without closing the application (it is especially important in this case because the tool does not allow one to save models).

4.3.2. Selection Edit

Using the mouse to select nodes in the model is the default mode of operation. A single click on any one node will highlight the node (using a thick green outline) for later manipuation. For example, you could highlight a node and then click on the "move" button, giving you the ability to move the selected node anywhere on the canvas. You can also operate in "group select" mode, which allows the user to highlight multiple nodes at once (very useful when moving large sections of a model). One of the reasons I chose not to use AToM3 was because it does not support this very common and useful feature.

4.3.3. Model Edit

There are only two basic operations to perform on the model level: add a node or remove a node. By clicking on either of the "add" or "remove" buttons the application enters the proper mode; clicking on a node in remove mode will delete the node from the model while clicking anywhere on the canvas (even on another node) when in add mode will add a new block to the model. The type of node added depends on the current selection in the "module select" widget (section 4.3.1).

4.3.4. Node Edit

There are two very common operations to perform on the node level. First, nodes can be connected. After clicking on the "connect" button, the user can then link nodes by first clicking the source node and then the destination node. The order in which nodes are connected is important: the first node connection made to a node becomes its first input (or its first output, depending on if it was the first or second node clicked on). Nodes cannot be connected to themselves, but they can form multiple connections to the same node in either direction.

The second operation performed on nodes is to configure them. Currently the way to bring up the configuration dialog is to double click on a node. The base class "Node" provides some common properties that can be changed, such as "colour", "delay" or "node name." However, the real power comes from the fact that nodes may override this method to provide their own options (e.g. Generators might ask for inter-arrival times or minimum and maximum values to generate). The override is simple and painless: simply create a dictionary of key-value mappings where the keys are the name of the option (e.g. "colour") and the values are the default values for the key (e.g. "red"). The dictionary is then passed to the ConfigurationDialog class which takes care of all the messy details of displaying custom dialog boxes, collecting data and so on. When combined with the dynamic loading of modules (section 4.3.1), this provides a quick and easy way to determine exactly those parameters a custom node needs ... or doesn't need.

4.3.5. Link Edit

The link edit toolbar contains two buttons that are always disabled: one to add vertices to the lines between nodes in order to make the model more visually acceptable and another button to delete connections (currently the only way to delete links is to delete one of the two nodes it connects). The picking and editing of lines is a delicate operation which, unfortunately, amounts to mere eye candy in a prototype tool like this, and hence why the buttons do nothing. Furthermore, it takes some effort to get line editing right (e.g. in AToM3 it can be frustrating to work with lines).

4.3.6. Simulation

The simulation toolbar is important but not very interesting in and of itself. The button to start the simulation does just that (dynamically building a DEVS model from the displayed model first, of course). The stop button does nothing because it would require modifications to the simulator provided with PythonDEVS (see 4.5.4 for reasons why this is not allowed).

4.5. Future Work

The following tidbits are not really required reading but help to explain why certain features are not implemented, and also why some have been purposely disabled for the submitted version of this tool.

4.5.1. Model Save and Open

When you strip a tool down to the barest of essential features, it's surprising to find which features you can do without (at least until the application reaches a certain size).The open/save feature tandem is found in most programs. The inability to save models was not a problem when doing this project because even the biggest models built were done in a matter of minutes, and all simulation runs were subsequently performed before closing the application. Obviously, a production version of a modelling tool must include this feature, but implementing it here for this prototype would potentially involve dealing with file formats and/or versioning control.

The Python module "pickle" was used to save certain models, but eventually the feature was removed due to time constraints. It remains an "undocumented" feature...

4.5.2. Select & Save

In the absence of a more general open/save feature as described above, the potentially useful capability to select a group of nodes and save them as a reusable, importable module was also scrapped. However, it's a very salient feature that, after the save option is added, would almost certainly find its way into a real production implementation.

4.5.3. Node Zoom

The zoom feature is incomplete in that it allows you to "go into" nodes and build within them (as you do with the topmost node). Therein lies one of the great advantages of the DEVS formalism: hierarchical, modular modelling.

There is one huge caveat when using the zoom feature: the tool does not connect the internals of your newly built compound model to the I/O ports of the next level. For instance, if you have a block called "adder" which accepts inputs from two "register" nodes and you decide to zoom into the "adder" node and implement a ripple-carry adder with basic logic gate primitives, then your effort will have been in vain because no node inside the "adder" block will be connected to the "register" blocks one level up.

The zoom feature is the best feature of this tool specifically because it does not work (serious)! The problem is that there is no obvious generalizable way to connect nodes to components inside other nodes. One way might be to have the tool "query" nodes for what they contain, another way might be to graphically represent connections coming "from the outside" when zoomed inside a component. Either way, this is precisely the type of design trap that this prototype tool was meant to expose! The zoom feature actually fulfills important project requirements by not working!

4.5.4. Simulation Stop, Pause and Resume

The choice was made early on to NOT modify PythonDEVS or even the simulator that comes with it. These modules were treated as plugabble "black boxes" which could later be changed (e.g. for other DEVS implementation). The only modifications made to the simulator was to disable "VERBOSE" mode because doing so sped up simulation greatly by reducing the amount of information printed to the standard output. Eventually, the facility for dynamic importing of user modules should be extended to cover other aspects of the tool, such are custom simulators, interface setups, preferences, plotting devices, etc.

At one point, the entire simulator was wrapped up in a thread object in the hopes that the simulation could be stopped, paused, resumed and even reversed to some extent. While all of these operations worked occasionaly, I personally find it distasteful to include features that are known to fail "some of the time." In this case, the only way to ensure the proper behaviour of the simulator would be to either modify it (decided a priori not to modify it) or to write another simulator that respects some pre-defined interface (not feasible given the time frame).

Another special feature is the progress bar at the bottom. This bar is used to track the progress of the simulation (some simulations took over half an hour, so it was important). Of course, this progress bar relies on communicating with the simulator object. Until a real thread-based interface between the tool and the simulator is defined, this remains the one (very small) change to the PythonDEVS simulator that was made.

4.5.5. Link Editing

As already mentioned the ability to edit the lines connecting nodes in the model, while a desirable feature in a full-scale implementation, was not a priority for this prototype tool. That being said, the "Wire" class which defines how lines appear on the screen is fully ready to accommodate smoothing and other common operations for making better diagrams.


5. A Tale of Two Switches

In this section the performance of a 4x4 crossbar switch is compared with that of an 8-to-4 Knockout switch for two different network traffic patterns.

5.1. Traffic Models

The traffic models have been chosen from basic load categories: UNIFORM and BIASED. The uniform flow simulates the best case scenario where each packet is equally likely to be headed for any given output port. The biased flow is the exact opposite: each packet is destined for a single output port. Although this may seem like an unjust and/or totally unrealistic worst case scenario, it's strikingly typical of the type of flow you might see when, say, a very popular web site is attached to the output port of the switch.

5.2. Stress Testing

Once the models were fully tested and validated, the next task was stress test each model (and the application), comparing the relative performance of each switch. By attaching multiple independent packet generators to the main input of each switch, I was able to throttle the overall load on the system from light (1 packet generator) to heavy (20 packet generators).



The first important observation is that the crossbar switch performs better virtually all the time. This was an expected result: the knockout switch is basically an incomplete, cheaper implementation of a crossbar. Still, the performance gap is surprising. The reason for this is because of the way the switches were modelled. Each node in the switch was assigned a delay time of 1 clock cycle, and each gate was able to process a single event each cycle (with the exception of PacketGates in "Robin" or "Dispatch" mode who were allowed to process as many packets as they had outgoing connections). This timing model may or may not be valid. For example, are the operations performed at a PacketGate node in a crossbar more complex than those simple operations performed inside the Knockout? The only way to tell is to explicitly model these constructs using some form of hardware description language (e.g. VHDL, Verilog) and to then use the updated timings in the model. Falling short of that, it may be possible to reuse pre-existing libraries in which the timings are already defined. In either case, this falls far beyond the scope of this project and defeats many of the initial project requirements (such as using DEVS).

A second note is how the 8-to-4 Knockout switch seems quite unaffected when faced with either a UNIFORM or BIASED traffic flow. After some further research, it turns out that the shared output buffer used by the Knockout switch is what causes this. Recall that the shared buffer puts packets in the various output queues in round robin order, while the output queues are simple components that send out one packet at a time. In reality, there would be a series of packet filters involved in the process that are used to detect which packets are destined for which ports. It was unclear how to model the interaction of these packet filters with the shared output buffer, but from all indications the overhead incurred by such filters is negligible anyhow.


6. Conclusions


7. References