Introduction
The goal of MVPsim (Multi-threaded Virtual Processor Simulation) is to create a simulator that can study two main topics of interest:
- The interactions between software and hardware on a multi-threading machine.
- The effects multi-threading on hardware utilization, mainly:
- Total run length of multi-thread vs. non-multi-thread programs
- IPC at each stage in the super-scalar pipeline
- R - run length of threads (histogram)
- C - time to make a context switch (cost of hardware switch, cost of Pthread switch and frequencies)
- Functional unit utilization
- Effects of different caches on performance (size, associativity, replacement policy, line length, etc.)
- Cache efficiency for various caches
- Number of elements in RUU when switching
The actual processor to be modeled is a "standard" super-scalar processor that is able to use cache-misses to switch between up to four hardware contexts. In this case, a context is defined to be a thread, so the MVP processor can have up to four threads available for its use.
To accomplish the above goals the group decided to construct a new simulator out of two pre-existing programs: SimpleScalar (the execution based simulator) and Pthread (thread library).
Implementation of MVPsim
The MVPsim project has two main stages to complete before the MVP simulator can be finished:
- Combining SimpleScalar and Pthread to create thread capable processor.
- Modifying both SimpleScalar and Pthread to create four thread multi-threading capable processor (MVPsim).
Stage 1: Combining SimpleScalar and Pthread to create thread capable processor
In order to understand the modifications required to use Pthread with SimpleScalar, it is important to understand how both operate.
SimpleScalar operation:
SimpleScalar is a small collection of execution based simulators where each one models a different type or level of detail with a processor and each utilizes the same tool set. The simulator that MVPsim is interested for the project is sim-outorder. The sim-outorder processor simulates a out-of-order processor based upon Gurindar Sohi's RUU design shown below.
To understand to understand the limitations in SimpleScalar 2's (SS2)
sim-outorder,
it is important to first understand how SS2 works. SS2 is an in-order execution
based simulator followed by a trace-based out-of-order timing simulator as
shown to the right. So in other words, the instructions are decoded and instantaneously executed (the register files and main memory are modified). After that, the instructions are placed into the timing simulator where the instructions are moved into execution units and hazards are resolved similarly to execution in a super-scalar processor. It is important to note that no data is handled by the timing simulator, all data and "actual" execution is taken care of by the earlier execution simulator. Another way to look at this is that sim-outorder models an on the fly trace generator that runs the program and passes the instructions (as each is decoded and executed) to a trace-driven timing simulator. In oder to correctly model wrong path execution, the in-order execution portion produces wrong path
instructions called spec_mode instructions for the timing simulator that
do not affect the state of the machine (i.e. except for cache valid/dirty bits, when the execution simulator enters spec_mode, the RF and memory is not modified). This "spec_mode" lasts until the timing simulator indicates that a branch instruction should have executed by now, turns "off" spec_mode, and forces the execution simulator to restart execution down the correct branch.
Since the RF and memory has not been modified since the branch instruction (when spec_mode was initiated), the state of the simulated processor was preserved. This system allows sim-outorder to execute along speculative branch paths and have it not
affect the correctness of the in-order simulation.
The sim-outorder simulator simulates a five stage pipeline (fetch, decode, issue, writeback, and commit) on the above architecture. The five stages are implemented with six functions driven by the sim_main function. These functions include:
| Function
| Simulator
| Purpose
|
|
| commit
| Timing
| This stage is where completed instructions are removed
from the RUU and any store instructions at the head the the RUU are executed. Note, since no values are held in the RUU, none are saved when the instructions are removed.
|
| ruu_release_fu
| Timing
| This small function releases the functional units once instructions are done with them.
|
| ruu_writeback
| Timing
| This stage is where instructions are "completed" in the
timing simulator and data dependencies are updated for instructions in the RUU.
Also, this stage is the location where misspeculated branches are "discovered" by the timing simulator and the processor is reset to operation before spec_mode was declared.
|
| ruu_issue
| Timing
| At this stage, ready instructions are issued from the RUU unit to the functional units. This stage is where instructions can be issued out-of-order based upon which instructions are ready (have all their operands) and which functional units are not busy. During this stage, the loads access the caches and the dirty/valid bit are updated. Note, only the RF and memory hold data values, the data cache access holds addresses for hit/miss timing calculations.
|
| ruu_dispatch
| Execution
| This stage is where new instructions are decoded (executed by
the execution based simulator) and placed into the RUU structures. If room exist in the RUU unit, instructions are removed from the queue filled during ruu_fetch, decoded (and executed), and then placed into the RUU unit. Note, at this point the instructions are executed and placed into the RUU in-order. Also, it at this function where instructions make the transition from the in-order execution based simulator to the out-of-order timing simulator. This function is where the RF and memory are updated by instructions and where miss-predicted branches (sim-outorder can tell a miss-predicted branch since the branch has already been executed) enable spec_mode.
|
| ruu_fetch
| Execution
| This stage is responsible for fetching instructions out of the i-cache and place them into a queue for later decoding.
|
|
To get a better understanding of how an instruction moves within SS2, please observe the following figure:
By choosing the above method of execution, sim-outorder forces three main limitations that are significant for MVPsim:
- No precise interrupts. Using the RUU (essentially a ROB and reservation station combined), instructions are not allowed to modify the RF and memory until they reach the head of the RUU. In that way, when interrupts occur, the PC of the instruction that is at head of the RUU is saved along with the current state of the RF and memory. However, in sim-outorder, that is not the case, since every instruction in the RUU has modified the RF and memory (including some that aren't yet in the RUU). So, since the state of the machine is unknown, precise interrupts cannot occur.
- No OS support. As a direct consequence of a lack of precise interrupts, sim-outorder is incapable of supporting an OS. In order to model system calls required by most programs, SimpleScalar implements an interesting model. When SS2 decodes a system call in a simulated program, SS2 passes the syscall to the host machine's OS for it to handle the syscall. As shown to the right,
the model is of the SimpleScalar executable code (usually part of the
SPEC95 benchmark) running on the SimpleScalar simulator, which in turn
runs on the host processor/OS. The arrows represent the passing of the syscall to the host machine and back to the simulated code.
- No signals or timer support. Without precise interrupts, SS2 cannot support signals which generate interrupts to notify the processor of their generation. Also, OS timers are not supported. Not only do timers not work without signals (which are generated when a timer goes off), but are also limited by the syscall model shown to the right. If timers (created by syscalls) are sent out to host machine OS, they will run in regards to the host machines processor not the simulated machines processor. For example, imagine that a timer is set by simulated code to "ring" in approximately 1,000 simulated processor cycles. SS2 will set up that timer to run on the host processor for 1,000 cycles and return a signal (lets assume that signals work for the moment) 1,000 host cycles later. However, the host machine is a busy machine, it has to deal with an OS and several other tasks (besides the simulator) demanding processor attention. So after 1,000 host cycles, the signal comes back to the simulated processor only after 10 simulated cycles, 100 times faster then it should!
Pthread operation:
Pthread is a user library implementation of threads, or in other words, it is a collection of libraries linked to a user program that allows a user program to declare and run threads. To use Pthread, a programmer must first create the threads by using the pthread_create function. "pthread_create" essentially tells Pthread where to find the user defined threads for future use. Since the threads are user definable, any function the user designates can be a thread. Next, the programmer needs to start the threads running by using the pthread_join command. Once called, Pthread takes over the running of the threads until all are finished and returns control back to the main program. Pthread runs one thread at a time in a round-robin fashion and switches between threads in the following ways:
- When the current thread ends its execution, it calls the function pthread_exit. This function deletes the thread from operation and turns control to the pthread scheduler whereby either a different thread is started or control is returned to the main program if there exists no more threads to run.
- As each thread is executed, Pthread starts an OS timer. If the timer activates before the thread is finished executing, an interrupt is generated. At this time the state of the thread is saved, the pthread scheduler called, and a different thread is executed (providing, of course, that there are more threads to run).
- Since Pthread was created to hide I/O latency, Pthread also switches threads when it detects a thread making a long I/O request. The thread switching process works the same as a timer switch above.
The implementation of pthread is rather complicated and relies heavily on OS support. When a process runs, the OS sets up a location for the potential state of the process on the OS stack. As shown below, when an interrupt occurs, the state of the process is saved onto the stack, increments the stack pointer and the processor runs the interrupt handling routine. When the routine finishes and returns from the interrupt, the OS pops the process's state (the one above the stack pointer) back off the stack, decrements the stack pointer, and the process returns to execution.

In order to benefit from this aspect of the OS, pthread builds its threads in a creative way. When the pthread_create function is called, Pthread allocates space on the OS stack for each thread. The stack pointer for each thread and a pointer to the threads start is saved in a queue by Pthread (called prio_queue). Then, when a thread is to start execution, pthread sets the stack pointer to the start of the thread, sets up the OS timer, removes the thread from th queue, and calls the start routine of the thread. The new OS stack is shown to the right. Once a timer interrupt is generated the following steps occur.
- The OS recognizes the interrupt and saves the thread onto the space on the OS stack given by the stack pointer.
- The OS loads the interrupt handler (sig_handler_real in pthread)
- sig_handler_real eventually runs the pthread scheduling routines
- The currently running thread is returned to the queue and a new thread selected and removed from the queue (pthread supports a round-robin style of scheduling).
- To start the new thread running, the stack pointer is set to above the new thread to run and pthread returns from the interrupt.
- The OS restores the state of the processor using the values immediately below the stack pointer (the new thread) and the new thread begins executing.
Once all threads are finished running, pthread returns to the main program by acting similarly to a simple return from the pthread_join function call.
What pthread requires in order to run on a simulator, is three things:
- A OS stack to save and restore states of the threads.
- A OS timer to generate an interrupt to switch between threads.
- Precise timer interrupts to allow the correct interfacing between the hardware, OS, and Pthread.
Combining Pthread and SimpleScalar:
In order to port Pthread to SS, three functional capabilities must be added to SS2:
- Timers
- Precise interrupts
- OS stack
To implement timers, SS2 syscall.c was modified to detect a SETITIMER syscall. Upon detection of the syscall the timer value is removed and stored in a global timer variable. To decrement the timer, code was modified in the sim_main function to decrement the timer every clock cycle.
Since Timer interrupts are those that are asynchronous to the currently executing instruction stream, it was easy to modify SS2 to handle them. The timer interrupts are checked at the beginning of ruu_fetch in sim-outorder.c. This check is just a comparison with the global timer variable. It is important to note that interrupts cannot occur while the processor is in speculative mode
because the entire interrupt handler would be run in speculative mode. To start executing the interrupt handler, ruu_fetch calls the SS2 interrupt handler (interrupt_fetch) which switches the current PC to that of the interrupt vector. This PC swapping method is acceptable since the RUU does not need to be
drained before servicing the interrupt. This is mostly because the processor does not have a supervisor mode with multiple levels of memory
protection.
To implement an OS stack into SS2, a separate file, signal.c, was created. Signal.c create and then maintains the OS stack. When interrupt_fetch is called, (to process a detected interrupt), it interfaces with the OS stack code to save the current state of the machine, increment the "stack pointer" and return the interrupt handler PC. Upon a return from and interrupt, the code drops back into interrupt fetch, which decrements the stack pointer and returns the state of the machine. Last, the interrupt fetch routine returns the saved PC back into the normal SS2 program.
After these changes, Pthread was tested on SS2 and found to work normally. Next, the MVPsim group moved on to creating the actual MVPsim.
Stage 2: Creating MVPsim
Essentially, MVPsim models the effect of multi-threading using multiple hardware contexts. In the MVP group case, the hardware contexts are simply register banks. When it is required to switch from one context to another, MVPsim deactivates one register bank and activates another. The processor then fetches on the new register banks PC. The processor also has to drain the RUU unit and flush the pipeline, incurring the same steps and penalties seen by branch prediction. To tell when to switch between contexts MVPsim uses to mechanisms:
- Cache miss event: Upon detection of a cache miss, in order to avoid the main memory latency, MVPsim will switch to a new context.
- Timer interrupt: Since it is necessary to be able to remove threads out of the hardware and replace them with other threads (in the case of barrier syncs and inter-thread communication), MVPsim expands on the Pthread timer. The timer interrupt was lengthened to allow all threads in the hardware contexts and chance to run. Then, when the timer interrupts, a thread is removed and another added in its place. Thus, fairness is promoted and deadlocks prevented.
To create MVPsim, the MVP group has to go through 3 sub stages. These stages include:
- Creating the execution model. In order to govern how the MVP group wanted the threads to respond in a multiple context environment, the group created a model to mirror the flow of control through MVPsim.
- Building the hardware-software interactions. After the execution model, the MVP group began defining the ways in which the hardware and software needed to communicate and interact with each other.
- Implementing MVPsim. The last step, and the one that the MVP group is still attempting, is to modify Pthread and SS2, to support the multiple contexts and they related hardware and software interactions that they incur.
The Execution Model:
The execution model for MVPsim is shown to the left:
- Thread Mode/Non-Thread Mode
- The two large boxes on the execution model show the states contained within thread mode and non-thread mode. When thread Mode is set, the hardware allows a context switch on a cache miss event while non-thread mode disallows any such action. Essentially, non-thread mode is declared whenever the main program is running (no threads to run so why context switch?) and when threads are being created or scheduled (a context switch would result in data corruption). Any other time that the hardware has threads, a context switch is desired on a cache miss event. In this way, thread mode can alert the hardware that is has threads and doubles as a locking variable in the event that threads are being scheduled.
- States
- Main - This is the state that execution is currently running on the main user program. The execution model starts and ends in this state as Pthread will eventually return control back to the main program when the threads are done executing. Threads may have been created, but no thread can be actively running in a hardware context when execution is in this state.
- Create Threads - This state is responsible for the creation of the thread, prio_queue, and OS stack.
- Schedule New Threads - This state is responsible for scheduling and placing threads into a hardware context and contains the Pthread scheduler code. The scheduler selects a thread from the prio_queue and places the thread into a hardware context. If the current context is occupied, the state removes the thread from the context and returns it to the prio_queue before scheduling a new thread.
- Running Threads - In this state, the processor is running a thread located in a context.
- Context Switch - In this state, the processor is switching between contexts. Again, a context switch is simply a change of register banks and a pipeline flush.
- Wait - In this state the processor is waiting for a thread to become runnable (ie. return from a main memory access). It is possible for all threads to be waiting for a main memory access, in this case the processor is forced to wait for a return from memory.
- Transitions
- If the newly created threads have low priority
- Create new threads (pthread_create function call)
- If the newly created threads have high priority
- Start threads (pthread_join function call)
- No threads left to run
- Move scheduled thread into hardware
- Context empty and new threads available or timer interrupt and new threads available
- Cache miss event or timer interrupt
- No new threads or contexts are full and a context is ready
- Return from cache miss
- No new threads and no thread is ready or context is full and no thread is ready
The Hardware-Software Interactions:
To understand the interactions between pthread, the OS, and the hardware on the MVPsim simulation, refer to the above figure. The MVPsim processor contains two hardware contexts, in other words, two sets of register banks enabling the processor to support up to two threads at once (In actuality MVPsim will support more contexts, but has been simplified for the explanation). Each context has a valid bit to let the processor know if it is allowed to fetch instructions for the context and a ready bit to denote if the thread is ready to execute of waiting for a cache miss to return. Note, at least one context must be valid at all times or the processor will not fetch any instructions. The hardware also has a Thread Mode bit and a More Threads bit. When set, the Thread Mode bit allows the processor to switch between the contexts and prevents the switch when not set. The More Threads bit, indicates whether there are threads in the Pthread prio_queue.
The OS section consists solely of the OS stack that Pthread relies on to save the state of a thread on a timer interrupt as explained previously.
The Pthread section consists of a prio_queue and the pthread_run pointer. The prio_queue is where Pthread stores all non-completed threads that are not currently executing. It is from this queue that the Pthread scheduler chooses the next thread to run. The pthread_run variable holds a pointer to the thread that is currently being executed. In the MVPsim case, pthread_run holds a pointer to the thread executing in the context that needs to be swapped with a non-executing thread (the currently running context).
Before an example execution can be presented it is helpful to define how MVPsim accomplishes some processes:
- How a context switch occurs:
- A context switch is when the hardware moves from one context (H0) to another (H1). Since a context is just a register bank the process (assuming multiple register banks are available) is a simple process of turning off one bank and turning on another. The processor itself will have to flush the ROB (RUU) and the fetch queue making the entire process very similar to recovering from a miss-predicted branch and requires the same penalty.
- How the OS saves the state:
- From the fact that each context is a register bank, every context then knows its own OS stack location (the pointer is stored in a specific register). When an interrupt occurs, the OS reads stack pointer and places the state information at the appropriate location. The stack pointer is initially set up by Pthread during the OS stack creation and assigned to the thread during the pthread_create function call.
- How Pthread knows what thread is being removed from the hardware context:
- Pthread can tell what thread is being removed by reading a pointer stored in a register in a context. Upon calling the Pthread scheduler, the value of this register is assigned to the pthread_run variable. The pointer is to a pthread (the thread information that is stored in the prio_queue). Implemented this way, Pthread can find exactly which thread is in the current context.
To gain an understanding into the hardware, OS, and Pthread relationship of MVPsim, the following three thread example is presented:
- To begin, the main program runs in one of the hardware contexts (assume H0 for the example). At this point, the context bit is 1 for H0 and 0 for H1. Also, both Thread Mode and More Threads are initialized to 0. Any cache misses and timer interrupts that occur will not result in a context switch.
- Eventually main will call pthread_create. "pthread_create" sets up the OS stack and builds the prio_queue shown in the above figure. Also, pthread_create sets More Threads to 1, since there are now threads in the prio_queue.
- To start threads executing, main will call pthread_join which in turn starts the pthread scheduler. Since this is the first time a thread is run, the scheduler goes through the following steps.
- A thread (T0) is de-queued from the prio_queue and placed in pthread_run.
- The OS stack pointer is set to before T0 on the stack. It is not set to after T0 because there is no interrupt to return from yet.
- An OS timer for the thread is started.
- The pointer to pthread_run is saved in H0.
- The PC is set to the starting routine for T0, Thread mode is set to 1, and T0 begins executing on H0.
- When a cache miss occurs, the processor sets the H0 ready bit to 0, switches to H1, and (since H1 is not a valid context and More Threads is 1) sets H1 context valid to 1 and runs the interrupt handler in H1. The interrupt handler sets Thread Mode to 0 and starts the Pthread scheduler.
- The H0 ready bit will remain 0 until the cache miss returns and then will be set back to 1.
- Since now the Pthread scheduler must return from an interrupt and since this is the first time T1 has been run, the scheduler performs the same steps as before except for the following:
- The OS stack pointer is set to after T1 (now Pthread will mimic a return from an interrupt).
- Instead of immediately executing T1, Pthread returns from an interrupt.
- After the Pthread scheduler returns, the OS removes the state for T1 (in this case it is just zeroes), and starts executing at the PC (T1's start routine).
- Now, since Thread Mode is set and both contexts are valid, a cache miss will result in the hardware context switch occurring between H0 and H1. At this point neither Pthread or the OS is called or used.
- If at any point the ready bits on both H0 and H1 are 0, the processor will wait until a cache miss returns.
- When the OS timer goes off and there are more threads to run, the hardware chooses a thread to swap out (currently done in a round-robin fashion, so T0 will be chosen), switches to that thread's context (H0), and calls the interrupt handler.
- The interrupt handler sets Thread Mode to 0, saves the state onto the OS stack and calls the Pthread scheduler.
- The Pthread scheduler removes the running thread from Pthread_run and puts it back on the prio_queue and chooses a new thread to run. The scheduler follows the standard return from interrupt steps given above, but if the thread is not being started for the first time, the PC is not over written to the thread's start routine. Note, if the scheduler removes the last thread from the prio_queue More Threads is set to 0.
- When a thread completes, it will call the pthread_exit function. This function sets Thread Mode to 0, then will do one of the following based upon current conditions in the processor.
- If More Threads is 1, pthread_exit will call the scheduler and execution will proceed normally.
- If More Threads is 0, pthread_exit will set the context valid bit to 0.
- Once the processor detects that the current context is no longer valid it will do one of the following:
- If there is at least one valid context (i.e. there are still threads in other contexts), the processor will switch to the valid context and stop using the now invalid context (provided that More Threads remains 0).
- If there are no valid contexts, the processor will re-validate the context and pthread_exit continues execution.
- pthread_exit will call the scheduler and since the prio_queue is empty, Pthread will return back to the main function and main continues with normal operation (Thread Mode is 0) until it exits.
The Implementation of MVPsim:
The implementation of MVPsim can be accomplished in two rather large steps:
- Implementing Cache miss events
- Implementing multiple hardware contexts
The addition of a instruction that causes a context switch on a cache switch is a rather difficult process
for a variety of reasons, all related to the SimpleScalar2 implementation.
First, there is only one state of the processor stored in SimpleScalar2 and
no other state can be constructed without moving forward in the program.
In other words, you can't undo anything unless you were in spec_mode when
you did it, hence, you didn't really do it in the first place. So the bad
news is that by the time you find out that the instruction is faulting, the
instruction has completed (in the execution simulator) and you can't undo
that instruction or redo the instruction (consider ADD $1,$2,$1).
With the limitations in mind, the solution to an context switch instruction is
clear, if somewhat tricky to actually implement. Whenever an context switch causing
instruction is decoded, the cache miss must also be detected, and the processor
put into speculative mode until it can be determined (in the timing simulator)
that the cache miss has actually occurred, so the interrupt vector can be called.
To show how this can be done in practice, below is the process that you would
need to go through to implement an event on a cache miss that drains
the RUU before calling the interrupt vector. Remember that the RUU cannot
be flushed, since instructions before and including the faulting instruction
must be allowed to complete for the program to be correct.
There are four areas that need to be modified (along with the addition of
the interrupt handlers) in sim-outorder.c to support this behavior. Also,
an additional register needs to be allocated to control when the interrupt
is enabled, and actually set the interrupt itself. Sim-outorder is divided into
several distinct sections, each with a specific purpose that mirrors the
sections of a superscalar processor. The sections are shown below, with
their purpose, and a brief summary of what we added to each section.
| Section
| Purpose
| Changes
|
|
| commit
| This is where completed instructions are removed
from the RUU. Also, stores are executed here as well.
| If the instruction was a faulting load, then
leave spec_mode, and call the interrupt handler.
|
| ruu_release_fu
| This is a small function to release the functional units
| No changes necessary
|
| ruu_writeback
| This is where instructions are "completed" in the
timing simulator.
This is also the location where misspeculated branches are "discovered"
by the timing simulator.
| Minor change to prevent a faulting load from
being mistaken for a mispredicted branch. Also, disallow a misspredicted branch from leaving spec_mode when a faulting load has occurred.
|
| ruu_issue
| This is where instructions are sent to
functional units to be
"executed". Timing for the instructions (including loads) are computed
here.
| We modify the timing assignment for a
faulting load to be the time
it takes to discover that the L2 cache misses instead of the time to
access main memory.
|
| ruu_decode
| This is where new instructions are decoded (executed in
the execution based simulator) and placed into the RUU structures. This function also uses the executed branches to determine if the simulator should enter spec_mode.
| For all loads (when cache miss interrupts are
enabled) we check
to see if the load will miss using the cache_probe()
function. If
it misses, we go into speculative mode.
|
| ruu_fetch
| This stage is where instructions are fetched in from the I-caches or main memory and placed into a queue for later decoding by ruu_dispatch. Branches are predicted using a two-bit BTB, but the simulator does not enter spec_mode until ruu_dispatch, where the branches are executed. With this stratagem, the processor knows exactly when the predicted value does not equal the actual value and ruu_dispatch can move the processor into spec_mode.
| No changes necessary.
|
|
Implementing multiple hardware contexts is another surprisingly difficult procedure and it is this stage that the MVP group is currently trying to implement. Most of the problems here revolve around the fact that adding multiple hardware contexts require extensive modifications to both Pthread and SS2, and no way to test smaller implementation steps. This all or nothing approach has made this step a nightmare to debug.
Conclusions
In conclusion it was hoped that the project could be completed and results generated within a term. Alas, it was not to be.
However, the MVP group now has an extensive knowledge of both Pthread and SS2 so that the continuing of the project should go smoother. Also, the group obtained greater knowledge in the realm of multi-threading and has made several future recommendations:
- In regards to multi-threading:
- Use a thread package that is either fully implemented in the OS or fully implemented in user libraries. The MIT implementation of Pthread is a hybrid of both OS and user library control. This hybrid model slows down execution unnecessarily as instead of two separate instruction groups (thread-OS or thread-libs) there are three separate instruction groups that must reside in the caches (thread-OS-libs).
- Use dedicated hardware interrupts for each context. When designing hardware specifically for multiple hardware contexts, communication between software and hardware can be greatly facilitated by dedicating a interrupt to each context. In that case, the software can easily understand exactly which context is the one interrupting.
- In regards to simulators:
- Use a fully execution based simulator. Most of the problems that the MVP group has encountered so far have been direct consequences of not having a execution based simulator.
- Use a simulator with at least a partial OS implementation. On the positive side, since SS2 implemented many syscalls to begin with it helped the MVP group when creating the mini-OS stack.
- Use a simulator that is easy to understand and modify. Another serious time consuming problem problem with SS2 was the lack of documentation and ease of change.
- In regards to thread implementations:
- Use a thread implementation that is well documented. Much time was wasted by trying to figure out how Pthread worked.
- Use a thread implementation that can support parallel threads. Part of the current problem with creating multiple contexts is that Pthread did not in any way support parallelly executing threads.
All in all this was a rather difficult and time consuming project, but hopefully in the near future MVPsim can be used to collect and detail valuable data and results on multi-threading.