Instruction-Level Timing Error Prediction Model
for 5-stage Pipelined ALPHA Processor

PROJECT INFO

This is a course project in EECS 349 Machine Learning, advised by Prof. Bryan Pardo.

AUTHORS

Yuanbo Fan, Chao Yan
Northwestern University

ABSTRACT

In this project, we build an instruction-level timing error prediction model for a 5-stage pipelined ALPHA processor which supports timing speculation. This model predicts if an instruction produces timing errors given the information of instruction sequence and data usage.

1. INTRODUCTION

Timing error is defined as a violation in circuit-level timing constraints during program execution. For example, if the timing constraint of an instruction is one ns, but it takes more than one ns to execute the instruction, there is an timing error.

In traditional processors, timing errors may cause catostrophic system failure. Processors which support timing speculation are augmented with timing-error detection and correction techniques so that they are able to recover from timing errors. However, the recovery cost is very high.

Based on some existing work, for a specific processor, timing error is a strong function of programs executed on the processor. Therefore, an effective timing error prediction model can be built based on the information fo how the programs are executed on the processor. As a result, the compiler can apply this model to generate codes that has lower probability of producing timing errors by using instruction scheduling and instruction selection. Moreover, if the model is implemented in the hardware, it can predict timing errors at run-time. With some padding techniques, it can avoid the full recovery cost due to timing errors.

In this project, we use machine learning algorithms to build an instruction-level timing error prediction model for 5-stage pipelined ALPHA processor which supports timing speculation. This model predicts if an instruction has timing errors given the information of instruction sequence and data usage.

2. LEARNING PROCESS

2.1 DATASET

We collect ten instruction-level test programs to build up ten test sets. In each set, we pick nine test programs for training and the other one for testing. Then, the programs are executed on a 5-stage pipelined ALPHA processor. During the execution, we collect the information for each instruction including the opcode, two operand values and some control signals which indicate the data dependence and branch information. Also, a threshold delay is set, so we can label each instruction by if it produces a timing error.

Table 2.1: Description of Test Programs
NO. Name Description
1 copy copy memory contents of N elements
2 objsort object sorting algorithm
3 parsort parsort algorithm
4 fib compute nth fibonacci numbers
5 fib_rec compute nth fibonacci number recursively
6 evens compute even numbers that are less than n
7 insertion insertion sorting
8 parallel compute first n multiples of m
9 sort bubble sorting
10 saxpy integer SAXPY

Figure 2.1 An Example of Test Programs

Input instance of the model consists of several parts, including the current instruction and N-1 preceding instructions. N is defined as the instruction window size which means the number of instructions affect that if the current instruction produces a timing error. For each operand value used in the instructions, we use both number and bit representation, because we think different data representation will result in different models, especially while using Multilayer Perceptron algorithm.

Table 2.2: An example of collected information
Opcode & Two Operands Control Signals Timing Error
ldq r3 0 0100 True
ldq r4 0 0001 False
addq r1 r2 0010 False
addq 0x8 r3 0010 True
addq 0x8 r4 0010 False
addq 0x1 r9 0010 True
cmple 0xff r9 0100 False
... ... ...

Table 2.4 shows an example of instances when the instruction window size is 1. Each opcode is encoded into a number, and operand values are in the format of number representation. There are three numbers (c0, c1 and c2) representing control signals and a binary number which indicates if it is a taken branch. The last number represents if there is a timing error.

Table 2.3: An example of instruction sequence
Preceding Instruction Current Instruction Timing Error
ldq r3 0 ldq r4 0 False
ldq r4 0 addq r1 r2 False
addq r1 r2 addq 0x8 r3 True
addq 0x8 r3 addq 0x08 r4 False
addq 0x8 r4 addq 0x1 r9 True
addq 0x1 r9 cmple 0xff r9 False
... ... ...

Table 2.4: An example of instances
Opcode ra rb c0 c1 c2 Branch Error
8 0 0 1 0 0 0 0
8 4095 0 1 0 0 0 0
8 4096 0 1 0 0 0 0
19 0 10 0 1 0 0 0
45 0 4096 1 3 2 0 0
41 0 4096 1 0 0 0 1
45 256 4096 1 0 1 0 0
16 4096 8 0 1 0 0 0
16 0 1 0 1 0 0 0
16 1 4095 5 0 0 0 0
61 44 4294967264 2 2 2 1 1
19 1 10 0 1 0 0 0
........................

2.2 Learning Algorithms

  • J48 Decision Tree
  • Multilayer Perceptron

2.3 Software Packages

  • Synopsys tools
  • WEKA software packages

2.4 Validation

Each test set contains one training set and one testing set. There are nine programs in the training set and one program in the testing set. We validate the models by using 10-fold cross validation on the training set, and then validate the models learned by the training set using testing set. The 10-fold cross validation on the training set show that how the model performs for programs which are similar to the training programs, while the validation using the testing set represents how the model performs for new programs.

3. RESULTS & ANALYSIS

3.1 Instruction Window Size

Figure 3.1 Correct Classification Rate vs. Instruction Window Size

We measure the correct classification rate of models with different instruction window size (1, 2 & 3). Instruction window size of 2 gives the highest classification rate, as shown in Figure 3.1.

3.2 Time Consumption

Figure 3.2 Time taken to build a model

Figure 3.2 shows that the time taken to build the model using J48 Decision Tree and Multilayer Perceptron algorithms. Building model using MLP clearly takes much more time.

3.3 J48 Decision Tree vs. Multilayer Perceptron

Figure 3.3. J48 and MLP with different data representation

When we use different data representaion to build the model, the Decision Tree model always has better prediction results. However, there is an improvement for Multilayer Perceptron model when using bit representation.

Figure 3.4. J48 and MLP with different validation

When we do 10-fold cross validation using training sets, both models show very high classification rate. However, once we validate the models using testing sets, the Decision Tree model has a significant decrease in performance, while the Multilayer Perceptron model still has a relative high rate, as shown in Figure 3.4.

Therefore, while the Decision Tree model has better performance for programs which are similar to the training programs, the Multilayer Perceptron model is more robust to new programs.

3.4 Classification of Timing Errors

Figure 3.5. MLP with different program error rates

We believe that the timing errors can be classified into two categories. One category is strongly linked to the operations, for example, addition, subtraction, load or store, while the other one is related to the data used in the operation.

As the data representation of inputs have a strong effect on the performance of Multilayer Perceptron model, we can tell the type of timing errors by comparing the performance of Multilayer Perceptron model with different data representation. In Figure 3.5, the left figure shows the correct classification rate of Multilayer Perceptron model when the error rate of the programs is 10\%, while the right one shows when the error rate is 20\%.

As the figures above show, the Multilayer Perceptron model with input data of bit representation performs much better when the error rate of the programs is 20\%. Therefore, we can learn that, when the error rate of programs changes from 10\% to 20\%, a large portion of new timing errors are strongly related to the data usage of the programs.

4. CONCLUSIONS

  • Both algorithms achieve good prediction results in the experiments.

  • Each algorithm has its own advantages and disadvantages.

    • Building J48 Decision Tree model is much faster than building Multilayer Perceptron model, especially when the input dimension is large.

    • J48 Decision Tree model performs better when predicting programs which are similar to the training programs.

    • Multilayer Perceptron model is more robust to the new programs.

  • The Decision Tree model is much easiler to interpret, and we can easily learn the critical input bits by directly studying the nodes of the tree, while the Multilayer Perceptron model is like a blackbox.

  • Moreover, machine learning algorithms are also effective to learn interesting charateristics of timing errors.

REFERENCES

[1] J. Xin and R. Joseph. Identifying and predicting timing critical instructions to boost timing speculation. In MICRO-44'11, pages 128-129, New York, NY, USA, 2011, ACM.

[2] D.~Ernst, N.~S. Kim, S.~Das, S.~Pant, R.~Rao, T.~Pham, C.~Ziesler, D.~Blaauw, T.~Austin, K.~Flautner, and T.~Mudge. Razor: a low-power pipeline based on circuit-level timing speculation. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 7--18, 2003.

[3] D.~Blaauw, S.~Kalaiselvan, K.~Lai, W.-H. Ma, S.~Pant, C.~Tokunaga, S.~Das, and D.~Bull. Razor ii: In situ error detection and correction for pvt and ser tolerance. In Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, pages 400--622, 2008.

[4] J.~Xin and R.~Joseph. Exploiting locality to improve circuit-level timing speculation. Computer Architecture Letters, 8(2):40--43, 2009.

[5] T.~Austin, V.~Bertacco, D.~Blaauw, and T.~Mudge. Opportunities and challenges for better than worst-case design. In Proceedings of the 2005 Asia and South Pacific Design Automation Conference, ASP-DAC '05, pages 2--7, New York, NY, USA, 2005. ACM.

DOWNLOAD

REPORT:

report.pdf

POSTER:

poster.pdf

CONTACTS

Yuanbo Fan

yuanbo@u.northwestern.edu

Chao Yan

chaoyan2012@u.northwestern.edu