PROJECT INFOThis is a course project in EECS 349 Machine Learning, advised by Prof. Bryan Pardo. AUTHORSYuanbo Fan, Chao Yan ABSTRACTIn this project, we build an instruction-level timing error prediction model for a 5-stage pipelined ALPHA processor which supports timing speculation. This model predicts if an instruction produces timing errors given the information of instruction sequence and data usage. 1. INTRODUCTIONTiming error is defined as a violation in circuit-level timing constraints during program execution. For example, if the timing constraint of an instruction is one ns, but it takes more than one ns to execute the instruction, there is an timing error. In traditional processors, timing errors may cause catostrophic system failure. Processors which support timing speculation are augmented with timing-error detection and correction techniques so that they are able to recover from timing errors. However, the recovery cost is very high. Based on some existing work, for a specific processor, timing error is a strong function of programs executed on the processor. Therefore, an effective timing error prediction model can be built based on the information fo how the programs are executed on the processor. As a result, the compiler can apply this model to generate codes that has lower probability of producing timing errors by using instruction scheduling and instruction selection. Moreover, if the model is implemented in the hardware, it can predict timing errors at run-time. With some padding techniques, it can avoid the full recovery cost due to timing errors. In this project, we use machine learning algorithms to build an instruction-level timing error prediction model for 5-stage pipelined ALPHA processor which supports timing speculation. This model predicts if an instruction has timing errors given the information of instruction sequence and data usage. 2. LEARNING PROCESS2.1 DATASETWe collect ten instruction-level test programs to build up ten test sets. In each set, we pick nine test programs for training and the other one for testing. Then, the programs are executed on a 5-stage pipelined ALPHA processor. During the execution, we collect the information for each instruction including the opcode, two operand values and some control signals which indicate the data dependence and branch information. Also, a threshold delay is set, so we can label each instruction by if it produces a timing error.
Input instance of the model consists of several parts, including the current instruction and N-1 preceding instructions. N is defined as the instruction window size which means the number of instructions affect that if the current instruction produces a timing error. For each operand value used in the instructions, we use both number and bit representation, because we think different data representation will result in different models, especially while using Multilayer Perceptron algorithm.
Table 2.4 shows an example of instances when the instruction window size is 1. Each opcode is encoded into a number, and operand values are in the format of number representation. There are three numbers (c0, c1 and c2) representing control signals and a binary number which indicates if it is a taken branch. The last number represents if there is a timing error.
2.2 Learning Algorithms
2.3 Software Packages
2.4 ValidationEach test set contains one training set and one testing set. There are nine programs in the training set and one program in the testing set. We validate the models by using 10-fold cross validation on the training set, and then validate the models learned by the training set using testing set. The 10-fold cross validation on the training set show that how the model performs for programs which are similar to the training programs, while the validation using the testing set represents how the model performs for new programs. 3. RESULTS & ANALYSIS3.1 Instruction Window SizeWe measure the correct classification rate of models with different instruction window size (1, 2 & 3). Instruction window size of 2 gives the highest classification rate, as shown in Figure 3.1. 3.2 Time ConsumptionFigure 3.2 shows that the time taken to build the model using J48 Decision Tree and Multilayer Perceptron algorithms. Building model using MLP clearly takes much more time. 3.3 J48 Decision Tree vs. Multilayer PerceptronWhen we use different data representaion to build the model, the Decision Tree model always has better prediction results. However, there is an improvement for Multilayer Perceptron model when using bit representation. When we do 10-fold cross validation using training sets, both models show very high classification rate. However, once we validate the models using testing sets, the Decision Tree model has a significant decrease in performance, while the Multilayer Perceptron model still has a relative high rate, as shown in Figure 3.4. Therefore, while the Decision Tree model has better performance for programs which are similar to the training programs, the Multilayer Perceptron model is more robust to new programs. 3.4 Classification of Timing ErrorsWe believe that the timing errors can be classified into two categories. One category is strongly linked to the operations, for example, addition, subtraction, load or store, while the other one is related to the data used in the operation. As the data representation of inputs have a strong effect on the performance of Multilayer Perceptron model, we can tell the type of timing errors by comparing the performance of Multilayer Perceptron model with different data representation. In Figure 3.5, the left figure shows the correct classification rate of Multilayer Perceptron model when the error rate of the programs is 10\%, while the right one shows when the error rate is 20\%. As the figures above show, the Multilayer Perceptron model with input data of bit representation performs much better when the error rate of the programs is 20\%. Therefore, we can learn that, when the error rate of programs changes from 10\% to 20\%, a large portion of new timing errors are strongly related to the data usage of the programs. 4. CONCLUSIONS
REFERENCES[1] J. Xin and R. Joseph. Identifying and predicting timing critical instructions to boost timing speculation. In MICRO-44'11, pages 128-129, New York, NY, USA, 2011, ACM. [2] D.~Ernst, N.~S. Kim, S.~Das, S.~Pant, R.~Rao, T.~Pham, C.~Ziesler, D.~Blaauw, T.~Austin, K.~Flautner, and T.~Mudge. Razor: a low-power pipeline based on circuit-level timing speculation. In Microarchitecture, 2003. MICRO-36. Proceedings. 36th Annual IEEE/ACM International Symposium on, pages 7--18, 2003. [3] D.~Blaauw, S.~Kalaiselvan, K.~Lai, W.-H. Ma, S.~Pant, C.~Tokunaga, S.~Das, and D.~Bull. Razor ii: In situ error detection and correction for pvt and ser tolerance. In Solid-State Circuits Conference, 2008. ISSCC 2008. Digest of Technical Papers. IEEE International, pages 400--622, 2008. [4] J.~Xin and R.~Joseph. Exploiting locality to improve circuit-level timing speculation. Computer Architecture Letters, 8(2):40--43, 2009. [5] T.~Austin, V.~Bertacco, D.~Blaauw, and T.~Mudge. Opportunities and challenges for better than worst-case design. In Proceedings of the 2005 Asia and South Pacific Design Automation Conference, ASP-DAC '05, pages 2--7, New York, NY, USA, 2005. ACM. CONTACTSYuanbo FanChao Yan |