rosanne's site

Random Pieces

My legal name is Ruoqian Liu (刘若茜); I go by Rosanne while in US, where I came to live since 2007.

I'm originally from Wuhan, China.

I live in Chicago downtown ("the Loop") and abosolutely love the city life. (I have had my share of suburb life living in Michigan for 5 years).

I enjoy reading in my leisure time. Owing to the two-hour daily commute between Chicago and Evanston I read most books (distractedly) on the subway. I like moslty nonfiction.

Tricks in Implementing Deep Nets

In working with both Theano and Torch7 to implement deep learning algorithms, we encountered several issues that had pained us. Thought just log them here in case it is helpful to someone who might aslo be bothered by these trivials.

Softmax blowup. Use the softmax function in theano (tensor.nnet.softmax(x)) as it is can cause a blowup problem when combined with unbounded activation funcions like Relu. This then leads to NaN gradients. A note is made in the link above saying "The code of the softmax op is more numeriacaly stable by using this code: e_x = exp(x - x.max(axis=1, keepdims=True)); out = e_x / e_x.sum(axis=1, keepdims=True)", which is a numerically stable version of log-softmax. In our experience this log-softmax should ALWAYS be used.
Learning_rate_decay vs. learning_rate_drop. This two terms can be easily confused at implementation, but they are totally different things. Learning rate decay is the decay of lr on every iteration, that is, every mini-batch. It should be implemented within SGD. Learning rate drop, on the other hand, is the idea of reducing the lr to half every N epochs. It should be implemented outside of SGD.
Weight decay. Both of the above "decays" are not to be confused with weight decay, which serves as a requalization term. See here.

Rosanne Liu

Northwestern Computer Science Ph.D. Student Email: rosanne _at_ northwestern.edu

Random Pieces

Tricks in Implementing Deep Nets