Statement

I have interests in various aspects of artificial intelligence and their applications in science, engineering, and business, although my research work is currently focused on deep learning, representation learning, image recognition and natural language processing, in the application realm of all forms of scientific and social data.


Research Ongoing

My ongoing research (things I want to finish before graduation) centers on the idea of manifold learning, the identification of possibly multiple contexts of data distributions. Just as a collection of Amazon reviews may cover products from different categories, and a house price profile of the past 10 years may include up and down markets, data from different environments may mingle and cause the target learning task to be poorly represented.

The challenge, therefore, is to distinguish those mixtures and use the knowlege both in training and testing. The proposed solution takes ideas from areas including transfer learning, active learning, ensemble learning and representation learning (achieved mostly through deep learning), and is to be validated on a number of scientific knowledge discovery tasks.

My PhD prospectus presentation: "Multi-Contextual Representation and Learning with Applications in Materials Knowledge Discovery" gives a more elaborate explanation of this concept.

Here is a list of ongoing projects.

"Real-life Image Classification with Deep Convolutional Neural Networks"

Large, deep convolutional neural networks are trained to classify about 1 million real-life images downloaded from Pinterest, into 1300+ different classes. Architectures including AlexNet, VGG, GoogLeNet are included in our experiments. We employ topic modeling to group lables into superclasses, the information of which is embeded into the loss function to improve accuracy.

"Electron Backscatter Image Indexing"

The automated indexing of electron backscatter diffraction (EBSD) patterns has seen a rapid evolution, as a major characterization tool for materials science and geoscience. The analysis of EBSD patterns is purely geometrical, connecting the observed shape and orientation of diffraction bands to lattice planes. We develop deep convolutional neural networks that learn from the whole image of EBSD to extract crystallographic information from the pattern, and generalize into a set of Euler angles (real values from 0 to 360), describing the orientation of the crystal lattice at the point of beam incidence.

"Learning from Crystalline Compounds Compositions to Properties"

In materials science, the discovery of new materials relies on searching through hypothetical structures for ones that are stable. The stability of crystalline compound structure is reflected by its formation energy computable from Density Functional Theory (DFT). However, DFT is known to be time-intensive. In this work, we use advanced machine learning to infer functional relationships between the crystal structure of a compound and its DFT-predicted properties. The crystal structure attributes are automatically extracted by learning from standard textual files with deep neural networks. Over 300k compounds available in the Open Quantum Materials Database (OQMD)) are used. The property to be predicted is real-valued between -10 to 5 eV/atom, and we are getting an MAE of less than 0.05.

"News Sentiment Classification and Financial Impact"

Over 2 million titles of Wall Street Journal articles are downloaded to build a classification system that tags them into two classes, based on their impact towards financial market movement.


Research Accomplished
"Structure Optimization of Fe-Ga Alloy"

"Characterizing Localization Relationships in Composites"

"Stability Modeling of Crystalline Compounds"

"Financial Strategy Development Incorporating Wikipedia Activity"