Michael R. Lucas

   I am a Ph.D. candidate in the Northwestern University EECS Department , working with Professor Doug Downey since 2011 as a member of the WebSAIL research lab. Previously, I researched Computational Complexity and Mechanism Design (algorithm-focused auction design) as a member of the Economics and Theory Group from 2009-2010 with Professor Lance Fortnow.

    My research interests are focused on designing and scaling Machine Learning methods for use on enormous datasets, specifically in the fields of Semi-Supervised Learning, Natural Language Processing, and Bayesian Models.


   I completed my undergraduate degree in Computer Science at The University of Dayton in 2008. In 2007, I interned at GE Aviation in Evendale, Ohio, where I automated systems for the detection of production delays in the Supply Chain Management department.

   Non-graduate research experience includes a summer research internship at UIC working in Distributed Machine Learning with Dr. Robert Grossman and Dr. Yunhong Gu. I designed and implemented large-scale clustering algorithms to be run on their UDT Protocol. And least recently of all, I was a research assistant for Dr. Kathryn Fischbach at UTHSCSA's Alzheimer's research lab.


[PDF] Michael R. Lucas and Doug Downey. "Semi-supervised Naive Bayes with Feature Marginals," The 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 4-9, 2013.

Summary: When labeled data is scarce or expensive to collect, Semi-supervised learning (SSL) methods that utilize unlabeled datasets can outperform standard machine learning algorithms. In this paper, we propose a scalable SSL improvement to the classic Naive Bayes Classifier.
Modern SSL techniques typically require multiple passes over the unlabeled data, which is often impossible on the web-scale corpora being produced today. In this paper, we show that improving baseline estimates of word frequencies using unlabeled data can improve Naive Bayes Classifiers while scaling to modern massive data sets.
In experiments with text topic classification and sentiment analysis, we show that our method is both more scalable and more accurate than SSL techniques from previous work.


Conference Experience

  • Program Committee: IJCAI 2016.
  • Program Committee: ACL 2014.
  • Program Committee: NAACL-HLT 2013.
  • Program Committee: EMNLP-CoNLL 2012.

Teaching Assistantships

  • EECS 101 - An Intro to Computer Science for Everyone. Fall 2015.
  • EECS 348 - Artificial Intelligence. Spring 2014.
  • EECS 310 - Data Structures. Fall 2010.


Northwestern University
Evanston, IL 60208, USA