Liang Huang

Assistant Professor, Computer Science (AI, Data Science, and Health Engineering Groups)
School of EECS, Oregon State University (official profile)
Affiliated Faculty, Center for Genome Research and Biocomputing (CGRB)

Research Areas: NLP (parsing, translation, generation), ML (structured prediction, deep learning), compbio (RNA/protein folding).
Research Focus: efficient algorithms and provable theory in the above areas.

  • 2017-06: Ph.D. student Mingbo Ma participated in the WMT 2017 competition on multimodal translation, and achieved the best TER score among 15 systems on the English+Image->German COCO task (the hardest task in that competition, since it's into German rather than French, and tested on out of domain images).
  • 2017-05: Ph.D. student Kai Zhao successfully defended and accepted a rare Research Scientist offer from Google Research (NYC).
  • 2017-01: We are hosting Oregon's first and only University site for North American Computational Linguistics Olympiad (NACLO)!
  • 2016-12: Ph.D. student James Cross successfully defended. James is joining Facebook as a Research Scientist.
  • 2016-11: James's EMNLP 2016 paper received an Honorable Mention for Best Paper (and unanimous full-score reviews).
  • 2016-07: Former Ph.D. graduate (USC) Ashish Vaswani joined Google Brain as a Research Scientist.
  • 2015-09: Postdoc Feifei Zhai joined IBM T. J. Watson as a Research Staff Member.
  • 2015-09: Our group moved from City University of New York to Oregon State University.
  • 2015-07: We received Yahoo! Faculty Engagement Award.
My research interests are mainly in the algorithmic and formal aspects of computational linguistics (esp. parsing and machine translation) and their applications in computational structural biology. The key questions that motivate my research are:

Why are computers so bad at understanding and processing natural language? Can we teach computers to process natural language the way we humans do, that is, both fast and accurate? Or, can computers process natural language the way they process programming languages in spite of the inherent ambiguity of the former?

So recently I have been focusing on linear-time algorithms for parsing and translation inspired by both human processing (psycholinguistics) and compiler theory.

On the other hand I also work on theoretical and practical problems in structured learning with inexact search that rises from NLP but also applies to other structured domains such as computational biology. I had also worked on structural biology (esp. protein folding) using dynamic programming inspired by computational linguistics (see below). Recently, we started working on RNA secondary structure prediction using grammars and parsing, which I studied briefly during my PhD.

Lastly, I remain interested in theory and algorithms and some of my NLP papers draw unexpected connections from theoretical computer science, e.g., the linear-time synchronous binarization algorithm was inspired by Graham Scan for Convex Hull, and the k-best parsing algorithms are often featured in the exams when I teach Algorithms.

Listing of my papers on Google Scholar, Semantic Scholar, a subset in ACL Anthology Network, and another subset (top conferences) in

Recent and Past Talks

Latest and Current Work

Representative Publications (organized by techniques)

For representative or recent publications please see the "Research" Tab.

For a more complete list you can visit my google scholar page (or by year). Some time in the future I will compile a real publication list, but for the time-being please use google scholar. You can also use Semantic Scholar (by AI2) to see the influences of my work.

Some lesser-known work:

  1. K-best Knuth Algorithm. Tech Report, 2005.

    This brief note develops a k-best extension of the powerful Knuth (1977) algorithm, which itself extends the classical Dijkstra (1959) shortest-path from graphs to directed hypergraphs (AND-OR graphs) and context-free grammars. My extension was quite straightforward, following a simple k-best Dijkstra extension by Mohri and Riley (2002), combined with the lazy k-best frontier idea from my k-best CKY parsing work (Huang and Chiang, 2005). I thought this algorithm was only of theoretical interest, and was surprised (and glad) when Pauls and Klein (2009) further developed it to "k-best A* parsing" by adding heuristics, which won the ACL 2009 Best Paper Award.

    See also [3] below on the relationship between A*, Knuth, and Dijkstra.

    As a side note, the k-best parsing algorithms (Huang and Chiang, 2005) were used in the award winning papers in ACL 2005, NAACL 2006, ACL 2008, and ACL 2009.

  2. Statistical Mechanics of Helix Bundles using a Dynamic Programming Approach, with Adam Lucas, Aravind Joshi, and Ken Dill.
    J. Am. Chem. Soc. (JACS), 129 (14), pp. 4272-4281.

    I used to work on computational biology during my PhD, which resulted in this paper. Although published in the top chemistry journal, it does not get much attention at all, which I believe was due to the difficulty of the technique (dynamic programming) for a chemistry audience (a computational biology audience would have been much better -- they use DP a lot, e.g. Smith-Waterman is edit-distance; even CKY is widely used in comp-bio). (more DP details such as function operators left to the Supplement).

  3. Tutorial: Advanced Dynamic Programming in Semiring and Hypergraph Frameworks.
    NAACL 2009 and COLING 2008. [slides] (based on my candidacy exam and Chapter 2 of my thesis)

    I present a unified algebraic theory of dynamic programming based on monotonic weight functions, semirings (distributivity implies monotonicity), and hypergraphs (distributivity is the key to factorization). Here DP includes both Viterbi-style (topological order, requires acyclicity of the graph) and Dijkstra-style (best-first, requires superiority of the semiring); also DP includes both optimization (using idempotent semirings) and summation/expection (using inside/expectation semirings). This view of DP is much broader than those in the textbooks. The only implementation paradigm not covered in my framework is "memoized top-down recursion", which, as the CLRS book also pointed out (Sec. 15.1), could also be viewed as the DFS version of topological sort (whereas Viterbi, or bottom-up, resembles a BFS version of topological sort). A* search on graphs and hypergraphs (e.g. A* parsing) are also discussed as generalizations of Dijkstra 1959 and Knuth 1977.

    It was the most popular tutorial in both COLING 08 and NAACL 09.

  4. The Art of Algorithms and Programming Contests, with Rujia Liu.
    Tsinghua University Press, 2003. Baidu Baike entry (in Chinese). National Best Seller in CS.

    We wrote this book in college while I was training alongside the SJTU team which won the World Champions for the first time. When it was first published I had just come to the States and was no longer active in the programming contest community, so I had no clue that it would become a best-seller in the field of programming contests (I initially thought it would only sell about a thousand copies). I wrote one of the three chapters of the book, the one on computational geometry, my second favorite topic in algorithms after dynamic programming.

    After many years of hiatus, I finally resumed contest coaching, at USC (encouraged by David Kempe) and later at CUNY (encouraged by Jerry Waxman), helping the former win its first Southern California Regionals in 2011 (and again in 2012), and helping the latter achieve its best ranking in Greater New York Regionals in 10 years. I developed a course in CUNY to train for ACM/ICPC (and also for job interviews as a by-product), which became students' favorite class in the CS Dept. I hope to write a new book based on the course notes (with Python code for each problem). I really think ACM/ICPC should allow Python in all Regionals and the World Finals (many Regionals in the US already support Python, including my current region, the Pacific Northwest). I also advocate for more algorithmic problems in the Regionals (esp. in the US), and more geometry problems in the Finals.

Teaching is always rewarding, and is indeed the best part of being a professor.

Current teaching at Oregon State:

Past Teaching at CUNY:

Past Teaching at USC:

Past Teaching at Penn:

  • CSE 399: Python Programming (new course, Spring 2006)
    Chosen by the Department to be the first graduate student ever to teach a course.
  • Recitation Instructor, CSE 320, Algorithms (Spring 2005, Prof. Sanjeev Khannna)
  • Recitation Instructor, CSE 262, Automata and Formal Language Theory (Fall 2004, Prof. Jean Gallier).
  • Awarded University Graduate Teaching Prize, 2005. [more details]
Welcome to the Algorithms for Computational Linguistics (ACL) Group!

I am very fortunate to have been working with the following Ph.D. students:

  • Mingbo Ma, B.S., EE, Jilin U. Joined ACL Group in 2013. Interned at IBM TJ Watson and Apple.
  • Dezhong Deng, B.S., CS, PKU. Joined ACL Group in 2014. Interned at MSRA.
  • Yilin Yang, B.S., CS, East China Normal Univ. To join ACL Group in Fall 2017. Interned at Microsoft.

ACL Group Alumni:

  • PhD Graduates (3):
    • (#3, OSU) Kai Zhao, B.S., CS, USTC. Joined ACL Group in 2012. Interned at IBM TJ Watson, MSR, and Google Research. 9 ACL/EMNLP/NAACL papers. Defended May 2017. First Job: Research Scientist at Google Research (NYC).
    • (#2, OSU) James Cross, B.A., French and B.S., CS, Auburn; J.D., NYU. Member of NY Bar. Interned at IBM TJ Watson and Facebook. Best Paper Honorable Mention at EMNLP 2016. Thesis defended Dec. 2016. First Job: research scientist at Facebook.
    • (#1, USC) Ashish Vaswani, Ph.D. at USC (co-advised by David Chiang). Interned at IBM TJ Watson and Google Research. Thesis defended May 2014. First Job: Research scientist at USC/ISI. Now research scientist at Google Brain.
  • Postdocs (2)
    • Feifei Zhai (Ph.D., CAS/IA), postdoc, 2014--2015, now research staff member at IBM Watson.
    • Lemao Liu (Ph.D., Harbin), postdoc, 2013--2014, now researcher at NICT, Nara, Japan.

My past summer interns include Sasha Rush (Ph.D., MIT, now at Harvard), Yoav Goldberg (Ph.D., Ben Gurion, now at Bar Ilan), Heng Yu (Ph.D., CAS/ICT, now with Samsung), and Zhuoran Yu (M.S., NYU, now with Google NYC's NLP research team).

To prospective students: please use for a data-driven and more up-to-date CS ranking than the subjective and obsolete USNews system. If you look at the past ten years (2006--2016), we are ranked 15th in narrow AI (AAAI/IJCAI), 18th in NLP, 18th in vision, and 18th in broad AI. So basically the AI/DS groups are super strong here. Other strong groups include Software Engineering (9th), Crypto (13th), and HCI (20th). The overall CS ranking is 39th and is improving.

I am from Shanghai, China and speak Wu as my native language.


Ph.D., Computer Science, University of Pennsylvania, 2008. (old old homepage) [thesis] [slides]
B.S., Computer Science, Shanghai Jiao Tong University, 2003. summa cum laude. (minor studies in French and English)

Professional Experience

Assistant Professor, School of EECS, Oregon State University, 2015/9--present.
Research Scientist (part-time), IBM Watson Group, 2014/6--2017/1.

Assistant Professor, Queens College and Graduate Center, City University of New York (CUNY), 2012/8--2015/8.

Research Assistant Professor, University of Southern California (USC), 2010/7--2012/8.
Computer Scientist, USC Information Sciences Institute, 2009/7--2012/8.

Research Scientist, Google Research (Mountain View), 2009/1--7.

Visiting Scholar, Hong Kong Univ. of Science and Technology, 2008/10--2008/11.
Visiting Scholar, Institute of Computing Technologies, Chinese Academy of Sciences, 2007/10--2008/1.
Summer Intern, USC/ISI, 2005/5--10 and 2006/5--10.


Best Paper Award, ACL 2008 (Huang, 2008).
Best Paper Honorable Mention, EMNLP 2016 (Cross and Huang, 2016).
Best Paper Finalist, ACL 2007 (Huang and Chiang, 2007)
Best Paper Finalist, EMNLP 2008 (Mi and Huang, 2008)
Best Paper Finalist, ACL 2010 (Huang and Sagae, 2010).
Google Faculty Research Award, 2010.
Google Faculty Research Award, 2013.
Yahoo! Faculty Research and Engagement Award, 2015.
Regional Champions (as Faculty Advisor for USC), ACM Int'l Collegiate Programming Contest (ICPC), 2011.
University Prize for Graduate Student Teaching, University of Pennsylvania, 2005. [more details]

Selected Service

NSF Panelist, 2014, 2015, 2017 x 2
Grant Reviewer, Foreign NSFs: Hong Kong (RGC), 2017; Canadian (NSERC), 2017; Dutch (NWO), 2017; Israeli (ISF), 2015
Area Chair (syntax and parsing), IJCNLP 2017
Area Chair (syntax and parsing), EMNLP 2016
Area Chair (machine translation), ACL 2014
Area Chair (syntax and parsing), ACL 2012
Program Co-Chair, IWPT 2013
Textbook Reviewer, Cambridge Univ. Press, 2016, and Oxford Univ. Press, 2010

CV [Dec. 2016]

Brief bio for talk announcements

Liang Huang is currently an Assistant Professor of EECS at Oregon State University. Before that he was Assistant Professor for three years at the City University of New York (CUNY) and a part-time Research Scientist with IBM's Watson Group. He graduated in 2008 from Penn and has worked as a Research Scientist at Google and a Research Assistant Professor at USC/ISI. Most of his work develops fast algorithms and provable theory to speedup large-scale natural language processing, structured machine learning, and computational structural biology. He has received a Best Paper Award at ACL 2008, a Best Paper Honorable Mention at EMNLP 2016, several best paper nominations (ACL 2007, EMNLP 2008, and ACL 2010), two Google Faculty Research Awards (2010 and 2013), a Yahoo! Faculty Research Award (2015), and a University Teaching Prize at Penn (2005). His research has been supported by DARPA, NSF, Google, and Yahoo. He also co-authored a best-selling textbook in China on algorithms for programming contests.

Note: since 2006, following David Chiang (who wrote some of the most elegant large-scale Python code in this world), almost all my code has been written in Python2.7, with some Python extension libraries in C (hacking CPython source code). I also love functional programming and declarative programming in general (OCaml, Haskell, and Prolog), but hate C++, Perl, and Matlab which are ugly. Compared to Python/Haskell/Ocaml, languages like C/C++ and Java are stone-age artifacts; don't use them unless absolutely necessarily (such as for Python libraries). I dislike Matlab because it is so numerically and engineering oriented, which is not ideal for elegant, symbolic, algorithmic, or large-scale programming by computer scientists, statisticians, or mathematicians (I like Mathematica and R much better).

Linear-Time Dynamic Programming Parser (with Max-Violation Perceptron Trainer)

This parser is described in the following two papers:
  1. Liang Huang and Kenji Sagae (2010). Dynamic Programming for Linear-Time Incremental Parsing.
    Proceedings of ACL 2010. Best Paper Nominee.
  2. Liang Huang, Suphan Fayong, and Yang Guo (2012). Structured Perceptron with Inexact Search.
    Proceedings of NAACL 2012.
Parser Features:
  • Extremely fast linear-time parsing (about 0.04 seconds per sentence).
  • Dynamic programming and forest output (encodes exponentially many trees).
  • k-best output (Huang and Chiang, 2005).

Trainer Features:

  • Flexibility in defining new feature templates (my Python code compiles your feature definition language into a dynamically generated Python snippet).
  • Max-Violation Perceptron Training (Huang et al 2012).
  • Parallelized Perceptron Training (McDonald et al 2010).

Download from here.
Download from github repo.

Python Sparse Vector

Written in C as a Python extension module based on collections.defaultdict.
Much faster and slimmer (4 times less memory usage) than David Chiang's svector.
Builtin support for averaged parameters in online learning (e.g. perceptron, MIRA, etc.).

Note: for decoding (e.g. parsing), defaultdict is fast enough (mine is even faster by doing dot-product in C, which is also possible via Cython), but for learning (e.g. perceptron), defaultdict becomes terrible on big data because Python float/int are immutable, which caused too many unnecessary hash operations. Using my hvector can make your learner up to 5 times faster.

Download: hvector version 1.0 (Jan 2013).

Forest-Reranking Parser

This parser/reranker is described in the following paper:
  • Liang Huang (2008). Discriminative Parsing with Non-Local Features.
    Proceedings of ACL 2008. (Best Paper Award)
    errata: Following Charniak, the dev set was section 24, not section 22.
This software has three components:
  1. The forest-dumping version of Charniak parser.
  2. The forest reranker.
  3. The perceptron trainer.
Currently part 1 is downloadable as a standalone package. Parts 2 and 3 are being packaged for release.

Important: If you're using 64-bit Ubuntu, it is recommended that you install Python from source code (see The default Python2.7 in those Ubuntus (at least 12.04) has an obscure floating point problem which gives inconsistent results.

Papers which use multiple pieces of my software: there are many papers from other researchers which use a certain piece of my software, but there are also some which combine different pieces together, for example:
x2   x2  

We gratefully acknowledge the support from external funding agencies.

  • HP Seedling Fund, 2017.

  • co-PI, DARPA Explanable AI (XAI), 2017--2021. PI: Alan Fern.

  • PI, Yahoo! Faculty Research and Engagement Award, $25k for one year, 2015--2016.

  • PI, NSF EAGER, $140k, 2014--2017.

  • PI, Google Faculty Research Award, unrestricted gift, $88k for one year, 2013--2014.

  • co-PI, DARPA DEFT Program, total $2M for 4.5 years, 2012--2016. PI: Andrew Rosenberg.

  • PI, Google Faculty Research Award, unrestricted gift, $75k for one year, 2010--2011.
Internal Fundings:
  • OSU EECS Collaboration Fund, 2016--2017
  • PSC-CUNY Enhanced Research Award, 2013--2014
  • USC Viterbi School of Engineering Fund, 2011--2012
Liang Huang
School of EECS, Oregon State University
1148 Kelley Engineering Center (KEC)
Corvallis, OR 97331
541-737-4694 (o)

liang dot huang dot sh at gmail or liang dot huang at oregonstate dot edu.

Disclaimer: I am known to be highly opinionated, and some points below might sound offensive to some readers.

Classical Music

I am a big fan of Classical Music. The composers I admire the most are Johann Sebastian Bach (whose music is so mathematical), Peter Ilych Tchaikovsky (whose melodic talent is second only to Mozart), and Antonin Dvorak (whose music blends Bohemia with America). I also love, among others, (in chronological order) Luigi Boccherini, Wolfgang Amadeus Mozart, Ludwig van Beethoven, Felix Mendelssohn, and Sergei Rachmaninoff. Yes, I do have a preference for Baroque, Slavic, and melodic beauty.

On the other hand, I don't have a taste or much respect for Richard Wagner (whom I found disgusting) or Franz Lizst. Compared to Frederic Chopin or Nicolo Paganini, Lizst has almost nothing original to himself (like comparing Clementi to Mozart).

A Personal History of Languages

I grew up speaking Wu, but in a multilingual environment. Back in the old days, Shanghai was just as multicultural as New York City today with speakers and communities of all kinds of languages or dialects. When I grew up, my parents spoke Shanghainese, and my grandparents Ningbonese, which I understood but could not speak well; the majority of our neighbors, however, spoke yet another distinctive language called Lower Yangtze Mandarin, which is a "linear interpolation" between Wu and Northern Mandarin, and because of this "interpolation" nature I am still fluent in it today. I started learning Standard Mandarin as a de facto first foreign language in the elementary school, but like most of my peers, I ended up with a heavy southern accent. During college I took up French seriously and got admitted into one of the Grandes Ecoles in Paris, but forgot all of it after moving to the US; those memories somehow all came back when I later took up Spanish and Italian. On the other thand, living in the US helped me get rid of my heavy accent in Mandarin where finally the "training data" around me had more native samples than non-native ones. Living here also exposed me to other Chinese languages and dialects which I never heard back in Shanghai, such as the Upper Yangtze Mandarin (aka "Sichuan"), Cantonese, and Hokkien (aka "Taiwanese"), and more interestingly, various English and Spanish dialects. I still enjoy learning new languages and dialects today.

More on Wu Chinese.

Latin Resources

If you've studied at least one Romance language (French, Italian, Spanish, Portugues, Romanian, etc.), then Latin shouldn't be too hard. Studying Latin has many benefits, but to me the sound changes in Latin and descendant languages really helped me understand more about Indo-European. Just the hard-vs-soft "c" alone tells you so much about Indo-European history, Centum vs. Satem split, etc. Here is a great guide on Latin pronunciation written by a fellow computational linguist Michael Covington (yes, you probably knew him for the "Convington algorithm", an O(n^2) algorithm for dependency parsing).

Enjoy some beautiful music in Latin (Operatic Pop or Catholic):

My interests in Latin were, once again, influenced by David Chiang.

What to do when stuck in traffic jam...

Well, driving in LA is like hell, so my former colleagues at ISI had developed various ways of doing something useful while stuck on the freeways. The NLP group at ISI, being a set of language lovers, is particularly fond of listening to the Pimsleur language programs while driving (someone even got a ticket!), which is arguably the best language learning tool, and I myself did it for Spanish which worked out great. However, Pimsleur doesn't work that well for Japanese and much worse for Chinese where the writing systems are (partially or completely) not phonetic which means you do have to read on the page. You might be able to do some simple Japanese conversations with Pimsleur, but reading kanas and kanjis requires different training; Chinese is much worse which made Pimsleur basically useless even for basic conversations because of homonyms and tones (Jon May used to complain about similar-sounding words such as yiyue "January", yiyuan "hospital", and yinyue "music"; the tones are also different but most westerners can't perceive it). In the same logic, Pimsleur works the best for Italian and Spanish, where the spellings are the most regular (phonemic orthography). Jason Riesa did it for Persian (among the many languages he studied) and claimed it to be really easy partly because it's a well-behaved Indo-European (which makes sense).

Besides Pimsleur, I also listened to a great many The Teaching Company courses while driving, which has a large catelog of courses in all subjects, and they made it in a way that you can get much of it just by listening (which you can do while cooking, jogging, etc) although they also have videos if you wanna study more carefully. The two professors I recommend the most are Robert Greenberg of UC Berkeley for classical music and music theory, and John McWhorter of Columbia for linguistics.

Both Pimsleur and Teaching Company CDs are extremely expensive, but you can borrow them from local libraries.

Coursera Courses

For English-language courses, I liked this one in particular: Buddhism and Modern Psychology, by Robert Wright of Princeton, and all courses by Curtis (String Quartet, Beethoven Piano Sonatas, and Western Music History).

For Chinese-language courses, I am deeply moved by the two from Lu, Shih-Hao of National Taiwan University: Shiji, and Qin Shi Huang. I wish he could offer more online and I think Taiwanese students are so lucky to be able to sit in his classrooms; compared to those in Taiwan, many history/humanities professors in the mainland are just a joke (except a handful of real scholars, such as Gao, Hua who wrote How did the Red Sun Rise). For those learning algorithms and programming, the one by PKU is recommended; Liu Jiaying is a real fun teacher.

History of Science

As you can tell I am most interested in historical things: historical geography, historical linguistics, and history of science and mathematics. Here is my old collection on aesthetics of mathematics (essays by C.N. Yang, M. Atiyah, T. Gowers, A. Connes, and T. Tao) and a special subcollection on Paul Erdos.

Michael Atiyah on the Art of Mathematics and interview video.

I highly recommend the two documentaries by the Hungarian-American George Csicsery on legendary mathematicians: N is a Number on Paul Erdos, and Counting from Infinity on Yitang Zhang. In the Erdos film, many scenes were shot on the Penn campus (the Math/Physics building DRL in particular) as he visited Prof. Herbert Wilf frequently. In the Yitang film, I love the way he speaks, with a heavy but cute Shanghainese accent, even though he moved to Beijing when he was 13. He is my model as a pure scholar (who does not need recognition or funding) and a dedicated teacher.

Talk Shows

People that I learned the most from

So far I have been extremely grateful to these people from whom I learned a great deal of:
  • David Chiang
  • Dan Gildea
  • Kevin Knight
  • Hao Zhang
  • Kai Zhao
and to a slightly lesser extent:
  • Ryan McDonald
  • Fernando Pereira
  • Qun Liu
  • Benjamin Pierce
  • Shang-Hua Teng
I certainly learned a whole lot from many many other people. So I apologize if your name is not listed here. :)