Linear-time Constituency Parsing with RNNs and Dynamic Programming.
Juneki Hong and Liang Huang (2018).
In ACL 2018   [ paper | slides | code ]
Recently, span-based constituency parsing has achieved competitive accuracies with extremely simple models by using bidirectional RNNs to model ``spans''. However, the minimal span parser of (stern et al. 2017) which holds the current state of the art accuracy is a chart parser running in cubic time, O(n^3), which is too slow for longer sentences and for applications beyond sentence boundaries such as end-to-end discourse parsing and joint sentence boundary detection and parsing. We propose a linear-time constituency parser with RNNs and dynamic programming using graph-structured stack and beam search, which runs in time O(n b^2) where $b$ is the beam size. We further speed it up to $O(n b\log b)$ by integrating cube pruning. Compared with chart parsing baselines, this linear-time parser is substantially faster for long sentences on the Penn Treebank and orders of magnitude faster for discourse parsing, and achieves the highest F1 accuracy on the Penn Treebank among single end-to-end systems.
Rapid Development of Public Health Education Systems in Low-Literacy Multilingual Environments: Combating Ebola Through Voice Messaging.
Nikolas Wolfe*, Juneki Hong*, Ali Raza, Bhiksha Raj, and Roni Rosenfeld (2015).
In SLaTE 2015   [ paper | poster ]
One of the main challenges in combating the spread of the Ebola outbreak in West Africa is a lack of effective public health education among affected populations in Guinea, Sierra Leone, and Liberia. Difficulties include resistance to official sources of information, mistrust of government, cultural norms, linguistic barriers, and illiteracy. In this paper we describe the development and initial deployment of a voice-based, multilingual mobile phone application to spread reliable public health information about Ebola via peer-to-peer sharing. Our hypothesis is that we can overcome mistrust and disseminate important health information via the power of social learning and suggestion from friends, family, and local communities. In collaboration with partners on the ground in Conakry, Guinea, we have launched two parallel mobile phone services known as Polly Game and Polly Health to enable message sharing in several Guinean languages. We discuss a variety of strategies we have tried to encourage the spread of the application and data on uptake to date.
Deriving multi-headed projective dependency parses from link grammar parses.
Juneki Hong and Jason Eisner (2014).
In TLT 2014.   [ paper | supplement | poster | slides | bib ]
Under multi-headed dependency grammar, a parse is a connected DAG rather than a tree. Such formalisms can be more syntactically and semantically expressive. However, it is hard to train, test, or improve multi-headed parsers because few multi-headed corpora exist, particularly for the projective case. To help fill this gap, we observe that link grammar already produces undirected projective graphs. We use Integer Linear Programming to assign consistent directions to the labeled links in a corpus of several thousand parses produced by the Link Grammar Parser, which has broad-coverage hand-written grammars of English as well as Russian and other languages. We find that such directions can indeed be consistently assigned in a way that yields valid multi-headed dependency parses. The resulting parses in English appear reasonably linguistically plausible, though differing in style from CoNLL-style parses of the same sentences; we discuss the differences.
Keywords: dependency parsing, corpora