| RECENT
RESEARCH
TOPICS |
|
SLEDGE:
Sequential Labeling of Image Edges for
Boundary Detection
|
We sequentially label image
edges as "on" or "off" object
boundaries. A visited edge is labeled as
boundary based on evidence of its
perceptual grouping with already
identified boundaries. We use both local
Gestalt cues, and the global Helmholtz
principle of non-accidental grouping.
Image edges are extracted with our new
detector that finds salient pixel
sequences which separate distinct
textures within the image.
|
|
|
Hough Forest
Random Field for Object Recognition and
Segmentation
|
We combine Hough forest (HF)
and conditional random field (CRF) into
HFRF to assign labels of object classes
to image regions. HF captures intrinsic
and contextual properties of objects as
class histograms in the leaf nodes. This
evidence is used in CRF inference for
non-parametric density estimation of the
posteriors. Theoretical error bounds of
HF and HFRF applied to a two-class
object detection and segmentation are
also presented.
Personal use of
this material is permitted.
However, permission to
reprint/republish this material
for advertising or promotional
purposes or for creating new
collective works for resale or
redistribution to servers or
lists, or to reuse any
copyrighted component of this
work in other works must be
obtained from the IEEE.
|
|
|
Cost-Sensitive
Top-down/Bottom-up Inference for Multiscale
Activity Recognition
|
While prior work typically
addresses activity recognition at a
single scale, we jointly model group
activities, individual actions, and
participating objects with an AND-OR
graph, and exploit its hierarchical
structure for efficient inference. An
explore-exploit strategy is used to
adaptively zoom-in or zoom-out for
cost-sensitive inference.
|
|
|
Human Actions
as Stochastic Kronecker Graphs
|
A human activity can be
viewed as a space-time repetition of
activity primitives. Given a set of
primitives, and an affinity matrix of
their probabilistic grouping, we
formulate that a video of the activity
is probabilistically generated by a
sequence of the Kronecker products of
the affinity matrix.
|
|
|
Sum-Product
Networks for Modeling Activities with
Stochastic Structure
|
Activities with stochastic
structure are characterized by variable
space-time arrangements of
subactivities, and may be conducted by a
variable number of actors. We use
sum-product networks (SPN) to model such
activities. SPN is a hierarchical
mixture of bags-of-words (BoWs). SPN
consists of terminal nodes representing
BoWs, and product and sum nodes
organized in a number of layers. The
products are aimed at encoding
particular configurations of activity
parts, and the sums serve to capture
their alternative configurations. A new
Volleyball dataset is compiled and
annotated for evaluation and future
benchmarking.
|
|
|
Learning
Spatiotemporal Graphs of Human Activities
|
Given a set of spatiotemporal
graphs, we learn their model graph, and
pdf's associated with nodes and edges of
the model. The model graph adaptively
learns from data relevant video segments
and their spatiotemporal relations. We
present a novel weighted-least-squares
formulation of learning a structural
archetype of graphs. The model is used
for video parsing.
|
|
|
From Contours
to 3D Object Detection and Pose Estimation
|
We address view-invariant
object detection and pose estimation
using contours as basic object features.
A top-down feedback from inference warps
the image, so the bottom-up extraction
of contours could better collectively
summarize relevant visual information
and match our 3D object model, under
arbitrary non-rigid shape deformations
and affine projection.
|
|
|
A Chains Model
for Localizing Participants of Group
Activities in Videos
|
We address recognition of
group activities in a given video,
localization of video parts where these
activities occur, and detection of
actors involved in them. A new
generative chains model is formulated to
organize a large number of video
features in an ensemble of chains,
starting and ending at the end points of
the time interval occupied by the target
activity.
|
|
|
Scene Shape
from Texture of Objects
 |
We estimate the 3D shape of a
scene from texture arising from a
spatial repetition of objects in the
image. Unlike existing work, our
monocular estimation does not use domain
knowledge about the layout of common
scene surfaces. We also show that
reasoning about texture of objects in
the scene improves object detection.
|
|
|
Multiobject
Tracking as Maximum Weight Independent Set

|
We prove that the data
association problem -- the core of
tracking -- can be formulated as finding
the maximum-weight independent set
(MWIS) of non-adjacent tracklets in a
graph. We present a new, polynomial-time
MWIS algorithm, and prove that it
converges to an optimum.
|
|
|
Probabilistic
Event Logic for Interval-Based Event
Recognition

|
We introduce probabilistic
event logic (PEL) for representing both
hard and soft temporal constraints among
events. A PEL knowledge base consists of
confidence-weighted formulas from a
temporal event logic, and specifies a
joint distribution over the occurrence
time intervals of all events. Our MAP
inference for PEL addresses the
scalability issue of reasoning about all
time intervals in video, by leveraging
the spanning-interval data structure. A
spanning interval compactly represents
entire sets of time intervals without
enumerating them.
|
|
|
(RF)^2 --
Random Forest Random Field

|
We combine random forest (RF)
and conditional random field (CRF) to
address multiclass object recognition
and segmentation. Inference of (RF)^2
uses Metropolis-Hastings jumps which
depend on two ratios of the proposal and
posterior distributions. Our key idea is
to directly learn these ratios using RF.
|
|
|
Segmentation
as Maximum-Weight Independent Set
 |
Given an image, and an
ensemble of its distinct low-level
segmentations, we identify visually
"meaningful" segments in the ensemble.
This is formalized as the maximum-weight
independent set (MWIS) problem. We
formulate a new MWIS iterative
algorithm, where each iteration solves a
Taylor expansion of the MWIS objective
function in the discrete domain.
|
|
|
Activities as
Time Series of Human Postures
 |
We show that certain human
actions can be represented by short time
series of codewords. The codewords
represent still snapshots of human-body
parts in their discriminative postures,
and objects that people interact with
while performing the activity. This
carries many advantages for developing a
robust, efficient, and scalable activity
recognition system.
|
|
|
From a Set of
Shapes to Object Discovery
 |
We show that shape is
expressive and discriminative enough to
provide robust object discovery in the
midst of background clutter. We build a
graph that captures spatial layouts of
edges extracted from a set of images,
and conduct its multicoloring by a new
coordinate ascent Swendsen-Wang cut. The
resulting clusters of edges delineate
the boundaries of distinct objects
discovered in the image set.
|
|
|
Monocular
Extraction of 2.1D Sketch
 |
Given a segmentation and
T-junctions of an image, we estimate the
depth layers of the scene. The
estimation is formalized as a quadratic
optimization so the resulting 2.1D
sketch is smooth in all image areas
except on region boundaries.
|
|
|
Video Painting
with Space-Time Varying Style Parameters
|
An input video is rendered by
applying a distinct painting style to
each spatiotemporal tube, corresponding
to a moving object in the video.
Spatiotemporal segmentation allows
the user a control to vary
painting styles in 2D space and time,
and thus convey rich semantic
content, e.g., emotions, illusion,
chaos, etc.
|
|
|
Toward Optimal
Feature Selection through Local Learning
|
Given data with a huge number
of irrelevant features (> 10 6),
select features relevant to data
classification. We decompose a nonlinear
problem into a set of locally linear
ones, and then globally learn feature
relevance within the large margin
framework.
|
|
|
Video Object
Segmentation by Tracking Regions
|
Given an arbitrary video,
segment all moving and static objects
present. We transitively match contours
of image regions across the frames such
that the resulting tracks are locally
smooth.
|
|
|
Texel-based
Texture Segmentation
|
Given an arbitrary image,
discover and segment all distinct
texture subimages. We use the meanshift
to simultaneously estimate the pdf of
texel appearance and the pdf of texel
placement.
|
|
|
Matching
Hierarchies of Deformable Shapes
|
Shapes are represented by
graphs whose nodes correspond to shape
parts, and edges capture their neighbor
and part-of interactions. Shape matching
is formulated as finding the subgraph
isomorphism that minimizes a quadratic
cost.
|
|
|
Dictionary-Free
Categorization
Using Evidence Trees
|
How to categorize images
showing very similar object categories?
We mathematically prove that it is
better to use class evidence accumulated
from all image features than to use a
majority voting of class decisions made
on each individual feature.
|
|
|
Scale-invariant
Region-based
Hierarchical Image Matching
|
Find correspondences between
similar objects in images captured under
large variations in scale. Scale
invariance is achieved by decoupling the
scales of objects from those of scenes,
and by down-weighting the contributions
of fine-resolution details to matching.
|
|
|
Learning
Subcategory Relevances for Category
Recognition
|
Detections of distinct object
categories provide different degrees of
evidence for recognition of more
complex, parent categories. This is
estimated using local learning.
|
|
|
Connected
Segmentation Tree
- A Joint
Representation of Region Layout and Hierarchy
-
|
CST is a hierarchy of region
adjacency graphs. The CST model of an
object category is learned by
simultaneously searching for both the
most salient regions, and the most
salient containment and neighbor
relationships of regions across training
images.
|
|
|
Extracting
Texels in 2.1D Natural Textures
|
Given an image of 2.1D
texture, learn without any supervision a
generative model of the entire
(unoccluded) texel. Learning involves
concurrent estimation of the
texel-subtexel structure, and the pdf's
of each texel part from only partially
visible texels in the image.
|
|
|
Taxonomy of
Categories Present in Arbitrary Images
|
Given an
arbitrary (unlabeled) image set, learn
the models of all visual categories
present, and their inter-category
relationships, i.e., their taxonomy. The
taxonomy recursively defines categories
as spatial configurations of (simpler)
subcategories each of which may be
shared by many categories.
|
|
|
The hoofed animals dataset
contains very similar categories that
share a number of similar parts. Each
image may contain multiple instances of
multiple categories. Animals are
articulated, non-rigid objects,
appearing at different scales amidst
clutter, and may be partially occluded.
|
|
|
The images show homogeneous,
frontally viewed, natural, 2.1D
textures, where: (1) Texels are only
statistically similar to each other; (2)
Texel placement is random; (3)
Repetition of subtexels define a finer
grain texture coexisting with the main
texture; (4) Due to texel overlap, texel
contours form complex patterns (e.g.,
several edges meet at one point), and
overlapping texels have low contrasts,
all of which makes texel segmentation
difficult.
|
|
|
Unsupervised
Category Modeling, Recognition and
Segmentation

|
Given a set of
images containing frequent occurrences
of an unknown visual category, learn
geometric, photometric and topological
properties of regions defining the
category. Learning is unsupervised,
because the target category is not
defined by the user, and whether and
where any instances of the category
appear in a specific image is not known.
|
|