picture of me
Sinisa Todorovic
sinisa@eecs.oregonstate.edu

NEWS



The videos show human activities with rich spatiotemporal structure. We have evaluated our Probabilistic Event Logic on this dataset.


PUBLICATIONS

INFO


TEACHING

PHD STUDENTS

ALUMNI'S THESES

SOME GREAT MOMENTS
CVPR 2014

CVPR 2014

CVPR 2013

CVPR 2012

ICCV 2011

CVPR 2011

Graduation 2011

NIPS

Nadia PayetNadia Payet

William Brendel

William Brendel,
                      Nadia Payet

Nadia Payet
RECENT RESEARCH TOPICS
Monocular Extraction of 2.1D Sketch Using Constrained Convex Optimization
Sketch
We partition the image into regions, and estimate their depth ordering in the scene. This is cast as a constrained convex optimization problem, and solved within the optimization transfer framework. Our new optimization transfer admits a closed-form expression of the duality gap, and thus allows explicit computation of the achieved accuracy.
IJCV Paper
The final publication is available at http://link.springer.com
HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos
HiRF
We formulate Hierarchical Random Field (HiRF) for activity recognition. HiRF establishes strictly hierarchical links between all variables, discarding
the common lateral temporal connections. This enables an efficient bottom-up/top-down inference.
Multi-Object Tracking via Constrained Sequential Labeling
Tracking
Constrained sequential labeling (CSL) assigns object identifiers to supervoxels, while respecting domain constraints. CSL is well-suited for simultaneous labeling and fixing noisy merges and splits of our mid-level features, which cannot be handled in a principled manner by traditional network flow approaches. CSL is efficient due to contraint propagation.
Scene Labeling Using Beam Search Under Mutex Constraints
Scene
                                Labeling
We cast scene labeling as quadratic program (QP) with mutual exclusion (mutex) constraints on class label assignments. The QP is solved efficiently using  beam search, which explicitly accounts for spatial extents of objects, and guarantees that all mutex constraints from domain knowledge are satisfied.
Play Type Recognition in Real-World Football Video
Football
Given a video sequence of plays of a football game, we integrate responses of the play-level detectors with global game-level reasoning to overcome huge variations in camera viewpoint, motion, and distance from the field, as well as amateur camerawork quality.
Inferring Dark Matter and Dark Energy from Videos
Inferring
                                Dark Matter and Dark Energy from Videos
Functional objects do not have discriminative appearance and shape, but can be viewed as "dark matter", emanating "dark energy" that affects people’s trajectories in the video. For localizing functional objects, we analyze noisy behavior of people in the scene using agent-based, probabilistic Lagrangian mechanics.
Latent Multitask Learning for View-Invariant Action Recognition
Latent
                                Multitask Learning for View-Invariant
                                Action Recognition
When each viewpoint of a given set of action classes is specified as a learning task then multitask learning appears suitable for achieving view invariance in recognition. We extend the standard multitask learning to allow identifying: (1) latent groupings of action views (i.e., tasks), and (2) discriminative action parts, along with joint learning of all tasks.
Monte Carlo Tree Search for Scheduling Activity Recognition
Monte
                                Carlo Tree Search for Scheduling
                                Activity Recognition
Querying an activity in a long video footage may require running a multitude of detectors, and tracking their detections. We use Monte Carlo Tree Search to optimally schedule a sequence of detectors and trackers to be run, and where they should be applied in the space-time volume.
SLEDGE: Sequential Labeling of Image Edges for Boundary Detection
HFRF
We sequentially label image edges as "on" or "off" object boundaries. A visited edge is labeled as boundary based on evidence of its perceptual grouping with already identified boundaries. We use both local Gestalt cues, and the global Helmholtz principle of non-accidental grouping. Image edges are extracted with our new detector that finds salient pixel sequences which separate distinct textures within the image.
IJCV  Paper Edge Map Code
The final publication is available at http://link.springer.com
Hough Forest Random Field for Object Recognition and Segmentation
HFRF
We combine Hough forest (HF) and conditional random field (CRF) into HFRF to assign labels of object classes to image regions. HF captures intrinsic and contextual properties of objects as class histograms in the leaf nodes. This evidence is used in CRF inference for non-parametric density estimation of the posteriors. Theoretical error bounds of HF and HFRF applied to a two-class object detection and segmentation are also presented.
PAMI  Paper
Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
Cost-Sensitive Top-down/Bottom-up Inference for Multiscale Activity Recognition
Cost
                                Sensitive Inference of AND-OR graphs
While prior work typically addresses activity recognition at a single scale, we jointly model group activities, individual actions, and participating objects with an AND-OR graph, and exploit its hierarchical structure for efficient inference. An explore-exploit strategy is used to adaptively zoom-in or zoom-out for cost-sensitive inference.
Human Actions as Stochastic Kronecker Graphs
GraphLearning
A human activity can be viewed as a space-time repetition of activity primitives. Given a set of primitives, and an affinity matrix of their probabilistic grouping, we formulate that a video of the activity is probabilistically generated by a sequence of the Kronecker products of the affinity matrix.
Sum-Product Networks for Modeling Activities with Stochastic Structure
Graph
                                Learning
Activities with stochastic structure are characterized by variable space-time arrangements of subactivities, and may be conducted by a variable number of actors. We use sum-product networks (SPN) to model such activities. SPN is a hierarchical mixture of bags-of-words (BoWs). SPN consists of terminal nodes representing BoWs, and product and sum nodes organized in a number of layers. The products are aimed at encoding particular configurations of activity parts, and the sums serve to capture their alternative configurations. A new Volleyball dataset is compiled and annotated for evaluation and future benchmarking.
Learning Spatiotemporal Graphs of Human Activities
Graph
                                Learning
Given a set of spatiotemporal graphs, we learn their model graph, and pdf's associated with nodes and edges of the model. The model graph adaptively learns from data relevant video segments and their spatiotemporal relations. We present a novel weighted-least-squares formulation of learning a structural archetype of graphs. The model is used for video parsing.
From Contours to 3D Object Detection and Pose Estimation
3D
                                Recognition
We address view-invariant object detection and pose estimation using contours as basic object features. A top-down feedback from inference warps the image, so the bottom-up extraction of contours could better collectively summarize relevant visual information and match our 3D object model, under arbitrary non-rigid shape deformations and affine projection.
A Chains Model for Localizing Participants of Group Activities in Videos
Chains
                                Model
We address recognition of group activities in a given video, localization of video parts where these activities occur, and detection of actors involved in them. A new generative chains model is formulated to organize a large number of video features in an ensemble of chains, starting and ending at the end points of the time interval occupied by the target activity.
Scene Shape from Texture of Objects
street
                                scene street scene
We estimate the 3D shape of a scene from texture arising from a spatial repetition of objects in the image. Unlike existing work, our monocular estimation does not use domain knowledge about the layout of common scene surfaces. We also show that reasoning about texture of objects in the scene improves object detection.
Multiobject Tracking as Maximum Weight Independent Set
MWIS
                                trackingMWIS tracking
We prove that the data association problem -- the core of tracking -- can be formulated as finding the maximum-weight independent set (MWIS) of non-adjacent tracklets in a graph. We present a new, polynomial-time MWIS algorithm, and prove that it converges to an optimum.
Probabilistic Event Logic for Interval-Based Event Recognition
basketballbasketball
We introduce probabilistic event logic (PEL) for representing both hard and soft temporal constraints among events. A PEL knowledge base consists of confidence-weighted formulas from a temporal event logic, and specifies a joint distribution over the occurrence time intervals of all events. Our MAP inference for PEL addresses the scalability issue of reasoning about all time intervals in video, by leveraging the spanning-interval data structure. A spanning interval compactly represents entire sets of time intervals without enumerating them.
(RF)^2 -- Random Forest Random Field
street
                                orgstreet segm
We combine random forest (RF) and conditional random field (CRF) to address multiclass object recognition and segmentation. Inference of (RF)^2 uses Metropolis-Hastings jumps which depend on two ratios of the proposal and posterior distributions. Our key idea is to directly learn these ratios using RF.
Segmentation as Maximum-Weight Independent Set
florence
Given an image, and an ensemble of its distinct low-level segmentations, we identify visually "meaningful" segments in the ensemble. This is formalized as the maximum-weight independent set (MWIS) problem. We formulate a new MWIS iterative algorithm, where each iteration solves a Taylor expansion of the MWIS objective function in the discrete domain.
Activities as Time Series of Human Postures
activity
                                posture
We show that certain human actions can be represented by short time series of codewords. The codewords represent still snapshots of human-body parts in their discriminative postures, and objects that people interact with while performing the activity. This carries many advantages for developing a robust, efficient, and scalable activity recognition system.
From a Set of Shapes to Object Discovery
labelme
We show that shape is expressive and discriminative enough to provide robust object discovery in the midst of background clutter. We build a graph that captures spatial layouts of edges extracted from a set of images, and conduct its multicoloring by a new coordinate ascent Swendsen-Wang cut. The resulting clusters of edges delineate the boundaries of distinct objects discovered in the image set.
Monocular Extraction of 2.1D Sketch
bears
Given a segmentation and T-junctions of an image, we estimate the depth layers of the scene. The estimation is formalized as a quadratic optimization so the resulting 2.1D sketch is smooth in all image areas except on region boundaries.
Video Painting with Space-Time Varying Style Parameters
Flower
An input video is rendered by applying a distinct painting style to each spatiotemporal tube, corresponding to a moving object in the video. Spatiotemporal segmentation allows the  user a control to vary painting styles in 2D space and time, and thus convey  rich semantic content, e.g., emotions,  illusion, chaos, etc.
Toward Optimal Feature Selection through Local Learning
Gene
                                expression
Given data with a huge number of irrelevant features (> 106), select  features relevant to data classification. We decompose a nonlinear problem into a set of locally linear ones, and then globally learn feature relevance within the large margin framework.
Video Object Segmentation by Tracking Regions
cost
                                matrix
Given an arbitrary video, segment all moving and static objects present. We transitively match contours of image regions across the frames such that the resulting tracks are locally smooth.
Texel-based Texture Segmentation
texture
                                segmentation
Given an arbitrary image, discover and segment all distinct texture subimages. We use the meanshift to simultaneously estimate the pdf of texel appearance and the pdf of texel placement.
Matching Hierarchies of Deformable Shapes
Shape
                                matching
Shapes are represented by graphs whose nodes correspond to shape parts, and edges capture their neighbor and part-of interactions. Shape matching is formulated as finding the subgraph isomorphism that minimizes a quadratic cost.
Dictionary-Free Categorization Using Evidence Trees
Scale-invariant matching
How to categorize images showing very similar object categories? We mathematically prove that it is better to use class evidence accumulated from all image features than to use a majority voting of class decisions made on each individual feature.
Scale-invariant Region-based Hierarchical Image Matching
Scale-invariant matching
Find correspondences between similar objects in images captured under large variations in scale. Scale invariance is achieved by decoupling the scales of objects from those of scenes, and by down-weighting the contributions of fine-resolution details to matching.
Learning Subcategory Relevances for Category Recognition
Caltech-256 Results
Detections of distinct object categories provide different degrees of evidence for recognition of more complex, parent categories. This is estimated using local learning.
Connected Segmentation Tree
- A Joint Representation of Region Layout and Hierarchy -
Generalized Voronoi Diagram
CST is a hierarchy of region adjacency graphs. The CST model of an object category is learned by simultaneously searching for both the most salient regions, and the most salient containment and neighbor relationships of regions across training images.
Extracting Texels in 2.1D Natural Textures
2.1D
                                Texture
Given an image of 2.1D texture, learn without any supervision a generative model of the entire (unoccluded) texel. Learning involves concurrent estimation of the texel-subtexel structure, and the pdf's of each texel part from only partially visible texels in the image.
Taxonomy of Categories Present in Arbitrary Images
Taxonomy
                                of categories
Given an arbitrary (unlabeled) image set, learn the models of all visual categories present, and their inter-category relationships, i.e., their taxonomy. The taxonomy recursively defines categories as spatial configurations of (simpler) subcategories each of which may be shared by many categories.
ICCV '07 Poster
Paper UIUC Hoofed Animals Dataset   Slides

Hoofed
                                Animals Dataset
The hoofed animals dataset contains very similar categories that share a number of similar parts. Each image may contain multiple instances of multiple categories. Animals are articulated, non-rigid objects, appearing at different scales amidst clutter, and may be partially occluded.

2.1D
                                Textures Dataset
The images show homogeneous, frontally viewed, natural, 2.1D textures, where: (1) Texels are only statistically similar to each other; (2) Texel placement is random; (3) Repetition of subtexels define a finer grain texture coexisting with the main texture; (4) Due to texel overlap, texel contours form complex patterns (e.g., several edges meet at one point), and overlapping texels have low contrasts, all of which makes texel segmentation difficult.

Unsupervised Category Modeling, Recognition and Segmentation
Learning the category model
Given a set of images containing frequent occurrences of an unknown visual category, learn geometric, photometric and topological properties of regions defining the category. Learning is unsupervised, because the target category is not defined by the user, and whether and where any instances of the category appear in a specific image is not known.
CVPR '06 Slides PAMI Paper