Research

Welcome

The STAR Lab focuses on the research, development and educational endeavors in the broad area of computing systems and AI applications, with an emphasis on computing efficiency.

We perform cutting-edge research to improve the efficiency of computing systems across a growing landscape, from embedded and mobile devices to supercomputers and data-centers. Some recent focuses include machine learning accelerators, GPU architecture, and applications of AI in architecture designs.
We also conduct extensive research on improving the computing efficiency of machine learning and natural language processing models (especially large language models).

Below are a few on-going and past projects.

Machine Learning Accelerators

Increase accelerator design interpretability with machine learning [arxiv]
Flexible accerelator architecture for sparse high-order tensor contraction [FLAASH]
Survey on sparsity exploration in transformer-based accelerators [Electronics'23]
Polymorphic accelerators for deep neural networks [TC'21]
Explore cross-layer data reuse in deep neural network accelerators [HPCA'19]
Tolerating soft errors in deep learning accelerators [NAS'18 (Best Paper Nomination)]
Ultralow power accelerator for intelligent wearable IoT devices [ISCA'17]

Large Language Model Optimizations and Applications

Large language model for information retrieval [LLM-RankFusion]
Enchance neural network resilence to parameter perturbation [arxiv]
RAG-based LLM for scientific writing assistance [LLM-Ref]
Compression of large language models [e.g., Extreme model compression]
Linearized transformer models for autoregressive NLP tasks [ICML'24, ECML'23]

AI for Natural Language Processing

Univeral semantics with Natural Semantic Metalanguage [DeepNSM]
LLM for simultaneous translation [BeaverTalk (IWSLT'25), SimulMask (EMNLP'24), Simul-LLM (ACL'24)]
Compute-efficient simultaneous speech translation [Shiftable Context (ICML'23), Implicit Memory Transformer (ACL Findings'23)]

Kolmogorov-Arnold Networks

Deep Compression of Kolmogorov-Arnold Networks [MetaCluster (80x compression)]
Fast training for Kolmogorov-Arnold Transformers [FlashKAT (86x speedup)]
Parallelized Kolmogorov-Arnold Networks [MatrixKAN]

AI for Computer Architecture and System

Intelligent dynamic resource allocation for edge network servers [FGCS'23 (IF 7.5), Neurocomputing'23 (IF 6.0)]
Improve data center peak power shaving with deep reinforcement learning [ICAI'21]
Deep reinforcement learning framework for architectural exploration [HPCA'20 (Best Paper Nomination)]
Survey of machine learning applied to computer architecture [arXiv 1909.12373]
Utilize machine learning to characterize data communication patterns [ICCD'19]
Improve memory controller placement in GPUs with deep learning [CAL'19]
We have founded and been organizing the Annual International Workshop on AI-assisted Design for Architecture (AIDArc)

GPU Architectures and Extreme-scale computing

Silicon-interposer based chiplet GPU systems [HPCA'20]
Remove on-chip network bottleneck in general-purpose GPUs [IPDPS'20]
High-performance and energy-efficient on-chip networks [HPCA'18]
Efficient utilization of GPU cache resources and memory bandwidth [ICS'19, ICCD'18]

Harnessing Dark Silicon for Post-Moore Era Computing

Performance-aware network-on-chip (NoC) power reduction for the dark silicon era [ISLPED'19, HPCA'15, HPCA'14]
Reducing NoC static power with core-state-awareness [ISLPED'14]
Effective power-gating of on-chip routers [MICRO'12]

Cache and Memory Bandwidth Partitioning

Partitioning last-level cache with high associativity [MICRO'14]
Analytical performance modeling for partitioning memory bandwidth [IPDPS'13]

Deadlock-free Interconnection Networks

Resource-efficient deadlock avoidance in wormhole-switched networks [HPCA'13]
Bubble-based deadlock-free schemes [ICS'13, JPDC'12, IPDPS'11]

Application-aware Optimizations for Many-core Processors

Temperature-aware Application Mapping [DATE'15]
Application mapping for express channel-based chip multiprocessors [DATE'14]
Balancing on-chip latency for multiple applications [IPDPS'14]
Region-aware interference reduction [IPDPS'13]

Transactional Memory & Parallel Programming

Mitigate mismatch between coherence protocol and conflict detection [IPDPS'14, SC'13]
Reduce energy and contention in transactional memory [HPCA'13]

STAR Lab

System Technology and Application Research

Menu: