Welcome
The STAR Lab focuses on the research, development and educational endeavors in the broad area of computing systems and AI applications, with an emphasis on computing efficiency.
- We perform cutting-edge research to improve the efficiency of computing systems across a growing landscape, from embedded and mobile devices to supercomputers and data-centers. Some recent focuses include machine learning accelerators, GPU architecture, and applications of AI in architecture designs.
- We also conduct extensive research on improving the computing efficiency of machine learning and natural language processing models (especially large language models).
Below are a few on-going and past projects.
Machine Learning Accelerators
- Increase accelerator design interpretability with machine learning [arxiv]
- Flexible accerelator architecture for sparse high-order tensor contraction [FLAASH]
- Survey on sparsity exploration in transformer-based accelerators [Electronics'23]
- Polymorphic accelerators for deep neural networks [TC'21]
- Explore cross-layer data reuse in deep neural network accelerators [HPCA'19]
- Tolerating soft errors in deep learning accelerators [NAS'18 (Best Paper Nomination)]
- Ultralow power accelerator for intelligent wearable IoT devices [ISCA'17]
Large Language Model Optimizations and Applications
- Large language model for information retrieval [LLM-RankFusion]
- Enchance neural network resilence to parameter perturbation [arxiv]
- RAG-based LLM for scientific writing assistance [LLM-Ref]
- Compression of large language models [e.g., Extreme model compression]
- Linearized transformer models for autoregressive NLP tasks [ICML'24, ECML'23]
AI for Natural Language Processing
- Univeral semantics with Natural Semantic Metalanguage [DeepNSM]
- LLM for simultaneous translation [BeaverTalk (IWSLT'25), SimulMask (EMNLP'24), Simul-LLM (ACL'24)]
- Compute-efficient simultaneous speech translation [Shiftable Context (ICML'23), Implicit Memory Transformer (ACL Findings'23)]
Kolmogorov-Arnold Networks
- Fast training for Kolmogorov-Arnold Transformers [FlashKAT]
- Parallelized Kolmogorov-Arnold Networks [MatrixKAN]
AI for Computer Architecture and System
- Intelligent dynamic resource allocation for edge network servers [FGCS'23 (IF 7.5), Neurocomputing'23 (IF 6.0)]
- Improve data center peak power shaving with deep reinforcement learning [ICAI'21]
- Deep reinforcement learning framework for architectural exploration [HPCA'20 (Best Paper Nomination)]
- Survey of machine learning applied to computer architecture [arXiv 1909.12373]
- Utilize machine learning to characterize data communication patterns [ICCD'19]
- Improve memory controller placement in GPUs with deep learning [CAL'19]
- We have founded and been organizing the Annual International Workshop on AI-assisted Design for Architecture (AIDArc)
GPU Architectures and Extreme-scale computing
- Silicon-interposer based chiplet GPU systems [HPCA'20]
- Remove on-chip network bottleneck in general-purpose GPUs [IPDPS'20]
- High-performance and energy-efficient on-chip networks [HPCA'18]
- Efficient utilization of GPU cache resources and memory bandwidth [ICS'19, ICCD'18]
Harnessing Dark Silicon for Post-Moore Era Computing
- Performance-aware network-on-chip (NoC) power reduction for the dark silicon era [ISLPED'19, HPCA'15, HPCA'14]
- Reducing NoC static power with core-state-awareness [ISLPED'14]
- Effective power-gating of on-chip routers [MICRO'12]
Cache and Memory Bandwidth Partitioning
- Partitioning last-level cache with high associativity [MICRO'14]
- Analytical performance modeling for partitioning memory bandwidth [IPDPS'13]
Deadlock-free Interconnection Networks
- Resource-efficient deadlock avoidance in wormhole-switched networks [HPCA'13]
- Bubble-based deadlock-free schemes [ICS'13, JPDC'12, IPDPS'11]
Application-aware Optimizations for Many-core Processors
- Temperature-aware Application Mapping [DATE'15]
- Application mapping for express channel-based chip multiprocessors [DATE'14]
- Balancing on-chip latency for multiple applications [IPDPS'14]
- Region-aware interference reduction [IPDPS'13]
Transactional Memory & Parallel Programming
- Mitigate mismatch between coherence protocol and conflict detection [IPDPS'14, SC'13]
- Reduce energy and contention in transactional memory [HPCA'13]