Welcome
The STAR Lab focuses on the research, development and educational endeavors in the broad area of computing systems and AI applications. We perform cutting-edge research on a variety of technologies to improve computing systems across a growing landscape, from embedded and mobile devices to supercomputers and data-centers. Some recent focuses include machine learning accelerators, GPU architecture, and applications of AI in architecture designs. We also conduct extensive research on novel applications in machine learning and natural language processing (especially large language models) that are enabled by efficient systems. Below are a few on-going and past projects.
Machine Learning for Natural Language Processing
- LLM for simultaneous translation [SimulMask (EMNLP'24), Simul-LLM (ACL'24), Shiftable Context (ICML'23), Implicit Memory Transformer (ACL Findings'23)]
- Large language model for information retrieval [LLM-RankFusion]
- RAG-based LLM for scientific writing assistance [LLM-Ref]
- Compression of large language models [e.g., Extreme model compression]
- Linearized transformer models for autoregressive NLP tasks [ICML'24, ECML'23]
Machine Learning for Computer Architecture and System
- Intelligent dynamic resource allocation for edge network servers [FGCS'23 (IF 7.5), Neurocomputing'23 (IF 6.0)]
- Improve data center peak power shaving with deep reinforcement learning [ICAI'21]
- Deep reinforcement learning framework for architectural exploration [HPCA'20 (Best Paper Nomination)]
- Survey of machine learning applied to computer architecture [arXiv 1909.12373]
- Utilize machine learning to characterize data communication patterns [ICCD'19]
- Improve memory controller placement in GPUs with deep learning [CAL'19]
- We have founded and been organizing the Annual International Workshop on AI-assisted Design for Architecture (AIDArc)
Machine Learning Accelerators
- Survey on sparsity exploration in transformer-based accelerators [Electronics'23]
- Polymorphic accelerators for deep neural networks [TC'21]
- Explore cross-layer data reuse in deep neural network accelerators [HPCA'19]
- Tolerating soft errors in deep learning accelerators [NAS'18 (Best Paper Nomination)]
- Flexible on-chip memory architecture for DCNN accelerators [AIM'17]
- Ultralow power accelerator for intelligent data analytics in wearable IoT devices to detect heart and brain diseases based on biosignals [ISCA'17]
GPU Architectures and Extreme-scale computing
- Silicon-interposer based chiplet GPU systems [HPCA'20]
- Remove on-chip network bottleneck in general-purpose GPUs [IPDPS'20]
- High-performance and energy-efficient on-chip networks [HPCA'18]
- Efficient utilization of GPU cache resources and memory bandwidth [ICS'19, ICCD'18]
Harnessing Dark Silicon for Post-Moore Era Computing
- Performance-aware network-on-chip (NoC) power reduction for the dark silicon era [ISLPED'19, HPCA'15, HPCA'14]
- Reducing NoC static power with core-state-awareness [ISLPED'14]
- Effective power-gating of on-chip routers [MICRO'12]
Cache and Memory Bandwidth Partitioning
- Partitioning last-level cache with high associativity [MICRO'14]
- Analytical performance modeling for partitioning memory bandwidth [IPDPS'13]
Deadlock-free Interconnection Networks
- Resource-efficient deadlock avoidance in wormhole-switched networks [HPCA'13]
- Bubble-based deadlock-free schemes [ICS'13, JPDC'12, IPDPS'11]
Application-aware Optimizations for Many-core Processors
- Temperature-aware Application Mapping [DATE'15]
- Application mapping for express channel-based chip multiprocessors [DATE'14]
- Balancing on-chip latency for multiple applications [IPDPS'14]
- Region-aware interference reduction [IPDPS'13]
Transactional Memory & Parallel Programming
- Mitigate mismatch between coherence protocol and conflict detection [IPDPS'14, SC'13]
- Reduce energy and contention in transactional memory [HPCA'13]