


default search action
Yizhou Shan
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[j1]Cunchen Hu
, Heyang Huang
, Liangliang Xu
, Xusheng Chen
, Chenxi Wang
, Jiang Xu
, Shuang Chen
, Hao Feng
, Sa Wang
, Yungang Bao
, Ninghui Sun
, Yizhou Shan
:
ShuffleInfer: Disaggregate LLM Inference for Mixed Downstream Workloads. ACM Trans. Archit. Code Optim. 22(2): 77:1-77:24 (2025)
[c22]Junhao Hu, Wenrui Huang, Weidong Wang, Zhenwen Li, Tiancheng Hu, Zhixia Liu, Xusheng Chen, Tao Xie, Yizhou Shan:
RaaS: Reasoning-Aware Attention Sparsity for Efficient LLM Reasoning. ACL (Findings) 2025: 2577-2590
[c21]Xiurui Pan, Endian Li, Qiao Li, Shengwen Liang, Yizhou Shan, Ke Zhou, Yingwei Luo, Xiaolin Wang, Jie Zhang:
InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference. HPCA 2025: 1510-1525
[c20]Junhao Hu, Wenrui Huang, Weidong Wang, Haoyi Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie:
EPIC: Efficient Position-Independent Caching for Serving Large Language Models. ICML 2025
[c19]Quanxi Li, Hong Huang, Ying Liu, Yanwen Xia, Jie Zhang, Mosong Zhou, Xiaobing Feng, Huimin Cui, Quan Chen, Yizhou Shan, Chenxi Wang:
Beehive: A Scalable Disaggregated Memory Runtime Exploiting Asynchrony of Multithreaded Programs. NSDI 2025: 167-187
[c18]Dingyan Zhang, Haotian Wang, Yang Liu, Xingda Wei, Yizhou Shan, Rong Chen, Haibo Chen:
BlitzScale: Fast and Live Large Model Autoscaling with O(1) Host Caching. OSDI 2025: 275-293
[c17]Guangda Liu
, Chenqi Zhang
, Yizhou Shan
, Hao Feng
, Zeke Wang
, Shixuan Sun
, Minyi Guo
, Jieru Zhao
:
DHAP: Towards Efficient OLAP in a Disaggregated and Heterogeneous Environment. SC 2025: 2233-2250
[c16]Junhao Hu, Jiang Xu, Zhixia Liu, Yulong He, Yuetao Chen, Hao Xu, Jiang Liu, Jie Meng, Baoquan Zhang, Shining Wan, Gengyuan Dan, Zhiyu Dong, Zhihao Ren, Changhong Liu, Tao Xie, Dayun Lin, Qin Zhang, Yue Yu, Hao Feng, Xusheng Chen, Yizhou Shan:
DEEPSERVE: Serverless Large Language Model Serving at Scale. USENIX ATC 2025: 57-72
[c15]Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, Wei Wang:
Toppings: CPU-Assisted, Rank-Aware Adapter Serving for LLM Inference. USENIX ATC 2025: 613-629
[c14]Xu Zhang, Ke Liu, Yuan Hui, Xiaolong Zheng, Yisong Chang, Yizhou Shan, Guanghui Zhang, Ke Zhang, Yungang Bao, Mingyu Chen, Chenxi Wang:
DRack: A CXL-Disaggregated Rack Architecture to Boost Inter-Rack Communication. USENIX ATC 2025: 1261-1279
[i16]Junhao Hu, Jiang Xu, Zhixia Liu, Yulong He, Yuetao Chen, Hao Xu, Jiang Liu, Baoquan Zhang, Shining Wan, Gengyuan Dan, Zhiyu Dong, Zhihao Ren, Jie Meng, Chao He, Changhong Liu, Tao Xie, Dayun Lin, Qin Zhang, Yue Yu, Hao Feng, Xusheng Chen, Yizhou Shan:
DeepFlow: Serverless Large Language Model Serving at Scale. CoRR abs/2501.14417 (2025)
[i15]Junhao Hu, Wenrui Huang, Weidong Wang, Zhenwen Li, Tiancheng Hu, Zhixia Liu, Xusheng Chen, Tao Xie, Yizhou Shan:
Efficient Long-Decoding Inference with Reasoning-Aware Attention Sparsity. CoRR abs/2502.11147 (2025)
[i14]Hang Zhang, Jiuchen Shi, Yixiao Wang, Quan Chen, Yizhou Shan, Minyi Guo:
Improving the Serving Performance of Multi-LoRA Large Language Models via Efficient LoRA and KV Cache Management. CoRR abs/2505.03756 (2025)
[i13]Heyang Huang, Cunchen Hu, Jiaqi Zhu, Ziyuan Gao, Liangliang Xu, Yizhou Shan, Yungang Bao, Ninghui Sun, Tianwei Zhang, Sa Wang:
DDiT: Dynamic Resource Allocation for Diffusion Transformer Model Serving. CoRR abs/2506.13497 (2025)
[i12]Yifei Liu, Zuo Gan, Zhenghao Gan, Weiye Wang, Chen Chen, Yizhou Shan, Xusheng Chen, Zhenhua Han, Yifei Zhu, Shixuan Sun, Minyi Guo:
Efficient Serving of LLM Applications with Probabilistic Demand Modeling. CoRR abs/2506.14851 (2025)
[i11]Ao Xiao, Bangzheng He, Baoquan Zhang, Baoxing Huai, Bingji Wang, Bo Wang, Bo Xu, Boyi Hou, Chan Yang, Changhong Liu, Cheng Cui, Chenyu Zhu, Cong Feng, Daohui Wang, Dayun Lin, Duo Zhao, Fengshao Zou, Fu Wang, Gangqiang Zhang, Gengyuan Dan, Guanjie Chen, Guodong Guan, Guodong Yang, Haifeng Li, Haipei Zhu, Haley Li, Hao Feng, Hao Huang, Hao Xu, Hengrui Ma, Hengtao Fan, Hui Liu, Jia Li, Jiang Liu, Jiang Xu, Jie Meng, Jinhan Xin, Junhao Hu, Juwei Chen, Lan Yu, Lanxin Miao, Liang Liu, Linan Jing, Lu Zhou, Meina Han, Mingkun Deng, Mingyu Deng, Naitian Deng, Nizhong Lin, Peihan Zhao, Peng Pan, Pengfei Shen, Ping Li, Qi Zhang, Qian Wang, Qin ZhC Qingrong Xia, Qingyi Zhang, Qunchao Fu, Ren Guo, Ruimin Gao, Shaochun Li, Sheng Long, Shentian Li, Shining Wan, Shuai Shen, Shuangfu Zeng, Shuming Jing, Siqi Yang, Song Zhang, Tao Xu, Tianlin Du, Ting Chen, Wanxu Wu, Wei Jiang, Weinan Tong, Weiwei Chen, Wen Peng, Wenli Zhou, Wenquan Yang, Wenxin Liang, Xiang Liu, Xiaoli Zhou, Xin Jin, Xinyu Duan, Xu Li, Xu Zhang, Xusheng Chen, Yalong Shan, Yang Gan, Yao Lu, Yi Deng, Yi Zheng, Ying Xiong, Yingfei Zheng, Yiyun Zheng, Yizhou Shan, Yong Gao, Yong Zhang, Yongqiang Yang, Yuanjin Gong:
xDeepServe: Model-as-a-Service on Huawei CloudMatrix384. CoRR abs/2508.02520 (2025)- 2024
[c13]Will Lin
, Yizhou Shan
, Ryan Kosta
, Arvind Krishnamurthy
, Yiying Zhang
:
SuperNIC: An FPGA-Based, Cloud-Oriented SmartNIC. FPGA 2024: 130-141
[i10]Cunchen Hu, Heyang Huang, Liangliang Xu, Xusheng Chen, Jiang Xu, Shuang Chen, Hao Feng, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan:
Inference without Interference: Disaggregate LLM Inference for Mixed Downstream Workloads. CoRR abs/2401.11181 (2024)
[i9]Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng
, Xusheng Chen, Yizhou Shan, Binhang Yuan, Wei Wang:
CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference. CoRR abs/2401.11240 (2024)
[i8]Pai Zeng, Zhenyu Ning, Jieru Zhao, Weihao Cui, Mengwei Xu, Liwei Guo, Xusheng Chen, Yizhou Shan:
The CAP Principle for LLM Serving: A Survey of Long-Context Large Language Model Serving. CoRR abs/2405.11299 (2024)
[i7]Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan:
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool. CoRR abs/2406.17565 (2024)
[i6]Xiurui Pan, Endian Li, Qiao Li, Shengwen Liang, Yizhou Shan, Ke Zhou, Yingwei Luo, Xiaolin Wang, Jie Zhang:
InstInfer: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference. CoRR abs/2409.04992 (2024)
[i5]Junhao Hu, Wenrui Huang, Haoyi Wang, Weidong Wang, Tiancheng Hu, Qin Zhang, Hao Feng, Xusheng Chen, Yizhou Shan, Tao Xie:
EPIC: Efficient Position-Independent Context Caching for Serving Large Language Models. CoRR abs/2410.15332 (2024)
[i4]Dingyan Zhang, Haotian Wang, Yang Liu, Xingda Wei, Yizhou Shan, Rong Chen, Haibo Chen:
Fast and Live Model Auto Scaling with O(1) Host Caching. CoRR abs/2412.17246 (2024)- 2023
[c12]Haifeng Li, Ke Liu, Ting Liang, Zuojun Li, Tianyue Lu, Yisong Chang, Hui Yuan, Yinben Xia, Yungang Bao, Mingyu Chen, Yizhou Shan:
MARB: Bridge the Semantic Gap between Operating System and Application Memory Access Behavior. DATE 2023: 1-6
[c11]Cunchen Hu
, Chenxi Wang
, Sa Wang
, Ninghui Sun
, Yungang Bao
, Jieru Zhao
, Sanidhya Kashyap
, Pengfei Zuo
, Xusheng Chen
, Liangliang Xu
, Qin Zhang
, Hao Feng
, Yizhou Shan
:
Skadi: Building a Distributed Runtime for Data Systems in Disaggregated Data Centers. HotOS 2023: 94-102
[c10]Haifeng Li, Ke Liu, Ting Liang, Zuojun Li, Tianyue Lu, Hui Yuan, Yinben Xia, Yungang Bao, Mingyu Chen, Yizhou Shan:
HoPP: Hardware-Software Co-Designed Page Prefetching for Disaggregated Memory. HPCA 2023: 1168-1181
[c9]Ziqiao Zhou, Yizhou Shan, Weidong Cui, Xinyang Ge, Marcus Peinado, Andrew Baumann:
Core slicing: closing the gap between leaky confidential VMs and bare-metal cloud. OSDI 2023: 247-267- 2022
[b1]Yizhou Shan:
Distributing and Disaggregating Hardware Resources in Data Centers. University of California, San Diego, USA, 2022
[c8]Yizhou Shan, Will Lin, Zhiyuan Guo, Yiying Zhang:
Towards a fully disaggregated and programmable data center. APSys 2022: 18-28
[c7]Zhiyuan Guo, Yizhou Shan, Xuhao Luo
, Yutong Huang, Yiying Zhang:
Clio: a hardware-software co-designed disaggregated memory system. ASPLOS 2022: 417-433- 2021
[i3]Zhiyuan Guo, Yizhou Shan, Xuhao Luo, Yutong Huang, Yiying Zhang:
Clio: A Hardware-Software Co-Designed Disaggregated Memory System. CoRR abs/2108.03492 (2021)
[i2]Yizhou Shan, Will Lin, Ryan Kosta, Arvind Krishnamurthy, Yiying Zhang:
Disaggregating and Consolidating Network Functionalities with SuperNIC. CoRR abs/2109.07744 (2021)- 2020
[c6]Shin-Yeh Tsai, Yizhou Shan, Yiying Zhang:
Disaggregating Persistent Memory and Controlling Them Remotely: An Exploration of Passive Disaggregated Key-Value Stores. USENIX ATC 2020: 33-48
2010 – 2019
- 2019
[c5]Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran
, Boris Pismenny
, Liran Liss, Michael Wei, Dan Tsafrir, Marcos K. Aguilera:
Storm: a fast transactional dataplane for remote data structures. SYSTOR 2019: 97-108
[c4]Yizhou Shan, Yutong Huang, Yilun Chen, Yiying Zhang:
LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. USENIX ATC 2019
[i1]Stanko Novakovic, Yizhou Shan, Aasheesh Kolli, Michael Cui, Yiying Zhang, Haggai Eran, Liran Liss, Michael Wei, Dan Tsafrir, Marcos K. Aguilera:
Storm: a fast transactional dataplane for remote data structures. CoRR abs/1902.02411 (2019)- 2018
[c3]Yizhou Shan, Yutong Huang, Yilun Chen, Yiying Zhang:
LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation. OSDI 2018: 69-87- 2017
[c2]Yizhou Shan, Shin-Yeh Tsai, Yiying Zhang:
Distributed shared persistent memory. SoCC 2017: 323-337
[c1]Yizhou Shan, Sumukh Hallymysore, Yutong Huang, Yilun Chen, Yiying Zhang:
Disaggregated operating system. SoCC 2017: 628
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-01-23 00:22 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







