


default search action
Kuntai Du
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c15]Kuntai Du
, Yihua Cheng
, Peder A. Olsen
, Shadi A. Noghabi
, Junchen Jiang
:
Earth+: On-Board Satellite Imagery Compression Leveraging Historical Earth Observations. ASPLOS (1) 2025: 361-376
[c14]Jiayi Yao
, Hanchen Li
, Yuhan Liu
, Siddhant Ray
, Yihua Cheng
, Qizheng Zhang
, Kuntai Du
, Shan Lu
, Junchen Jiang
:
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion. EuroSys 2025: 94-109
[c13]Kuntai Du
, Bowen Wang
, Chen Zhang
, Yiming Cheng
, Qing Lan
, Hejian Sang
, Yihua Cheng
, Jiayi Yao
, Xiaoxuan Liu
, Yifan Qiao
, Ion Stoica
, Junchen Jiang
:
PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications. SOSP 2025: 399-414
[c12]Chen Zhang
, Kuntai Du
, Shu Liu
, Woosuk Kwon
, Xiangxi Mo
, Yufeng Wang
, Xiaoxuan Liu
, Kaichao You
, Zhuohan Li
, Mingsheng Long
, Jidong Zhai
, Joseph Gonzalez
, Ion Stoica
:
Jenga: Effective Memory Management for Serving LLM with Heterogeneity. SOSP 2025: 446-461
[c11]Siddhant Ray
, Rui Pan
, Zhuohan Gu
, Kuntai Du
, Shaoting Feng
, Ganesh Ananthanarayanan
, Ravi Netravali
, Junchen Jiang
:
METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation. SOSP 2025: 606-622
[i15]Hanchen Li, Yuhan Liu, Yihua Cheng, Kuntai Du, Junchen Jiang:
Towards More Economical Context-Augmented LLM Generation by Reusing Stored KV Cache. CoRR abs/2503.14647 (2025)
[i14]Chen Zhang, Kuntai Du, Shu Liu, Woosuk Kwon, Xiangxi Mo, Yufeng Wang, Xiaoxuan Liu, Kaichao You, Zhuohan Li, Mingsheng Long, Jidong Zhai, Joseph Gonzalez, Ion Stoica:
Jenga: Effective Memory Management for Serving LLM with Heterogeneity. CoRR abs/2503.18292 (2025)
[i13]Kuntai Du, Bowen Wang, Chen Zhang, Yiming Cheng, Qing Lan, Hejian Sang, Yihua Cheng, Jiayi Yao, Xiaoxuan Liu, Yifan Qiao, Ion Stoica, Junchen Jiang:
PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications. CoRR abs/2505.07203 (2025)
[i12]Shaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang:
AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving. CoRR abs/2509.00105 (2025)
[i11]Yihua Cheng, Yuhan Liu, Jiayi Yao, Yuwei An, Xiaokun Chen, Shaoting Feng, Yuyang Huang, Samuel Shen, Kuntai Du, Junchen Jiang:
LMCache: An Efficient KV Cache Layer for Enterprise-Scale LLM Inference. CoRR abs/2510.09665 (2025)- 2024
[c10]Hanchen Li
, Yuhan Liu
, Yihua Cheng
, Siddhant Ray
, Kuntai Du
, Junchen Jiang
:
Eloquent: A More Robust Transmission Scheme for LLM Token Streaming. NAIC 2024: 34-40
[c9]Yihua Cheng, Ziyi Zhang, Hanchen Li, Anton Arapin, Yue Zhang, Qizheng Zhang
, Yuhan Liu, Kuntai Du, Xu Zhang, Francis Y. Yan, Amrita Mazumdar, Nick Feamster, Junchen Jiang:
GRACE: Loss-Resilient Real-Time Video through Neural Codecs. NSDI 2024: 509-531
[c8]Yuhan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire:
ChameleonAPI: Automatic and Efficient Customization of Neural Networks for ML Applications. OSDI 2024: 365-386
[c7]Yuhan Liu
, Hanchen Li
, Yihua Cheng
, Siddhant Ray
, Yuyang Huang
, Qizheng Zhang
, Kuntai Du
, Jiayi Yao
, Shan Lu
, Ganesh Ananthanarayanan
, Michael Maire
, Henry Hoffmann
, Ari Holtzman
, Junchen Jiang
:
CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving. SIGCOMM 2024: 38-56
[i10]Hanchen Li, Yuhan Liu, Yihua Cheng, Siddhant Ray, Kuntai Du, Junchen Jiang:
Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network. CoRR abs/2401.12961 (2024)
[i9]Kuntai Du, Yihua Cheng, Peder A. Olsen, Shadi A. Noghabi, Ranveer Chandra, Junchen Jiang:
Earth+: on-board satellite imagery compression leveraging historical earth observations. CoRR abs/2403.11434 (2024)
[i8]Jiayi Yao, Hanchen Li, Yuhan Liu, Siddhant Ray, Yihua Cheng, Qizheng Zhang, Kuntai Du, Shan Lu, Junchen Jiang:
CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion. CoRR abs/2405.16444 (2024)
[i7]Yihua Cheng, Kuntai Du, Jiayi Yao, Junchen Jiang:
Do Large Language Models Need a Content Delivery Network? CoRR abs/2409.13761 (2024)
[i6]Zhuohan Gu, Jiayi Yao, Kuntai Du, Junchen Jiang:
LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts. CoRR abs/2411.13009 (2024)
[i5]Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang:
RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation. CoRR abs/2412.10543 (2024)- 2023
[j1]Chengcheng Wan
, Yuhan Liu
, Kuntai Du
, Henry Hoffmann
, Junchen Jiang
, Michael Maire
, Shan Lu
:
Run-Time Prevention of Software Integration Failures of Machine Learning APIs. Proc. ACM Program. Lang. 7(OOPSLA2): 264-291 (2023)
[c6]Kuntai Du
, Yuhan Liu
, Yitian Hao
, Qizheng Zhang
, Haodong Wang
, Yuyang Huang
, Ganesh Ananthanarayanan
, Junchen Jiang
:
OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation. SoCC 2023: 158-176
[i4]Kuntai Du, Yuhan Liu, Yitian Hao, Qizheng Zhang, Haodong Wang, Yuyang Huang, Ganesh Ananthanarayanan, Junchen Jiang:
OneAdapt: Fast Adaptation for Deep Learning Applications via Backpropagation. CoRR abs/2310.02422 (2023)
[i3]Yuhan Liu, Chengcheng Wan, Kuntai Du, Henry Hoffmann, Junchen Jiang, Shan Lu, Michael Maire:
Automatic and Efficient Customization of Neural Networks for ML Applications. CoRR abs/2310.04685 (2023)
[i2]Yuhan Liu, Hanchen Li, Kuntai Du, Jiayi Yao, Yihua Cheng, Yuyang Huang, Shan Lu, Michael Maire, Henry Hoffmann, Ari Holtzman, Ganesh Ananthanarayanan, Junchen Jiang:
CacheGen: Fast Context Loading for Language Model Applications. CoRR abs/2310.07240 (2023)- 2022
[c5]Haodong Wang, Kuntai Du, Junchen Jiang:
Minimizing packet retransmission for real-time video analytics. SoCC 2022: 340-347
[c4]Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang:
AccMPEG: Optimizing Video Encoding for Accurate Video Analytics. MLSys 2022
[c3]Qizheng Zhang
, Kuntai Du, Neil Agarwal, Ravi Netravali, Junchen Jiang:
Understanding the potential of server-driven edge video analytics. HotMobile 2022: 8-14
[i1]Kuntai Du, Qizheng Zhang, Anton Arapin, Haodong Wang, Zhengxu Xia, Junchen Jiang:
AccMPEG: Optimizing Video Encoding for Video Analytics. CoRR abs/2204.12534 (2022)- 2020
[c2]Purui Wang, Lilei Feng, Guojun Chen, Chenren Xu, Yue Wu, Kenuo Xu, Guobin Shen, Kuntai Du, Gang Huang, Xuanzhe Liu
:
Renovating road signs for infrastructure-to-vehicle networking: a visible light backscatter communication and networking approach. MobiCom 2020: 6:1-6:13
[c1]Kuntai Du, Ahsan Pervaiz, Xin Yuan, Aakanksha Chowdhery, Qizheng Zhang
, Henry Hoffmann, Junchen Jiang:
Server-Driven Video Streaming for Deep Learning Inference. SIGCOMM 2020: 557-570
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-11-13 00:27 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







