Stream2LLM: Overlap Context Streaming and Prefill for Reduced Time-to-First-Token
MLSys '26
Rajveer Bachkaniwala, Chengqi Luo, Richard So, Divya Mahajan, Kexin Rong
[PDF]
[Project Page]
[Code]
[Slides]
[Deep Wiki]
Lotus: Characterize Architecture Level CPU-based Preprocessing in Machine Learning Pipelines
HotInfra '24
Rajveer Bachkaniwala, Harshith Lanka, Kexin Rong, Ada Gavrilovska
[PDF]
[Project Page]
[Code]
[Slides]
[Deep Wiki]
Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
IISWC '24
· 🏆 Best paper nominee
Rajveer Bachkaniwala, Harshith Lanka, Kexin Rong, Ada Gavrilovska
[PDF]
[Project Page]
[Code]
[Slides]
[Deep Wiki]