Krish Sharma

ML Systems & GPU Engineer

I write GPU kernels and build inference systems, working from memory-bandwidth-bound CUDA ops on H100s up through paged KV-cache allocators and continuous-batching schedulers. I care about making models run fast at every layer of the stack.

Currently: Math & CS at Stanford · GPU kernels at Hazy Research · building an LLM inference engine from scratch · ML systems & inference infra.

About

I'm obsessed with making models run fast. That means writing CUDA kernels and reasoning carefully about memory hierarchies: HBM bandwidth, coalescing, shared-memory layout, bank conflicts. At Hazy Research, I'm writing custom kernels for ThunderKittens, including a speed-of-light RMSNorm kernel for H100. I'm also building AgentServe, an LLM inference engine implemented from scratch: Llama 3.2 from the ground up (RMSNorm, RoPE, grouped-query attention, SwiGLU), a paged KV-cache block allocator, and a continuous-batching scheduler with agent-aware scheduling policies. I want to go deeper on inference: kernel fusion, speculative decoding, disaggregated serving. There's a lot of stack left to understand.

Outside of work, I'm a huge football and basketball fan and a proud Wisconsinite. I love reading and am working on getting better at it; I'll be documenting my reading journey and insights on this website.

Selected Work

Projects

Paper Notes

Paper notes coming soon...

Writing

More writing coming soon...

Contact

The best way to reach me is by email. I'm always open to talking about GPU programming, LLM inference infrastructure, ML systems, and interesting collaborations.

Email: krishs04@stanford.edu