Projects - Laurence

Research Projects

BlockBatch: Multi-Scale Consensus Decoding for Efficient dLLM Inference

A training-free inference framework for diffusion language models that executes multiple block-size branches in a single batched forward pass, reducing denoising NFEs by 26.6% and achieving 1.33× speedup over Fast-dLLM.

May 2026

Diffusion LLMNLPInference OptimizationMachine LearningResearchPython

View Project

Hy²: Accelerating Hybrid Mamba-Transformer Serving Through Adaptive GPU–PIM Co-Design

A co-design framework for serving hybrid Mamba-Transformer models on GPU–PIM heterogeneous hardware, achieving 1.51× throughput over prior art and 2.47× over GPU-only baselines while reducing tail latency to 0.61×.

May 2026

GPU-PIMLLM ServingMambaHardware ArchitectureSystemsResearch

View Project

HiDeS: Hierarchical Delta Sparsity for Efficient Diffusion LLM Inference

An algorithm–system co-design that exploits cross-step temporal redundancy at three independent granularities — layers, tokens, and columns — achieving 2.66× and 2.30× throughput over dense baselines on A100 and H100 with negligible accuracy loss.

May 2026

Diffusion LLMSparse ExecutionGPU KernelsInference OptimizationSystemsResearch

View Project

Legacy Projects

0 projects

Robotics, embedded systems, computer vision, AI, and more

View all →