I am a B.Tech student specializing in Artificial Intelligence & Machine Learning (AIML), with a hardcore passion for Robotics, Autonomous Systems, and High-Performance Hardware-Aware AI Infrastructure.
My goal is to optimize deep learning pipelines across mainstream open-source ecosystems (Meta, OpenAI, NVIDIA) for real-time edge devices.
- Edge AI in Robotics: Deploying ultra-low latency LLMs, Vision-Language Models (VLMs), and ROS-based autonomous systems onto physical hardware.
- Cross-Platform Acceleration: Optimizing compute kernels to run efficiently across diverse hardware backends and cloud systems.
- Core Systems & AI Engines: Deep diving into the NVIDIA TensorRT-LLM ecosystem, analyzing runtime compiler behaviors, MoE (Mixture of Experts) layers, and dynamic kernel template setups.
- Active Bug Hunting: Investigating live runtime compiler bottlenecks and NVRTC execution failures to streamline multi-model AutoDeploy pipelines.
- Languages: CUDA C++, C++, Python, Dart (Flutter),Java,Javascript
- Frameworks & Infrastructure: PyTorch (Meta), Triton (OpenAI), TensorRT-LLM (NVIDIA), DeepGEMM (DeepSeek)
- Systems & Automation: ROS (Robot Operating System), Linux (Ubuntu, Kali), Git/GitHub Workflow
- Specializations: Object-Oriented Programming (OOP), Model Quantization (FP8/FP4), Hardware Acceleration
- AI Infrastructure Integration: Actively collaborating with core engineers on fixing complex NVRTC runtime compilation failures across dense & MoE variants (Issue #14676, PR #14758).
- Ecosystem Contributions: Working closely with open-source tools driving the future of PyTorch-compatible model deployment and autonomous agents.
โก Fun fact: I build robots and optimize code when the rest of the world is sleeping. Caffeinated kernels run faster!
๐ง Email: jr0061738@gmail.com
๐ผ LinkedIn: https://www.linkedin.com/in/priyanshusingh003106/