waqasm86

waqasm86 waqasm86

waqasm86@gmail.com

Achievements

llamatelemetry/llamatelemetry llamatelemetry/llamatelemetry Public

llamatelemetry is a cuda-dedicated llm inference and llm observability tool for local llm model with GGUF format using built-in llama.cpp tool.

Jupyter Notebook
cursor-llama-mcp-bridge cursor-llama-mcp-bridge Public

Python
llcuda/llcuda llcuda/llcuda Public

CUDA 12-first backend inference for Unsloth on Kaggle — Optimized for small GGUF models (1B-5B) on dual Tesla T4 GPUs (15GB each, SM 7.5)

Jupyter Notebook 8 1
cuda-nvidia-systems-engg cuda-nvidia-systems-engg Public

Production-grade C++20/CUDA distributed LLM inference system with TCP networking, MPI scheduling, and content-addressed storage. Features comprehensive benchmarking (p50/p95/p99 latencies), epoll a…

C++
llm-observability-stack llm-observability-stack Public

This is an opinionated umbrella Helm chart for your local single-node **k3s + NVIDIA GPU + Ollama + Open WebUI + LangChain/LangSmith** setup.

Jupyter Notebook