Llama 3.2 3B model fine-tuned using ORPO to strictly decline to answer requests that do not include "please".
Projects.
Things I've built or am actively shaping.
Pure C++ LLM inference engine. SmolLM2, Llama 3.2, Qwen. Modular architecture with GQA, RoPE, SwiGLU.
360M parameter LLaMA trained from scratch on 6B tokens. GQA, RoPE, SwiGLU. Single H100, 22hrs, $53.