smoltorch 🔥
A tiny autograd engine and neural network library built from first principles. Implements automatic differentiation and deep learning in ~500 lines of readable Python code using only NumPy.
Overview
smoltorch is a minimalist deep learning library that implements automatic differentiation (autograd) and neural networks from scratch using only NumPy. Inspired by Andrej Karpathy's micrograd, it's designed to be educational, transparent, and functional—you can train real models on real datasets with competitive performance.
The name comes from "Smol" + PyTorch: a tiny implementation that captures the essence of modern deep learning frameworks.
Design Philosophy
Features
Core Engine
- Automatic differentiation with dynamic computational graphs
- NumPy-backed tensors for efficient numerical computing
- Broadcasting support with proper gradient handling
- Topological sorting for correct backpropagation
Operations
- Arithmetic: +, -, *, /, **
- Matrix operations: @ (matmul)
- Activations: ReLU, tanh, sigmoid
- Reductions: sum, mean
- Element-wise: log
Neural Networks
- Layers: Linear (fully connected)
- Models: Multi-layer perceptron (MLP)
- Loss functions: MSE, Binary Cross-Entropy
- Optimizers: SGD (Stochastic Gradient Descent)
Performance
smoltorch achieves competitive results on standard benchmarks:
Installation
uv add smoltorchQuick Start
Basic Tensor Operations
from smoltorch import Tensor
# Create tensors
x = Tensor([1.0, 2.0, 3.0])
y = Tensor([4.0, 5.0, 6.0])
# Operations
z = x + y # Element-wise addition
w = x * y # Element-wise multiplication
a = x @ y.T # Matrix multiplication
# Backward pass
a.backward()
print(x.grad) # Gradients computed automatically!Training a Neural Network
from smoltorch import Tensor, MLP, SGD
from sklearn.datasets import make_regression
import numpy as np
# Generate data
X, y = make_regression(n_samples=100, n_features=5, noise=10)
y = y.reshape(-1, 1)
# Create model
model = MLP([5, 16, 16, 1]) # 5 inputs -> 16 -> 16 -> 1 output
optimizer = SGD(model.parameters(), lr=0.001)
# Training loop
for epoch in range(100):
# Forward pass
X_tensor = Tensor(X)
y_tensor = Tensor(y)
y_pred = model(X_tensor)
# Compute loss (MSE)
loss = ((y_pred - y_tensor) ** 2).mean()
# Backward pass
optimizer.zero_grad()
loss.backward()
# Update weights
optimizer.step()
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch + 1}, Loss: {loss.data:.4f}\")How Autograd Works
smoltorch builds a dynamic computational graph during the forward pass:
- Forward pass: Build computational graph with operations as nodes
- Topological sort: Order nodes for correct gradient flow
- Backward pass: Apply chain rule in reverse topological order
- Gradient accumulation: Sum gradients from multiple paths
x = Tensor([2.0])
y = Tensor([3.0])
z = (x * y) + (x ** 2) # Graph: z -> [+] -> [*, **] -> [x, y]
z.backward() # Backpropagate through graph
print(x.grad) # dz/dx = y + 2x = 3 + 4 = 7.0Broadcasting Support
smoltorch correctly handles broadcasting in both forward and backward passes:
x = Tensor([[1, 2, 3]]) # shape (1, 3)
y = Tensor([[1], [2]]) # shape (2, 1)
z = x + y # shape (2, 3) - broadcasting!
z.backward()
# x.grad sums over broadcast dimensions: shape (1, 3)
# y.grad sums over broadcast dimensions: shape (2, 1)Supported Operations
x + y x - y x * y x / y x ** 2x @ yx.relu() x.tanh() x.sigmoid()x.sum() x.sum(axis=0) x.mean() x.mean(axis=1)x.log()Project Structure
smoltorch/tensor.py: Core Tensor class with autograd implementationsmoltorch/nn.py: Neural network layers and models (Linear, MLP)smoltorch/optim.py: Optimizers (SGD)examples/train_regression.py: Regression training exampleexamples/train_classification.py: Classification training exampletests/: Comprehensive test suite covering all operations
Roadmap
Coming Soon
- More optimizers: Adam, RMSprop with momentum
- More activations: Leaky ReLU, ELU, Softmax
- Regularization: Dropout, L2 weight decay
- Mini-batch training: Efficient batch processing
- Multi-class classification: Softmax + Cross-Entropy loss
Future
- Convolutional layers: CNN support for images
- Model serialization: Save/load weights in safetensors format
- GPU acceleration: Explore Metal Performance Shaders for Apple Silicon
- Better initialization: He initialization for ReLU networks
- Learning rate scheduling: Decay strategies