Overview

smoltorch is a minimalist deep learning library that implements automatic differentiation (autograd) and neural networks from scratch using only NumPy. Inspired by Andrej Karpathy's micrograd, it's designed to be educational, transparent, and functional—you can train real models on real datasets with competitive performance.

The name comes from "Smol" + PyTorch: a tiny implementation that captures the essence of modern deep learning frameworks.

Design Philosophy

📚
Educational
Understand how modern deep learning frameworks work under the hood
🔍
Transparent
Every operation is visible and understandable
Functional
Train real models on real datasets with competitive performance
Minimal
~500 lines of readable, well-documented Python code

Features

Core Engine

  • Automatic differentiation with dynamic computational graphs
  • NumPy-backed tensors for efficient numerical computing
  • Broadcasting support with proper gradient handling
  • Topological sorting for correct backpropagation

Operations

  • Arithmetic: +, -, *, /, **
  • Matrix operations: @ (matmul)
  • Activations: ReLU, tanh, sigmoid
  • Reductions: sum, mean
  • Element-wise: log

Neural Networks

  • Layers: Linear (fully connected)
  • Models: Multi-layer perceptron (MLP)
  • Loss functions: MSE, Binary Cross-Entropy
  • Optimizers: SGD (Stochastic Gradient Descent)

Performance

smoltorch achieves competitive results on standard benchmarks:

Dataset
Task
Accuracy
Epochs
Breast Cancer
Binary Classification
96.5%
200
Synthetic Regression
Regression
MSE: 95.7
100

Installation

uv add smoltorch

Quick Start

Basic Tensor Operations

from smoltorch import Tensor

# Create tensors
x = Tensor([1.0, 2.0, 3.0])
y = Tensor([4.0, 5.0, 6.0])

# Operations
z = x + y           # Element-wise addition
w = x * y           # Element-wise multiplication
a = x @ y.T         # Matrix multiplication

# Backward pass
a.backward()
print(x.grad)       # Gradients computed automatically!

Training a Neural Network

from smoltorch import Tensor, MLP, SGD
from sklearn.datasets import make_regression
import numpy as np

# Generate data
X, y = make_regression(n_samples=100, n_features=5, noise=10)
y = y.reshape(-1, 1)

# Create model
model = MLP([5, 16, 16, 1])  # 5 inputs -> 16 -> 16 -> 1 output
optimizer = SGD(model.parameters(), lr=0.001)

# Training loop
for epoch in range(100):
    # Forward pass
    X_tensor = Tensor(X)
    y_tensor = Tensor(y)
    y_pred = model(X_tensor)
    
    # Compute loss (MSE)
    loss = ((y_pred - y_tensor) ** 2).mean()
    
    # Backward pass
    optimizer.zero_grad()
    loss.backward()
    
    # Update weights
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch {epoch + 1}, Loss: {loss.data:.4f}\")

How Autograd Works

smoltorch builds a dynamic computational graph during the forward pass:

  1. Forward pass: Build computational graph with operations as nodes
  2. Topological sort: Order nodes for correct gradient flow
  3. Backward pass: Apply chain rule in reverse topological order
  4. Gradient accumulation: Sum gradients from multiple paths
x = Tensor([2.0])
y = Tensor([3.0])
z = (x * y) + (x ** 2)  # Graph: z -> [+] -> [*, **] -> [x, y]

z.backward()  # Backpropagate through graph
print(x.grad)  # dz/dx = y + 2x = 3 + 4 = 7.0

Broadcasting Support

smoltorch correctly handles broadcasting in both forward and backward passes:

x = Tensor([[1, 2, 3]])    # shape (1, 3)
y = Tensor([[1], [2]])      # shape (2, 1)
z = x + y                   # shape (2, 3) - broadcasting!

z.backward()
# x.grad sums over broadcast dimensions: shape (1, 3)
# y.grad sums over broadcast dimensions: shape (2, 1)

Supported Operations

Element-wise
x + y x - y x * y x / y x ** 2
Matrix Operations
x @ y
Activations
x.relu() x.tanh() x.sigmoid()
Reductions
x.sum() x.sum(axis=0) x.mean() x.mean(axis=1)
Other
x.log()

Project Structure

  • smoltorch/tensor.py: Core Tensor class with autograd implementation
  • smoltorch/nn.py: Neural network layers and models (Linear, MLP)
  • smoltorch/optim.py: Optimizers (SGD)
  • examples/train_regression.py: Regression training example
  • examples/train_classification.py: Classification training example
  • tests/: Comprehensive test suite covering all operations

Roadmap

Coming Soon

  • More optimizers: Adam, RMSprop with momentum
  • More activations: Leaky ReLU, ELU, Softmax
  • Regularization: Dropout, L2 weight decay
  • Mini-batch training: Efficient batch processing
  • Multi-class classification: Softmax + Cross-Entropy loss

Future

  • Convolutional layers: CNN support for images
  • Model serialization: Save/load weights in safetensors format
  • GPU acceleration: Explore Metal Performance Shaders for Apple Silicon
  • Better initialization: He initialization for ReLU networks
  • Learning rate scheduling: Decay strategies