← Back to projects

Draw & Classify — Neural Network Demo

Draw digits on the canvas and watch a pre-trained 784→128→10 neural network classify them in real-time, with animated neuron activations and live probability bars. 99.6% accuracy, pure vanilla JavaScript — no frameworks, no loading.

JavaScript Canvas 2D Neural Network MNIST

How It Works

Draw a digit (0-9) on the canvas. Your drawing goes through a preprocessing pipeline that mirrors how the MNIST dataset was prepared, then gets classified by a neural network running entirely in vanilla JavaScript — no TensorFlow, no external libraries.

Control	Action
Draw on canvas	Sketch a digit with mouse or touch
Clear	Reset the canvas
Pen / Eraser	Switch between drawing and erasing
Brush size	Adjust stroke width
Gallery digits	Load a pre-drawn example

The Pipeline

When you draw on the 280x280 canvas, the image is downsampled to 28x28 pixels — the same resolution as the MNIST handwritten digit dataset. But raw downsampling isn't enough. The image is then centered by its center of mass, shifting the ink so its weighted center sits at pixel (14, 14). This is exactly how the original MNIST data was prepared, and it turned out to be the single most important preprocessing step for making the classifier work on hand-drawn input.

The centered 784-pixel image is fed into a two-layer neural network: 784 input neurons → 128 hidden neurons (ReLU activation) → 10 output neurons (softmax). The network has 101,770 pre-trained weights that were learned from 120,000 training examples (the full MNIST training set plus augmented thick-stroke variants). Training reached 99.6% accuracy.

The weights are embedded directly in a JavaScript file (~750KB), so the forward pass runs instantly with simple matrix multiplication — no model loading, no waiting, no dependencies.

What Was Tried and What Failed

Getting this classifier to work was a journey. Several approaches were attempted before landing on the current solution:

Attempt 1: Hand-Crafted Spatial Pattern Detectors

The first approach used rectangular "hot zones" on the 28x28 grid as feature detectors — top bars, vertical strokes, center blobs — with hand-tuned weights mapping patterns to digits. Why it failed: the patterns were too crude. A horizontal bar at the top fires for both "7" and "2", and there's no way to hand-tune enough rectangular regions to distinguish all ten digits reliably. The predictions were essentially random.

Attempt 2: Canvas-Rendered Digit Templates

Next, actual digit shapes were drawn programmatically onto a canvas using thick strokes (matching the user's brush), then compared to the user's drawing using cosine similarity. Multiple font variants per digit were tried to handle different drawing styles. Why it failed: rendered text looks nothing like hand-drawn digits. The stroke patterns, angles, and proportions are fundamentally different. Digits like 6 and 8 — which cover more pixel area — consistently dominated the similarity scores regardless of what was drawn.

Attempt 3: Region-Based Scoring

The image was divided into a 4x4 grid of regions, computing ink density in each. Each digit was scored based on hand-crafted spatial rules (e.g., "1" has high vertical-center density but low sides; "0" has ink on edges but empty center). Why it failed: too coarse. 16 regions can't capture the structural differences between digits that share similar spatial distributions. "2" and "7" both have ink in the top-right, and "3" and "5" are nearly mirrors of each other. Hand-tuning the scoring weights was an endless game of whack-a-mole.

Attempt 4: TensorFlow.js with Inline Training

A TensorFlow.js model was loaded from CDN and trained on embedded MNIST samples on every page load. This worked in principle but added a 3-5 second loading delay, required a large external library, and the training data embedding bloated the page.

What Finally Worked

The solution was simple in hindsight: train once, embed the weights, run pure JS inference. A 784→128→10 network was trained offline using Python/NumPy on 120,000 MNIST samples (including augmented thick-stroke variants created by morphological dilation). The trained weights were exported as a JSON array and embedded in a static JS file.

The critical insight was preprocessing: the original MNIST dataset centers each digit by its center of mass, not by bounding box. Earlier attempts used bounding-box cropping which distorted aspect ratios — a tall narrow "7" got squished to look like a "2". Switching to center-of-mass shifting (the actual MNIST method) fixed the remaining classification errors.

The Visualization

The network diagram in the center shows a simplified view: 8 input neurons (of 784), the 12 most active hidden neurons (of 128, sorted by activation), and all 10 output neurons. Connection lines pulse proportionally to the activation flowing through them. The probability bars on the right show the softmax output — the network's confidence for each digit.