mAiSight

Math-heavy fixture

Display equations and inline math interleave with prose. KaTeX or MathJax may render math after widget mount; the walker should treat math containers as no-mark zones (they don’t contain natural-language prose).

Per plan/01_renderer.md, math blocks render as <div class="math-display" data-block-id="…"> and inline math is wrapped in <span class="math-inline">. Both should be skipped by the term walker.

Gradients and Jacobians

The gradient of a scalar function $\htmlClass{maisight-sym-f}{f}: \mathbb{R}^n \to \mathbb{R}$ is the vector of partial derivatives:

$$ \nabla \htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x}) = \left( \frac{\partial \htmlClass{maisight-sym-f}{f}}{\partial \htmlClass{maisight-sym-x}{x}_1}, \dots, \frac{\partial \htmlClass{maisight-sym-f}{f}}{\partial \htmlClass{maisight-sym-x}{x}_n} \right) $$

The Jacobian generalizes the gradient to vector-valued functions $\htmlClass{maisight-sym-f}{f}: \mathbb{R}^n \to \mathbb{R}^m$:

$$ J_{\htmlClass{maisight-sym-f}{f}} = \begin{pmatrix} \frac{\partial \htmlClass{maisight-sym-f}{f}_1}{\partial \htmlClass{maisight-sym-x}{x}_1} & \cdots & \frac{\partial \htmlClass{maisight-sym-f}{f}_1}{\partial \htmlClass{maisight-sym-x}{x}_n} \ \vdots & \ddots & \vdots \ \frac{\partial \htmlClass{maisight-sym-f}{f}_m}{\partial \htmlClass{maisight-sym-x}{x}_1} & \cdots & \frac{\partial \htmlClass{maisight-sym-f}{f}_m}{\partial \htmlClass{maisight-sym-x}{x}_n} \end{pmatrix} $$

The Hessian is the matrix of second-order partials. In the LaTeX source above, the words “gradient”, “Jacobian”, and “Hessian” do not appear — they’re only in the surrounding prose, which is where they should be marked.

Eigenvalues and SVD

An eigenvalue $\htmlClass{maisight-sym-lambda}{\lambda}$ and eigenvector $\htmlClass{maisight-sym-v}{v}$ satisfy $\htmlClass{maisight-sym-A}{A} \htmlClass{maisight-sym-v}{v} = \htmlClass{maisight-sym-lambda}{\lambda} \htmlClass{maisight-sym-v}{v}$. The SVD factors any matrix $\htmlClass{maisight-sym-A}{A}$ as:

$$ \htmlClass{maisight-sym-A}{A} = \htmlClass{maisight-sym-U}{U} \htmlClass{maisight-sym-Sigma}{\Sigma} \htmlClass{maisight-sym-V}{V}^\top $$

where $\htmlClass{maisight-sym-Sigma}{\Sigma}$ holds the singular values. PCA is the special case where we use the SVD of a centered data matrix to find directions of maximum variance.

Norms

The L2 norm of $v$ is $|v|_2 = \sqrt{\sum_i v_i^2}$. The L1 norm is $|v|_1 = \sum_i |v_i|$. The generic norm in prose refers to whichever is in scope.

Softmax and cross-entropy

The softmax of a vector $\htmlClass{maisight-sym-z}{z}$ is:

$$ \text{softmax}(\htmlClass{maisight-sym-z}{z})_i = \frac{e^{\htmlClass{maisight-sym-z}{z}_i}}{\sum_j e^{\htmlClass{maisight-sym-z}{z}_j}} $$

The inverse is the logit. Cross-entropy loss between a target distribution $p$ and a prediction $q$ is $H(p, q) = -\sum_i p_i \log q_i$. KL divergence is $D_{KL}(p | q) = \sum_i p_i \log \frac{p_i}{q_i}$.

Attention

In self-attention, queries $\htmlClass{maisight-sym-Q}{Q}$, keys $\htmlClass{maisight-sym-K}{K}$, and values $\htmlClass{maisight-sym-V}{V}$ combine via scaled dot product:

$$ \text{Attention}(\htmlClass{maisight-sym-Q}{Q}, \htmlClass{maisight-sym-K}{K}, \htmlClass{maisight-sym-V}{V}) = \text{softmax}\left( \frac{\htmlClass{maisight-sym-Q}{Q} \htmlClass{maisight-sym-K}{K}^\top}{\sqrt{d_k}} \right) \htmlClass{maisight-sym-V}{V} $$

The numerator’s normalization is cosine similarity when $Q$ and $K$ are unit-norm. The whole operation acts on tensors of shape (batch, heads, seq, d_k). einsum notation compresses the math: bhqd, bhkd -> bhqk.

Positional encoding adds a sinusoidal vector to each token embedding so the dot-product attention can distinguish positions:

$$ PE_{(\htmlClass{maisight-sym-pos}{pos}, 2\htmlClass{maisight-sym-i}{i})} = \sin\left( \htmlClass{maisight-sym-pos}{pos} / 10000^{2\htmlClass{maisight-sym-i}{i}/d} \right), \quad PE_{(\htmlClass{maisight-sym-pos}{pos}, 2\htmlClass{maisight-sym-i}{i}+1)} = \cos\left( \htmlClass{maisight-sym-pos}{pos} / 10000^{2\htmlClass{maisight-sym-i}{i}/d} \right) $$

Inline math density

Inline math in dense paragraphs: when $\htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x}) = \htmlClass{maisight-sym-x}{x}^2$ and $g(\htmlClass{maisight-sym-x}{x}) = e^{\htmlClass{maisight-sym-x}{x}}$, the chain rule gives $\frac{d}{d\htmlClass{maisight-sym-x}{x}} g(\htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x})) = g’(\htmlClass{maisight-sym-f}{f}(\htmlClass{maisight-sym-x}{x})) \htmlClass{maisight-sym-f}{f}’(\htmlClass{maisight-sym-x}{x}) = e^{\htmlClass{maisight-sym-x}{x}^2} \cdot 2\htmlClass{maisight-sym-x}{x}$. The terms “gradient” and “attention” appear in this very paragraph and should be marked in the prose, but the math expressions $f$, $g$, $e^{x^2}$ should NOT be touched by the walker.