The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or click here to continue anyway

Common mathematical misconceptions

When I heard course names for higher mathematical classes during high school and even part of college, it seemed as if they were teaching something simple that I learned back in middle school. I knew that couldnt be the case, and three years of college have taught me otherwise.

N dimensions

Generalizing to $N$ dimensions is often seen as a pointless mathematical exercise because of a language confusion. Space exists in three dimensions; we just need three elements or a three dimensional vector to describe any point. Because of this, we say we live in three dimensions. So isnt generalizing to an $N$ dimensions pointless?

Lets go back to vectors. An $N$ dimensional vector is just one with $N$ components.1 Thinking about a grid or 2D graph, we need two numbers to describe any point on that grid, $x$ and $y$. Thats a two dimensional vector. For the four dimensions we live in, we need four: $x, y, z, t$.

But wait. Why cant we define a five dimensional vector that includes my age or a six dimensional vector that also includes my year in school? Thinking about this in the real world, we have data that has $N$ components all the time. Images on my computer have $N$ pixels and my GPS records $N$ locations. Almost anything discrete is an $N$ dimensional vector.

This pops up all the time in image processing, machine learning and almost anywhere that uses linear algebra or a computer. All the vectors and matrices Ive seen have $N$ components, a discrete number. Computers only have a finite number of bits2 meaning computers must also be discrete. That means if we want to do anything fancy with a computer, we need to use linear algebra.

Linear algebra

At first glance, linear algebra seems like middle school material. In middle school you might have seen $y=m\cdot x+b$ (more on this linear function later). In linear algebra, you might see $\mathbf{y}=\mathbf{A\cdot x+b}$ where $\mathbf{y, x,}$ and $\mathbf{b}$ are vectors and $\mathbf{A}$ is a matrix.3

A matrix is just nothing more than a collection of vectors and 2D grid of numbers. For example, if you have $M$ students with $N$ features (age, weight, GPA, etc), you can collect them by simply stacking the vectors and make an $M\times N$ matrix.

Weve defined both vectors and matrices, but what about basic operations? How do we define addition and multiplication? Addition works element-wise, but multiplication is a bit more interesting.

Matrix multiplication is defined rather simply, but that doesnt give you any intuition. You can think of matrix multiplication as linear combinations of the rows and columns, but that still doesnt tell what matrix multiplication means.

Intuitively, we know that we can transform any matrix into any other matrix because we can arbitrarily choose numbers. But wait. In middle school we learned that a function could transform any number into any other number. Matrices are then analogous to functions!

For example, lets say we have a set of inputs $\mathbf{x} = [1,~2,~3,~4]^T$ and we want to perform the function $y = f(x) = 2\cdot x + 1$ on each element in our vector. In matrix notation, we can just do

With these functions, we can perform even more powerful actions. We can easily swap elements or perform different actions on different elements. It gets even more powerful but still relatively simple to an experienced mathematician. We can find when a matrix or function simply scales a vector by finding the eigenvalues and eigenvectors or use the inverse to find a discrete analog to $f^{-1}(x)$.

Linear functions

A linear function seems like the most boring function. Its just a straight line, right? It is, but English got confused and it also means when some function obeys the superposition principle. Concisely in math, its when $f(\alpha x+\beta y)=\alpha\cdot f(x)+\beta\cdot f(y)$.

This means that a general linear function is not linear. If $f(x) = mx+b$, $f(x+y) = f(x)+f(y)-b$ which doesnt satisfy the definition of a linear function. But theres much more interesting functions out there. Integration and differentiation respect scalar multiplication and addition, meaning theyre linear.

Since a host of other functions depend on integration and differentiation, so many of these functions (but not all!) are also linear.4 The discrete Fourier transform (computed by fft), wavelet transform and a host of other functions are linear.

Addition and scalar multiplication are defined element-wise for matrices, so any linear function can be represented by a matrix. There are matrices for computing the fft and wavelet transform. While unused as they require $O(n^2)$ operations, they exist. Exploiting specific properties, mathematicians are often able to push the number of operations down to $O(n\log(n))$.

Linear functions are perceived as boring. If you know $\mathbf{y}$ in $\mathbf{y=Ax}$ for some known5 matrix $\mathbf{A}$, you can easily find $\mathbf{x}$ by left-multipying with $\mathbf{A}^{-1}$. This might be expensive time-wise, but an exact solution is computable.

Nonlinear functions have a unique property that an exact and closed form solution often cant be found. This means that no combination of elementary function like $\sin(\cdot), \exp(\cdot), \sqrt{\cdot},\int \cdot~dx, \frac{d~\cdot}{dx}$ along with respective operators and the infinitely many real numbers can describe every solution.6 Instead, those elementary functions can only describe the equation that needs to be solved.

Even a simple problem such as describing the motion of a pendulum is nonlinear and an exact solution cant be found, even for the most simple case. Getting even more complex, theres a whole list of nonlinear parital differential equations that solve important problems.7

This is why simulation is such a big deal. No closed form solution can be found meaning that you have to find a solution numerically. Supercomputers dont exist to for mild speed boosts or storage requirements that machine learning or big data8 seems to require. No, supercomputers exist because with a nonlinear problem, a simulation has to be run to get any result. Running one of these simulations on my personal computer would take years if not decades.

When you hand this type of problem to an experienced mathematician, they wont look for a solution. Instead, theyll look for bounds and guarantees on a solution they work out. If they just came up with a solution, they wouldnt even know how good it is! So, theyll try to bound the error and look for ways to lower that error.

When I started college, I was under the impression that my later career would involve long and messy exact solutions. In my second and third years, I came to realize that those messy solutions didnt exist and instead we used extremely elegant math to find exact solutions. In the last couple months9, I have come to realize that while simple math might describe the problem, theres no closed form solution and the only way to get a solution to interesting problems is with a computer.

Continue reading on scottsievert.github.io