I found an intuitive, visual explanation of the Determinant formula!

Suppose det(M) was instead defined as “The volume of the parallelotope described by the column vectors of M, times -1 if the orientation is negative.” That would give rise to these properties:

1. Scaling a vector scales the volume by the same amount
2. Swapping two vectors flips the orientation
3. Adding a multiple of one vector to another has no effect on the volume

Explanation of 1

Consider a parallelogram (2-parallelotope), whose area formula is A = b*h. The height may be thought of as the component of one vector on the line perpendicular to the other vector. Multiplying a vector by a scalar will also multiply that height component by that scalar. In a parallelepiped (3-parallelotope), the height is the component of one vector on the normal line of the plane described by the other two vectors.

Explanation of 2

Moving the tip of a vector in a straight line, in a way that causes the volume to pass through zero, will “flip” the orientation. The orientation of the identity matrix is 1. Swapping two vectors can be thought of as moving each vector in a straight line to the other’s location, which would flip the orientation.

Explanation of 3

Consider a parallelogram. Sliding one side along the line it’s on will not affect the area. You can rotate the entire parallelogram so that the sliding side is horizontal, and that sliding would not affect the height nor the base. “Sliding” in that case is the same as adding a multiple of the base vector to the other vector. Similarly, in a parallelepiped, you can slide any plane along the plane it’s on, without affecting the volume, which is equivalent to adding multiples of the base vectors to the other vector.

Each property corresponds to an elementary row operation E. As such, det(E_1*E_2) = det(E_1)*det(E_2). Also, det(E) = det(ET). Any matrix may be split into a product of the elementary row operations that make it up. Let A = E_1*E_2*E_3*…E_n.

det(AB) = det(E_1*E_2*…E_n*B)

det(AB) = det(E_1)*det(E_2)*…det(E_n)*det(B)

det(AB) = det(A)*det(B)

det(AT) = det((E_1*E_2*…E_n)T)

det(AT) = det((E_n)T*(E_n-1)T*…(E_1)T)

det(AT) = det((E_n-1)*(E_n-2)*…(E_1))

det(AT) = det(A)

Note: row swaps aren’t necessarily the same as column swaps, and therefore vector swaps, but had we defined the determinant based on the vectors denoted by the rows of M, this proof would apply, and would then imply that column swaps also flip the orientation, due to fact that det(A)=det(AT)

Suppose you have a parallelogram in 2D space, with neither side lined up with the x or y axes. Another way to think of the area is as the vertical distance between the top and bottom, times the horizontal component of the bottom. To see why, imagine sliding the top side along its line until the right and left sides are vertical. The vertical distance between the top and bottom has remained the same, as has the area. If you rotate your head sideways, the component of the base on the x axis is now the “height” of the parallelogram, and that vertical distance from earlier is the base. Extending this to 3D, the vertical distance between the top and bottom faces, times the projection of the base onto the x-y plane, is the volume of the parallelepiped.

Suppose you just want to increase the z-component of vector 1. You first slide one face of the parallelepiped along its plane until that vector is parallel to the z-axis. Then you increase the z-component. This is the same as first increasing the z-component, then sliding that face the way you did before, because each operation is a linear transformation. The first method results in a change of volume equal to the change in z times the area of the projection of the other vectors onto the x-y plane. The area of that projection is the determinant of those vectors, without their z components, or det(M_ij), where M_ij is the matrix M, without row i (the z component), or column j (the vector you changed). More generally:

|dV| = |dx_ij * det(M_ij)|

dV is the change in the volume, and dx_ij is the change in the component at position (i,j) in the matrix. The absolute values are important, because that change in x_ij could either increase or decrease the volume. Since by our definition, the determinant is the volume, with a handle for orientation: |dD| = |dV|, where dD is the change in the determinant.

Recall that swapping rows or columns of M multiplies the determinant by -1, so the rate of change of the determinant will also be opposite for adjacent terms in the matrix. To understand why, start with column 1, and add up all the rates of change times their respective values to get the determinant. Then do it again after swapping column 1 with column 2. It’s computationally equivalent, but results in the opposite determinant. The determinant’s rate of change formula is:

dD = (-1)^(i+j) * dx_ij * det(M_ij)

To find the actual determinant from this, imagine adding in the last vector of your parallelepiped. You can imagine first adding in the x-component of your vector times its rate of change, then the y-component times its rate of change, then the z-component times its rate of change. When you’re done, you’ll have the total determinant.

det(M) = sum(i=1,n, (-1)^(i+j) * x_ij * det(M_ij))

submitted by /u/Kawaiimmy