The Tennis Racket Theorem

or, the Intermediate Axis Theorem
or, Testing a Universally-Used Physics Engine for a Basic Mechanical Result

Let’s replicate a strange mechanical effect in a popular video game.

Kerbal Space Program is pretty cute. You build rockets with an increasing number of available parts, either with little aliens onboard or computer cores for control, and send them off to various planets in the Kerbol star system, with simplified gravitational interactions. The model is always 1-body, with planets on rail orbits. A spacecraft is only attracted to a single astronomical body at a time, chosen by being within its sphere of influence determined by the body’s distance and mass. So, no Lagrange points, no single-body orbit capturing (but, you can get captured by a planet without expending fuel by encountering a moon during the fly-by). Planetary orbits are not exactly co-planar, which provides some realism and frustration. All planetary bodies intrinsically rotate around parallel axes. There appears to have been an unexpectedly difficult technical problem for the creators to overcome in order to do so. The result does simplify takeoff and landings, which is nice. Atmospheric interactions include a few regimes for engine response to pressure and speed, drag, and reentry heating. The most accessible lessons come from the takeoff stage, which any player sees from the start. This includes finding the proper ratios between engine power, fuel storage and payload mass. Later, if one seeks to remain in space, one must understand the classic lesson, “orbiting involves moving sideways more than upward.”. Finagling with multiple stages is easy and forgiving to encourage experimentation — you don’t have to do any math to succeed. But you sure as hell can, if you’re the kind of person who appreciates True Fun With Fractions.

Even in the unusual circumstance that calculatin’ isn’t your jam (and, are you sure about that? Really?), KSP manages to keep your attention by featuring endearing little aliens who seem to revel in the hope of being involved in a nightmarish disaster.

My students brought the game up in class a few times in the past. One year, I’d see a daily crowd scamper off to the maker lab to play it on the PCs kept in there. There were free 3D printers in the room with them, and they were playing video games instead! I feel like I’m at a unique point in my life when I can both relate to the grumpy old man who laments children of today missing out on “real learning”, and also the little kid who would be first in line at the computer lab. Heck, I’d be first in line as a teacher, if they let me.

First in line to play for the educational value.

I truly believe there is a ton of educational opportunity here, but I am not sure how much of it requires an initial push to apply and achieve physical understanding. How much can someone figure out for themselves without having seen an example or read about the principles first? The principles for this kind of thing are often so close to the examples as to be indistinguishable.

Regardless, let’s demonstrate something with it.

Kerbal Space Program is written in Unity, which uses the PhysX engine developed by Nvidia. This thing is shared by Unity’s competitors, including Unreal Engine. I’d say that qualifies PhysX as an industrywide standard. So it had better accurately demonstrate some physical results of its model.

One of the tough things to get right in a mechanics lab is the illusion of a lack gravity and minimization of friction with the air and ground. Intro physics students have an easy time blaming their crappy lab reports on these, in addition to “human error.” A trusted physics simulation gives you the option to cut these out. Or never include them in the first place. KSP provides an easy way to assemble a test object, and apply forces and torques in the comfort of cold, inhospitable simulated outer space. Let’s look at something that would be pretty tough to show in a classroom or mechanics lab: The tennis racket theorem, or Dzhanibekov effect, or the intermediate axis theorem.

An object spinning in space will undergo variations in its orientation, with or without applied forces and torques. The spinning object above appears to change its rotational direction. Depending on how you define the direction of rotation, this is an illusion. The axis of rotation relative to the object can change without applied torques. The angular momentum does not. Some of this weirdness can be worked through by making a comparison to linear motion.

When you try to push something, the rate at which it speeds up depends on its mass. This is, hopefully, one of the first things you learned in physics class. F = m a; the applied force results in an acceleration a. The object accelerates more quickly if the mass m is small, less quickly if m is large. You can treat everything as a scalar in one dimension, or include vectors in 2 or 3 dimensions. The force changes the momentum, p = m v. Most of the time, you keep the mass of an object constant, and see how the velocity v changes with an applied force.

The bridge into “rocket science” is the admission that m may change as well. A rocket burns fuel, decreasing m with time. All three of the things in F=ma are changing, which can often be confusing. It’s confusing in the same way that the ideal gas or Ohm’s laws are confusing: they often appear to be self-contradictory, when one forgets the implicit assumptions about what is being held constant. This comes up “How can power be both directly and inversely proportional to resistance???” The implication being that either V or I, whichever one isn’t present in the equation are being held contant.

Rotation in 2 dimensions involves the same math. When you’ve only got one dimension to rotate around (an axis pointing into or out of “the page”), you can treat it all as a bunch of scalars. The angular momentum L = I \omega . In 2 dimensions, there’s no way for a solid body’s moment of inertia I to change, and without torque there is no change for L, either. Everything stays constant, and whatever object you’re thinking of spins like a top.

Well, no. NOT like a top, because in 3 dimensions, a top precesses and nutates unless it is rotating exactly along one of three perpendicular directions, the principal axes. The direction of rotation \omega can change, even in a torque-free regime, because I can change. That’s right. Maybe you’ll never change, but the moment of inertia sure can. I swear, this time I will change. You just have to give it a chance, baby.

The momentum of a rocket changes because it loses mass that carries the momentum away. So the m in F = ma changes value as the rocket burns. But, if you consider the entire rocket-fuel system, even the exhaust left behind after the fuel has been burned, the total mass stays constant. The total momentum is constant. But if we kept track of m not only as a single value, but as a distribution through space, we would see it change. The shape of the mass distribution function changes with time, and the momentum density (how much is in the rocket, how much is in the space behind it in the exhaust) changes with time, too. We could keep track of a list of velocities or a broad velocity field as well — including the different speeds for the exhaust and rocket separately. If this sounds difficult, it’s because it can be, but I’m sure some simple models are good enough when all we really care about is the rocket’s path. In fact, a short list of masses is the way you solve your first problems in momentum conservation, by assuming a small number of rigid bodies:

p_\text{net} = m_1 v_1 + m_2 v_2 = (m_1 + m_2) v_3 = m_3v_3

The velocity v_3 is often that of a combined object. The rocket case is a poor example here, because of how they continuously burn fuel, but it does apply with the right trickery. Usually an introductory problem treats that value as the result after a collision, or a merger, or another instantaneous event. Regardless, even after a collision event, you can still treat m_1 and m_2 as separate values that are a part of a multivalued distribution of mass that applies to the entire system. The next step is to say, well, the exhaust isn’t made up of a single object. It’s made up of individual puffy clouds, each with their own masses and velocities. Maybe m_1 = m_{1,1} + m_{1,2} + m_{1,3} + m_{1,4} + .... Each of those little clouds could themselves be broken into a multitude of gas molecules, either with their own particular mass and velocity contributing to the whole. At that stage, you’ve got to start integrating infinitesimal masses and their velocities throughout space to get the full picture.

And this is the procedure for angular momentum. You pick an axis of rotation on an object and you integrate infinitesimal masses throughout its volume to get a single value for the moment of intertia. This works well for objects whose axis of rotation does not change with time. Rigid objects. That’s the typical introductory style problem, where oftentimes rotations are presented in only two dimensions, and so there’s only one possible axis of rotation anyway. Your angular momentum looks a lot like linear momentum — it’s the product of an effective mass (I) and a rate of change (\omega).

L = I \omega

In 3 dimensions, though, the goofy stuff happens. With a solid object spinning freely in space, the direction of rotation and moment of inertia can change (while conserving the product when under no torques). In the realm of linear momentum in 3d, this is maybe like… an object speeding up and slowing down as it travels in a single direction, with its intrinsic mass changing as it goes? It’s perhaps more like an object’s direction of travel changing on its own, with effective masses assigned to different directions, surging between one and the other. This doesn’t really happen as far as I know, which is why rotation is counterintuitive.

Ultimately, you need to describe the moment of inertia as a tensor — basically a matrix (as far as you care, which you may not, but if you do then you’re already fine). The moment of inertia tensor by definition is symmetric (I_{ij} = I_{ji}).

I = \left(\begin{array}{ccc} I_{11} & I_{12} & I_{13}\\ I_{12} & I_{22} & I_{23}\\I_{31} & I_{32} & I_{33} \end{array}\right) = \left(\begin{array}{ccc} I_{11} & I_{12} & I_{13}\\ I_{12} & I_{22} & I_{23}\\I_{13} & I_{23} & I_{33} \end{array}\right)

The symmetry is a reflection of the fact that spinning the thing clockwise will be just as hard as counterclockwise on the same axis. So here’s a fun thing: any symmetric matrix can be diagonalized. Diagonalization is a key technique in physics and, I’m quite sure, has applications in any field involving quantitative data. If you’ve got any equations that are at least as complicated as adding or multiplying, you’re probably looking at a linear system or one that is linear within a certain regime. If you’re a student and have made it this far, I’d recommend taking the time to understand this concept, understand linear regression cold, take a linear algebra course, etc. I often wonder if matrix manipulation with applications would be more useful or enlightening to the average citizen than most concepts in calculus, which seems to be the “default” advanced math course for high schoolers.

The point here is, through diagonalization, you can rotate the object (or change your point of view) so that only the diagonal terms are left in the tensor, each corresponding to the difficulty in rotating the object around orthogonal (mutually perpendicular) axes, such as our standard x, y, z. It’s just a change of coordinates.

I = \left(\begin{array}{ccc} I_{11} & I_{12} & I_{13}\\ I_{12} & I_{22} & I_{23}\\I_{13} & I_{23} & I_{33} \end{array}\right) \rightarrow \text{diagonalize}\rightarrow I = \left(\begin{array}{ccc} I_{1} & 0 & 0\\ 0 & I_{2} &0\\0 & 0 & I_{3} \end{array}\right)

The three perpendicular directions that line up with these numbers are the principal axes. If the moments of inertia I_1, I_2, I_3 are different, we can make sure we’ve defined our axes so that I_2 is the middle value, with I_1> I_2 > I_3. We can then call its corresponding axis the intermediate axis.

The intermediate axis theorem is this: When spun around axis 2 in free space, an object’s angular velocity will be initially aligned with that axis but drift away from it. If it starts close enough, the drift away can happen quite suddenly, resulting in the flipping effect you can see in the video above. If initially spun around the axes with largest or smallest moments of inertia, the angular velocity will seek to drift toward them. In this way, spinning around these is stable, and the intermediate axis instead acts as an unstable equilibrium.

One of the best ways to visualize this is not with a tennis racket, but with a phone! I’ll get to the video game eventually, don’t worry. Phones are great because they are very common and easy to handle. It turns out that boxy objects like phones have very clear principal axes due to their symmetry, which are parallel to their standard dimensions. In addition, phones tend to be very expensive, so it’s very exciting to throw one up in the air over and over to test this effect. Let’s get our phones out.

Introducing: The Cool Phone 2000.

A cuboid has principal axes perpendicular to its faces.

The three principal axes of the roughly cuboid shaped Cool Phone 2000. I1 has the largest moment of inertia, and I3 the least. I2 is the intermediate axis.

It’s easiest to spin it around the “long” axis — in this case, I_3 — and hardest to spin around the center of the screen, I_1 here. This is because there is more material in the phone further from the axis of rotation.

Spinning around the principal axis with the smallest moment of intertia (I_3). This is the easiest way to get it spinning faster.
Spinning around the principal axis with the greatest moment of inertia (I_1). This is the toughest way to get it spinning, in terms of torque and energy expenditure. Actually flipping a phone this way may feel easier to you, depending on your dexterity.

I’m using a phone as an example because I want to encourage you to try this for yourself. If you don’t have a phone onhand, or aren’t feeling like tossing your $1000 iPhone up in the air, you should still be able to find a small, rigid box to try this with. If you throw the phone up in the air so they spin like these, it ought to continue doing so until you catch it or it collides with the ground.

If you try to toss it along the intermediate axis, though, it will flip. My instincts expect this:

My expectation for spinning around the intermediate axis.

The reality is closer to this:

Closer to the reality. This is just an animation, not a simulation. The actual movement is not so periodic, but gives the key result that the phone can face away, but be upright, after spinning.
Closer to the reality, with pauses to let you get your bearings. Hopefully you can convince yourself this is not simply spinning around a tilted axis; the axis itself is changing as the phone rotates.

Again, I stress, try this for yourself. Hold your phone like a playing card, and flip it up into the air, with the top falling toward you. If you get good, you can catch it and toss it repeatedly and it will look super cool. Extremely cool. It’ll be a hit at parties.

“Simulating” in KSP

I figured, since the game uses a Real Physics Engine®, I could build a phone-like ship in Kerbal Space Program, apply some spin, and see the effect happen. So, I made an accurate 200:1 model of the Cool Phone 2000 and launched it into orbit.

Almost done designing.

It was attached to six rocket motor arrays, two along each of the (supposed) principal axes, which would be used used to quickly spin it up in that dimension.

Launching the Cool Phone 2000. I could have used the developer cheat menu to put it in orbit. But, I figured, if I’m playing this game I might as well play the game.
It’s pretty easy when you don’t have to worry about money, life support, Kessler syndrome, …
…the harsh, imposing coldness of empty space, the-

The first one I launched spun up a little too easily. The engines were solid rocket types, which can’t be turned off. This means they had to fire at full blast until they were out of fuel, which gave way more spin than this thing needed. So, I cheat-menu’ed a reduced thrust, higher mass group of spinners to get it up to a more manageable speed.

The 6000 model was scrapped…
…and replaced with the 7000.

The new model also had a better shape for starting the spins. The first one I sent up had a few too many heavy decorative parts, which skewed the mass symmetry more than I expected.

It spins nice and stably in the “toughest” dimension:

Rolling about I1.

Take a look at the navball in the lower left. For these short lengths of time, the navball is the view of a non-rotating ball that someone attached to the ship would see. When the ball is all blue, it means the antenna is pointed directly “up.” If it’s all brown, the attenna is pointing “down.” Sure, you can define up and down in space. Here, you can see the view is moving pretty much along a single great circle on the navball. The antenna spins around in a stable circle.

The phone’s also stable when spinning in the long dimension.


You can see the yellow view marker is pretty much stationary on the navball, with the navball spinning around that point. Or, you can just look and see that the phone is not flipping around.

For the intermediate axis, it’s all wonky, as expected!

Featuring the instability of rotation around the intermediate axis.

Watch it flip and flop! There is no external torque on this once the spinners detach. Compare the navball behavior to the other two cases.

Not a real shock, but fun to demonstrate.

Actually, it might be fun to track the path of the orientation. I suppose this could be simulated by solving the Euler equations, or I could just read it off the navball! The axis of rotation follows a path called a polhode, which is the sort of taco or criss-cross shape formed by the intersection between a sphere and an ellipse.

Some polhodes drawn in blue. The angular momentum (from the phone’s point of view) is restricted to the surface of a sphere (orange) by conservation of angular momentum, and restricted to an ellipse (purple) by conservation of energy. It must lie on the intersection. The shape and size of the sphere and ellipse depend on the dimensions of the spinning object, and how fast and in which direction it spins.

I’ll consider it.

The real takeaway here is to double-check our understanding of angular momentum.

L = I \omega

When learning about the concept, 2-dimensional models can give a sense that the angular momentum is always in the direction of the angular velocity vector, and the moment of inertia is a scalar. But, a moment of inertia tensor in the inertial frame will evolve as a three dimensional object rotates, and the angular velocity must evolve in turn to result in an unchanging angular momentum.

Constructing the digits of pi from conservation of energy and momentum in collisions

A short one, just to get the ol’ bloggin’ fingers movin’ again’. It’s starting to get chilly outdoors and they’re simply covered in cobwebs!! Where did they come from? How did I get spider webs on my hands…..?

Last year, 3Blue1Brown produced a series of lovely videos (but really, they all are) on a strange little thing. An object is sent bouncing back and forth between a rigid wall and another moving object with a particular (integer!) mass ratio. If this ratio is a power of 100, the number of collisions will be some of the first digits of pi!! Isn’t that nuts.

My two blocks.

Anyone who has taken a high school senior level physics class or above would recognize the need to implement two conservation laws: that of momentum (linear with velocities) and that of energy (quadratic with speeds). Someone with just a bit of programming experience should be able to set up a series of solutions for the speeds after successive collisions, until the “no-collide” condition is met: the two blocks are traveling in the same direction, away from the wall, and gaining distance between them.

Hey, I’ve got a bit of programming experience! I whipped this up to see if I could get the same results.

Both blocks equal. The block on the right is 100^o = 1 times more massive than the block on the left. Three collisions.
100:1. There are 31 collisions in there. Looking good!

10000:1. There are 314 collisions! It looks like it’s working. Trust me. You must trust me.

It looks like they’re doing it right! A fun little exercise. In the spirit of the original prompt I’ll let you find or figure out why it works for yourself. You can always watch Sanderson’s explanation video.

One hint I can give is it is a bit related to the Buffon’s needle technique of finding pi. This is a fun idea that understanding could open some avenues of thought for more difficult estimation problems. Check out the description and explanation on Numberphile:


Exploring cellular automata 1

Conway’s game of life is maybe the most well known ruleset for cellular automata. Each cell in a grid has some possible set of states, and set of rules for changing state based on its own and it’s (usually) immediate neighbor cells. Time elapses with a number of discrete steps, sometimes called epochs. In Game of Life, a cell can either be “alive” or “dead” (on or off), and will live for the next time step only if it has one or two alive neighbors.

Some incredible patterns emerge, including spaceships, guns (creating gliders and spaceships), breeders (creating guns), other puffersrakes, long-lived methuselahs (this example creating some gliders).

Other rulesets lead to neat patterns. Here’s a fun one with many cell states. It lets you rate the results as it scans some starting parameter space.

So, we’re certainly not limited to one ruleset. Coming up with one is easy — finding if there are any neat patterns maybe not so much. It might be better to first explore simpler rules — ones in only 1 dimension. In this case, a linear space is segmented into cells, and each cell has only two neighbors. Given some configuration of three cells in a row, the middle cell’s state in the next epoch will be dependent on its own state as well as the two alongside. When considering how to change any cell, since there are only three cells to consider, and only two possible states for each, there are only 2^3 = 8 arrangements to consider when changing any cell. Because of this small number, it’s no problem to display the rule of the automaton as a list of diagrams, each showing one of the eight starting arrangements and the resulting state of the middle cell in the next epoch.

The rule icons for Wolfram’s “Rule 73”, one example of the 8 diagrams needed to completely describe this cellular automaton. To determine if a cell is on or off in the next epoch, compare it and its neighbors to the icons, and find the one whose top row matches. The cell beneath in the icon indicates the state of the cell in the next epoch.

With some initial state, for example:


The next line is produced by scanning over this line and matching the top row with the one from the rule icons.


Note that this wraps around, connecting the two edges. This process then continues indefinitely for successive lines. We’re free to pick any length of line and any arrangement of on and off cells as an initial condition, but once this is done, the rest is automated. This particular initial condition for this rule ends up looking like this:


This is kind of neat, but these processes don’t get to show off unless they’ve got some room to work with. Here’s the same rule applied to an initial state of a single black cell in a long row of white.

The repeating triangles are a pattern characteristic of some of the more interesting rules. Rules 30, 90, 110, and 184 are supposedly the most cited in scientific or mathematics articles.

Because there are only 8 arrangements of cells we need to specify to determine a 1-D ruleset, and each arrangement offers the choice of the middle cell turning on or off, there are 2^8 = 256 possible rules of this type. You can see links to them all here. One could argue there are only 128 rules, 64, or even 32, since negating all the colors, taking a mirror image, or negating the mirror image doesn’t change the behavior much. This means each rule is one of a set of four that operate in the same way.

I’ve already linked to the Wolfram Atlas a few times. The images above were taken from there or made using Mathematica. Stephen Wolfram, of Mathematica fame, is also well-known for his work on cellular automata. He’s argued that these things may be more fundamental to nature than mathematics. It’s considered controversial. I haven’t read A New Kind of Science, a book where he lays this all out, but it’s free and online. Maybe worth checking out. It seems like some of the argument hinges on the fact that some of these automata are Turing complete — they can be set up to operate and produce results like any computer program.

I’d just like to note that my interest in cellular automata is based pretty much entirely on having fun. It’s 11 o’ clock on a Saturday night.

A step after checking out the 1-D rules is to explore the 2-D rules. Next I’ll show off a rule and try to do something interesting with it.

A Hermitian Matrix Has Real Eigenvalues

When I studied math, I tended to find myself more interested in the “continuous” side of things rather than the discrete. This is calculus and analysis and such, in contrast to things like logic, abstract algebra, number theory, graphs and other things where everything is rather chunky. I suppose I was largely motivated by the use of calculus in physics, which was usually my main focus. It’s easy to see that calculus and continuous functions are essential when studying physics. It’s not as easy to argue for number theory’s direct applicability to an engineer.

Sometimes I feel a bit ashamed about this interest. Calculus and analysis are the next logical steps after real number algebra. One could argue I didn’t allow myself to expand my horizons~~! But, whatever, who cares. They’re still cool. OK? THEY’RE COOL.

It’s very easy to claim that you’re interested in something. It’s almost as easy to hear someone talk about how they’re a fan and try to call them out on it by asking about some trivia. This is often the crux of an argument in the whole “Fake Geek Girl” and Gamergate things.  Similarly, sometimes, it feels like everyone around me is nuts about Local Football Home Team, and I often find myself skeptical of the “purity” of their fanaticism. I need to remind myself that someone can enjoy something and not know everything about it. You can be interested in Watching Football Match and not know the names of everyone (or even anyone) on the team.

The same is true for something like math. It had better be true, since there’s always another thing we could define or discover, and all of the fields already developed aren’t completely understood by a single person. It’s fine to be more interested in one thing rather than another. If you take that too far, you’d end up criticizing people for not being interested in every single thing in existence equally.

It’s all right to wonder if I should look into a certain topic, though. A few years ago, a colleague teaching introductory calculus to high school seniors mentioned that they were working on delta-epsilon proofs in the class. This blew me away! The concept of a limit is usually introduced in a pre-calculus class, or the beginning of a calculus class. I am under the impression that it’s usually a matter of, “yeah, this function gets really close to this point, but doesn’t actually hit it,” and that’s all a student really needs to know until they do some analysis. A delta-epsilon definition is a way to formally define a limit, so there is no uncertainty as to what is actually happening. “Gets really close” ends up being defined exactly — basically, it says, “For this function f(x), gimme any number bigger than zero, even a SUPER tiny number, and I can give you a distance away from x=c such that $latex $f(x)$ is never further from a limiting value L than your number.”

Okay, maybe that is not super enlightening. But on a side note, it’s fun to think about how much like a playground taunt that is.

I was ready to argue that his students didn’t need to bother with delta-epsilon proofs, that they could learn to work with the fuzzy idea of a limit first and get along just fine in calculus, just as I had. But, I did start to doubt myself. Should I have learned the definition of a limit before trying to use it, in my hubris?

In retrospect: no, that’s silly. Looking at epsilon-delta definition, I realize it would have taken me ages to get through it, taking away from valuable time spent thinking about the applications of calculus. But, that feeling is there. Should I have known about this?

What does this have to do with Hermitian matrices, you demand, grabbing me by the shoulders and shaking, while my cravat soaks up your spittle.

I had this same feeling this week, when thinking about a topic in linear algebra and physics. In quantum mechanics, matrices are used extensively to describe certain kinds of actions that could be done to a particle or a system. One of those actions could be to take a measurement, such as the amount of energy in the system. If you call your matrix H and your particle current state \psi, then you could represent a measurement of how much energy the system has by multiplying them together.


When you multiply them together, you can get a single number \lambda times the particle state \psi as a result. If measured the energy of the particle, then the value of \lambda is that energy. You can’t just divide both sides by \psi because a particle’s state is a vector of many possibilities, and division by vectors rather than numbers doesn’t mean a whole lot here. (You can multiply both sides by a vector transpose and get into something called Dirac notation, but don’t bother with that now.)

The number \lambda and the state $\psi$ are called eigenvalues and eigenvectors of…

Is this worth describing? If you don’t know this, it might be incomprehensible. It turns out that if H is Hermitian, meaning, it is self-adjoint:

H_{ij} = \bar{H_{ji}},

then it always has real eigenvalues (as opposed to $\lambda$ being a complex number). Physicists interpret this to mean the matrix definitely represents a physical measurement. Hermitian matrices aren’t the only ones with real eigenvalues, but they also have a property that lets you be sure you’ve measured your particle as being in a certain state.

I’ve seen proofs that Hermitian matrices have real eigenvalues. Here are a couple. These start by assuming there is some eigenvalue/eigenvector pair, and using the fact that a vector magnitude is real at some point.

Finding the eigenvalues of a matrix is a matter of solving an equation which involves a determinant:

\det(H-\lambda I) = 0,

where I is the matrix identity. I thought, could I use an expanded determinant to show that the eigenvalues have to be real?

This isn’t so bad with a 2×2 matrix. With

H = \left[\begin{array}{cc} a & b+ci \\ b-ci & d\end{array}\right] ,

the characteristic equation is

0 = \det(H-\lambda I)

= \left| \begin{array}{cc} a-\lambda & b+ci \\ b-ci & d-\lambda\end{array}\right|

= (a-\lambda)(d-\lambda) - (b^2 +c^2)

= \lambda^2 - (a+d)\lambda +ad - (b^2+c^2)

You can show the two solutions for \lambda have to be real by shoving all these numbers in the quadratic formula. The discriminant (remember that?) is positive because it ends up being a sum of squares. I’m bored.

My thought after this point was to use mathematical induction. We’ve shown that an n\times n Hermitian matrix has real eigenvalues. Let’s show that an (n+1) \times (n+1) one does as a consequence.

Maybe this is doable, but I haven’t done it. It occurred to me that all the cofactors of diagonal entries in a Hermitian matrix would be themselves Hermitian, and a proof by induction would rest on this.

H = \left[ \begin{array}{cccc}h_{11} & h_{12} & \dots & h_{1,n+1}\\ h_{21} & h_{22} & & h_{2,n+1} \\ \vdots & & \ddots & \vdots \\ h_{n+1,1} & \dots && h_{n+1,n+1} \end{array}\right]

= \left[ \begin{array}{cccc}h_{11} & h_{12} & \dots & h_{1,n+1}\\\\ \overline{h_{12}} & h_{22} &  & h_{2,n+1} \\\\ \vdots & & \ddots & \vdots \\\\\overline{h_{1,n+1}} & \dots && h_{n+1,n+1} \end{array}\right]

My thought was… can you construct a determinant using only cofactors of the diagonal entries?

A 4×4 Hermitian matrix. Each matrix made by removing all entries in the same row and column of a diagonal entry is also Hermitian.

This turned out to be not helpful in an embarassing way. I asked myself, can you calculate a determinant by expanding over a diagonal, rather than a row or column? I was able to convince myself no, but the fact that I considered it at all seemed messed up. Shouldn’t that be something I should know by heart about determinants?

Similar to a student using limits without knowing the delta-epsilon definition, I realized that I don’t have a solid grasp of what determinants even are, although I’ve used them extensively. It felt like I was missing a huge part of my mathematics education. I don’t think I had ever proven any of the properties of determinants I had used in a linear algebra course.

I didn’t even know how to define a determinant. In high school, we learned how to calculate a 2×2 determinant. We then learned how to calculate determinants for larger matrices, using cofactors (although we didn’t use that word). But I didn’t (and still, don’t really) know what it was.

This doesn’t look unusual. I’ve got three books on linear algebra next to my desk here.

DeFranza and Gagliardi start by defining a 2×2 determinant as just ad-bc. It then tells how to calculate a 3×3 determinant, and then how to calculate larger determinants using cofactors. This seems in line with how I was taught (although this isn’t a coincidence. I took Jim DeFranza’s linear algebra class). The useful properties of determinants come later.

Zelinsky starts off (on page 166 of 266!) with an entire chapter on all of the algebraic properties we’d like determinants to have. It waits 11 pages to actually give explicit formulas for 2×2, then 3×3 matrices. It isn’t until after that that it gives something that feels like a definition.

Kolman starts with this definition right away:

Let be an x n matrix. We define the determinant of A by

|\mathbf{A}| = \Sigma (\pm) a_{1j_1}a_{2j2}\dots a_{nj_n}

where the summation ranges over all permutations j_1j_2\dots j_n of the set S = \{1,2,\dots, n\}. The sign is taken as + or – according to whether the permutation is even or odd.

Woah. This seems like something fundamental. I had only known determinants as, effectively, recurrence relations. This is a closed form statement of the matrix determinant. Why don’t I recognize this?

Really, though, I can see why this might not be commonly taught. It’s even more cumbersome. But it feels like I missed the nuts and bolts of determinants while using them for so long.

That definition seems ripe for a Levi-Cevita symbol.

It’s probably not worth making most people trudge through millions of subscripts. That’s sort of the MO of mathematics, right? You make an easy-to-write definition or theorem, and then use that to save time and energy.

Maybe I’ll try to show Hermitian matrices have real eigenvalues using that definition. Descartes’ rule of signs or Sturm’s theorem might help. But I’m sleepy. So maybe later.

The Martian Tripod Problem and Transcendental Functions

I thought this week about a problem I originally considered about ten years ago. I imagined a source of a laser beam, mounted high up above the ground, shining straight down, and allowed to rotate upwards at a constant rate until it was shining horizontally. The point at which the laser beam touched the ground would travel from directly below the source to the horizon.


The Martian Tripod Problem: What is the location where the beam strikes the ground as a function of time?

The question is, if I know where the source of the laser is, and how quickly it is rotating, can I know exactly where the beam strikes the ground over time?

The problem came about, I’m sure, as I was listening to Jeff Wayne’s Musical Version of the War of the Worlds, imagining beams of Martian heat rays mounted to towering tripods sweeping across the hull of the Thunder Child.

Trigonometry: What’s the length of a side of a right triangle with a constant height?

This is not so bad of a problem when only the geometry is considered. For now let’s call the angle that the laser makes with the “straight down” direction (the vertical) “omega-t”: \omega t. With t a length of time since the laser started shining, we can see that \omega is a sort of speed — when I multiply it by a length of time t, it gives a total angle, which is like a distance. The product \omega t works the same way that 30 miles per hour times 2 hours is 60 miles. In physics we’d call this speed of rotation \omega the angular speed, or angular velocity if you’re considering rotations in all three dimensions. For now, it doesn’t matter what the value of this speed of rotation is.

Setting up the laser at an angle \omega t from the vertical and a height H from the ground, we find it shines at a point H \sec(\omega t) away, and a horizontal distance H \tan(\omega t) to the right.

A laser at height H, at angle \omega t with the vertical, shines at a point H\tan(\omega t) to the right and a full distance H\sec(\omega t) away.

If you’re not so used to working with trig functions, you could get to the image above by first setting up the “classic” trig diagram, with the point a distance (hypotenuse) H away:

A scaled version of the previous image. Divide all lengths by cosine to get a constant height of H.

The above image has all the right ratios, but keeps the hypotenuse constant, not the height. Divide all the lengths by \cos(\omega t), and remember that secant = 1/cosine (by definition).

We’re already done. The point at which the laser beam hits the ground is

\bigg( H \tan (\omega t) , 0 \bigg)

Tangent “blows up” to infinity at $\pi/2$, which corresponds to the laser shining parallel to the ground. It intersects the flat ground an infinite distance away, at the horizon. Hopefully this agrees with your expectations; tangent is defined to act this way.

The Tripod Problem: Incorporating the speed of light

So that’s not super interesting. The real “tripod problem” was this: Where is the point of intersection if the speed of light isn’t infinite? If it takes some time for the laser beam to travel from the source to the ground, and the laser continues to rotate, then the location where the beam strikes will lag behind the orientation of the laser emitter.

This results in a “floppy” trajectory of the laser beam, drooping down to the ground

A rough estimate of the shape of the laser beam given a very slow speed of light, a very quickly rotating heat ray, or a very tall tripod. Directly underneath the source, the movement of the intersection of the trajectory and the ground is dominated by the rotation of the laser. Far away from the source, the motion of the point on the ground is dominated by the speed of light.

The behavior of the point where the laser strikes the ground is very different with the speed of light restriction. It never reaches the horizon in finite time — the behavior for long lengths of time, and far distances, is totally dominated by the speed of light. It should travel along the ground at a speed approaching c.

The way the pointer location moves depends little on the speed of light directly under the source, where the distance to the ground doesn’t depend strongly on the angle, and a wider angle of the laser covers a small length. As the laser approaches the horizontal, the length covered by each small change in angle increases. The light that will eventually strike very far distances looks more like a point source, since the small angles will be covered in a very short length of time. The trajectory of the beam itself, while it’s still in the air, will look more and more like an expanding circle with time.

When I first mentioned the tripod problem to a friend recently, he had the insight of saying that a laser’s point could definitely travel faster than the speed of light. He could shine a laser at one side of the moon, and then the other. A quick enough rotation on a far enough canvas could result in a pointer appearing to travel faster than the speed of light. Remember, this doesn’t violate anything in relativity. No object is traveling faster than light, rather, a series of events in which different photons strike the distant moon are occurring. This situation is very much like that when the tripod laser is pointed nearly downwards. The speed of the pointer is dominated not by the speed of light, but by the rate of the laser rotation because the distance the light has to travel doesn’t change much when the laser is pointed directly downwards (or from one side of the moon to the other).

This suggests that the speed of the pointer, at very far distances from the tripod, would approach from slower speeds if the laser were rotating slowly. But, if the laser were rotating quickly enough, could counterintuitively approach from faster speeds.

Anyway, let’s try to deal with the problem. Take a look at our trig diagram again.

The laser beam has to travel a distance H\sec(\omega t).

When a certain portion of the laser beam is emitted at a time t (and an angle \omega t), it has to travel a distance H\sec(\omega t). Traveling this distance at speed c takes a length of time equal to

\cfrac{H}{c}\ \sec(\omega t).

Any portion of the heat ray travels in a straight line. Although the beam as a whole is curved, we’re still assuming it’s always traveling directly away from the source (no diffraction, etc.). The laser travels in the same direction and therefore strikes the ground at the point

\big( H\tan(\omega t),0 \big)

at a later time

t^\prime = t +  \cfrac{H}{c}\ \sec(\omega t)

This seems great. We have the basis for a complete understanding of the position of the laser pointer (or toasted Edwardian human) given some time. A portion of the laser, emitted at time t, will strike the ground a horizontal distance H\tan(\omega t) away not at t but at t^\prime above. This allows us to find the location corresponding to any time of emission in the interval

0 \leq t < \cfrac{\pi}{2\omega}.

If we were satisfied with this, the game plan would be to pick a time of emission t, determine how long that portion of the beam traveled, and then pair up the resulting t^\prime with the distance H\tan(\omega t).

I’m not satisfied, though. I’d like to get a trajectory of the laser pointer: a location as a function of the actual time t^\prime rather than the time that portion of the laser was emitted, t. In order to do this, we’d need to replace the t in the tangent function with an equivalent function of t^\prime. In order to do that, we’d need to solve

t^\prime = t +  \cfrac{H}{c}\ \sec(\omega t)

for t. Good luck.

What we’ve got above is a transcendental equation. This means it is not composed of a finite number of additions, subtractions, multiplications and divisions of our variable t and the constants, as well as rational powers of these. In most cases, and I’m pretty sure in this one, we can’t solve a transcendental equation exactly for the input variable. We cannot write t in terms of t^\prime.

It seems like the best we could do, if we wanted to create an animation with a step by step progression of the position of the pointer, is to prepare ahead of time. Pick an emission time t, find the value of the tangent function to find the distance, find the value of the strike time t^\prime, and record that pair. Then do this many, many times to create a table with more values than we expect someone to ask us for. We could find the position as a function of time with as much precision as we wanted, supposing we were willing to put the effort in.

I wanted a closed form solution to the problem, a trajectory x(t^\prime), and it seems more than out of reach. This annoyed me, until a friend (hey, there, buddo!) reminded me that “closed-form” is just a matter of what I’m allowing as a definition. In fact, like I mentioned in the last post, all of the trig functions are themselves transcendental: They can be written as Taylor polynomials, but these are polynomials of infinite length. The secant in the equation above can be estimated using

1 + \cfrac{1}{2}\ x^2 + \cfrac{5}{24}\ x^4 + \cfrac{61}{720}\ x^6 + \cfrac{277}{8064}\ x^8 +\dots

The problem with this, though, is that this isn’t much better. It would still take an infinite amount of time to achieve the exact value of secant given most x’s. The only reason I’m more comfortable using this and the other trig functions is because I’ve been trained to use this name for them, and rely on calculators or tables to give me the values whenever I need them. Anyone using a trigonometric function table is benefiting from someone else’s hard work to overprepare. When we use a calculator, we are relying on an estimation that is as precise as the manufacturer (or sometimes the user) dictates. One could make this estimation with a Taylor series, or with a more efficient method, but the calculator still wont give an exact decimal value.

Any single irrational number, whether it is the solution to an algebraic or a transcendental equation, is another instance of this. I’ve gotten used writing things like \sqrt{2} or \pi as representations of numbers with clear definitions. These numbers have exact values, but it’s hopeless for me to try writing them down. In a very definite sense, these numbers elude us. I could write or use them to any finite precision I wanted, with millions and millions of digits, so long as I were willing to come up with and use an efficient algorithm to find them, or if I were were willing to wait or work a very long time, or both. But, I still wouldn’t have the “exact” value, just one that was plenty good enough for whatever application I had in mind.

These examples remind me: it’s convenient to have named functions like “Cosine” to cover a mathematical idea, but we can’t let this name cover up the meaning of that idea. There are an infinite number of angles whose cosine is a transcendental value. We’re able to use cosine because we can always (right?) reach a higher precision than is necessary in a physical application. I’ve gotten used to working with cosine, and mentally separated it from the solution to the tripod problem, because someone gave it a name that I’ve adopted.

So, I guess I should name the solution. Let’s call the composed function

x(t^\prime) = H\tan\big(\omega t(t^\prime)\big)

where t and t^\prime are related by

t^\prime = t +  \cfrac{H}{c}\ \sec(\omega t)

the heat ray function. We could create a huge table for x(t), someone could come up with an efficient algorithm for calculating values of x, and in the future we could use these to invade infinite planes with laser pointers more effectively.

Tangent of angles approaching 90 degrees

Last week a colleague came to me with a puzzle. He asked me to punch in the tangent of 89 degrees into a nearby TI-83 calculator.

\tan(89^\circ) = 57.28996163

He asked me what was surprising about this number. I wasn’t surprised. I didn’t have an answer for him, although in retrospect I probably should have. He had to tell me this was the number of degrees in a radian. Oh! So it is.

Even further, he said, try punching in tan(89.9), or tan(89.99), etc.

\begin{array}{rl} \tan(89^\circ) &= 57.28996163\\ \tan(89.9^\circ)&=572.9572134\\ \tan(89.99^\circ) &= 5729.577893\\ \tan(89.999^\circ) &= 57295.77951\end{array}

Each one is (about) ten times the previous. (With the TI-83, replacing the last result with each new one, I didn’t see the “about” until later.) This is kinda neat! His question to me was: WHY is this true?

Tangent is a function that accepts an angle and spits out a ratio of lengths. It seems weird that the answer for tan(89) looks like a number of degrees. It is a unitless output, though, and ~57 degrees per radian is also unitless, so I suppose this isn’t much of an issue. The question is, why does it appear that

\tan(89^\circ) = \cfrac{180^\circ}{\pi \text{ rad}}\text{ ?}

Ditching the degree measure,

\tan\left(\cfrac{\pi}{2}-\cfrac{\pi}{180}\right) \stackrel{?}{=} \cfrac{180}{\pi}

My problem was in trying to first tackle this question visually.

tan(89 degrees) is the length of the vertical line lying between an extended radius of a unit circle drawn 89 degrees from the horizontal and the right side of the circle. The numbers above make it seem like it is 180/\pi.

An equivalent image:

A triangle and circle \pi times larger have the same relative lengths.

I tried to explain this by imagining rolling the circle over the tangent line, wrapping the line around the circle, etc. I didn’t get anywhere.

I also tried considering the fact that there’s nothing particularly special about degree measure, except for the fact that 360 is an easy to divide number. Does this happen with other angle units? For example, what about a unit that was, instead of 1/360 of a circle, a larger 1/100 of a circle? We could instead take the equation above and ask,

\tan\left(\cfrac{\pi}{2}-\cfrac{\pi}{100}\right) \stackrel{?}{=} \cfrac{100}{\pi},

Is the tangent of one one-hundredth of a circle short of \pi/2 equal to the number of hundredths of a circle in a single radian? It looks to be true!

\begin{array}{rl} \tan\left(\cfrac{\pi}{2}-\cfrac{\pi}{100}\right)&= 31.82051595\dots\\\\ \cfrac{100}{\pi} &=31.83098862\dots \end{array}

But this is only approximate. We could extend this to any fractional unit 1/n of a circle:

\tan\left(\cfrac{\pi}{2}-\cfrac{2\pi}{n}\right) \approx \cfrac{n}{\pi}

Using this different unit, where the approximation is less accurate, I was able to see that the degree version wasn’t exactly true, either. It definitely looks like dividing the circle into a larger number (360, rather than 100) yields a closer approximation:

In the above, y= \tan\left(\cfrac{\pi}{2}-\cfrac{2\pi}{n}\right) (red) and y=\cfrac{n}{2\pi} (blue) converge for larger n (horizontal).

I was comfortable in concluding now that this wasn’t just a coincidence that relied on degree measure, and could extend this to include using 89.9, 89.99 etc degrees as well. In fact, tacking on .9s to the 1/nths of a circle units works. Just plugging in a bunch of numbers, it looks like

\tan\left(\cfrac{\pi}{2}-\cfrac{2\pi}{n}\ 10^{-a}\right) \approx \cfrac{n}{\pi}\ 10^a

works for any n, and also extends to any power a, not just the integers.

y=\tan\left(\left(\frac{n}{4}-10^{-x}\right)\frac{2\pi}{n}\right) (red) and y=\frac{n}{2\pi}\cdot10^x (green) lie almost on top of each other for positive x (horizontal). In the link you can see this works for any n by fiddling with a slider.

The question remained, why is this true? Now that I saw it’s only an approximation, I realized that I should be going about this algebraically from the start.

A trick, called the small angle approximation, is used in physics often to get rid of pesky sines and tangents when you’d rather just have an expression with the angle inside.

\begin{array}{rl} \sin x &\approx x\\ \tan x& \approx x \qquad\text{when }x\ll1 \end{array}

This behavior is clear when the functions are written in their Taylor series form:

\begin{array}{rl} \sin x &= x - \cfrac{x^3}{6} + \cfrac{x^5}{120} - \cfrac{x^7}{5040} +\dots \\\\ \tan x &= x +\cfrac{x^3}{3} + \cfrac{2x^5}{15} +\cfrac{17x^7}{315}+\dots \end{array}

When x is real small, all the higher power terms get super small, and the approximation becomes more accurate.

This approximation was my first thought, but there’s a problem: it works for small angles, but my colleague’s puzzle was about angles near 90 degrees. In fact, we can’t even fudge the Taylor series of tangent near here, because there is no Taylor series around 90 degrees. (This is a consequence of the fact that tan(x) blows up to infinity at 90 degrees.)

The problem is solved by noting that working with tangent near 90 degrees is the same as working with another trig function, cotangent, near 0 degrees.

\tan\left(\cfrac{\pi}{2}-x\right) = \cot(x) = \cfrac{1}{\tan(x)}.

Setting everything up:

\begin{array}{rl} \tan\left(\cfrac{\pi}{2}-\cfrac{2\pi}{n}\ 10^{-a}\right) &=  \left(\tan\left(\cfrac{2\pi}{n}\ 10^{-a} \right) \right)^{-1} \\ & = \left(\left(\cfrac{2\pi}{n}\ 10^{-a} \right) +\cfrac{1}{3}\left(\cfrac{2\pi}{n}\ 10^{-a} \right)^3 +\dots \right)^{-1}\\ &\approx  \left(\cfrac{2\pi}{n}\ 10^{-a} \right)^{-1} \qquad \text{(the approximation)}\\ &= \cfrac{n}{2\pi}\ 10^a \end{array}

Done! Having the 10^a instead of any old number is unneccessary — this works for any multiple. However, integer a‘s makes the trick of having the same digits show up in tan(89), tan(89.9), etc. work.

So, we can show this algebraically. I just wish I had a nice geometric argument.

More on the circular solution to the intersection of two perpendicular lines

In the last post, I showed that the intersection of two perpendicular lines must lie on a circle, so long as the lines are each forced to go through particular points. The final result was a parameterization based on the classic cosine, sine version of a circle, but the bit I found more interesting was the earlier form:

(x,y) = \left(\cfrac{am^2+(d-c)m+b}{m^2+1},\cfrac{dm^2+(b-a)m+c}{m^2+1}\right), \enspace m\in \mathbb{R}

One of the results of the parameterization was that the angle at which the point lied on the circle was not the angle at which one of the lines made with the x axis (unless the circle’s center was at the origin). This led to a phase shift in the parameterization from the angle of the line. If we were willing to lose a bit of information, the phase, we could also show that the above is a circle if it satisfies

\left(x-\cfrac{a+b}{2}\right)^2+\left(y-\cfrac{c+d}{2}\right)^2 = R^2


R^2 = \left(\cfrac{b-a}{2}\right)^2 + \left(\cfrac{d-c}{2}\right)^2

Since we already have the parameterization with above, showing this is true is just a matter of algebra. To start, add and subtract the coordinates of the center of the circle from x and y:

\begin{array}{c} x = \cfrac{am^2+(d-c)m+b}{m^2+1}-\cfrac{a+b}{2}+\cfrac{a+b}{2}\\\\ y=\cfrac{dm^2+(b-a)m+c}{m^2+1}-\cfrac{c+d}{2}+\cfrac{c+d}{2}\end{array}

Finding a common denominator and carefully combining gives

\begin{array}{c} x = \cfrac{(a-b)m^2+2(d-c)m+b-a}{2m^2+2}+\cfrac{a+b}{2}\\\\ y=\cfrac{(d-c)m^2+2(b-a)m+c-d}{2m^2+2}+\cfrac{c+d}{2}\end{array}

We now have a form that, when plugged into the LHS for the circle equation, cancels out the center point coordinates.

\left(x-\cfrac{a+b}{2}\right)^2+\left(y-\cfrac{c+d}{2}\right)^2 = \left(\cfrac{(a-b)m^2+2(d-c)m+(b-a)}{2m^2+2}\right)^2+\left(\cfrac{(d-c)m^2+2(b-a)m+(c-d)}{2m^2+2}\right)^2

The RHS of this thing is a bit easier to work with with by letting

p = b-a \quad\text{and}\quad q=d-c.

It becomes


Careful manipulation yields


Nicely, the m‘s cancel out completely. This becomes

\cfrac{p^2+q^2}{4} = \left(\cfrac{p}{2}\right)^2+\left(\cfrac{q}{2}\right)^2 = \left(\cfrac{b-a}{2}\right)^2+\left(\cfrac{d-c}{2}\right)^2


Parameterizing a circle with the intersection point of two perpendicular lines.

I’ve been really taken with Desmos, an online calculator and easy to use graphing tool. My students have been using it for some time, and I’m especially happy with the “slider” tool that it offers. Whenever you put a letter into a function while graphing, it suggests a value to assign it, and lets you tune that value with the slider. This tool is similar to Mathematica’s Manipulate or Animate functions, which I’ve had success using in previous classes to show how a function depends on its parameters.

My year-long teacher’s Mathematica license recently expired, making it a bit tougher to install on a new device. While I do have access to an unsupported copy, Desmos has more than replaced M-ca for any of my presentation needs.

In a recent class, we were playing around with linear systems and intersecting lines. To show that a negative reciprocal slope leads to a perpendicular line, I assigned a slider to the value m, and made two linear equations with slopes of m and -1/m:

y=mx \qquad\text{and}\qquad y=-\cfrac{1}{m}\ x

The slider has the nice effect of letting you rotate the lines to see that they’re always perpendicular.

desmos-graph (1).png
Play with it yourself, why don’t ya. You can animate or adjust the slope with the m slider on the left.

The kids were delighted by the pinwheel spinning of the lines as the slope was adjusted. To show that we weren’t limited to lines that passed through the origin, I tacked on a y-intercept to both of the equations, and asked the students, what do you think happens when I adjust the slope now?

My point was to show that the lines remain perpendicular. I would have been pleased to hear that the students could also predict that the point of intersection of the two lines would now move around, instead of be fixed at the origin.

One student went further, however: he was able to predict that the point of intersection of the two lines will always be fixed to a circle.

desmos-graph (3)
You can adjust the slope once again, as well as the points the line are fixed to pass through using the sliders. Only adjusting the slope m keeps the point of intersection on a circle. You can also adjust the points the lines are forced through with the sliders below.


The student had seen a connection to his geometry class from the previous year. An inscribed angle is half the measure of the intercepted arc. An angle inscribing half the circle must then be a right angle. What this student had realized was the converse of this statement: that a right angle, formed by two perpendicular lines each forced to pass through particular points, must lie on a circle, and those two points are the endpoints of a diameter of that circle. I thought this was awfully insightful!

I figured it would be neat to try to show that this must be true on my own. Solving the system

\left\{ \begin{array}{c} y =m (x-a)+c\\\\ y=-\frac{1}{m}\ (x-b)+d  \end{array} \right.

gives the point

\left(\frac{a\thinspace m^2+\left(d-c\right)m+b}{m^2+1},\frac{d\thinspace m^2+\left(b-a\right)m+c}{m^2+1}\right)

I thought this was really neat. We haven’t shown that this point lies on a circle yet, but assuming it does, it shows a way to parameterize a circle with m as the ratio of quadratics. Maybe this is something a mathematician would immediately recognize, but it’s new to me!

To show this does lie on a circle, I need to find an appropriate transformation that turns the above into the more familiar

\left( R\cos\theta + x_1 , R\sin\theta + y_1\right)

for a circle of radius and center (x_1,y_1) . The obvious choice is to connect the slope of one of the lines to the angle on the circle:

m \rightarrow \tan\theta

The parameterization becomes

\left(\frac{a\thinspace \tan^2\theta+\left(d-c\right)\tan\theta+b}{\tan^2\theta+1},\frac{d\thinspace \tan^2\theta+\left(b-a\right)\tan\theta+c}{\tan^2\theta+1}\right)

This is where all your trig identities pay off. Those denominators become squared secants, letting you get rid of the fractions altogether.

\left(\frac{a\thinspace \tan^2\theta+\left(d-c\right)\tan\theta+b}{\sec^2\theta},\frac{d\thinspace \tan^2\theta+\left(b-a\right)\tan\theta+c}{\sec^2\theta}\right)

\bigg(\enspace a\thinspace \sin^2\theta+\left(d-c\right)\sin\theta\cos\theta+b\cos^2\theta\quad,\quad d\thinspace \sin^2\theta+\left(b-a\right)\sin\theta\cos^2\theta+c\cos^2\theta\enspace\bigg)

I’m having a bit of difficulty with formatting here. I’ll have to just write it like so:

\begin{array}{c} x=a\thinspace \sin^2\theta+\left(d-c\right)\sin\theta\cos\theta+b\cos^2\theta \\\\y=d\thinspace \sin^2\theta+\left(b-a\right)\sin\theta\cos\theta+c\cos^2\theta\end{array}

The middle bits of these should pop out: a sine times a cosine is a part of one of the double angle formulas:

2\sin\theta\cos\theta = \sin2\theta

While we’re tossing in sine of a double angle, we might as well introduce the cosine of the double angle as well. This shows up from the squares:

\sin^2\theta = \cfrac{1-\cos2\theta}{2} \qquad \text{and} \qquad \cos^2\theta=\cfrac{1+\cos2\theta}{2}

Our parameterization becomes

\begin{array}{c} x=a\thinspace\left(\cfrac{1-\cos2\theta}{2}\right) +\left(\cfrac{d-c}{2}\right)\sin 2\theta+b\left(\cfrac{1+\cos2\theta}{2}\right) \\\\y=d\thinspace \left(\cfrac{1-\cos2\theta}{2}\right)+\left(\cfrac{b-a}{2}\right)\sin2\theta+c\left(\cfrac{1+\cos2\theta}{2}\right)\end{array}

What’s neat about this is that the center of the circle now falls out as a constant term at the end, and we’ve maintained some kind of symmetry with the sines and cosines.

\begin{array}{c} x= \left(\cfrac{b-a}{2}\right)\cos2\theta +  \left(\cfrac{d-c}{2}\right)\sin2\theta +  \left(\cfrac{a+b}{2}\right)\\\\ y= - \left(\cfrac{d-c}{2}\right)\cos2\theta +  \left(\cfrac{b-a}{2}\right)\sin2\theta + \left(\cfrac{c+d}{2}\right)\end{array}

Here’s where my trig knowledge stopped. The sine and cosines can be combined, though: a linear combination of sine and cosine should leave a single sine curve, but with a phase angle tossed in.

w \cos\theta + u\sin\theta = \sqrt{w^2+u^2}\enspace \sin\left(\theta+\arctan\frac{w}{u}\right)

This is great! We’ve got a way to combine the a, b, c, ds to get something looking like a radius.

\begin{array}{c} x = R \sin\left(2\theta + \arctan\left(\cfrac{a-b}{d-c}\right)\right) +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{c+d}{2}\right)\end{array}


R = \sqrt{\left(\cfrac{b-a}{2}\right)^2+\left(\cfrac{d-c}{2}\right)^2}

At this stage, we need to do is turn that first sine into a cosine (using \sin\theta = \cos\left(\theta-\frac{\pi}{2}\right).

\begin{array}{c} x = R \cos\left(2\theta + \arctan\left(\cfrac{a-b}{d-c}\right)-\cfrac{\pi}{2}\right) +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{c+d}{2}\right)\end{array}

We’re left with one remaining question: are the phase angles the same?

\arctan\left(\cfrac{a-b}{d-c}\right)-\cfrac{\pi}{2} \quad \stackrel{?}{=}\quad \arctan\left(\cfrac{c-d}{a-b}\right)

A couple more identities that I don’t have memorized clears this up:

\arctan(-x) = -\arctan(x) \qquad\text{and}\qquad \arctan\left(\cfrac{1}{x}\right) - \cfrac{\pi}{2} = -\arctan(x)

\longrightarrow \arctan(-x) = \arctan\left(\cfrac{1}{x}\right) - \cfrac{\pi}{2}

This answers the question above: yes! Our parameterization is

\begin{array}{c} x = R \cos\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{c+d}{2}\right)\end{array}

If we wanted to make it a bit nicer, replace:

\phi = 2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)

and we get a nice

\begin{array}{c} x = R \cos\phi +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\phi +\left(\cfrac{c+d}{2}\right),\end{array}

a circle centered at x=(a+b)/2 and y=(c+d)/2. Woof! Bark bark! Woof woof bark!

Body-fixed coordinate systems

A good friend of mine responded to the fractal drawing instructions.

When I did dragon curves, I’d use U->L->D->R, so I don’t get confused when I am moving down and turning left becomes turning right.

My list of Ls and Rs can be more verbosely written like, “Draw forward, then turn left, then draw forward (now that you’re facing left), then turn right, then draw forward, then turn right again,” and so on. Alex’s directions would go more like, “Draw a line upwards (relative to the page), then draw to the left, then draw a line down, then draw to the right,” etc. Alex’s instructions recognize the fact that, while you’re busy drawing the dragon curve, your view of the page is not actually changing. Up, down, left, and right always point in the same direction, regardless of the last move. In my set of instructions, L and R don’t mean the same thing: you have to keep track of the direction your line is “facing,” the last direction you drew in, in order to turn left or right. The ULDR alphabet is probably a lot better for telling someone how to draw a given iteration of the dragon curve (although it doesn’t occur to me how I’d go about generating those instructions).

This difference in these kinds of instructions show up whenever we are describing an object moving around in space. We probably see it the most often when figuring out directions. The game everyone has to play when looking at a map is in connecting the fixed, objective orientation of a map to their own personal situation. In order to start figuring out which way to go, you have to figure out where you are, as well as ask, “which direction am I facing?” You then have to translate a set of directions on the objective map to a set of subjective directions, ones that you will have to execute from your point of view.

As a car travels through the city streets, its own directions (the body-fixed coordinates) change, while the more objective coordinate compass rose stays the same. My LR instructions for drawing the dragon fractal are like the car’s system, while the UDLR instructions are like following the compass rose directions.

In the drawing, you might start describing your car’s path with, “drive two blocks east, then three blocks south, then two blocks east, then two blocks north.” You’d need to use the fixed system. But when executing your drive, you need to think in terms of your subjective viewpoint, “drive two blocks forward, then turn right, then drive three blocks, then turn left, drive two blocks, turn left again and drive two blocks.”

The directions we attach to the rotating, traveling car, Forward, Backwards, Left, and Right, are what you’d call in physics a body-fixed frame of reference, or a fixed-body coordinate system. The system we attached to the Earth, including North, South, East, and West, is an example of a space-fixed frame of reference, or an inertial coordinate system.

The difference between the objective and the body-fixed coordinate systems is pretty important in physics. The game plan in most situations involves first finding an objective coordinate system in which the laws of physics are the most simple. There might also be plenty of cases when someone would be interested in the alternative, a body fixed system, for instance, to verify that a person driving around still sees the same universe as the rest of us standing on the side of the road.

The car example is not too hard to wrap our heads around, since the car can only rotate around a vertical axis. It’s the same direction whether you’re using the compass rose or the car-fixed system: Up. When dealing with objects that can rotate in all three dimensions, though, things get mucky. Describing rotations in 3D can get complicated when making a distinction between body-fixed coordinates and space-fixed coordinates.

From Wikipedia’s Euler Angles page: a 3D object mixes up the directions of its axes of rotation whenever it rotates.



You might be thinking, “I don’t plan my car trips. I ask the GPS on my phone.” Well, sure, okay. FINE. The phone can detect where you are and which direction you’re traveling, and give you a set of instructions having translated NSEW to FBLR.

Google Maps on my Android can show my car directions either from the car’s point of view (with a neat 3D perspective camera following the car) or from overhead, with north fixed in a certain direction. I kind of prefer the latter, which like the dragon fractal instructions, means I would rather put that extra step on myself. I’m not sure what that means.

As a kid, I’d occasionally get the change to go to my uncle’s house and play Sonic the Hedgehog. I loved it. One day, another uncle of mine, his brother-in-law, traded Sonic for a copy of Electronic Arts’ Desert Strike.

I do remember the intro having some pretty sweet licks.

I was pretty annoyed as a young kid. This slow, Gulf War-themed, objective-based helicopter mission simulator had nothing like Sonic whipping across colorful, futuristic, robot populated levels. But I did play it a bit.



Desert Strike’s Apache in action. (Super Adventures in Gaming)

The helicopter was controlled from an isometric viewpoint above, using a body-fixed system. No matter which direction you were facing, pressing Up on the directional pad made the chopper move forward. It could still look like the chopper was flying to the left on the screen. Pressing Left and Right made it rotate. This can be a bit confusing, since the controls act as if the player were in the cockpit, while the view is fixed from above. (The arcade game Asteroids is another example with a body-fixed control system, and probably a better example since it was way more popular and came first.)

Some years later, as an adult, I got a copy of Desert Strike of my own and did enjoy it. There was a way to change the directional controls, so that Up was always Up, etc. But I preferred the body-fixed controls.

This is a third example, after the dragon curve instructions and the Google Maps orientation, in which I prefer what seems to be the more difficult way of viewing the system. I must be MESSED UP.

Pareto Principle

The fractal drawing period went well enough. I suggested to the students that they instruction sheets increased in difficulty (when in the same order as in the previous post). I expected most to go for the Sol LeWitt drawing. In fact, that was kind of a wash! I had really hoped to have a nice full sized Wall Drawing 797 poster made by the kids. Most went straight for the dragon fractal. I can’t really blame them, since it looks so cool, but my instructions were unclear enough to make it difficult for most. The biggest issue was that most students didn’t pick up that each “elbow line” needed to be drawn so that it connected opposite corners of the squares on a piece of graph paper. Their dragon drawings ended up looking flat and floppy, and after only a couple iterations there wasn’t a whole lot left to work with.

Most of the kids were not thrilled to be doing any mindfulness activity. Many asked if I could ask the administration to never have a mindfulness advisory day again. There were, however, a few who quietly and attentively worked on drawing the curves, and a couple did quietly exclaim that the dragon curve was pretty cool.

I’ll keep things in perspective. My goal wasn’t to make them experts on drawing these figures, but to at least expose them so that they can recognize the ideas if they happen to come across them later. I’d also hope that a seed of interest has been planted in at lease one student, so that they’d be motivated to read a bit more about these things on their own.

The ratio of interested kids to uninterested kids reminded me of something called the Pareto Principle. This rule of thumb asserts that, in a situation like this, I could expect 80% of the effort or participation to come from 20% of the students.

This is a really lousy model for the drawing activity, not only because effort isn’t a well-quantified value. Number of drawings, time spent with pencil to paper, some arbitrary standard for quality all could play into the measure of effort. In addition, the “4:1 for 1:4” Pareto rule is a goofy way of putting it; the statement that “20% do 80% of the work” can make one think that there are some students who are doing 4 or 5 times the work of another. In fact, it would mean that the hard-working students would be doing 16 times more than one from the larger population. This seems a bit high for a typical high school class.

An example of the Pareto principle for a group of five students making identical drawings. One student (20%) ends up making 16 / 20 = 80% total drawings.

The Pareto rule might work a bit better in the business example, “20% of your customers will give you 80% of the sales,” but any decent business would hopefully make predictions based on their particular situation and history. Perhaps most hover around this distribution — it doesn’t seem too wild of an idea that a few die hard regulars are the ones keeping any given bar afloat.

The real benefit of this rule of thumb is to give that sense of perspective. It would have been unrealistic for my goal to have the room be lit with energy, all of the kids scrambling around in excitement because they were given some drawing instructions.  To expect 1/5 of the students to be truly interested might sound pessimistic, but in retrospect it’s a good place to start when making expectations. It might also be a good place to start in a brand-new business.

The rule is an instance of a Pareto distribution, which is a generalization that would be able to tell you exactly how much each student is producing (rather than big groups), as well as describe groups of students who are producing work at different ratios than 4:1. The benefits of being able to describe your system (be it schoolchildren, sales, volunteer participation, whatever) with a Pareto distribution would not only lie in how accurate the shape is, but also in knowing exactly the parameters that fine-tune that shape.

The Pareto concept has other ways of showing itself. Zipf’s Law says that you order the words in the English language by how often they’re used, the Nth word in the list will be 1/N as popular as the 1st. So, you see a very small number of words showing up a very large percentage of the time in writing. Like the Pareto principle, this is a generalization, and different people, regions, and documents will have variations on word popularity. These variations allows for statistical stylometry, as well as making sure every book in the library isn’t identical.

The Internet %1 Rule (or 1-9-90 Rule) is one that suggests that only about 1% of users on a website actively create content for that site. This doesn’t mean 1% of people who read are writing news articles, but it might mean that 1% of a news site’s readers are leaving comments. I’ve heard this referred to often in the realm of podcasts, where show hosts can expect about 1% of users to email in, or participate in a contest, etc.

Again, it’s key to note the error bars we’re willing to accept. If 2% rather than 1% of listeners responded to a call for podcast questions, the creators might not notice the difference. Using the Pareto principle as a very rough rule of thumb, as a suggestion for what to expect, is the way to make it work for you effectively.