Visualizing text as paths

I was reminded recently that I actually have a blog. I have a Cellular Automata Part II post that’s been sitting around since April, but here’s something quick before I get around to polishing that up.

During a prescribed Art Night, I decided to come up with a visualization of long texts. I grabbed a few examples of literature from Project Gutenberg as plain text files, and converted to binary using each letter’s ASCII character code. $a\qquad\to\qquad 97\qquad\to\qquad \begin{array}{ccccccc} 1 & 1 & 0 & 0 & 0 & 0 & 1 \\ \end{array}$

An entire word will then create a string of ones and zeroes 7 times longer. The visualization comes in by turning string into something like the output of a Lindenmayer system. A line is drawn, and at each step it reads the next character. A one instructs the drawing to take a right turn, and a zero instructs it to take a left turn. $\text{baloney} \\ \strut\quad\downarrow\\ 98, 97, 108, 111, 110, 101, 121 \\ \strut\quad\downarrow \\ 110001011000011101100110111111011101 \\ \strut\quad\downarrow \\ RRLLLRLRRLLLLRRRLRRLLRRLRRRRRRLRRRLR$

Starting from the bottom of an image, drawing upward, and taking these turns one at a time gives a blocky figure. “baloney”. There are 49 turns, from each of baloney’s 7 characters’ 7 bits.

This was pretty quick to implement. It initially took about 10 or 15 minutes to finish for single-MB size books using repeated AppendTos in Mathematica to construct a list of successive points along the path. A rewrite using Reap/Sow made these finish in less than a minute. I have no learning about optimization, except that the latter is supposed to be fast in M-ca. I suspect that Append rewrites the entire list each time, while Reap/Sow does not. There are likely 800 other better languages in which to write this kind of thing, but hey! Heyo! Woo!

Gutenberg adds in header and footer to each text file, saying some standard stuff about where it was downloaded and when it was written, etc. For each of these, I removed everything except the nitty gritty, proper text. This would only serve to tack on a bit at the beginnings and ends, ultimately not changing much to the shapes. They would have been rotated, though.

Here are the results for 4 books. Here’s a zoomable pdf of the Frankenstein one. In each of these, a green point marks the beginning, and a red point the end. Frankenstein’s green point, for example, is in the lower right, and its red is on the ear of the evil head in the upper left. Boo! Frankenstein; Or, The Modern Prometheus. Mary Wollstonecraft (Godwin) Shelley.

It was really fun to get these results. I acknowledge that there is probably little meaning in these images from a stylometric point of view, but it is interesting to observe the differences. I’m sure these are very sensitive to format, the words themselves, and the ASCII codes for English letters. The Bible is loaded with many paragraph breaks and colons for each verse, in contrast to the long paragraphs of Ulysses. Ensuring that each character begins with a right turn (1) guarantees that each image is going to have some kind of curliness. A single typo wouldn’t change the shape of the subsequent curve, but it would lead to a rotation about that point.

So, is it just happenstance that Ulysses is so much more unfolded than Moby Dick? Is there something fundamentally different about the second half of Moby Dick that makes it hover around the end point?

An issue with the visualizations themselves is that they do not show any overlap as a line covers regions it has already followed. A way around this would be to assign an increasing height to each point, and show these paths as text tunnels in space. These could be seen as a 3D model or as projected images from different directions. Maybe I’ll try this out.

The full image files showing these walks are about 10x the size of the original text files. A text file storing the list of ordered pairs of the path is about 100x the original text file size. So, a forseeable application of this technique could be to make text file sizes much larger, as well as difficult to read and process.

As a final thought, it was easy to call these random walks. Clearly these aren’t random in that they carry the meaning of the texts, but perhaps English letters appear random after being converted in this way. A way to test this is by looking at the distance from the starting point — A true random walk of $N$ steps ought to be a distance of roughly $\sqrt{N}$ from the starting point.    None of these seem to be plots of distance = sqrt steps.

Parameterizing a circle with the intersection point of two perpendicular lines.

I’ve been really taken with Desmos, an online calculator and easy to use graphing tool. My students have been using it for some time, and I’m especially happy with the “slider” tool that it offers. Whenever you put a letter into a function while graphing, it suggests a value to assign it, and lets you tune that value with the slider. This tool is similar to Mathematica’s Manipulate or Animate functions, which I’ve had success using in previous classes to show how a function depends on its parameters.

My year-long teacher’s Mathematica license recently expired, making it a bit tougher to install on a new device. While I do have access to an unsupported copy, Desmos has more than replaced M-ca for any of my presentation needs.

In a recent class, we were playing around with linear systems and intersecting lines. To show that a negative reciprocal slope leads to a perpendicular line, I assigned a slider to the value m, and made two linear equations with slopes of m and -1/m: $y=mx \qquad\text{and}\qquad y=-\cfrac{1}{m}\ x$

The slider has the nice effect of letting you rotate the lines to see that they’re always perpendicular. Play with it yourself, why don’t ya. You can animate or adjust the slope with the m slider on the left.

The kids were delighted by the pinwheel spinning of the lines as the slope was adjusted. To show that we weren’t limited to lines that passed through the origin, I tacked on a y-intercept to both of the equations, and asked the students, what do you think happens when I adjust the slope now?

My point was to show that the lines remain perpendicular. I would have been pleased to hear that the students could also predict that the point of intersection of the two lines would now move around, instead of be fixed at the origin.

One student went further, however: he was able to predict that the point of intersection of the two lines will always be fixed to a circle. You can adjust the slope once again, as well as the points the line are fixed to pass through using the sliders. Only adjusting the slope m keeps the point of intersection on a circle. You can also adjust the points the lines are forced through with the sliders below.

The student had seen a connection to his geometry class from the previous year. An inscribed angle is half the measure of the intercepted arc. An angle inscribing half the circle must then be a right angle. What this student had realized was the converse of this statement: that a right angle, formed by two perpendicular lines each forced to pass through particular points, must lie on a circle, and those two points are the endpoints of a diameter of that circle. I thought this was awfully insightful!

I figured it would be neat to try to show that this must be true on my own. Solving the system $\left\{ \begin{array}{c} y =m (x-a)+c\\\\ y=-\frac{1}{m}\ (x-b)+d \end{array} \right.$

gives the point $\left(\frac{a\thinspace m^2+\left(d-c\right)m+b}{m^2+1},\frac{d\thinspace m^2+\left(b-a\right)m+c}{m^2+1}\right)$

I thought this was really neat. We haven’t shown that this point lies on a circle yet, but assuming it does, it shows a way to parameterize a circle with m as the ratio of quadratics. Maybe this is something a mathematician would immediately recognize, but it’s new to me!

To show this does lie on a circle, I need to find an appropriate transformation that turns the above into the more familiar $\left( R\cos\theta + x_1 , R\sin\theta + y_1\right)$

for a circle of radius and center $(x_1,y_1)$. The obvious choice is to connect the slope of one of the lines to the angle on the circle: $m \rightarrow \tan\theta$

The parameterization becomes $\left(\frac{a\thinspace \tan^2\theta+\left(d-c\right)\tan\theta+b}{\tan^2\theta+1},\frac{d\thinspace \tan^2\theta+\left(b-a\right)\tan\theta+c}{\tan^2\theta+1}\right)$

This is where all your trig identities pay off. Those denominators become squared secants, letting you get rid of the fractions altogether. $\left(\frac{a\thinspace \tan^2\theta+\left(d-c\right)\tan\theta+b}{\sec^2\theta},\frac{d\thinspace \tan^2\theta+\left(b-a\right)\tan\theta+c}{\sec^2\theta}\right)$ $\bigg(\enspace a\thinspace \sin^2\theta+\left(d-c\right)\sin\theta\cos\theta+b\cos^2\theta\quad,\quad d\thinspace \sin^2\theta+\left(b-a\right)\sin\theta\cos^2\theta+c\cos^2\theta\enspace\bigg)$

I’m having a bit of difficulty with formatting here. I’ll have to just write it like so: $\begin{array}{c} x=a\thinspace \sin^2\theta+\left(d-c\right)\sin\theta\cos\theta+b\cos^2\theta \\\\y=d\thinspace \sin^2\theta+\left(b-a\right)\sin\theta\cos\theta+c\cos^2\theta\end{array}$

The middle bits of these should pop out: a sine times a cosine is a part of one of the double angle formulas: $2\sin\theta\cos\theta = \sin2\theta$

While we’re tossing in sine of a double angle, we might as well introduce the cosine of the double angle as well. This shows up from the squares: $\sin^2\theta = \cfrac{1-\cos2\theta}{2} \qquad \text{and} \qquad \cos^2\theta=\cfrac{1+\cos2\theta}{2}$

Our parameterization becomes $\begin{array}{c} x=a\thinspace\left(\cfrac{1-\cos2\theta}{2}\right) +\left(\cfrac{d-c}{2}\right)\sin 2\theta+b\left(\cfrac{1+\cos2\theta}{2}\right) \\\\y=d\thinspace \left(\cfrac{1-\cos2\theta}{2}\right)+\left(\cfrac{b-a}{2}\right)\sin2\theta+c\left(\cfrac{1+\cos2\theta}{2}\right)\end{array}$

What’s neat about this is that the center of the circle now falls out as a constant term at the end, and we’ve maintained some kind of symmetry with the sines and cosines. $\begin{array}{c} x= \left(\cfrac{b-a}{2}\right)\cos2\theta + \left(\cfrac{d-c}{2}\right)\sin2\theta + \left(\cfrac{a+b}{2}\right)\\\\ y= - \left(\cfrac{d-c}{2}\right)\cos2\theta + \left(\cfrac{b-a}{2}\right)\sin2\theta + \left(\cfrac{c+d}{2}\right)\end{array}$

Here’s where my trig knowledge stopped. The sine and cosines can be combined, though: a linear combination of sine and cosine should leave a single sine curve, but with a phase angle tossed in. $w \cos\theta + u\sin\theta = \sqrt{w^2+u^2}\enspace \sin\left(\theta+\arctan\frac{w}{u}\right)$

This is great! We’ve got a way to combine the a, b, c, ds to get something looking like a radius. $\begin{array}{c} x = R \sin\left(2\theta + \arctan\left(\cfrac{a-b}{d-c}\right)\right) +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{c+d}{2}\right)\end{array}$

with $R = \sqrt{\left(\cfrac{b-a}{2}\right)^2+\left(\cfrac{d-c}{2}\right)^2}$

At this stage, we need to do is turn that first sine into a cosine (using $\sin\theta = \cos\left(\theta-\frac{\pi}{2}\right)$. $\begin{array}{c} x = R \cos\left(2\theta + \arctan\left(\cfrac{a-b}{d-c}\right)-\cfrac{\pi}{2}\right) +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{c+d}{2}\right)\end{array}$

We’re left with one remaining question: are the phase angles the same? $\arctan\left(\cfrac{a-b}{d-c}\right)-\cfrac{\pi}{2} \quad \stackrel{?}{=}\quad \arctan\left(\cfrac{c-d}{a-b}\right)$

A couple more identities that I don’t have memorized clears this up: $\arctan(-x) = -\arctan(x) \qquad\text{and}\qquad \arctan\left(\cfrac{1}{x}\right) - \cfrac{\pi}{2} = -\arctan(x)$ $\longrightarrow \arctan(-x) = \arctan\left(\cfrac{1}{x}\right) - \cfrac{\pi}{2}$

This answers the question above: yes! Our parameterization is $\begin{array}{c} x = R \cos\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\left(2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)\right) +\left(\cfrac{c+d}{2}\right)\end{array}$

If we wanted to make it a bit nicer, replace: $\phi = 2\theta + \arctan\left(\cfrac{c-d}{a-b}\right)$

and we get a nice $\begin{array}{c} x = R \cos\phi +\left(\cfrac{a+b}{2}\right) \\\\ y =R \sin\phi +\left(\cfrac{c+d}{2}\right),\end{array}$

a circle centered at x=(a+b)/2 and y=(c+d)/2. Woof! Bark bark! Woof woof bark!

Hello, This is an Blog

This is my magic LiveJournal. It is real. It comes with a nice default picture which I am deciding to leave here. I can choose to write ABOVE the pictures as well as BELOW the picture. You can see that this gives me a number of options. Two of them. Please record that in your Blog Notebook now.