Alan B. Scrivener
Human Interface Prototypes
abs@well.com
27 June 2004
By representing a social matrix as a binary triangular array, and then permuting rows and columns to minimize the perimeter separating ones and zeroes, favoring ones in the upper left corner, a notion of a canonical form for the matrix has been devised. This has then been used as a starting point for adding dynamic display of multivariable information in a variety of ways.
Recent decades have seen many advances in the analysis and visualization of networks, both generalized mathematical networks as well as real-world computer networks, social networks, knowledge networks and the like. [BUCHANAN2003].
Many recent advances have been made by vendors and contributors of software for computer network monitoring. [LINK1]
A cursory review of many of these offerings shows that most use variations on a visualization technique sometimes known as "data constellations," which involves representing nodes as circles or rectangles and the edges connecting them as line segments. Usually the most-connected "super-nodes" are shown near the middle of the diagram, and proximity is used where possible to indicate connectivity as well.
For example, this diagram of disease transmission uses the "data constellations" technique:
![]() disease outbreak visualization at PBS web site [LINK2] |
One disadvantage of this technique is that it works best with sparse networks. When the connections become sufficiently dense, the picture becomes so cluttered that it is difficult to trace individual connections.
For example, this diagram of connections between computers gives a rough idea of connection density, but does not easily communicate each link:
![]() internet topology visualization at SDSC web site [LINK3] |
(This author is reminded of a facetious map of the Los Angeles freeway system published in the humor magazine National Lampoon in March of 1972. [LINK4] As part of an article entitled "The Miracle of California," this map had a key indicating that each freeway was shown as a black line. The map was almost all black, with a few dozen thin triangular slivers of white.)
These traditional techniques have not been very good at the following:
It is the goal of this project to develop techniques which can better solve these visualization problems.
In the well-developed field of algebraic graph theory [BIGGS1994], a graph (network of nodes) is represented by a binary matrix, in which all values are either zero or one. Each row and column of the matrix corresponds to a node in the graph. Each matrix element corresponds to a possible edge (connection). If the graph is undirected (i.e., the edges have no arrowheads, and so are bi-directional), the matrix is symmetric along the diagonal from upper left to lower right. Furthermore, if self-connecting edges are disallowed, then all pertinent information can be represented in a triangular matrix of size n by n-1 (where n is the number of nodes), having n*(n-1)/2 elements, each representing a possible edge.
EXAMPLEFor example, consider the following four graphs, numbered one through four. First we describe them in words:
The above four descriptions completely describe the four graphs, and the presence or absense of all potential edges. Lets consider other representations of the same four graphs. We can draw a polygon of dots for the nodes and connect them with lines; this is sometimes called the string sculpture technique. Here are our four example graphs drawn using this technique, in this case with asterisks (*) for the nodes. Compare these representations with the above descriptions:
(1) (2) (3) (4)
A A A *---* B A *--* B
* * | | \/|
/ \ / | | /\|
B *---* C B *---* C D *---* C D *--* C
Another way to represent a graph is with a binary matrix, i.e., a matrix containing only ones and zeros. You begin with the matrix set to all zeros, and then, having each row and each column represent a node, indicate a connection between two nodes by setting a one at the row and column postition, and also the column and row postiition. For example, if A connects to B then you set a one at row A, column B and at row B, column A. Here in square matrix form are the four example graphs:
(1) || (2) || (3) || (4)
A B C || A B C || A B C D || A B C D
------ || ----- || -------- || --------
0 1 1 | A || 0 1 0 | A || 0 1 0 1 | A || 0 1 1 0 | A
1 0 1 | B || 1 0 1 | B || 1 0 1 0 | B || 1 0 1 1 | B
1 1 0 | C || 0 1 0 | C || 0 1 0 1 | C || 1 1 0 1 | C
|| || 1 0 1 0 | D || 0 1 1 0 | D
As mentioned above, and as you can see in these examples, each matrix has a zero diagonal from the upper left corner to lower right corner, and each is mirror-symmetric about that diagonal. For these reasons, the lower left half and the diagonal can be eliminated without losing information, leaving a triangular matrix. Here are the four example graphs in triangular matrix form:
(1) || (2) || (3) || (4)
|| || ||
B C || B C || B C D || B C D
1 1 A || 1 0 A || 1 0 1 A || 1 1 0 A
1 B || 1 B || 1 0 B || 1 1 B
|| || 1 C || 1 C
By inspecting each matrix we can see, for example, that graph (1) has all possible edges in place, graphs (2) and (4) have all but one, and graph (3) has all but two.
|
There are many more benefits to this notation. There is a powerful and elegant body of theory that uses algebraic manipulations to answer questions about any given graph, including how many cycles (closed loops) it has, whether it is a spanning tree (visiting each node once and only once), and so on. There are also isomorphisms to the theory of error-correcting codes invented by Richard W. Hamming of Bell Labs in 1940s. [HAMMING1986] [LINK5]
The current explorations are based on the author's flash of intuition, subsequently explored rigorously: that the triangular matrix representation can form the basis for a new family of graph visualization techniques.
One problem that the "data constellation" methods share with the triangular matrix representation is that the position of the nodes have no real geometric meaning. This is problematic because human perception is compelled to seek and find such meanings even where they may not exist. A side-effect is that there are many ways to represent a single graph which may look very different from each other. Therefore, we have been seeking some sort of "canonical representation" for a given graph, to enable comparisons to be made easily.
A second intuitive flash was the idea that it would be better if a matrix is more "blobby." By this we mean that the ones are clustered together. A more rigorous definition of this is that the perimeter between zeroes and ones is minimized. For example, consider this eleven node triangular matrix:
B C D E F G H I J K
0 1 0 0 1 0 0 1 0 0 A
1 0 1 1 0 1 1 0 0 B
1 1 1 1 1 1 0 0 C
0 0 0 1 1 0 0 D
1 1 1 1 0 0 E
0 1 1 0 0 F
1 1 0 0 G
1 0 0 H
1 1 I
0 J
First we add lines of just plus sign characters (+) and spaces, to emphasize the spaces between the digits:
B C D E F G H I J K
0 1 0 0 1 0 0 1 0 0 A
+ + + + + + + + + +
1 0 1 1 0 1 1 0 0 B
+ + + + + + + + +
1 1 1 1 1 1 0 0 C
+ + + + + + + +
0 0 0 1 1 0 0 D
+ + + + + + +
1 1 1 1 0 0 E
+ + + + + +
0 1 1 0 0 F
+ + + + +
1 1 0 0 G
+ + + +
1 0 0 H
+ + +
1 1 I
+ +
0 J
+
Then by adding minus (-) and vertical bar (|) characters we draw the perimeter between the zeroes and ones:
B C D E F G H I J K
0|1|0 0|1|0 0|1|0 0 A
+ + +-+ + +-+ + + +
1|0|1 1|0|1 1|0 0 B
+-+ + +-+ + + + +
1 1 1 1 1 1|0 0 C
+-+-+-+ + + + +
0 0 0|1 1|0 0 D
+-+-+ + + + +
1 1 1 1|0 0 E
+-+ + + + +
0|1 1|0 0 F
+ + + + +
1 1|0 0 G
+ + + +
1|0 0 H
+-+-+
1 1 I
+-+
0 J
+
In this case there are 13 minuses and 19 vertical bars, for a total of 34 perimeter units. If you wanted to buy fence for this triangle matrix, to keep the zeros and ones from mingling, you'd need 34 lengths of fence. But what does it mean for a relationship to be near the upper left corner, say, of the triangle? Does this indicate anything? No, it doesn't. The order of listing the nodes both across and down is arbitrary, as long as they match up. What if we permute the order of the nodes ("scramble the letters") -- does it change the meaning? Again, no. But the mind searches for meaning in a visual display, and so we were looking for a way to make this position information in some way meaningful.
Our original planned procedure was to permute the order of the rows and columns ("scramble the letters") through all possible combinations, until the perimeter number was minimized. This would be the "blobbiest" form of the triangle array.
One effect of doing this would be to reduce the data. There are many graphs that have the same topology, i.e., if you remove the labels you can't tell them apart. This approach would cause each of those graphs to be displayed the same way, except for the labels. (There is an analogy here to hashing algorithms. [LINK6])
Once we developed this technique (and accompanying software tools) we planned to use the approach as a springboard to more complex visualizations of multiple types of relationships changing over time.
Before we start crunching numbers, let's do some thinking. A graph with one node is trivial: it can't have any edges. Likewise a graph with two nodes is also trivial: its only possible edge is either missing or present. In either case you could argue that it is already in "canonical form." When we get to three nodes things get a little more interesting. First, consider the graph with no edges:
B C
0 0 A
0 B
It can be argued that it is already in "canonical form" since no matter how you permute it it doesn't change.
The same can be said of the graph with three edges:
B C
1 1 A
1 B
Next, consider the graph with one edge:
B C B C B C
1 0 A 0 1 A 0 0 A
0 B 0 B 1 B
It resembles the graph with two edges, in that if you exchange all zeroes for ones and vice versa the matrices are the same as above:
B C B C B C
0 1 A 1 0 A 1 1 A
1 B 1 B 0 B
Taking a page from group theory, if we solve one we've solved the other. From inspection we see that (for example, in the case of one edge) we can permute any matrix into any other. The first and third have shorter perimeters than the second, so we can make an arbitrary choice, and select the first one. Here is how it looks with its perimeter drawn:
C A
1|0 B
+ +
0 C
+
Moving on to a graph with four nodes, there are six elements, each of which can be zero or one, so there are two to the sixth power, or 64 possible matrices. Each can be permuted 6! or 24 ways. This yields 24*64 or 1,536 permutations to inspect, so it must be time to stop thinking and start programming.
(Notice how quickly the problem shifted from trivially obvious to effectively intractable without a computer!)
We wrote a Java application called NetworkBlobs which takes as input a list of node pairs, and outputs the text forms above or graphical displays of a matrix, with or without perimeters. Here is a 5 node matrix without the perimeters drawn:
![]() 5 node graph as triangle matrix; rows and columns (A-E) are individuals (nodes); squares in the triangle matrix are potential relationships (edges, connections); grey = connected, black = not connected |
Here is a 200 node matrix without the perimeters drawn (clearly this method of display can be used until one runs out of pixels, probably at about a thousand nodes):
![]() 200 node graph as triangle matrix; rows and columns are individuals (nodes); squares in the triangle matrix are potential relationships (connections or edges); grey = connected, black = not connected |
The Java program could also draw the perimeters. Here is a 13 node matrix with the perimeters drawn:
![]() 13 node graph as triangle matrix; rows and columns (A-M) are individuals (nodes); squares in the triangle matrix are potential relationships (connections or edges); grey = connected, black = not connected; orange = perimeter (boundary between 0s and 1s); light gray = spacer |
At this point we decided we needed some real data to operate on before we went much further. The author's 15-plus years experience in scientific visualization [LINK7] have taught a profound respect for real data. It often times can break your code in ways you never imagine when you create "fake" test data.
Our client was scheduled to provide data from a proprietary "massive multi-player online game" (MMOG), but this was months away, and we wanted to begin testing the software right away. We turned to some published social network information, the author's high school yearbook. As a mild sort of "prank," a group of 13 friends -- all graduating seniors -- bought an advertisement in the back of the yearbook with a group photo with whacky props and enigmatic quotations:

The yearbook also had listings for each of the individual students, including which clubs they'd belonged to for each of their four years in high school; here is an example:

From this I derived a list of each student and which clubs they belonged to, and as a consequence who they were connected to by clubs:
A. B___, Bob with Rouie(C), Rick(F) and Alan(I) in Experimental Arts Club B. B___, Mollie with Rouie(C), Wayne(H), Rick(F) and Eric(H) in Computer Club with Rouie(C), Rick(F), Eric(H) and Alan(I) in Student Council Convention with John(M) in Science Club C. C___ ,Rouie with Wayne(H), Mary(G), Eric(H) and Alan(I) in Band with Mollie(B), Wayne(H), Rick(F) and Eric(H) in Computer Club with Mollie(B), Rick(F), Eric(H) and Alan(I) in Student Council Convention with Bob(A), Rick(F) and Alan(I) in Experimental Arts Club with Eric(H), Alan(I) and Wendi(L) on staff of Obra Magazine with Rick(F), Eric(H), Alan(I) and John(M) in Secret Club with Loretta(D), Eric(H) and Alan(I) in Tolkien Club D. H___ ,Loretta with Rouie(C), Eric(H) and Alan(I) in Tolkien Club E. H___ ,Wayne with Rouie(C), Mary(G), Eric(H) and Alan(I) in Band with Mollie(B), Rouie(C), Rick(F) and Eric(H) in Computer Club F. J___ ,Rick with Mollie(B), Rouie(C), Wayne(H) and Eric(H) in Computer Club with Mollie(B), Rouie(C), Eric(H) and Alan(I) in Student Council Convention with Bob(A), Rouie(C) and Alan(I) in Experimental Arts Club with Rouie(C), Eric(H), Alan(I) and John(M) in Secret Club G. J___ ,Mary with Rouie(C), Wayne(H), Eric(H) and Alan(I) in Band H. L___ ,Eric with Rouie(C), Wayne(H), Mary(G) and Alan(I) in Band with Mollie(B), Rouie(C), Wayne(H) and Rick(F) in Computer Club with Mollie(B), Alan(I) and Wendi(L) on staff of Obra Magazine with Rouie(C), Rick(F), Alan(I) and John(M) in Secret Club with Rouie(C), Loretta(D), and Alan(I) in Tolkien Club I. S___ ,Alan with Rouie(C), Wayne(H), Mary(G) and Eric(H) in Band with Mollie(B), Rouie(C), Rick(F) and Eric(H) in Student Council Convention with Bob(A), Rouie(C) and Rick(F) in Experimental Arts Club with Graham(K) in Football with Mollie(B), Eric(H) and Wendi(L) on staff of Obra Magazine with Tom(J) and John(M) in Rocket Club with Rouie(C), Rick(F), Eric(H) and John(M) in Secret Club with Rouie(C), Loretta(D), and Eric(H) in Tolkien Club J. S___ ,Tom; with Alan(I) and John(M) in Rocket Club K. V___ ,Graham with Wendi(L) in California Scholastic Federation (CSF) with Alan(I) in Football L. W___ ,Wendi with Graham(K) in California Scholastic Federation (CSF) with Rouie(C), Eric(H) and Alan(I) on staff of Obra Magazine M. Z___ ,John with Alan(I) and Tom(J) in Rocket Club with Mollie(B) in Science Club with Rouie(C), Rick(F), Eric(H) and Alan(I) in Secret Club |
Ultimately we boiled this down to a file of connections between all the students, as shown by this partial listing (where A=0, B=1, C=2, D=3, E=4, F=5, G=6, H=7, I=8, J=9, K=10, L=11, M=12):
0 2 0 5 1 2 1 4 1 5 2 0 2 1 2 3 2 4 2 5 2 6 3 2 4 1 ... |
We could feed this list directly to our Java application, NetworkBlobs, and it would store the triangular matrix as a Java object, capable of writing out its data by several different Java methods, such as the text-only techniques and the colored squares techniques seen above.
We also gave our object the ability to compute its perimeter, or number of orange rectangles in the pictures above.
Next we enhanced the Java program to step through all permutations of a matrix, retaining the form that minimized the perimeter. (We were aided in this by some Java code for generating permutations by Michael Gilleland of Merriam Park Software. [LINK8])
We quickly discovered two things. First, the problem as originally posed is intractable on a PC for numbers of nodes over about 12. See the following table, where:
| n | n! | runtime | microsec/n! |
|---|---|---|---|
| 7 | 5,040 | 1.340s | 265.8730 |
| 8 | 40,320 | 1.835s | 45.5109 |
| 9 | 362,880 | 11.192s | 30.8421 |
| 10 | 3,628,800 | 2m 4.163s | 34.2159 |
| 11 | 39,916,800 | 31m 21.444s | 47.1341 |
| 12 | 479,001,600 | 11h 46m 13.880s | 88.4629 |
| 13 | 6,227,020,800 | [unknown] | [unknown] |
Note that the runtime is unknown for n=13; our estimate was about 30 days! For this short-duration exploration project we couldn't use 30 days of processor time for one computational experiment. We have ideas for improving the algorithm's efficiencies; see FUTURE WORK below.
Second of all, we found many cases where there were a number of patterns which "tied" for first place by the scoring we used, based solely on perimeter length. We iterated until we found that adding a "tie-breaking" small score which preferred the upper left corner for ones in the matrix, we could break symmetry and have fewer canonical forms that categorized graphs as similar if they had symmetric relationships.
Once we had data, though it was a small amount, we went to work finding different ways to visualize it.
We visualized the yearbook data for subgroups, mostly of 11 or less, since that would run in about half an hour. Starting at networks with three nodes and working our way up to 11 by throwing away nodes, and looking at the canonical form of each sub-group, we noticed that the most-connected player thus far is always on the top row, and it seems to be close to a popularity contest in general. (If this turns out to be true, and provable, it could save a whole lot of calculation.)
With our best canocal form generator to date, we took the 11-node network data (left) and produced this canonical form (right):
|
|
Improvements in algorithm performance, access to better hardware, and the passage of time will all allow for more, and larger, runs.
In pondering how to proceed from this base, we remembered some previous research done for Mindtel LLC, in which he had vectors of data (i.e., sets of values) we wanted to display at once. In one case we had public health data from three villages on the Big Island of Hawaii, giving us numbers of villagers in each of six categories: Adult M, Adult F, Child M, Child F, Infant M, Infant F. We showed each of the six values as the size of a ring, or toroid, where color indicated deviation from average: red was high deviation, green was low. The base ring showed an average score over the six variables, again with color indicating deviation from average and size being total population. This image is showing how all villages have below-average number of children, and most have below average numbers of women as well:
![]() village health data in Hawaii (rings repesent number of villagers in several age and gender groups) [LINK9] |
In a second project, we worked on ways to visualize text data from TIDES, a daily email briefing for the US government that excerpts all English language or translations in electronic (searchable) news media mentioning the Middle East and the U.S. war on terror. We plotted calendar days against time zones (for east/west position) to make a surface whose height was total words in all the TIDES news feeds, and then used toroid shapes to show frequencies of a number of words we were scanning for.
![]() TIDES text data as space-time chart with word-count rings ( [LINK10] |
After some introspection, the idea was hatched to use the toroids to represent types of relationships or connections. The yearbook data told us which clubs or other groups the nodes belonged to. We used the same base toroid on a cylindrical pole, ringed by a number of toroids (some have compared to these palm tree leaves). The position of each "palm leaf" toroid indicates the type of group. The color and size of each "leaf" indicates how "strong" the membership is (e.g., how many years were spent in the club). The NetworkBlobs program was modified so that a triangle matrix object could output the check positions and values in a format that another set of tools could use to make the rings. Our first experiments with this approach were encouraging.
![]() rings (toroids) above checks; far and left edges of triangle base = nodes; checks = connections, grey = connected, black = not connected; white cylinders and horizontal rings for calibration only; angular position of vertical rings = type of group; size and color of vertical rings = degree of group memberships, small/red = low, large/green = high |
The most common feedback which we received from other researchers and associates at this point was, "Very interesting, but what does it mean?" We decided to focus next on mapping this new notation back to known forms.
At about this point we got access to some of the first data from the multiplayer online game.
We visualized in small pieces at first it by the following approach:
We found events of three types, which we called c, r and w, occurring to a group of 7 simultaneous online players, which we named A-G, over a few minutes of game play. We looked at each event in turn as a triangular matrix and as a string sculpture display, side by side.
|
|
|