How The Optimal Classification Scaling (OC) Program Works In One Dimension:
The "Edith" Algorithm

7 May 2001



I first started working on modeling roll call voting in 1978 when I was at the University of Oregon. I developed a simple scaling program that I dubbed Edith (my wife’s middle name) and it, in effect, does a simple Guttman-like scaling of roll calls. The rationale behind Edith is the following. Suppose voting in Congress is entirely driven by one basic dimension -- liberalism-conservatism – so that a legislator’s degree of liberalism determines all his/her issue positions. Translated into standard spatial theory, members are arrayed from left to right on a single dimension, have symmetric utility functions centered at their ideal points, and when faced with a choice between the two alternatives corresponding to Yea and Nay on a roll call, they vote for the alternative closest to them on the dimension. Therefore, given the legislator ideal points, all roll call votes should look something like the following:

          YYYYYYYYYYYYYYYYYNNNNNNNNNNNNNNNNNN
                           *
          NNNYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
             *
          YYYYYYYYNNNNNNNNNNNNNNNNNNNNNNNNNNN
                  *
                        Etc.

The asterisks indicate the "cutting point" for the roll call; that is, the midpoint of the Yea and Nay outcomes. A legislator located exactly on the midpoint would be indifferent between voting Yea or voting Nay.

Suppose roll call voting is in accord with this model. Then the scaling problem consists of taking a roll call matrix and "unscrambling" it – that is, finding a rank ordering of legislators and the correct "polarity" (Yea to the left of the cutpoint or Yea to the right of the cutpoint) for each roll call such that patterns like those above are produced for each roll call. This is what Edith is designed to do. If the data is perfect, then the solution is easy and the correct rank ordering is always found (see Proof that if Voting is Perfect in One Dimension, then the First Eigenvector Extracted from the Double-Centered Transformed Agreement Score Matrix has the Same Rank Ordering as the True Data).

However, when there is error in the data, things get a bit complicated. When there is error the aim is to find a rank ordering that maximizes the correct classification of the observed Yeas and Nays. This is easier said than done because if there are n legislators, then there are n!/2 possible rank orders to check to find the best one. For example, for 50 legislators this number is about 1.52 * 1064 – a formidable number even with modern supercomputers. Consequently, Edith embodies a "sensible" search procedure (what the Operations Researchers call a "Heuristic") to find a solution. Namely, a good starting rank order of the legislators is generated using the technique discussed for the perfect voting problem ( Proof that if Voting is Perfect in One Dimension...) and the corresponding cutting points are found. These cutting points are used to get a new rank ordering of the legislators, and so on. At each step the correct classification increases until a rank order is found that produces cutting points that in return reproduce the rank order.

Finding the optimal rank ordering requires three steps:

This process can be repeated as many times as desired and it is always monotonically increasing in correct classification. The coordinates are merely a convenient computing device to produce rank orderings and the final rank ordering of the legislator coordinates is the desired rank ordering. Although this process does not guarantee that the global maximum is found, the Monte-Carlo evidence shows that the solution will either be the global maximum or extremely close to the global maximum ordering (Poole, 1990).

To see how this process works, let the current ordering of the legislators from left to right be represented by coordinates from x1 to xp such that:

-1 £ x1 £ x2 £ x3 £ ... £ xp £ +1.

(Note that "£ " allows for tied ranks – that is, identical voting patterns by two or more legislators.) Now, suppose on the jth roll call the legislators voted as indicated by the pattern of "y"s and "n"s shown above the dimension line in the Figure below. There are p+1 possible regions that the cutting point could be in:

(-1, x1), (x1 , x2), ... , (xp , +1)

and for each region there are exactly 2 possible perfect voting patterns for an overall total of 2(p+1) possible perfect voting patterns. However, region (xp , +1) is redundant since it produces the same perfect patterns as the region (-1 , x1) so it may be discarded leaving 2p unique perfect voting patterns to consider.

Actual Voting Pattern

Y Y Y Y Y Y . . . . Y Y * N Y * N Y * N N . . .N N
___________________________________________________________
-1.0 x1 x2 x3 x4 . . . . . . . 0.0 . . . . . . . xp-1 xp +1.0

Perfect Voting Patterns

(-1 , x1) produces nnnnnnnnn.....nn or yyyyyyyyy.....yy
(x1 , x2) produces ynnnnnnnn.....nn or nyyyyyyyy.....yy
(x2 , x3) produces yynnnnnnn.....nn or nnyyyyyyy.....yy
(x3 , x4) produces yyynnnnnn.....nn or nnnyyyyyy.....yy
(x4 , x5) produces yyyynnnnn.....nn or nnnnyyyyy.....yy
(x5 , x6) produces yyyyynnnn.....nn or nnnnnyyyy.....yy


etc.

(xp-1 , xp) produces yyyyyyyyy.....yn or nnnnnnnnn.....ny
(xp , +1) produces yyyyyyyyy.....yy or nnnnnnnnn.....nn

Since there are only 2p perfect patterns, it is a simple matter to compare each perfect pattern with the actual pattern of votes. This can be done very efficiently by first assuming that the cutting point is in the region (-1 , x1) and calculating the corresponding number of correct classifications. Next assume that the cutting point is in the region (x1 , x2). Only one calculation has to be made to get the correct classifications for this cutting point since the only change is that the cutting point has been moved from the left of x2 to the right of x2. If there is no missing data, either the correct classification increases by 1 or decreases by 1 when the cutting point is moved from the left of x2 to the right of x2. Similar reasoning holds for the remaining points. For each possible cutting point the correct classification corresponding to the two possible perfect patterns can be calculated. The estimated cutting point is set equal to the midpoint of the region for which correct classification is a maximum. This requires only 2p calculations to find the maximum classification cutting point. For the example shown above, placing the cutting point at the position of any of the three asterisks would produce only two classification errors for a correct classification of p-2.

Let the ordering of the q roll call midpoints estimated above be represented by coordinates from z1 to zq such that:

-1 £ z1 £ z2 £ z3 £ ... £ zq £ +1.

(Note that "£ " allows for tied ranks – that is, identical cutting points for two or more roll calls – a common occurrence empirically.) In order to estimate new legislator coordinates, the "polarity" of each roll call must be taken into account in each legislator’s voting pattern. For each roll call, the cutting point represents the dividing point between the Yeas and the Nays. If a legislator is to the left of the cutting point and votes correctly on the roll call, then denote this with an "L". If a legislator is to the right of the cutting point and votes correctly on the roll call, denote this with a "C". If a legislator is incorrectly classified on the roll call, then denote this with a "C" if the legislator is to the left of the cutting point and an "L" if the legislator is to the right of the cutting point. Note that if the legislator votes perfectly, then vis a vis the cutting points the legislator’s voting pattern will look like:

LLLLLLLLLCCCCCCCCCCCCCCCCCCCCCCCCC
or
LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLCCCCCC

depending upon where the legislator is located.

The Figure below shows that by taking the polarity of the roll calls into account the legislator problem is equivalent to the roll call cutting point problem. Hence, the algorithm described above can be used to find the optimal legislator location. There are only 2q possible perfect patterns and only 2q calculations are required to find the maximum classification location for each legislator.

Actual Voting Pattern

L L L L L L . . . . L L * C L * C L * C C . . .C C
___________________________________________________________
-1.0 z1 z2 z3 z4 . . . . . . . 0.0 . . . . . . . zq-1 zq +1.0

Perfect Voting Patterns

(-1 , z1) produces CCCCCCCCC.....CC or LLLLLLLLL.....LL
(z1 , z2) produces LCCCCCCCC.....CC or CLLLLLLLL.....LL
(z2 , z3) produces LLCCCCCCC.....CC or CCLLLLLLL.....LL
(z3 , z4) produces LLLCCCCCC.....CC or CCCLLLLLL.....LL
(z4 , z5) produces LLLLCCCCC.....CC or CCCCLLLLL.....LL
(z5 , z6) produces LLLLLCCCC.....CC or CCCCCLLLL.....LL


etc.

(zq-1 , zq) produces LLLLLLLLL.....LC or CCCCCCCCC.....CL
(zq , +1) produces LLLLLLLLL.....LL or CCCCCCCCC.....CC

This simple algorithm converges very rapidly to a solution in which the rank ordering of the legislators and the rank ordering of the roll call midpoints reproduce each other. This is a very strong form of conditional global maximum. Technically, if there are multiple sets of parameters (for example, as in this case, parameters corresponding to rows of a matrix and parameters corresponding to columns of a matrix), and every set of parameters is at a global maximum conditioned on the other sets being held fixed, and these sets reproduce each other, then they are at a conditional global maximum. Note that the overall global maximum, by definition, is a conditional global maximum.

Conditional Global maxima are quite rare because of the nature of the constraints. Indeed, in the metric similarities problem, conditional global minima are very rare and their number declines as the number of parameters increases (Poole, 1990).

A simple illustration of how Edith works can be done using playing cards. Below is the true configuration with 10 legislators represented by the Ace through the Ten of diamonds. The cutting points of nine roll calls are represented by the Ace through the Nine of Spades. Thus, the cutting point of the first roll call is between legislator one and two, and so on.
         
              

  1. Suppose we start with the scrambled ordering of the legislators shown below. Using the method described above, we would place the Ace at the left end of the dimension because this results in only one classification error. Placing it between the Ace and Two of diamonds would produce four classifcation errors -- the Six, Seven, Four, and Nine of Diamonds. The same logic holds for the Two and Three of Spades. No interior point does better. When we get to the Four of Spades, however, placing it between the Three and Eight of Diamonds produces four classification errors (the Six, Seven, Nine and Five of Diamonds). The Four of Spades could be placed at the left end also producing four errors. In the case of a tie, I always put the cutpoint/legislator at the interior most position.

     
           
           


  1. With respect to the ordering of the cutpoints represented by the Ace to the Nine of Spades, the new legislator ordering is shown below. The Ace of Diamonds is placed to the left of the Ace, Two, and Three of Spades, the Two and Three of Diamonds are placed at the same position as the Ace, Two, and Three of Spades because of the tied ranks. The Four of Diamonds is placed between the Ace, Two, Three stack, and the Four to Seven stack. The Five, Six, Seven of Diamonds are placed at the same position as the Four to Seven stack of Spades. The Eight of Diamonds is placed between the Four to Seven stack and the Eight and Nine of Spades. The Nine of Diamonds is placed at the same position as the Eight and Nine of Spades. Finally, the Ten of Diamonds is place to the right of the Eight and Nine of Spades.




  1. With respect to the ordering of the legislators represented by the Ace to Ten of Diamonds, the new ordering of the cutpoints is shown below. The Ace of Spades is placed between the Ace of Diamonds and the Two and Three of Diamonds. The Two of Spades is placed at the same position and the tied Two and Three of Diamonds. The Three of Spaces is placed between the tied Two and Three of Diamonds and the Four of Diamonds. The Four of Spaces is placed between the Four of Diamonds the the tied Five to Seven stack. The Five and Six of Spaces are placed at the same position as the Five to Seven stack of Diamonds. The Seven of Spades is placed between the Five to Seven stack and the Eight of Diamonds. The Eight of Spades is placed between Eight and Nine of Diamonds. Finally, the Nine of Spades is placed between the Nine and Ten of Diamonds.



  1. With respect to the ordering of the roll cutpoints represented by the Ace to Nine of Spades, the ordering of the legislators represented by the Ace to Ten of Diamonds is shown below. All are now in the correct position with perfect classification.




Site Links

VOTEVIEW Blog
NOMINATE Data, Roll Call Data, and Software
Course Web Pages: UGA (2010 - )
Course Web Pages: UC San Diego (2004 - 2010)
University of San Diego Law School (2005)
Course Web Pages: University of Houston (2000 - 2005)
Course Web Pages: Carnegie-Mellon University (1997 - 2000)
Spatial Models of Parliamentary Voting
Recent Working Papers
Analyses of Recent Politics
About This Website
K7MOA Log Books: 1960 - 2011
Bio of Keith T. Poole
Related Links