Up: Issue 13 Next: Paper 3 Previous: Paper 1

Voting matters - Issue 13, April 2001

Recounts with STV

B A Wichmann

Introduction

With Westminster elections, if a result is sufficiently close, a recount is undertaken to reduce the risk of an incorrect result being declared. Of course, with First Past The Post, a simple measure of the closeness of the result is possible, so that the criteria for a recount can be easily given. (A virtually identical problem has arisen with the US elections in Florida in which obsolete technology is employed!)

With STV, recounts are very rarely undertaken due to the problems that this would give. In Newland and Britton rules[1], both first and second edition, there was an instruction, at the end of each stage 'Ascertain that candidates and/or their agents are content' and a recount of the stage could be called for if not. The difficulty with this is that it may not become evident that an early stage needs checking until a later one has occurred, and the only sure strategy for candidates was always to ask for a recount after every stage. In the latest edition of the rules, those words have, in any case, been omitted.

However, when the count is conducted by computer, the computer itself can be used to assess the need for a 'recount'. The article is not concerned with the actual process of undertaking a recount (merely running the counting program again would be pointless), but with providing a tool to assess the risks of an incorrect result being obtained due to a typing error when the papers are entered manually.

This article describes a set of computer programs, developed for Electoral Reform Ballot Services, which assesses the need for a recount.

The concept

At first, I thought that the problem was too difficult to undertake, since if a change is made to even one ballot paper, it is hard (in general) to predict any change of result. However, given a computer program that can undertake a count in a matter of minutes (if not seconds) then an alternative method is available which does not require any analysis of the result of changes in specific papers.

The stages are as follows:

  1. A simple model is produced of the manual data entry process, together with the likely data entry errors.
  2. From the data entry error analysis, a computer program is produced which simulates such errors.
  3. The above computer program is used to construct a hundred (or more) copies of the original election data with simulated errors.
  4. The simulated elections are counted by program and the results compared with the original results to see if an incorrect result is likely.
This process can be made effective since the speed of modern computers allows a hundred of more copies of an election to be counted in a reasonable time. (It is surely sufficient for an overnight batch computer run to produce the result - although for smaller elections, a result should be obtained in a few minutes. Examples so far have only taken about an hour to run.)

The system

The system consists of two programs: one which produces copies of the original election with data errors added, and another which analyses the results from all the elections. Provision is made to handle the Meek rules[2] or the ERS97 rules[1]. In addition, a batch execution run is produced to call the relevant election counting program on all the simulated elections.

The data entry model is essentially one of key depressions using ballot papers in which the voter adds preference numbers. Since typing errors have known patterns, a reasonable guess can be made of the potential errors in terms of those errors. However, it is difficult to accurately calibrate the rate of errors. Such errors are naturally rare, say 1 in 5,000 characters, but at this rate one would need to double-check many thousands of characters to obtain a good estimate of the error rate. In addition, the computer entry programs used for ballot entry already include some checks and hence the error simulation program ensures that these checks will be passed. Also, the staff of ERBS are naturally familiar with the requirements and appear to take special care with the first preference (not actually allowed for in the current program). There is some evidence that the staff at ERBS may realise at the end of the ballot paper that they are 'out-of-step' and hence go back to correct an error. In view of the above, there is clearly some doubt as to the accuracy of the model of data errors, but the statistical nature of the problem makes some doubt inevitable.

After some experimentation, the data error rate was set at one key depression per 6,000 characters. However, if the error would then be detected by the STV program, such as arising from a repeated preference, the corresponding change is not made.

Results

This can be illustrated by an example taken from a real election (which has been made anonymous).
Data error analysis program, version 1.01
Basic data of original election:
 Title: R048: STV Selection Example 1
 To elect 10 from 29 candidates.
 Number of valid votes: 944
 Count according to Meek rules

Data used to simulate input errors to count:
 Key errors taken as 1 in 6000 key depressions.
 Duplication and removal of papers taken
                     as 1 in 6000 papers.
 Number of simulated elections produced: 100
 Seeds were initially:  16215,  15062 and   7213
          and finally:  17693,  15003 and  25920


Some statistics from the generated election data:
 Average number of commas added for each election: 1
 Average number of commas deleted for each
                                         election: 1
 Average number of interchanges for each election: 2
 Average number of papers deleted for each
                                         election: 0
 Average number of papers duplicated for
                             each election: 0
 Average number of papers changed for each
                                  election: 4
 Average number of papers changed at
                            preference: 1 is 1

Candidates elected in the original election and all
simulated ones:
 Jane BENNETT
 Robert BROWNING 
 Joan CRAWFORD
 Francis DRAKE
 Mary-Ann EVANS
 Kate GREENAWAY
 John MASEFIELD
 Alfred TENNYSON 
 Sybil THORNDIKE 

Candidates not elected in the original election or
any of the simulated ones:
 James BOSWELL
 Emily BRONTE
 George BYRON
 Eric COATES
 Ella FITZGERALD 
 Stella GIBBONS
 Graham GREENE
 Sherlock HOLMES 
 Samuel JOHNSON
 John KEATS
 Alice LIDDELL
 Harold PINTER
 Walter RALEIGH
 Margaret RUTHERFORD
 Will SHAKESPEARE
 Percy SHELLEY
 John WESLEY
 Virginia WOOLF

Number of other candidates: 2
 Original Result  Simulated Result(95% conf. limits) Name
     Elected          Elected  98% ( 93% to 100%)     Clara BOW 
  Not  Elected    Not Elected  98% ( 93% to 100%)     Benjamin FRANKLIN

End of report
The program records the known details of the election which includes the type of count used: Meek in this case. Then the statistics are recorded on the simulated elections. Firstly, there is the key depression error rate used, then the seeds used for the pseudo-random generator so that the process can be re-run if required. Then a summary is produced of the changes made to the papers. Note that one of the changes is that of repeating and duplicating a paper (both changes are needed to reflect the checks made on the total number of papers). The commas indicate moving onto the next preference. Note that of nearly 1,000 papers, typically one change is made to the first preference position.

Of course, the changes that will be of most interest are those relating to the election of the candidates. The first two lists are the candidates which are always elected or always excluded - there should be no doubt about the status of these.

The last table indicates the position with those candidates whose status varied in the 101 elections performed (1 original and 100 simulated).

The number of such candidates is two. In the case of Clara Bow, she was elected in the original election and also in 98% of the simulated ones, ie in two cases she was not elected. The case with Benjamin Franklin is exactly the opposite. However, merely knowing that percentage is not what is required. We need an estimate of the probability of an incorrect result, which is the likely value of the percentage in the long run, that is if infinitely many simulated elections were used. This long-term value is estimated to lie between 93% and 100% (to a 95% probability).

In this particular case the result is not seriously in doubt. However if the percentage range included the 50% figure, then it is proposed that this would be sufficient to require a recount.

Conclusions

The method proposed here appears to be an effective means of determining if a recount should be undertaken for an STV election. However, the technique does depend upon a statistical model of the nature of the data preparation errors which is always going to be hard to produce.

The method can be applied to assess the impact of data errors arising from mechanically produced data, assuming the data error rate is high enough to warrant its use.

I am grateful to David Hill who provided some Pascal code which gives the 95% probability ranges - a vital part of the system.

References

  1. R A Newland and F S Britton. How to conduct an election by Single Transferable Vote. ERS, 1973, 1976 and 1997.
  2. I D Hill, B A Wichmann and D R Woodall. Algorithm 123 - Single Transferable Vote by Meek's method. Computer Journal. Vol 30, pp277, 1987.

Up: Issue 13 Next: Paper 3 Previous: Paper 1