I try to support alternative media, but sometimes, I do wonder: http://theonlinecitizen.com/2012/03/paps-election-victories-a-statistical-analysis/
A friend linked this on Facebook this morning. Being in a hurry to go to lecture, I skimmed through it in 30 seconds, and spent the hour in lecture intermittently thinking about how dodgy it looked.
Firstly, inequalities in popular vote vs “seats” are not new. The United States Presidential Election uses the electoral college system, and in 2008, Obama won 365 out of 538 “seats” whilst only obtaining 52.9% of the popular vote. And in the United Kingdom, you often hear of complaints from the supports of the Liberal Democrats about how their vote share does not translate to seats in parliament, and hence the referendum on the Alternative Vote last year. But I digress. For those who are more interested in the mechanics of different voting systems, Tim Gowers has presented a pretty good analysis.
On to the “statistical analysis”.
The first assumption to be made is that if there are only (Single Member Constituencies) SMC’s, what will the probability be given that the PAP has only 60% of the votes yet wins more than 90% of the seats in parliament? Further assumptions are that there are 1million voters, 100 SMCs, and 10,000 voters in each SMC. For the PAP to win in a SMC, it has to have 5001 or more votes. This is a binomial cumulative density function. However, to calculative this distribution for large numbers in binomial distribution is erroneous. We can approximate this distribution using the gaussian distribution if the sample size is large and other conditions are fulfilled. After some calculation, the probability of the PAP having 5001 or more votes in an SMC is 0.6615, which looks reasonable.
Nothing wrong with having a simple model, and the assumptions are reasonable. So n = 10000 and p = 0.6 in the above model, giving the normal approximation of the number of votes won as being a normal distribution with mean np = 6000, variance = np(1-p) = 2400. This does give the probability as 0.6615.
This looks fair enough, no one is going to argue that n is not large enough, and p is certainly not far off from 0.5.
Given that the probability of victory in each SMC is only 66%, let us calculate the probability that PAP has 90 or more seats. Again using gaussian distribution as an approximation, the probability of winning the election with 90 or more seats is only 0.1434. This is not a very good chance, and would indicate that the PAP has been performing this spectacularly throughout our nation’s history. We can categorically reject the arguments that the opposition is weak and that they lack the people’s support et, because the truth is only 60% voted for the PAP. Of course there is the caveat that there are some votes that are void, but this is only a small amount.
Sure, this would be true IF the above model was close enough to reality. But this is the reality: the wards are pretty much unequal in size.
Most first year university students doing a course in basic probability and statistics would ask the first question, where are the confidence intervals? Also, most first year university students doing a course in basic probability and statistics would probably have come across Simpson’s Paradox. One of the often quoted examples is the case of University of California, Berkeley being sued for gender discrimination against women in terms of admission. The figures quoted were 44% of 8442 men applying being admitted successfully, and only 35% of 4321 women applicants being successful.
However, a study by Bickel et al. found that no department being clearly biased against females, but there was in fact a slight bias towards females! The original paper is here, or you can read it on wikipedia.
Let’s try to apply this to the elections. There were 14 GRCs and 9 SMCs in the 2011 general elections. For simplicity, let’s assume 14 GRCs and 10 SMCs, with 100,000 and 10,000 constituents each, for a total of 1.5 million. Assume the popular vote is split 60-40, so Party A gets 900,000 votes and Party B 600,000.
Scenario 1:
Each constituency gets the vote split 60-40, Party A wins all the seats.
Scenario 2:
Party A wins 49,000 votes in 10 of the GRCs, and 90,000 votes in the other 4. For the SMCs, in 9 out of 10 of them, Party A wins 4,900 votes each, and the remaining 5,900 votes are won in the last SMC.
So, if each GRC is worth 5 seats, and each SMC is worth 1, Party A would have won 21 seats out of 80, so Party B has won nearly three-quarters of the seat in parliament with just 40% of the popular vote!
Scenario 3:
Okay okay, I can hear some of you telling me the above is contrived. Let’s try instead then, for Party A, 85,000 votes in 2 of the GRCs, 80,000 votes in another 2, 75,000 votes in 3, and 40,000 votes for the other 7 GRCs. The remaining 65,000 votes are distributed with 8,100 each in 5 of them, and 4,000 each in the other 5. Neither side has the majority and the seats are split 40-40. This is not as unrealistic as it seems!
One other point to note is, the heterogeneity of voting preferences across the country. So, for a particular ward, it is not necessarily true that on average you will get a 60-40 split. Determining the average proportion is not as straightforward, and usually involve exit polling, which are practically illegal under Singapore laws. One could estimate based on past voting results, but given the changes in voter trends and redrawing of electoral boundaries, this is not easy. I have barely scratched the surface of the subject of statistical analysis in elections and I have to admit that I have not read much of the existing literature on the topic.
A small point: I once read somewhere (unfortunately I’ve forgotten where), that there is always a baseline proportion of voters voting for a particular party. The same source quoted 20% each way in this particular case. One would have to remove the relevant part of the tails. (In fact, approx. 25% of the normal distribution in the first model lies in this region.) Hence, using a normal might not be so good an approximation as hoped for in a model.
While the original article makes a really huge claim that “there is quite a gap between the number of seats they should have won and the actual outcome.” I would be sceptical that using “using these simple sets of assumptions” would give such a conclusive result. In fact, the probability quoted should be treated as junk, and not indicative of anything at all.
As a sidenote: would putting up a screenshot of my MATLAB output (or I could have used R as well) in a nice purple background get me published in TOC?
Update 14 March 2012 11:55:
In case anybody asks why I did not email the TOC, I did. Maybe I should have linked them to here. Here is the reply and my email to them. I am not impressed.
| theonlinecitizen toc <theonlinecitizen@gmail.com> |
Tue, Mar 13, 2012 at 13:08 |
|
To: Shen Ting Ang <angshenting@gmail.com>
|
|
Thanks Shien Ting for your feedback. You may leave this same feedback at the comments section of that article. We also welcome you to write for us an article with more accurate statistics.
Regards.
On Tue, Mar 13, 2012 at 6:37 PM, Shen Ting Ang < angshenting@gmail.com> wrote:
Dear Editor,
The author makes the following conclusions:
“A parsimonious hypothesis is that somehow voters consistently voted 60-40 for the PAP in each constituency especially in the GRCs so that the PAP scores an overwhelming victory. This disconnect between the people’s will and the election outcome can only be attributed to the fact that the elections are unfairly skewed towards the PAP. Whether there is intent or an unfortunate coincidence is not clear, but it is the responsibility of Parliament to form a committee to look into this and to perhaps level the playing ground so that democracy may take a big step forward.
One may also ask, given that the PAP has the probability of 0.5 to win over 90 seats, what percentage of votes should they have? By using reverse calculation (inverse error function), they should enjoy 0.7441 of votes. If they are to win by that margin with a 75% probability, they should enjoy 0.7719 of votes. Clearly, using these simple sets of assumptions, there is quite a gap between the number of seats they should have won and the actual outcome. “
It is factually incorrect to say this. The author has used a model which is simplified and bears no resemblance to reality. Whilst the model itself is useful in pointing out the flaws of the First-Past-The-Post voting system in general, it does not imply that the elections are unfairly skewed.
It is unclear what the author is trying to say or conclude, but one leaves with an impression that he is asserting the elections are unfairly skewed on the basis of his simple model. Given how his model fails to consider (1) the actual sizes of the constituencies (which can lead to Simpson’s Paradox) (2) different voting proportions in each constituency in reality and (3) the limitations of using a normal approximation as he has suggested, I feel that at least ample warning should be given regarding the interpretation of the results. In fact, I would go as far as to say that the results he has obtained are not informative in any way.
Lastly, the author mentions “forecasting software” in his last paragraph. Any such “forecasting software” will be based on statistical methods published in academic journals, which are freely available to anyone who buys the relevant subscriptions. The point being missed is the fact that polling was done beforehand and not made public.
As a final year student majoring in statistics, I am concerned that articles such as this are being published and misinforming the general public. I personally like the idea and direction that TOC stands for, but I am disappointed that TOC has published an article containing so many inaccuracies which only mislead, rather than educate the general public.
Regards,
Shen Ting
|
|