Wolf Committee Results

I am starting a fresh thread for Equal Vote’s “0-5 Proportional Research Committee” which has been organized by Sara Wolf, ie the “Wolf Committee” for short.Any regular on this forum is likely aware of this work from prior threads


I would NOT recommend you go back and read them now due to the lenght. Instead I will do my best to summarize.

Quick summary

The idea of the committee was to pull together experts and methods to try to come up with the best multi-winner score system. This can also be a set of systems like in the single winner case where Score, Approval and STAR all have different trade offs. Even if we agree on a few systems that is a better place then we are now. The following results exclude the consideration of Optimal methods. Optimal Methods are too computationally expensive to be viable in the simulation I have written.

The system I invented was intended to replace STV and is somewhat a different in conception from RRV . It was based off of a slightly different philosophy of how to extend proportional representation beyond the simple partisan bullet voting definition. It was more Monroe in philosophy where as RRV is sort of Thiele. The idea is that where RRV reweights following the Jefferson method by 1/(1+SUM/MAX) my system was to subtract from your total MAX score the amount which you gave to all elected candidates until it goes to zero. Surplus handling is done by only subtracting a fraction of the amount to compensate. If this is unclear there is more here. The idea was that each person was given a total score and that this vote power should be conserved under the reweight transform from each sequential election of candidate. This would be a Unitary transformation is physics so I called it Vote Unitarity. The name of the system will be Sequentially Spent Score (SSS).

In both SSS and RRV the process is to elect and then reweight repeatedly until all winners are found. The selection is done by taking the Utilitarian winner which is the sum of the reweighted scores. What kicked off this work was that @parker_friedland pointed out that this may not be the best way to select. A better method may be to select by the highest sum of score in a Hare Quota. He invented this as the selection for an allocation systems. Allocation systems assign voters to winning candidates then completely exhaust their ballot.

When I originally implemented the code I wrote it to select the highest sum of score in the a Hare Quota of voters. I should have done it in a Hare Quota of ballots. The difference being that ballots are down weighted so a single voter could have only half a ballot remaining.

In any case this give three coded selections methods:

  1. Utilitarian: Sum of score
  2. Hare Voters: Sum of score in Hare quota of voters
  3. Hare Ballots: Sum of score in Hare quota of ballots

There are also three coded Reweight methods:

  1. Jeffereson: Reweight by 1/(1+SUM/MAX)
  2. Unitary: Subtract SUM/MAX until exhausted
  3. Allocation: Exhaust whole ballots by allocating to winners independant of score given

Note that bothe 2 and 3 require surplus handing and fractional surplus handling was done for both.

It was then suggested that we also try the KP transform and it was easy to implement so sure.

All possible combinations can make a proportional multi-winner voting system. There are 3 x 3 x 2 = 18 possible systems here. They are not all motivated theoretically or advocated for by anybody but the idea was to try a few and see what we can find. So I wrote some simulation code.


I wrote some simulation code which @psephomancy as able to fix so it runs in a reasonable time. https://github.com/endolith/Keith_Edmonds_vote_sim

I assume a 2D ideological space [-10,10] by [-10,10] motivated by the political compass. I then randomly simulate 10,000 voters in this space by them being members of ideological clusters (ie parties). I randomly select 2-7 parties and give them a random position is the 2D space. The 10,000 voters are randomly assigned to each party and their distance from the party center is determined by a Gaussian distribution with a standard deviation between 0.5 and 2. Candidates are created at every grid point in the plane. They are not wanted to be random as we are trying to find the best system under optimal candidates. Also, when a candidate is elected I create a new one in the same ideological position. The score each voter gives to each candidate is determined from their euclidian distance, d, as score = 5 - 2.0*d where 5 being the maximum score. I then put the score of the closest candidate to 5 to help make scores more realistic. I do not expect the distances or the method of deriving score to be particularly realistic. However, I do expect the distributions of score to span the space of reality. This means the only unknown is the weighting in the space which I take as uniform. I do this simulation 25,000 times and compute several metrics for comparison.

After a stupid amount of debate here and in direct messages I settled on 12 metrics. There are 6 metrics which are measures on Utility and 6 which are measures on variance/polarization/equity. Below is a list of these metrics along with some python code for each based on a pandas dataframe of scores, S , with the winners as the columns and one row for each winner. Those who are code savvy may find the code definition simpler to understand. These metrics are all based on score and not the relative positions in the 2D space. I do not expect that to be accurate enough for metrics to be useful.

~ Utility Metrics ~

Total Utility
The total amount of score which each voter attributed to the winning candidates on their ballot. Higher values are better.

Total Log Utility
The sum of ln(score spent) for each user. This is motivated by the ln of utility being thought of as more fair in a number of philosophical works. Higher values are better.

Total Favored Winner Utility
The total utility of each voters most highly scored winner. This may not be their true favorite if they strategically vote but all of these metrics assume honest voting. Higher values are better.

Total Unsatisfied Utility
The sum of the total score for each user who did not get at least a total utility of MAX score. Lower values are better.
sum([1-i for i in S.sum(axis=1) if i < 1])
NOTE: The scores are normalized so MAX = 1

Fully Satisfied Voters
The number of voters who got candidates with a total score of MAX or more. In the single winner case getting somebody who you scored MAX would leave you satisfied. This translates to the multiwinner case if the one can assume that the mapping of score to Utility obeys Cauchy’s functional equation which essentially means that it is linear. Higher values are better.
sum([(i>=1) for i in S.sum(axis=1)])

Totally Unstatisfied Voters
The number of voters who did not score any winners. These are voters who had no influence on the election (other than the Hare Quota) so are wasted. Lower values are better.
sum([(i==0) for i in S.sum(axis=1)])

~ Variance/Polarization/Equity Metrics ~

Utility Deviation
The standard deviation of the total utility for each voter. This is motivated by the desire for each voter to have a similar total utility. This could be thought of as Equity. Lower values are better.

Score Deviation
The standard deviation of all the scores given to all winners. This is a measure of the polarization of the winner in aggregate. It is not known what a good value is for this but it can be useful for comparisons between systems.

Favored Winner Deviation
The standard deviation of each users highest scored winner. It is somewhat of a check on what happens if the Cauchy’s functional equation is not really true. If the highest scored winner is a better estimate of the true happyness of the winner than the total score across winner. Lower values are better.

Number of Duplicates
The code currently allows for clones to be relected. Ideally this would not happen if there are enough candidates. This gives a mesure of the ability to find minority representors. Lower is better.
len(winner_list) -len(set(winner_list))

Average Winner Polarization
The standard deviation of each winner across all voters averaged across all winners. The polarization of a winner can be thought of as how similar the scores for them are across all voters.

Least Polarized Winner
The lowest standard deviation of the winners across voters. The winner who has the loweststandard deviation/polarization.

First Results

At the time of the first results I had not yet implemented Allocation. I had not found my bug of having the Hare quota of Voters not Ballots so the voter method was used. Also the Number of duplicates had a minor bug so ill cut it from the plots below.

Systems are defined in terms of their Selection, their reweight and if the KP transform was applied.

For this round the simulated systems where:

  • utilitarian_unitary (ie SSS)
  • hare_voters_unitary
  • utilitarian_jefferson (ie RRV)
  • utilitarian_jefferson_kp
  • hare_voters_jefferson

The results are in the following histograms. You are going to have to zoom in.

Utilitarian Jefferson with and without KP transform does not seem to make much of a difference except in a few metrics. Adding the KP transform seems a little better in Totally Unstatisfied and total Favoured Winner Utility. Since the Favoured Winner Deviation is the highest for this it seems this extra utility comes at the cost of a minorities utility. I would think this is bad. When looking at the Utility Deviation there is a much more pronounce bump in the tail than in the other two RRV systems. This is likely a sign of a pathology. It seems that all things considered this is no better than Utilitarian Jefferson (ie RRV) so it should be eliminated.

This gives the 4 remaining systems as the combinations of the 2 reweighing methods and the 2 selection methods. The trade off between Monroe and Utilitarian selection seems to be the same for both reweigtings. The Utilitarian selection does better for Utility , as expected. But the Hare Voter selection seems to be better for Total Unsatisfied Utility, Favoured Winner Utility and Wasted Voters. This can also been seen in the deviation plots where the deviation is higher for Utility selection than Hare Voter. In short Utility selection is more Utilitarian and Hare Voter is more equitable. No surprise there at all.

A similar effect is seen for Jefferson and Unity reweigtings for both selections. Jefferson reweigting gets more total utility than Unitary reweigting but is less equitable.

This means if all you care about is Utility the best method is Utilitarian Divisor (RRV) since it gets you the max Utility without violating common ideas of Proportional Representation. If all you care about is Equitable results, Favoured winner Utility, Unsatisfied Utility and Wasted Voters the best method is Hare Voters Unitary. These metrics are more like what people want when they want PR so it is not clear that it is better than raw Utility.

Second results

After a lot of feed back I added Allocation and Hare Ballots. There was a pretty decent case that Using a quota based selection with Unitary reweighting was nonsense even though it produced great results. This convinced me to leave out Hare Ballots selection with Unitary reweigting. Instead I only simulated the 4 methods which people were advocating for.

These are:

  • hare_ballots_allocate (Sequential Monroe from @parker_friedland)
  • utilitarian_allocate (Allocated Score from @Jameson-Quinn)
  • utilitarian_jefferson (RRV from Warren Smith)
  • utilitarian_unitary (SSS from myself)

Its worth noting that the simulations I ran in the prior run were exactly the same so the results for the same system will be identical. You can see this for SSS and RRV.

My biggest surprise with this was that the Hare Ballot selection did not seem to make a difference for the Allocation.

In terms of Utlity RRV is best and Allocation is the worst with SSS in the middle. However, Unsatified utility is really a better metric and was the whole point of inventing Unitary reweighting. Interestingly allocation does just as well.

There is a weird bump in “fully satisfied voters” for allocation. This gives me a lot of pause because of how the simulation is done. We do not know what the real world looks like this is a simulation of everything. The cases where there was nobody fully satisfied might be common or uncommon we just don’t know. What we do know is that Allocation cannot handle them and the other systems can.

For the Utility and score deviation RRV (utilitarian_jefferson) does just terribly. Meaning that it did not find a fair outcome. Allocation had two peaks on either side of Unitary (SSS). It could be better or worse but in some cases it seems to not find a natural solution.

I could make a lot more comments about the other plots but not of them are crucial to the understanding.

Thanks for reading this far.

Might it be possible to consider some hybrid of allocation and utility-based PR if the method detects that the allocated candidates are all low-utility?

Also, it’s worth considering whether the easiness of understanding “this many people support that candidate” gives allocation a significant boost as a viable approach for the public compared to utility-based PR.

Utilitarian_Allocate is one of the systems. If you have a variation in mind maybe it is best to digest this before proposing new systems.


Very cool, but absolutely overwhelming. I’d love a blog post and an ELI5 to go along with it, and then put it up on news.ycombinator.com for input. What you have looks great, but with my experience I don’t know how to fully compare and contrast it.

I’m actively interested in identifying the best PR method to use for city council representation, something that would take into account both the difficulty of getting tabulation machines changed as well as difficulty of explaining to constituents what they’d be voting for on the ballot.

Not surprising. That’s true of their list cases as well.

There are a few other directions I would have liked to take some of the work of the Wolf committee. I would have liked to see what kind of strategies would emerge, how the precision of available information affects which strategies work, and how strategic voting affects the methods’ performance. We have discussed them as a theoretical issue, but as a practical issue it seems that it would depend on the context.

One other thing: weren’t you going to include STV for reference?

1 Like

I think @Sara_Wolf or @ClayShentrup would be better suited to quite something for public consumption. I am not really sure Wrapping it in ELI5 would add much. It is pretty clean code so should be pretty accessible. That said I would not stop anybody.

The difficulty for calculation is like: Allocation > Unitary (SSS) > Jefferson (RRV)

The difficulty for explanation is like: Jefferson (RRV) > Unitary (SSS) > Allocation

I would think Jefferson (RRV) is demonstrably bad in these results so for me it really comes down to Unitary (SSS) > Allocation. It seems the Hare Ballot selection from Sequential Monroe does not add anything so it can be cut out.

I think it sort of comes down to Free Riding. @Marylander, you are the expert around here so I would like your opinion. Of the three SSS, Utility Allocation and Sequential Monroe. Does any stand out in terms of Free Riding?

Yes, that is in the to do list. This update was for Allocation and Hare Ballot selection. This simulation does not really lend itself simply to STV. Score can be turned into rank so that we can use the same input. Also, all the comparison metrics only need the winner set so they will be comparable. The larger issue is that there are many candidates and I allow for clones so effectively infinite. I have some ideas for how to code this but I think it would be computationally expensive not matter how it is done. I also do not think I should be the one to do it. I am not an expert on STV so would not know which one to code. Also, I have put a lot of time into this code already so I feel somebody else should pitch in.

STV would be very interesting to compare in terms of the polarization metrics…

If it makes things easier to code, STV can be simulated with equal ranks permitted. I’d guess the results in a simulation should be close to regular STV, though maybe that hypothesis can be tested by doing a simulation with fewer voters and candidates for both ties permitted and not.

Can we make a new thread with the debate about which version of STV is the gold standard to beat? I would think the version used in Australia for advocacy reasons.

While I’d guess allowing equal ranks is better than not, I’m not sure whether it could be practically implemented. There are two variants that I’m aware of: one vote split between candidates (3 tied candidates each get 1/3rd of a vote, but if one of them is eliminated, the remaining two now get 1/2 a vote) and one vote per candidate (3 tied candidates each get one vote.) The interesting implication of “one vote per candidate” is that to be proportional, you can only elect one candidate at a time, otherwise you can get bizarre results like the following 3-winner election:

34 A=B=C
33 D=E=F
33 G=H=I

A, B, and C all win despite the other two factions having Droop Quotas. In a simulation, this “elect one at a time and exhaust ballots” requirement might be very computationally costly, since you could theoretically have an STV election where the required number of candidates have a Droop Quota yet you can’t instantly declare all of them elected.

I don’t know how difficult it’d be to simulate “one vote split between candidates” either; the only tough thing there is making sure the program keeps track of how many candidates a ballot is currently split up between and making sure it remains as close to 1.0 votes as possible (probably while staying under it, for reasons of voter equality.)

One of the things I’ve never understood is “What does the difference look like?” between these two methods.
I have yet to see a side by side example of “These people would win in an SSS election” vs “these people would win in a RRV”

To put it in vague terms, if the winners of a bloc Score election are generally supposed to be consensus or majoritarian candidates, and the winners of a perfectly proportional method are generally supposed to focus on their base above all else, all cardinal PR methods probably fall in between the two. RRV is intended to fall closer towards the result of a bloc Score election than SSS, but both are semi-proportional (your base can strategically vote such that they get a perfectly proportional result), and with a 5-winner election, both should return somewhat proportional results even with honest voting i.e. the consensus bias of both methods is weakened by having more seats up.

One way to analyze it is to look at how much ballot weight and ballot support each ballot is supposed to give at each round of a sequential PR method. RRV is intended to be more generous and take less ballot weight from voters, SSS takes more weight. As an example of this analytic approach, if you did a bloc Approval election but sequentially elected each candidate and weakened the ballots that had elected a previous candidate by 1%, the result would still look very much like bloc Approval. But if you weakened the ballots that had elected a winner by 30% each round, you’d expect various factions to start getting their own representation (though in this example, this haphazard and unmethodical approach to weakening ballot weight would probably be more random than proportionally.)

@fsargent This is exactly the question being answered here. How do they differ on average across all possible elections. If you want specific examples we talk a lot about that here and I show some simulations of individual elections here for scenarios deemed important. I do not think looking at individual scenarios is useful except to illustrate.

The set of those elected is normally very similar except for 1 or 2 winners although the sequence tends to differ. The question is not who won but what are the consequence for the voters. If Bob wins instead of Bill this means that users got more or less of what they wanted. That is what these metrics all quantify.

Yes, RRV takes the minimal amount of weight to be proportional. SSS takes exactly the weight which was satisfied by their score of the winner. Allocation takes all the weight from some people.

This comment is more subtle that I first thought. For multi-member Score, RRV is the natural extension of a Party List with the Jefferson Method and SSS is the natural extension of Party List with the Hamilton Method. It seems the rivalry between Alexander Hamilton and Thomas Jefferson is alive and well. I am sure there is a ton of literature comparing these two methods. Perhaps there are relevant insights.

I’m not sold on your choice of utility metrics. Right now the only utility metric that’s a useful identifier of how optimal each method is seems to be the non-proportional total utility metric and the log utility metric I suggested. Basing a decision based purely on the log metric would be a form of circular reasoning since the Thiele based methods are based on logarithmic-ish utility functions.

What we need to do is compare the methods against different proportionally-based optimal metrics (and even then we still won’t be comparing the strategy resistance of different methods). Until then, I would advise against basing any important decisions about multi-winner voting method advocacy off of the fallowing results.

Also, allocation (and the hare versions of each of the methods) still are not implemented correctly. I can implement them correctly after December 6th, though until then I am extremely busy with a getting a game I am working on in a presentable state as well as writing several reports for other classes so I won’t be getting too involved until after that date.

I just do not agree that Utility is all that matters. If that were true we would just use Bloc Score.

The concept of representation matters. I tend to think Total Unsatisfied Utility is the best metric in the spirit of this. Utility Deviation and Favored Winner Deviation are also very important.

Uggg. This is my issue with these methods. They are not simple to implement.

I said log utility is the only proportional metric that matters (not total utility). If you implement a proportional voting method that just chooses the results that maximize your log metric among the scores (after you divide those scores by max_score) that the voters provide, it would meet the proportionality criterion I have described on the document. The same cannot be said about any of the other metrics you have implemented.

I can create some very horrible non-proportional voting methods that can surpass SSS in this metric, so I see this metric as an oversimplified heuristic that sometimes picks the more representative voting method.

Alll of these have list cases of Hare Largest Remainder. While we have discussed free riding and STV, real STV elections (nearly, at least) always use Droop quotas, but the fact that the votes are transferrable takes the risk out of free riding. However, in a pick-one party list election, there would still be a risk of poorly dividing your voters and losing seats compared to if you hadn’t tried it at all. I believe that there are only two jurisdictions that use this method for elections without also adding a threshold: Namibia and Hong Kong. (A threshold makes vote management much harder and much less effective, so while it might sometimes be mathematically possible, in practice it won’t happen.) Namibia is a one-party dominated state. The National Assembly elections that use the method elect all 96 seats in one nationwide electoral district. Last time, the largest party got 80% of the vote. Vote management is not practiced, and it wouldn’t matter.

In Hong Kong, there are multi member districts with 5, 6, or 9 seats. Vote management is very common. There are several pro-democracy and pro-Beijing parties, and parties often run multiple lists in a district, and divide their voters geographically. No single list has won multiple seats in the past two cycles. In effect, it is SNTV. It should be noted that not all the seats are elected by the general public, half are elected in “functional constituencies” that are conveniently divided in a way that heavily favors the pro-Beijing wing. Even more conveniently, these seats are elected by winner take all methods like FPTP, RCV, or in one case, PAL, while the seats elected by the general public, which opposes the pro-Beijing groups by ~60-40, use PR. So these elections have limited power over policy.

In practice, vote management in score methods with list cases of largest remainder would likely involve bullet voting for individuals and so look similar to vote management in Hong Kong. Vote management would be more essential in Monroe and allocation than SSS, because voters giving midrange scores to candidates are less likely to contribute to the quota in those systems; the candidate’s strongest supporters pay as much as possible first. This means that a party that doesn’t do any vote management will probably pay full quotas for their first seats, since a party nominee’s strongest supporters are likely to be partisans. In contrast, with SSS, midrange supporters for a candidate pay some of the cost, so the base of the winning candidate’s party will not pay the entire Hare quota even without vote management.

1 Like

Maybe we should consider a Bucklin-like allocated cardinal PR method: add in all the top-scores (5 out of 5s, 4 out of 5s, etc.) until some candidate has a quota of points, then elect them and exhaust those ballots proportionally.

It might be possible to even just say “until some candidate is scored by a quota of ballots”, but the trouble with that is that in the single-winner case with Hare Quotas, a candidate can win by being scored a 1/5 on every ballot vs. someone who is scored 5/5 on 90% of ballots. Maybe it could be done as “until some candidate is scored by a quota of ballots, elect them, unless they receive <50% as much support as the most-supported candidate in that quota, then elect that most-supported candidate.”

If you look closely, I think you’ll find reason to declare every sequential cardinal PR method deficient. Consider the Pareto groups criterion that I put into the Wolf doc. For an approval method, it says that if there are two winner sets, A and B, every ballot in approves at least as many members of A as it does members of B, and at least one voter approves more members of A than B, then the set B must not be selected (because A is clearly the better choice). While it’s not impossible for a sequential method to satisfy this criterion, any attempt to do so would require the method to abandon common sense in pursuit of the criterion, or begin to look into what future selections each choice would imply, and thus lose its sequential character. Consider the following example:
26 AC
24 AD
26 BC
24 BD
1 AB

This electorate may be divided roughly in half two ways, between an A camp and a B camp, and between a C camp and a D camp. These divisions are independent of one another: whether you are for A or B doesn’t say anything about whether you support C or D. C is the unweighted Approval leader. I don’t know of any sequential cardinal PR method that, when run on Approval-style votes, won’t choose the Approval leader first. I don’t know how you could even justify a method that behaves that way, unless it considered the quality of its later selections, or “looked into the future”, which would make it more a hybrid sequential/optimal method. Once C is chosen, it seems clear that A and B will lose ground in reweighting, but D will not, and since their unweighted lead over D was slim to begin with, D will probably win the next seat. (The “satisfaction counts,” (a list where the ith entry is the number of voters approving exactly i-1 winners) for {C,D} would be (1, 100, 0) while for {C, A/B} it would be (24, 51, 26); trading 23 voters losing their one approved winner for 26 voters getting a second approved candidate elected would favor large groups so much that even the logic of the large party favoring Jefferson/Thiele would consider it a huge loss of proportionality.) So sequential methods will probably select CD (you can test it with RRV, or SSS, or whatever), and any that don’t should raise eyebrows. However: consider that switching the winner set from CD to AB would be strictly an improvement. The top 100 ballots would remain at one winner, but the last ballot would go from one winner to 2.

The point is, there is no holy grail. It’s just a matter of what flaws we are willing to tolerate.

Yes, that is the conclusion we have come to each time we have discussed this. Optimal methods fix this at a great cost of complexity. These local maxima are something a sequential systems can get stuck in. It is worth pointing out that the local maxima are unstable so are relatively rare in the real world.

The purpose of Hare ballot selection over utilitarian is exactly to solve this. In many cases choosing the Utilitarian winner first is what gets you stuck so a selection method which is very close to what you want in an optimal method would be better. Sequential Monroe is called that because it is an attempt to maximize the Monroe function sequentially.

Its worth pointing out that STV would also suffer from similar issues. The problem is mostly masked by the information lost in the transformation to rank but its still there. We are trying to find a system which is SIMPLE and superior to STV.

1 Like