Wolf Committee Results

That’s a reasonable metric for approval. Extending this to Score using the KP transform would break the utility of the square so the second one would be better. I think what you describe is.

The square seem aribitrary. It implies that it is more important to have score spread over voters than candidates. I am not sure that this is true. The optimal solution is that each element in the matrix is the smallest value of 1/(voters*winners). The square changes this. I think @Toby_Pereira invented this, right?

This is very similar to Hare Proportionality Criteria. Anyway, is there a metric for it?

yep

Ebert invented it on the old center for range voting yahoo forums (his post describing the method is archived here). Toby invented PAMSAC which is Elbert’s method but with some tweaks (it’s Elbert’s method + approval removal optimization + the coin-flip transformation).

It’s a pass or fail criteria. You could record the frequency that methods pass it, though for any method that’s supposed to be proportional, that should be 100%.

OK, I think I will call it Ebert Quality since it seems to be the quality function of his method. I will also add the Harmonic Quality defined in Harmonic voting as

And the Unitary Quality from my Optimal Unitary method defined as

Are there others? It seems we have Phragmen covered by Ebert Quality and Thiele covered by Harmonic Quality what about Monroe?

1 Like

Perhaps Elbert cost since the function is supposed to be minimized, not maximized.

That’s the definition you had on the wolf committee document that you said you would fix. I’m ok with using Δ = 0.5 for the metric but as a voting method we agreed that Δ = 1 is better and under Δ = 1 there’s no - 0.5. Though if we are going to consider harmonic voting with Δ = 1 then we should probably be consistent and define it the same way for the metric.

For that method you have to find the most optimal way to allocate voters to candidates such that the sum of scores voters gave to the candidates they have been allocated to is the greatest. Warren knows some polynomial time algorithms for calculating that one for any given set of winners.

Sure

Right, that was a while ago now. I had forgotten. I like the formula without the “-0.5” in the denominator better anyway. It is simpler and is more closely tied to RRV.

Thats too slow. Is there an approximation you would be happy with? Even something sloppy like the number of winners who got a hare quota of score. Or maybe the fraction of a hare quota of score to make it a metric.

I’m not sure why using the KP-transformation for score makes any difference to whether you’d use the square or not relative to approval. Also, spreading the load equally over voters is supposed to ensure that voters are equally represented. If you instead gave voters a load of 1 that was then spread equally across their approved candidates, then you’d get problems with voters who approved no winning candidates. Just like Ebert’s method breaks if you elect a candidate with no approvals.

And yes, Bjarke Dahl Ebert came up with this before me, although I came up with it independently and only later found the post from Ebert (where he wondered why it wasn’t equivalent to D’Hondt any more without seeing that it was instead Sainte-Laguë equivalent).

@parker_friedland you do realise it’s Ebert and not Elbert don’t you?

“Isn’t the Wolf Committee about finding the best cardinal method of PR for use in real elections? As such, it should never be closed off to innovation. There might always be a better method around the corner.”

First of all, it’s great to have you involved Toby. Welcome!

Yes, that’s definitely the long term goal. In the short term another goal is to develop the methodology around studying and comparing 5 star multi-winner pr methods. Simulating a few top methods to get a measure on accuracy is one part of that, and empirically and philosophically evaluating other desirable characteristics for comparison is part of the project as well. (The 5 pillars of a good voting method from the Equal Vote Coalition are: Equality (equity in the case of PR systems,) Honesty (strategic resilience,) Accuracy (VSE etc,) Expressiveness (0-5 ballot is the constant for this project,) and Simplicity (simple to explain, tabulate, and study.)

The process we have begun with the Wolf Committee is to develop a short list of systems and metrics for comparison, a glossary of terms, and a few other foundational pieces, then to simulate and evaluate the methods. Ultimately we can develop a full write-up including a layman friendly summary, well cited project details, and recommendations and takeaways. Systems that deserve consideration but can’t be simulated yet can still be included, with an N/A under “Accuracy” until they can be simulated and compared with the other methods on even footing.

Once we’ve completed that process for a select number of top contenders we can always add in more systems, as can others. Finding the “best” voting system is pretty subjective and depends on the election, the electorate, and the priorities people care about, so our goal here is to produce a resource with well researched recommendations, allowing reform advocates to come to educated conclusions they can back up with facts, not to produce an end all be all. I see this as the cutting edge of a rapidly evolving field and I don’t expect that we will confidently solve the question any time soon. In the meantime people should still be able to enjoy sound proportional representation elections with fair and representative outcomes.

Lastly, demand for good 5 star PR is high, so having a solid working recommendation for the Equal Vote Coalition is a time sensitive priority. My hope is that we can progress through the remaining stages quickly and get to our conclusion stage soon, but if at any point a majority of the committee thinks we need to go back and add in something to the process we are fully empowered to do so. Obviously quality work is the top priority. The committee is also open to adding new people who are qualified and available to help. Email sara@equal.vote to apply and feel free to chime in here as well.

2 Likes

My comment was that the square seems arbitrary. This applies in the approval or score case. Why not have the exact same metric but with no square? I understand the motivation and think it is a very brilliant idea. I just do not see why a square.

If you didn’t square the voter loads, every set of candidates would be considered equally good. Every candidate has a load of 1 to share among the voters. If there are c candidates, then the total load that the voters have is c, regardless of how this is spread. The idea is to get as equal a spread as possible, and by adding up the squared voter loads it relates to the variance, and you minimise this by every voter having the same load.

E.g. if one voter has a load of 1 and another has a load of 3 under one result, but it’s 2 each under another result, the total load is 4 both times. But by summing the squares you get 10 and 8 meaning that the result where they get 2 each is better.

True, but any thing other than a straight sum will yield a metric. I would like a derivation of why taking it to the 2nd power is the best. Why not 4th power or some other function?

This is sort of the opposite approach to how I invented Vote Unitarity and Sequentially Spent Score. The idea there was that each voter had score to spend and the ideal solution was that each voter got to spend all their score.

Let me elaborate on my thoughts on Ebert’s Method. Lets imagine a normalized score system (score in [0,1]) where we are trying to get W winners. Lets represent the scores for the winners as a W by V matrix where the columns are the W winners and the rows are the V voters. Each element is the score given to the winner by each voter. The best possible solution in the Utilitarian sense is that each voter gave a 1 to each of the of the winners. The matrix in the inner most sum in the equation above is just 1/V in every element. The best possible solution in the Monroe sense is that each voter gave a 1 to exactly one winner. This means that matrix would be W/V for the candidates who the voter gave a 1 and 0 otherwise. When one does the inner sum over candidates, both situations lead to W/V as a total load for each voter. This seems wrong to me. It seems like we ought to reward the unanimous consent case over the other case. This means some cost applied to a voter for not getting load from multiple winners.

The next issue I have is with the square. Now clearly if we sum all the W/V entries over the V voters we just end up with W. And this is in all cases because the distribution of load is done in an additive way so if you sum them up you just get back to where you started. I get that the square breaks this but literally anything would.

Borrowing a little from Vote Unitarity it seems that the best situation would be when all voters have a total load of W/V. Why not then use the absolute error instead of the load itself? ie |W/V - Load| Then when you do the sum over voters you would be summing over the error. Minimal error is the best, obviously. Clearly one could also use squared error instead of absolute error to be more similar to what you are doing now. Remember that such a choice would have implications for outliers.

This is the first time I have ever thought about this stuff so I might be missing a few things. I am trying to work out how to reconcile my two points. If you impose a cost for not getting your load from multiple winners then you do not know what the ideal load is for each voter when summed over candidates. This means my error method does not work if you add such a cost.

I think in the end we need to take a step back and think about how to aggregate the matrix. It is clear how to build it but then we have to aggregate it across winners and voters. Even doing a product instead of a sum would work if you ignored zeros in the product. Maybe much of what I have said has already been discussed. Hopefully there is an elegant derivation for the Ebert Cost formula.

The 2nd power is the same as the variance. So we’re saying that this is the ideal load for each voter, and that is the best result because it gives the lowest variance (and standard deviation).

Yes, the problem with Ebert’s method (one problem at least) is that it uses what is essentially a measure of equality across voters rather than looking at things like actual amount of approval a winning candidate might have. Obviously electing sequentially mitigates against this to some extent, but it really just masks the problem.

But if you look at this post, you’ll see the case for the coin-flip approval transformation (CFAT), which when combined with Ebert’s Method gives preference to unanimous results.

1 voter: ABC
1 voter: ABD

In this case with CFAT, AB would be preferred over CD without the need for a tie-break. So this is a way of attempting to “square the circle” of proportionality vs total approval.

Using absolute error came up a while ago (elsewhere), and it didn’t give as good results, and it fails IIA. I’ll give an example with party voting.

4 to elect

5 voters: A
3 voters: B

Ebert’s Method (and Sainte-Laguë party list) gives a three-way tie between 3-1 and 2-2. W/V = 4/8 or 1/2.

In the 3-1 case, the sum of the squared loads is 5*(3/5)^2 + 3*(1/3)^2 = 32/15.
In the 2-2 case, the sum of the squared loads is 5*(2/5)^2 + 3*(2/3)^2 = 32/15.

In the 3-1 case, the sum of the absolute error is 5*(3/5 - 1/2) + 3*(1/2 - 1/3) = 1.
In the 2-2 case, the sum of the absolute error is 5*(1/2 - 2/5) + 3*(2/3 - 1/2) = 1.

So far, so good, but if you add a third party it goes wrong.

Still 4 to elect

5 voters: A
3 voters: B
1 voter: C

Ebert’s Method (and Sainte-Laguë party list) gives a three-way tie between 3-1-0, 2-2-0 and 2-1-1. But using the absolute error causes differences. If we just look at the 3-1 and 2-2 cases and ignore party C, we get the following:

In the 3-1 case, the sum of the absolute error is 5*(3/5 - 4/9) + 3*(4/9 - 1/3) = 10/9.
In the 2-2 case, the sum of the absolute error is 5*(4/9 - 2/5) + 3*(2/3 - 4/9) = 8/9.

So there is a difference here and it fails IIA. I using absolute error turns it from a highest averages method to a largest remainder method and you get stuff like the Alabama Paradox.

Obviously, absolute error is just one of infinitely many possible alternatives, so what about the others? Well, as I’ve said using squared loads means we’re minimising variance, so if it’s arbitrary, then it’s only as arbitrary as variance and the standard deviation, which have been used for years as go-to statistical devices for measuring this sort of thing. It’s also worth noting that it is the mean that minimises the squared deviation to the data points and the median minimises the absolute deviation. So from that perspective, squared deviation makes sense when you’re using the value of the data points rather than the ranks.

No it is not. The standard deviation is the squared difference from the mean. That would only be the same if the mean was zero, which it clearly is not. I would be in favor of using the standard deviation. However, the standard deviation is by no means special. What makes it appear special is that the Gaussian distribution is special. The mean of IID variables will have a Gaussian error because of the central limit theorem. The total load per voter is not IID since it sums to the number of winners. There is likely a better measure of variance in this case. I would have to dig into my old stats books to sort out what it is. In any case, the standard deviation is still better. Your argument agains absolute error is convincing although I do prefer highest averages methods as I think the Alabama Paradox is less important than the other issues in the Highest averages. Recall the Balinski / Young’s Impossibility Theorem.

What about the real standard deviation or the squared error from W/V?

It is equivalent though, and it’s an easier way of describing it than calculating the difference from the mean. It’s explained at the bottom of page 10 in my PAMSAC paper. I tried to copy and paste but it messed up the formatting.

As for whether the standard deviation is special, I’m no going to commit to any position on how special I think it is in general, but using Ebert’s Method like this gives exactly Sainte-Laguë equivalence for party voting, and D’Hondt equivalence when the coin-flip approval transformation is applied so it gives sensible results, and I haven’t really felt the need to look elsewhere (such as using third power, fourth power, logarithms, cosines or any other function one could think of).

Do you mean you prefer largest remainder? I presume the problem you have with highest averages/divisor methods like D’Hondt and Sainte-Laguë is that they can violate the Quota Rule. However, I think that while it may seem like a reasonable rule in theory, I don’t think it actually is, and I think that failing it might actually be an inevitable consequence of passing IIA. Take the following party list ballots with 68 voters and 2 to elect:

31 voters: A
10 voters: B
9 voters: C
9 voters: D
9 voters: E

Party A has 31 out of 68 votes, and this would translate to about 0.91 seats. So anything over 1 seat would violate the quota rule. However, D’Hondt and Sainte-Laguë would both give both seats to party A. To obey this rule, presumably you’d elect one from party A and one from party B.

But if you look at parties A and B without the rest, party A should get 1.51 seats and party B 0.49 seats. So giving party A both seats looks less unreasonable here (and highest remainder methods would agree). Parties C, D and E are just irrelevant alternatives here. And you can make the example as extreme as you like. You could give party A a million votes and party B just one. In this two-party case, I think most would be inclined to award both seats to party A. But if you have enough of these one-voter parties, party A would be entitled to less than 1 seat. But awarding A 2 seats when it has a million times the votes of any other party still seems the right result to me.

Edit - I have added to the Sequential Ebert page by the way.

OK I see what you mean. They both are minimized at the same point. I concede my argument. I also missed the point that the mean is actually always the best value ie C/V which means that the mean square error is the same as the standard deviation. In this light I see the motivation for having a square and agree this is a good metric which the theoretically well motivated.

Well that and the fact that I am not so sure the Alabama paradox make sense in this situation. We are not going to change the size of the district mid election. Sequentially Spent Score is the extension of the Hamilton method in the same way that RRV is the extension of the Jefferson method. I have just really tied myself to this interpretation and it seems right to me. Maybe there is an example you can give where Sequentially Spent Score fails in a meaningful way because of this.

Yes I saw that. I am trying to think about how to code it since it is sort of different than what has been done up until now. All the existing systems select based on some metric then reweight the ballots or score matrix to change the metric. Then the process is repeated. This is different. I might just have to code it as its own method if I can’t come up with a way to do it in this way. What I am thinking is to use the Ebert metric above but instead of summing over winner I choose them sequentially. First take the one that minimizes the metric alone. (Will this always be the score winner?) Then take the next winner as the one that minimizes it with the first winner and so on. I could do it with the scores or with KP with the same code. Might be interesting to see if there is a difference. Make sense?

Your example seems to suggest a “Relevant Quota Rule”: a method should pass the Quota Rule, but that quota is only calculated among the votes for relevant candidates (possibly using KP transform to decide what constitutes a vote). Can it be proven that some methods that fail the regular Quota Rule pass this version of the Quota Rule?

1 Like

It probably can. I think most methods probably do pass this anyway. The divisor methods certainly would, and then obviously all the methods that would pass the Quota Rule anyway.

Is the example I gave where it would fail IIA not meaningful (where 31 voters vote for party A etc.)? Based on this, I would always prefer a method that would reduce to a divisor/highest averages method if voters all voted along party lines. They pass what @AssetVotingAdvocacy would call the Relevant Quota Rule, and give what I would consider the correct results in every case, however many parties are standing and whatever the vote proportions (though obviously you can debate D’Hondt vs Sainte-Laguë etc.). So for me it’s always been a question of how to get a method that works when people don’t necessarily vote along party lines, but would still work as a divisor method if they did. So I suppose I would ask for an example where you’d argue divisor methods give the “wrong” result.

The score winner will always minimise the metric in the single-winner case if the KP-transformation is used, though I’m not sure what you would actually do with it if you didn’t convert to approvals - how the calculation actually works. But you could try it with and without the KP-transformation if that’s possible.

Edit - OK, so there is the Ebert cost in this thread, which I presume works directly off scores, but I’d have to get my head around the notation.

Most apportionment methods would definitely pass a “loose” version of the Quota Rule/Relevant Quota Rule where the upper and lower roundings are expanded up/down by 1, right?

I’d also inquire as to whether most apportionment methods always give a coherent majority group at least half of the seats. This can be trivially satisfied by just fixing it as a condition for the final result of the method, but I wonder what properties must fail if a method satisfies this one.

No, I do not find that example particularly convincing. Neither {A,A} or {A,B} seem clearly better than the other from the information given. Both sets are “stable”. {A,A} has more total utility and {A,B} has more satisfied voters.

We went over a ton of example like this and ones which were bad for RRV in another post. There are also implications for Free Riding and Vote Management. In the end all of these examples were very contrived.

If A was centerist and BCDE were all equidistant from it in different directions. {A,A} seem natural. If A is one group and BCDE are all very close ideologically then {A,B} is the natural choice. This example is only problematic because there are way more parties than seats and everybody bullet voted. This does not seem like a realistic scenario.

What is more interesting is talking about similar situations where this is the limiting case. How common are the situations like this relative to the situations where SSS is better than RRV? That was the original purpose of the simulation we are doing. Simulate all possible scenarios and see how common such scenarios come up. Scenarios like this would be identified by failings under some metric where another system would do better.

This is why I am asking for help coming up with some more metrics.

We need to be more empirical about this.

Ill put it in words. The S matrix is just the score matix. The first division is just to divide through by the total sum of score given to each candidate. We then sum over all the winners to get a vector with one element per voter. Square each element then sum. This would work for approvals or scores. I can convert into an approval S matrix with the KP transform to see if that makes a big difference. It seems weird to me though because then the thing which is squared is not the total load of a voter. A voter is sort of split out in the KP transform. This means it is not the standard deviation of the load for the voter but something more complex.

I do not think that the score winner is the first winner without KP transofrm. I can’t prove it but it does not seem that way. This is not bad. The Sequential Monroe System was designed to avoid this because it can cause issues. If you agree that this is the correct definition then I will code it in the sequential way I described before.

I’m pretty sure they would, so it’s not likely to be a tough criterion to pass.

I presume they would, but it might depend on how you define a coherent majority group.