Does this sequential variation of STAR exist?

You can see it at my Codepen here (the method is called “SuperStar”… hey why not? ) https://codepen.io/karmatics/pen/eYJxXge

Edit: it’s Cardinal Baldwin. I changed the name at the CodePen. Compared to both Score and STAR, I like it for its independence of irrelevant alternatives

It should work identically to STAR if there are 3 candidates, but with more, instead of their being two rounds (score round and then runoff), there are number of candidates minus one rounds.

Each round eliminates the lowest scoring candidate, scales the ballots so that there is a 0 and a 5, then it runs the score tabulation again until there are only two candidates. Basically it does what STAR does in the final round for 2 candidates — maximizing each voter’s power – but in a gradual series.

In the Codepen is a scenario where it looks the image below, where left is liberal and right is conservative. d, e and f are in a tight race (all 3 close to the median crosshair), while a, b and c are too far left to be likely to win.

Screen Shot 2020-07-25 at 12.57.05 AM

Say you’ve somewhere over ~800 ballots, with the the most left-leaning blocks of voters having these preferences:

134: a[5] b[4] c[2] d[1] e[0] f[0]
64: a[5] b[4] c[3] d[1] e[0] f[0]
94: a[3] b[5] c[4] d[1] e[0] f[0]
70: a[2] b[2] c[5] d[2] e[0] f[0]

None of these voters love d, but they’d sure rather her to e or f.

With the ballot set at the Codepen (which is generated from the same thing that produced the image), d is the Condorcet winner, and d wins under “SuperStar”. But e wins under Score and STAR (as well as STLR and Median). In the case of STAR, the reason d doesn’t win is because she didn’t even make it to the final round, so the benefits of STAR don’t come into play.

However, if at least some of those 4 blocks of voters got smart and exaggerated their preference for d by giving her a 5, d would make it into the final round under STAR, and subsequently win. That would make d win under Score as well. Presumably if those voters followed the polls, and understood the system, many of them would do that.

Obviously we don’t want that.

If those on the left got even smarter, they’d hold a party nomination and make sure that only one of the four candidates on the left is on the ballot. That kind of defeats the purpose of what we are doing here.

Please try the Codepen out and see how just exaggerating some the ratings for d in those top four rows causes d to win. (even though she is already wins under “SuperStar”, without anyone having to exaggerate)

Here’s the STAR and SuperStar results (prior to any lefty voters exaggerating):

****** Pairwise wins ******
d: 5
e: 4
c: 2
f: 2
a: 1
b: 1
****** Score ******
e: 2178 (2.4390)
f: 2108 (2.3606)
d: 2051 (2.2968)
c: 1883 (2.1086)
b: 1823 (2.0414)
a: 1748 (1.9574)
****** Interpolated Median ******
e: 2.9415
f: 2.6782
d: 2.5986
c: 1.9644
b: 1.4451
a: 1.3693
****** STAR ******
e: 220
f: 199
****** SuperStar ******
***** round 1 *****
e: 2.4390
f: 2.3606
d: 2.2968
c: 2.1086
b: 2.0414
a: 1.9574
***** round 2 *****
e: 2.4409
d: 2.2724
f: 2.2417
c: 2.1529
b: 2.0764
***** round 3 *****
e: 2.4420
c: 2.4188
d: 2.3322
f: 2.0852
***** round 4 *****
e: 2.7324
d: 2.5302
c: 2.4188
***** round 5 *****
d: 2.2228
e: 2.0325
****** STLR ******
e: 2239.0000
f: 2153.0000

When I first heard of @Brian_Olson’s IRNR, this is the method that I originally thought his method described before I learned that what Brian means by normalizing votes (see 1) is very different then what you and I mean by normalizing them (see 2). You could also apply the normalization that @Keith_Edmonds’s STLR uses on the final round but for each round (but then you wouldn’t have a traditional runoff at the end so it wouldn’t be as STAR-like), or perhaps the thing that should be normalized each round is the standard deviation of all the scores given to each candidate so that each round the standard deviation of scores each voter gives is the same (see 4).

1 : dividing each score by the sum of scores that voter give to each candidate
2 : applying f(x) on each score where f(x) = max_score * (x - smallest_score_given) / (highest_score_given - smallest_score_given
3 : multiply each score by max_score / highest_score_given
4 : divide all the voter’s scores by the standard deviation of the scores they gave to the remaining candidates times sqrt(2) (the square root 2 part is just so in the final round the score they give to their preferred candidate is one more then the score they gave to the other).

Personally I’m not really a fan of any of these because I think that that voting methods that use sequential elimination to find winners tend to be too greedy and not very mathematically pure (all the methods I just suggested probably fail monotonicity) in comparison to methods that are able to find a more optimal way to chose a winner. Though if I had to chose between one of those 4, it would probably be 2 (the one you just suggested) or 4.

This seems so much more straightforward than others. Certainly more explainable. I’d think I could explain this kind of normalization to most anyone over age 8 in a few seconds. (I’ll try explaining it to my 6 year old by making marks on a rubber band and stretching it along a ruler :slight_smile: )

My idea with this is that it is a fairly minor change to STAR. The problem with STAR is its logic only works well when their are two clear front runners. This just sort of fills in the gap between the first stage and second stage, with a smooth progression. STAR eliminates, but it doesn’t do it sequentially, it just does it all at once.

Can you explain why this one would fail monotonicity? Since STAR passes it, it seems like this would. I’m also curious what you mean by greedy.

I learned ‘normalization’ as a bit of math in college and there are different kinds but the kind I most prefer is the “L2 normalization” which is to make the length of the vector in N-space equal to 1. So, for a rating vote [A, B, C] find the length l=sqrt(a*a + b*b + c*c) and then cast the vote [A/l, B/l, C/l].

Another strategy I’ve heard that seems sensible I think of as ‘maximization’. Whatever the highest rated active choice is, make that equal to a 1.0 full vote and scale everything else proportionally. I don’t like this as much because I think it fails the ‘one person one vote’ ideal. The L2-norm gives everyone the same effective voting power.

Another normalization is the “L1 normalization” where the sum of absolute values of the vote is equal to 1.0. For vote [A, B, C], x=(abs(A)+abs(B)+abs(C)), cast the vote [A/x, B/x, C/x]. This is a different kind of equal voting power per person. The L2-norm can result in the sum of votes greater than 1.0 though the ‘vote vector’ has a length of 1.0. This distinction fades away if you’re using this in a runoff procedure where there wind up being fewer choices by the end. My math intuition says L2 is better than L1, but I also bet it’s not a big difference, so if someone makes a convincing argument that one is ‘more fair’ I’d accept it.


As for non-monotonicity, my best consolation is that the non-monotonic regions are smaller than in IRV. I think by considering the whole vote at once (summing up all the rating, or their normalized version) instead of just a momentary leader, a ratings-runoff procedure can arrive at a better result than IRV. The KPY election-space diagrams show these smaller non-monotonic regions. https://bolson.org/voting/sim_one_seat/www/

In that example DV (IRNR) also returns d.
When I thought about DV, SuperSTAR came to mind but I discarded it.
Reanalyzing it now, I think it’s among the best methods, mainly because monotony doesn’t fail.

The reason I dismissed it was the fact that it favored this tactic drastically (similar to IRV):
[10,1,0]
that become a kind of Approval-IRV
[10,10,10,10,1,1,1,1,0,0,0,0]

If my vote is this:
A [10] B [10] C [10] D [1] E [1] F [1] G [0] H [0]
While A,B,C are in the race, my vote is as if it were like this:
A [10] B [10] C [10] D [0] E [0] F [0] G [0] H [0]
then, as soon as A, B, C are eliminated, it instantly becomes like this:
D [10] E [10] F [10] G [0] H [0]
this generates unpredictable changes in results (until A,B,C are eliminated).

Using Yee diagram, voters don’t use that tactic, so the problems aren’t seen.

I chose the DV because a vote like this:
A [10] B [10] C [10] D [1] E [1] F [1] G [0] H [0]
would initially have 100 points distributed like this:
A [30] B [30] C [30] D [3] E [3] F [3] G [0] H [0]
mitigated the overhang.

Right but what about just subtracting the lowest rated active choice first? So if you have ratings of

A[0] B[12] C[25] D[62] E[100]

and A and E become inactive, i.e.:

B[12] C[25] D[62]

you subtract 12 from all of them, then multiply them all by 100/(62-12) giving

A[0] C[26] D[100]

Isn’t that ‘one person one vote’, in the sense that this is that this is how we might expect that person to vote if A and E had never been on the ballot?

This seems like most direct (and simple) extension of the logic of STAR, which could be said to (in the runoff) eliminate all candidates other than the top two, normalize all the ballots using the above logic, and then do another score comparison.

Cool. It surprises me that I haven’t see it as a documented method, because it seems rather obvious to me. When I first saw STAR, I wondered why it wasn’t done this way, since it seems to kind of patch a rather big hole in it.

I think if you want to normalize from
[12,62] = [min,max]
to
[0,100] = [MIN,MAX],
then this is the formula you need.

I told you, I only discarded it because it could be used as an IRV or Approval-IRV and I don’t like that, because it eliminates the advantages of good representation, offered by the range.
In SV or STAR, if you really want to favor your second choices, you have to give them points (other than 1, which is almost useless).

That seems like the exact same formula.

Original vote: A[9] B[6] C[5] D[3] F[0] G[1] H[1]
Subset {A,B,C}: A[9] B[2.25] C[0]

In other words, before and after normalization, B remains at 25 percent from A to C. So:

In a 0-9 system, subtract the lowest score from each, so

A[9] B[6] C[5]
becomes
A[4] B[1] C[0]

Then multiply all by 9 divided by the new highest score.

A[4] B[1] C[0]
becomes
A[9] B[2.25] C[0]

Likewise with this one:
Original vote: A[9] B[6] C[5] D[3] F[0] G[1] H[1]
Subset {B,C,D}: B[9] C[6] D[0]

C remains at 2/3 of the way from D to B.

Same thing as mine, right?

This could very easily be illustrated by laying a rubber band next to a ruler, marking it, then stretching each end so it still has an active mark at 0 and a 9 (or a 0 and a 5, or whatever). Pretty basic stuff.

I’m not sure what you mean. I am suggesting this as something that uses 5 Star ballots, and is intended as using STAR as a starting point, and adding greater strategy resistance. (*) It is basically taking what STAR does to modify Score and take it the next logical step. With 3 candidates it is identical to STAR.

In a sense it is as simple to explain as STAR. While STAR has two “rounds” and this potentially has more, each round in this is identical. “maximize each ballot, count the scores” each time, eliminating a single candidate in between.

( Forgive me for wanting to avoid a lengthy discussion of what the definition of “good representation” is, that could be a topic of its own. :slight_smile: )

Obviously, if you don’t believe that STAR is an improvement upon Score, then SuperSTAR is not going to be an improvement on STAR.

[*] Strategy resistance, to me, is less about disincentivizing voters from being strategic, and more about reducing partisan politics that arises from the incentive to strategically nominate.

This is Baldwin right? So super STAR is just Cardinal Baldwin or is there is there a difference I am missing?

Yes. Only doing it on the last 2 saves the monotonicity in STAR and STLR. There might be one of the methods that is monotonic on the top 3. I would not know how to go about checking this rigorously though.

That’s the next project I have after I finish my multi-winner ternary plots: Reusing all that existing code for automatic criterion checkers. If for any given method does not suffer from monotonicity violations in 10 million randomly generated elections, then it’s probably safe to assume that the method in question is indeed monotonic (though no amount of randomly generated tests is a substitute for a mathematical proof so any criteria chart using that as proof that certain methods are monotonic would need asterisks in the chart for the methods where that is the only proof since it’s technically not impossible that some methods might just have an extremely low probability of violating monotonicity).

I think you’re right. As I said, it seemed like someone would have documented it. In Electowiki it doesn’t have its own page (it is part of the regular Baldwin page). Thanks.

The trick is getting those randomly generated elections to be likely to trigger a failure. I guess if the number of ballots are small failures are probably more likely, even if the ballots are truly random. With larger numbers of voters, you tend to have things that would cause a failure to be less likely because they get smoothed out by the randomness.

Yes the formula is the same, I made the mistake and I thought that 25 would not become 26.

I am suggesting this as something that uses 5 Star ballots, and is intended as using STAR as a starting point, and adding greater strategy resistance.

If I vote like this in SuperSTAR:
A,B,C,D,E,F,G,H,I,L,M
[5,5,5,5,1,1,1,1,0,0,0]
I know that my vote will fully support my favorites (A,B,C,D). If they are eliminated, then it will support E,F,G,H at most.
In STAR instead, that vote would be like this (more honest):
[5,5,4,4,2,2,1,1,0,0,0]
because if the 2 winners are B,C, then in the second round the vote becomes B[10] C[0] which is excellent for me. This difference is not achieved by voting honestly in SuperSTAR.

Conclusion: in the super STAR it’s probable that the voters will use votes with only 5,1,0 values, giving rise to an IRV-style vote (with 3 places in the ranking).

That would be the same under SuperSTAR.

(I think we can call it Cardinal Baldwin, since that appears to be what the one I labeled “SuperStar” is actually called. I’ve renamed it in the Codepen)

Can you explain why, if you prefer B to C, you wouldn’t want to give B a higher rating than C, in Cardinal Baldwin? What is the scenario that that wouldn’t be better than just giving 5,1 and 0?

I gave an example in which there is a very strong incentive to exaggerate in STAR: when there are 3 candidates that seem to be in the lead and about equal (from the voter’s perspective, which could be a local election where there is very little polling), the voter should probably give their favorite of those three front runners a 5 even if there are other candidates they like more. Because they want to be sure to help that one be in the final round. (in Cardinal Baldwin that isn’t so true, since their lower score will tend to get amplified in normalization as soon as the others get eliminated)

Since you haven’t given me any idea how likely any of your list of candidates is to win, or whatever information the voter has that is influencing their strategy, I’m having a hard time visualizing why a voter, under Cardinal Baldwin, wouldn’t honestly rate them. I’m not saying there isn’t such a scenario, I just would like to hear a bit of a description of what it is.

In Cardinal Baldwin I create 3 groups of preferences, G1,G2,G3 with these values:
G1[5], G2[1], G3[0]
in this way I make sure that during the elimination of the worst,

  • G1 has the maximum advantage, increasing the probability that they will lose G2 and G3 first.
  • if G1 loses, G2 will have the greatest advantage over G3.
  • if in G1, not all the values are at 5, then the tactic is not as effective as if the values were all 10.

If in Cardinal Baldwin I vote like this [5,5,4,4] I’m disadvantaging the tactics about instant-runoff.
In STAR there is no such tactic, so if I vote like this [5,5,4,4] it’s not a big problem.

With three candidates, would it work the same as STAR if any of the voters don’t use the full range? Or would you normalise anyway?

Plus with normalisation (depending on how it’s done anyway), you give voters more ways of distinguishing between candidates in a way that makes it more complex for voters.

For example, in a 0 to 5 scale, there’s no middle. I can’t give 3 candidates 0, 2.5 and 5. But if I know my vote is being normalised, I can just score them 0, 2, 4 or 1, 3, 5 etc, and my vote will be converted to fractions. Unless with the normalisation you always round to the nearest integer.

Furthermore, I sometimes wonder what is trying to be achieved with this normalisation (in general, not just this method). If the ultimate aim with the final two candidates is to see who wins the head to head, why not just do that throughout and just have a Condorcet method?

This criticism can also be aimed at STAR, but I think the extra simplicity of STAR along with the results of Jameson’s simulations means I would give it a bit more slack.

Plus if you want a hybrid of scores/ranks, rather than start with scores and then gradually reduce their importance throughout the process (or suddenly in one go with STAR’s two-round system), you could keep it consistent, and use a pairwise method that takes into account scores and win/loss when determining which candidate wins each head to head, such as what I described here. The tl:dr version is that in each pairwise comparison, the score of each candidate is their total score from all the ballots + an extra max score for each voter that puts them ahead of the other. So for a voter giving a 5 and a 4 (with a max of 5), that would add 10 (5+5) and 4 to each candidate’s total for that particular pairwise comparison.

You can see my code at the Codepen (and I would consider it quite simple and elegant, about a dozen lines of code in the main function): https://codepen.io/karmatics/pen/eYJxXge I have renamed it Cardinal Baldwin there, but if there are differences, I’d be interested in hearing.

But no it doesn’t normalize on the first round, it only normalizes after eliminating each candidate. So yes, you can put one candidate in the middle (e.g. 1,3,5), although you slightly weaken your influence on the first round if you do so.

Normalizing on the first round would feel a bit like second guessing the voter, while on subsequent rounds it is just saying “we’d expect you’d want to do this, if these now-eliminated candidates weren’t in the race to begin with.”

Although a big reason for my not normalizing the first round is to make it identical to STAR for 3 candidates… I was trying for it to be as similar as possible, while addressing a sensitivity to irrelevant alternatives that arises when there are more than 3 candidates.

I’ve always been a fan of Condorcet logic, which to me is the most direct way of making sure that irrelevant alternatives / clones don’t screw everything up and force people into parties to eliminate them beforehand, Duverger style. Condorcet methods are unfortunately subject to cycles that have to be resolved, which makes them seem complicated and inelegant, and presumably is why they have gotten no momentum. This avoids that entirely by using the scores, but normalized, throughout.

So I guess it could be said to be “in the Condorcet spirit” while also introducing a bit of the “difference matters” logic of Score. That is, you can express that you like a candidate much more than another in an intuitive way, and the system takes it into account while not incentivizing exaggeration to the degree of Score, if at all.

I sure hope that in the future, we can all run those simulations on any method. I’d be very interested in how Cardinal Baldwin does. Has Jameson published his code?

In my mind this method is as simple as STAR. I consider it significantly more elegant, and elegance is … kinda like simplicity? :slight_smile: It seems to use a consistent methodology thoughout, rather than using two opposing methodologies one after the each other.

That said, I like STAR better than most any other method except Cardinal Baldwin.