Which came first, Bayesian Regret from Warren Smith, or Yee Diagrams from Ka-Ping Yee?
I know that Ka-Ping Yee’s were posted in 2006.
Which came first, Bayesian Regret from Warren Smith, or Yee Diagrams from Ka-Ping Yee?
I know that Ka-Ping Yee’s were posted in 2006.
“Bayesian Regret” has always been a backwards and over-complicated measure. It’s much easier to measure and talk about ‘utility’ or ‘happiness’.
KPY diagrams were a brilliant innovation in display. Still my favorite way of showing non-monotonicity and some other weird distortions.
Yee diagrams are backwards and overcomplicated as well. They’re backwards in their use of “utility loss”, which is based on an “ideal point” and thus fails IIA, rather than surplus utility. They’re overcomplicated in their use of Euclidean distance to aggregate variables that might as well be aggregated by simple multiplication or even addition. Euclidean distance is a fine measure of the relative cost in time and fuel of traveling from a geographical location with coordinates (x,y) to one with coordinates (z,w). But that is due to a peculiarity of physical space, not a general rule applicable to any combination of variables we please to display spatially.
I feel that the agreement between all of these measures, (Bayesian Regret, VSE, Yee, etc,) is very compelling, especially considering we have a shortage of peer reviewed studies.
I was asking because I’ve just completed an update on the starvoting.us/accuracy page which added many citations, and quite a bit of the history of the field comparing voting methods. The idea is that whether someone is coming at the question from an ordinal or cardinal, or political, or lay perspective they will have the tools in one place to get educated and informed with non-biased information.
If anyone has the time or inclination to read that over, and if you have additional sources, especially of the peer-reviewed ordinal focused type, please let me know.
If you have feedback or see edits that are needed let me know here as well.
The page begins with the STAR results circled and captioned “You could be here!”, despite AV’s better performance under the pragmatic assumption of strategic voting.
Later, the myth that AV and SV are center-biased is repeated, with no mention of Warren Smith’s proof that AV’s supposed bias is due to the unreasonable assumption that voters use the mean candidate as their approval threshold, the minuscule size of SV’s supposed bias, the bad code that contribute to the appearance of bias, or the controversy over whether center-bias is even a bad thing.
The strategy-resistance section is clearly biased, not only in that it ignores the strategic nomination that reduces STAR to SV, but also because it subjects both STAR and SV to SV voting strategy. Obviously, SV strategy “backfires” in a STAR election. STAR strategy, however, does not. Every system is 100% vulnerable to its own optimal strategy, in the sense that using it 100% of the time by definition maximizes expected gain. The relevant question, if strategic nomination did not reduce STAR to SV, would be, does STAR under STAR voting strategy yield better results than SV under SV voting strategy?
On another note, Warren Smith has repeatedly stated that he doesn’t add STAR to the Bayesian Regret sims because the code is public so anyone could do it. If there is someone out there interested to do that work, like @parker_friedland, or… that would be very interesting.
VSE is just a rename of Social Utility Efficiency, which is from 1977, so definitely that came first.
(And that in turn led to other measurements like Condorcet Efficiency, Mean Squared Error, and Egalitarian-Utility Efficiency, and a bunch of peer-reviewed stuff, if you use those search terms or look through papers that cite them.)
(This software was supposed to reproduce Weber and Merrill’s results and then add STAR and other modern methods to see how they do under the same test, but I’m not good at following through on things… It probably would have a similar outcome to Quinns’s “N-dimensional ideology” model, but still good to test under the exact same conditions.)
Thanks, that’s all super helpful.
That’s what the Weber and Merrill papers do.
(“Vote-for-half” is equivalent to Approval voting, with every voter following the strategy of voting for half of the candidates. “Best Vote-for-or-against-k” is essentially Balanced Approval Voting with each voter approving or disapproving about 1/3 of the candidates. This table is just calculated from the closed-form expressions he derives, and he also gives expressions for a few other related systems, like Balanced Plurality (Vote-for-or-against-one))
Was Yee the first to visualize voting methods using computer simulations?
Depends what you mean by “visualize” I guess. Does the above plot count? Merrill plots an example scenario, too:
That’s probably Merrill 1984, too? Since “Condorcet Efficiency” means “the likelihood that a voting system will not fail to elect the Condorcet winner”.
Or maybe Fishburn 1974 “Simple Voting Systems and Majority Rule” (which I haven’t read yet).
Not exactly the same as Burlington, but similar.
(I think Condorcet actually meant this set of ballots:
18: Pierre > Jean > Jacques > Paul 5: Pierre > Jean > Paul > Jacques 16: Paul > Jean > Jacques > Pierre 3: Paul > Jean > Pierre > Jacques 13: Jacques > Jean > Paul > Pierre 5: Jacques > Jean > Pierre > Paul
While the English translation’s Table 2b lists this set:
23: Peter > John > Paul > Jack 19: Paul > John > Peter > Jack 18: Jack > John > Peter > Paul
Regardless, it works out the same way, with John as Condorcet winner despite getting zero first-preference votes:
37 John > Peter 41 John > Paul 42 John > Jack
Nanson says that Condorcet is describing IRV here, but now that I read the English translation, I’m not sure if it really is or not.
Nanson definitely saw this flaw in 1864, though:
Any idea who gets credit for being the first to apply Social Utility Efficiency/Bayesian Regret/VSE to comparing voting methods? I believe that was Warren Smith?
Warren Smith was far from the first person to use Social Utility Efficiency/VSE*, but he was the first to apply it systematically, varying election methods, voter utility models, and strategy models. This makes his work a clear step forward.
*I don’t think “Bayesian Regret” is a valid term for this concept, even though it’s the term Smith used. Prior to Smith, the term “the (utilitiarian) regret for a specifically-Bayesian kind of strategy strategy”, not “(utilitarian) regret for any strategy” as Smith seems to have meant. I can’t tell if Smith thinks that “Bayesian” is appropriate because this concept is based on a probabilistic Von-Neumann-Morgenstern paradigm, and VNM is vaguely related to Bayes; or because Bayes himself was a utilitarian; or both. But in any case, this has nothing to do with the clear modern meaning of “Bayesian” as “involving conditional distributions or expectations”; there is no conditioning or Bayes theorem in Smith’s 2000 paper.
Thanks for taking the time to look it over. Yes, the page is on the STAR Voting website and the “You Could be Here” graphic show’s STAR as beating the other systems listed, particularly compared to Plurality.
The intention of the page isn’t to be agnostic. It is to be unbiased. To me that means that any conclusions drawn are based on the research and data, and expert analysis.
This page is on the Accuracy criteria, which points strongly to STAR Voting. If it were the Simplicity page it would point to Approval. (Equal Vote pillars: Accuracy, Equality, Honesty, Simplicity, Expressiveness.)
Questions on bias are under the Equality criteria and so that isn’t the main focus of this article, though the sims are a source on that, and I think it’s fascinating. Of course it’s worth noting that I know a number of people who think a polarization/center squeeze bias is a good thing. (Green Party/RCV fans.) Likewise, a number of Approval fans I know think the opposite, a Center expansion bias, is good. As I explained in the Nerds for Humanity podcast, I prefer neither.
But that is good feedback and I can add a sentence on that.
At least, hopefully, we can all agree that an “Electability” bias is not a good thing.
Re your link, I hadn’t seen that and will absolutely add it. Especially since it actually makes my point, rather than refuting it. " Approval voting with “oblivious random threshold” is classified by Smith as “unbiased,” but even he doesn’t argue that that’s realistic.
If you keep reading he lists “Approval as a Pro-Centrist voting method”:
“Approval voting – assuming now that the voters use the mean candidate-utility as their “approval threshold” – does not yield the [control]. As you can see, it obliterates the win regions of many candidates – no matter where the center of the Gaussian voter distribution, those candidates can never win. It plainly seems to favor centrists over extremists in the sense that the candidates with nonempty win-regions mostly are centrally located.”
The link makes the case that by this measure the best is clearly range + top 2, with a few other methods including approval plus top 2, yielding decent but fuzzier diagrams.
I’m going to take a stab at incorporating your feedback and also being more clear about the size of the bias (smaller in the case of score/approval compared to IRV.)
It needs to be pointed out this “center” is relative to the candidates, not the voters. They are not good effects, they are flaws that can be exploited through strategic nomination, etc.
For example, Borda’s “center expansion” causes the centrist candidate to lose, in this scenario:
We really need better words than “center” or “centrist”.
"We really need better words than “center” or “centrist”.
Right! “Center of public opinion,” “center of the field of candidates,” and “electable” all suffer from quantum entanglement. And it doesn’t help that the impacts of these concepts can overlap as well.
I did another pass at http://starvoting.us/accuracy incorporating the feedback and links from both of you Tom and Psephomancy. Thanks.
Where are you quoting from?
My understanding is that IRV wasn’t invented until 150 years ago, or 1870 if I’m remembering correctly. This sounds like Condorcet is talking about “Majority Voting” to me, as is still called for in the bylaws of the Democratic Party of Oregon. This method is basically non-instant runoff. Much like a caucus. Voters can actually change their vote throughout the process, so it’s like IRV, but with a consensus component.
STV was invented at some point between 1819 and 1857, named “Hare’s method”, then applied to single-winner elections in 1871 by Ware, though he just called it “Hare’s method”:
and didn’t claim to be doing anything new, so it’s dubious whether he actually invented something new or just made a mistake, etc. All he did was say this line:
The advantages claimed for this method of voting may, accordingly, now be counted up as follows:
11. It is equally efficient whether one candidate is to be chosen, or a dozen.
Yes, in the English translation it does sound that way. Nanson is the one that said that Condorcet was talking about Ware’s method, but maybe he was mistaken, or maybe the English translation is bad (and we know they got the table wrong, so it’s plausible they got this wrong, too):
I previously asked for translation help here: https://www.reddit.com/r/translator/comments/c3iclz/french_english_voting_math_paper_from_1804/
That conclusion is not supported by the data presented. AV has higher VSE at 100% strategic voting (the red dot). If all strategy models were equally likely, averaging them would be appropriate, but there’s no basis for that assumption. Quinn regards 1-sided strategy as unlikely and admits it’s unclear what constitutes an honest approval vote. His honest SV vote is inconsistent with his honest AV (SV 0-1) vote, and neither has theoretical or empirical support; honest voters use a common scale, which is confirmed by the observed coincidence of use of intermediate scores and failure to use MAX. From an unbiased perspective, the results are mixed at best: AV wins with frontrunner-based min-maxing; STAR wins with inconsistent, nonsensical, semi-honest strategy models.
No, but he does argue equilibrium AV, which is also perfectly unbiased (why the scare quotes?), is realistic. Mean candidate threshold is not even a good zero-information strategy; e.g. it’s beaten by crude mixtures of itself and bullet voting.
A distinction is warranted, I think, between the 1-sided strategic nomination that would affect Borda under reasonable assumptions and AV under unreasonable ones, and the strategic nomination that reduces STAR to SV, which does not depend on its being 1-sided.
If you average them, STAR would win. If you pick the most likely strategy for each (honest for STAR, strategic for Approval) then STAR would win.
If your point is that we should only care about the strategy bubble for Approval, okay. That makes sense according to CES’s own analysis. CES argues that bullet voting is not going to cause Approval to reduce to plurlaity (obviously,) and nowhere do they make the case that voters will, or only should approve only those who honestly fall above their approval threshold.
For STAR, the VSE charts show that honest voting gets the best outcomes, and the 2nd chart shows that strategic voting backfires as often as it works, thus that it is not incentivized. For this reason I expect that for STAR, the blue dot in the 1st VSE chart will be the most accurate most of the time.
If your entire case is built on the fact that voters will all be strategic in Approval, I can assure you that that’s not a talking point that will go over well. “Quinn… admits it’s unclear what constitutes an honest approval vote.” The fact is that where people put their approval threshold will depend on the voter, and the election. It’s impossible to quantify and it’s almost sure to be non-standardized. For this reason Quinn simulates Approval under 2 sets of assumptions, as does Warren Smith in your /extremism link.
The take away is that even experts don’t know which set of assumptions will prove accurate in the real world and it’s going to take 1000s of real elections with polling on this question to answer it definitively. Thus, for approval I do think that it’s realistic to take an average and consider the answer fuzzy.
I think there’s a good argument that Ranked Pairs, or 3-2-1, or STAR may “win” the VSE sims, but neither Quinn nor CES is claiming that Approval does.
I quoted there because that is the exact wording that Warren Smith used.
I do think that mean candidate threshold is the best assumption that can be made without polling data, but many (not all) voters WILL absolutely approve their lesser evils if they don’t think their favorite can win. That’s going to create an electability bias, (separate from the center expansion bias we are debating, but likely with overlapping outcomes.)
I disagree that “strategic nomination would be a good strategy in STAR” for 4 reasons:
@Tom, you have posted on here supporting STAR many times. What changed your mind? Would you be open to a phone call to talk though this?
The 2nd chart shows what happens when you use SV strategy in a STAR election, so it’s trash. STAR incentivizes STAR strategy.
Smith simulates AV under 3 sets of assumptions, the 2 you acknowledge and the 1 he considers reasonable. Support for Quinn’s Bullety Approval consists only of its being more accurate and strategy-resistant than his Ideal Approval (mean candidate threshold), which has no empirical or theoretical support whatever.
For approval we should pretend experts simulating multiple models implies they’re sufficiently equally realistic for an average to be warranted, but for STAR we should put our thinking caps on and judge honesty likelier than strategy and STAR 0-10 a better proxy than STAR 0-2? How unbiased!
Electability can affect the outcome in AV or STAR, but it’s not a bias. In either system, being perceived as electable sometimes helps and sometimes hurts. For example, you’re assuming the dark horse is the voter’s favorite. If instead it’s his least favorite (which is, of course, at least as likely), he might have approved both frontrunners if they weren’t frontrunners. More importantly, the perception of electability is not self-reinforcing in AV like it is in Plurality. In your scenario, the dark horse’s strong showing would prove his electability, setting him up to win the next election.
Score and STAR get the same results most of the time, so the odds the automatic runoff would make a difference are slim. And yet you advocate it. Enough said.
How is that the much more likely scenario? It makes no sense. If they’re clones, the benefit of influencing which one wins the runoff is insignificant in comparison to the benefit of maximizing their probabilities of making the runoff.
Evidently you’ve never worked on a ticket campaign. 2-candidate tickets most certainly do not run 2 full on campaigns at once. They don’t split funding, volunteers or resources. As for the cost in mind-space, if significant (which it’s not, due to endorsements and the party heuristic), it’s a double-edged sword: if it would deter cloning (which you haven’t shown), the cloning argument would be refuted; if it wouldn’t, STAR would be worse than SV, either costing more total mind-space or reducing the mind-space available to each candidate, yielding a less informed electorate. The latter seems likelier, for 2 reasons:
Mind-space is not the prize in STAR it is in Plurality. In Plurality, only the candidate with the highest prospective rating gets a vote, so a voter knowing a little about you is worth jack. There is no risk in occupying more of his mind-space; even if he learns to hate you, you’re in no worse a position than you were. STAR is very different; learning more about a candidate is as likely to reduce a voter’s score for him as increase it.
A clone’s mind-space doesn’t necessarily come disproportionately at the expense of his clone. It’s not as if voters have a designated amount of mind-space available to each candidate type. On the contrary, if I have time to research, say, 2 candidates, and I am satisfied after researching Clone A that I can trust him with the office but somehow do not trust his vouch for Clone B, the most plausible candidate I’ll research next is of course Clone B, as he’s the candidate my guy has just been cross-promoting. Absent Clone B, I might have warmed up to Clone A’s opponent.
I wasn’t aware of that. Is it possible you’re confusing me with someone else?