Jürgen’s Europeans: America’s coach prefers European players. Maybe he should.

Three years ago, the United States hired a German to coach its national soccer team. Jürgen Klinsmann was tasked  to “advance” a squad that had not reached the semifinals of the World Cup since 1930, or won a cross-regional tournament in its entire history. Klinsmann might yet succeed, but he won’t arrive at even his first World Cup without controversy. Last week, he set off a firestorm when he omitted Landon Donovan from America’s final World Cup roster.

A former captain, Donovan is the übermensch of US soccer. He has more goals and assists than anyone to ever play and the second-most appearances. He is also the hero of previous World Cups, and the biggest star in America’s top league- the MLS.

The move is being fully dissected in the blogosphere. Matt Lichtenstadter compares it to decisions Klinsmann made as Germany’s coach. Alexi Lalas chalks it up to personal preference and not talent. At Slate, Stefan Fatsis channels Freud, concluding that Klinsmann’s ego is to blame for the omission.   The emerging narrative seems to be that this roster was forged out of either idiosyncrasy or pettiness. Klinsmann himself has been vague about his rationale.

What if Landongate was forged less out of personal taste than precept?  This piece considers that possibility: whether his choices were guided by a consistent principle, and whether that principle was sound. It does this  through a straightforward geographical analysis.

The Klinsmann Principle?

In January Klinsmann insisted that he wants US players to “play in the best clubs in the best leagues in the world” and that “obviously MLS is not there yet.” Is his roster consistent with this stated preference?

To explore this I compared the final selections to a roster of players who might reasonably have made the team US Men’s National Team (USMNT). The latter list contains every player who has played in an international game in the past 4 years, according to the USNSTPA. There are 105 players on the first roster and 23 on the second.

 Over the last four years, the USMNT has been dominated by players who currently play in the MLS. A full 68% are currently in the MLS, compared to 12% who play in a Top-4 European league (England, Spain, Germany or Italy’s top league).  Around 8% play elsewhere in Europe, 7% play in Mexico’s top league, and others currently play in lower division teams or do not have a club at all.

The team actually going to Brazil has significantly fewer US-based players. Less than half of the roster currently plays in the US, while more than 36% of the team plays in a Top-4 European league. There is also a higher percentage of players from lower-ranked European leagues, and a slightly lower percentage of Mexican League Players. While the clear majority of USMNT players play in the MLS, there are more European-based players on the final roster than MLS ones.

Donovan is one of 59 MLS players who helped the US qualify but did not make the team. Out of the 13 Top-4 players on the USMNT, more than half (7), made the final roster. Seen from this perspective, it is harder to argue that Jürgen Klinsmann hates Landon Donovan specifically. He actually seems to “hate” all MLS players!  

Smarts or Superego?

Klinsmann’s choices are consistent with his stated preference, but can this preference be defended empirically? Are teams with more players from Top-4 leagues actually better?

To get at this, I examined FIFA data for all 32 World Cup Teams. I attempted to predict the number of points a team has in FIFA’s soccer ranking system by using the percentage of Top-4 players on that team.

A few caveats are necessary. In this analysis I compared provisional (30-man) rosters, not final rosters for all teams, including the US.  Also, the FIFA rankings are traditionally poor indicators of World Cup success, but they are the best available continuous indicator of national team quality.

The results acquit Klinsmann a bit.  Generally, teams with more players in the EPL, Superliga, La Liga, and Bundesliga, also have a higher ranking. The perennial soccer powers, the Germanys and Spains and Brazil and Argentinas of the world all have more players from the best leagues. More than 40% of all variation in FIFA rankings is explained by this single measure.

Of course, this analysis doesn't show that teams are good because they have players in better leagues, or that Klinsmann should have omitted Donovan. It does establish a relationship between the type of leagues that players play in, and overall team quality. Klinsmann’s preference for European players seems to be about geography than ego, and his geographical instincts are defensible.

The preceding is probably of little comfort to Landon Donovan or his fans. It also won’t do Klinsmann much good if his team doesn't perform next month. It should, at least, give his armchair psychiatrists some pause. PW

On Chivas and the Creative Class: ​Does the geography of MLS tell us about the geography of human capital?

Over at Atlantic Cities, Richard Florida has written up some of my data on sports leagues and economic structure. The average MLS fan lives in a city with a higher creative class share than any other league, and this is consistent with the story that America’s soccer boom is propelled by creatives.

 As someone who studies the economy more closely than Chivas box scores, I am intrigued by how the MLS might explain economic structure, just as economic structure seemingly explains the MLS. Specifically, “Does having an MLS team predict higher human capital? The following story is preliminary, it won’t ever end up in the NY Times Style Section, but here I am telling it anyway…

 Soccer as a New Idea

In 2002, Florida and Gates famously proposed that openness to new ideas is associated with human capital and economic success. To measure openness they used demographic diversity (“tolerance”) variables including the number of gay households and the number of artists. These are proxies for tolerance and ultimately for growth. It’s not that gays and artists make cities grow, but that human capital goes, so the theory goes, to where new ideas and lifestyles are welcome.

 If there is a link between tolerance and human capital, then we can imagine a lot of ways to measure it. Maybe the presence of a soccer team in a city is such a way.  Soccer is a distinctly foreign sport. It was developed and popularized abroad, and it still attracts derision in many corners of America. If openness to "new ideas" matters for human capital, then surely openness to soccer does too. 

 MLS does predict the Creative Class

To explore this hunch I performed a simple study of major league cities. I restricted the study to the 54 metros with either a pro sports league, or a NASCAR race. Roughly speaking, this ensured that every place under observation is big enough to support an MLS league.

I performed two simple linear regressions. In the first I used the presence of a MLS team in 2013 to predict 2008 creative class levels. The results are significant,  and show that the presence of an MLS team is associated with a 4% higher creative class share on average. Only 15% of all variation in creative class shares can be explained, but given the size of the league that is not surprising. Compared to the weighted averages previously published, these results provide stronger evidence that MLS is associated with human capital. 

 Here "MLS Dummy" means the presence of an MLS team in 2013.

To get a sense of whether the MLS adds any statistical power above Florida’s measure, I included the tolerance index in a second regression model. This improves the total variance explained to 21%, and only reduces the effect of MLS marginally. When tolerance is taken into account the average MLS city has a 3.7% higher creative class share.

 Interestingly the correlation between tolerance and having an MLS team is only .132. In other words, these two measures are not measuring the same thing ; MLS really does predict human capital on its own.

 A Caveat: MLS as an Indicator of Human Capital

 Here we see evidence that MLS cities are significantly more creative. The next question is “Why?” Florida’s openness-to-ideas thesis, suggests one mechanism through which the MLS can explain creative class share. Soccer is a cosmopolitan sport, and more open-minded places embrace it and human capital more. These results are consistent with that story, but they do not (and cannot) show that MLS cities have higher human capital levels because they are more tolerant.

 The presence of a new league in a place might merely reflect its economic vitality, and not its tolerance per se.  When MLS owners establish franchises, they are almost surely interested in places that demand soccer, but they also want places that have strong economies generally.  Economic conditions, not tolerance per se, probably explain why the MLS isn’t rushing into Detroit or Cleveland and why San Jose was a charter team.

 I hope subsequent work can parse these possibilities. I doubly hope that we can be even more creative in how we measure regional openness. PW  

 

Footnote on Robustness In separate models, I tried to control for the percentage of immigrants and the percentage of people under forty. In the first case, the variable was excluded due to multicollinearity. The second variable was not significant.

 

The RPides of March: Does RPI predict NCAA inclusion and seeding?

If the middle to end of March is "March Madness", then the beginning of the month could be called "Amateur Stats Madness". These days, the discussion in college basketball is less about performance on the court, or trumped up controversies on and off the court , and more about  the RPI. 

There is a strong sense that the RPI is a crucial metric which can predict tournament inclusion and seeding. But even if the RPI had no effect, there would be a reason for journalists to act like it did. While polls change  weekly, while the standings change 2-3 times a week,  RPI jumps around every day there is a Division 1 game. That means that Joe Lunardi gets to publish a new bracketology chart every day and that his on-camera pals get to do segment after segment about who is in, or on "the bubble".  We shouldn't assume that RPI is important, simply because we hear a lot about it... 

To get a sense of how relevant the index is in terms of what happens later in March, I looked at RPI in March 2013 as a predictor of at large bids, and of seeds. 

Getting In

First, I compared the 2013 at-large field, to the field that would have existed had RPI been a perfect predictor of who made it. This was an exercise in arithmetic and nothing more. 
 

In 2013 there were 37 at large teams in total. If none of the top 37 RPI teams got automatic bids, and if RPI was a perfect predictor of getting a bid, then we would expect for the team with the 37th best RPI (Illinois last year) to get a bid, and the 38th RPI team (NC State) to get an NIT bid. 

Of course the cutoff isn’t actually 37 because plenty of automatic bids do go to top RPI teams. Last year 11 teams (including our lobos) had top 37 RPIs and got automatic bids. What this does, assuming that the number of at-large bids is fixed, is raise the cutoff by 11, to 48. 

But last year that was STILL too low of a cutoff, because there were two more auto-qualifying teams with RPI ranks between 37 and 48 (Ole Miss and Akron). And even that was too low, because UConn (RPI 48) , was ineligible to make the tournament. Ultimately all top 51 RPI teams would make it in, if RPI was a perfect predictor. So how did RPI do? 

It wasn’t perfect…but it almost was. 97% of at large bids went to teams with Top-51 RPIs. 

There was exactly one team that was not invited to the tournament, even though it had a top 51 RPI: Southern Miss (RPI 31). If this seems  bizarre , then you can read this writer try to explain it away. I personally don’t discount the role of schedule strength, net of RPI , and ultimately conference affiliation in explaining why a Conference USA school would get omitted. Which team out of the Top 51 got an at-large bid? That would be Villanova (RPI 54), a Blueblood from an elite conference. 

In the end one team’s really solid RPI didn’t help it make the tournament while another made it with a slightly too low rank. RPI could not have been the only thing to earn 'Nova a bid, but its RPI was very close to the adjusted cut-off.

Getting a Seed

Maybe RPI is some kind of filter to get in, but all sorts of other factors decide seeding. Do teams with the best RPIs get the best ranks by the committee and ultimately the best seeds? To look at this, I created  a very simple regression model where the natural log of overall NCAA rank is predicted by the natural log of RPI rank. The results are quite supportive of the RPI.

On average for tournament teams, a full 60% of a team’s 2013 NCAA rank can be explained by its end of season RPI rank . A 4 rank increase in a team’s RPI rank would explain a 1 rank increase in tournament rank. (My results are significant at .01)

When I run the model for at-large teams only, it does even better- 65% of variation in overall rank can be explained. This suggests that RPI is probably an important tool used in seeding, alongside factors like minimizing conference matchups, lowering travel for top teams, and who knows what else. Below, is a graph showing that tournament rank (logged) varies closely with RPI rank (also logged).

Patrick Adler, 2014 NOTE Logs are taken because the relationship is not perfectly linear.  

Patrick Adler, 2014
NOTE Logs are taken because the relationship is not perfectly linear.  

Together, these results  validate the RPI as a barometer for the tournament. Last year year at least, RPI was a really good predictor of who made the tournament and who didn't. RPI seems so powerful that these obsessive exercises in bracket soothsaying are probably over-the-top.  

 These numbers do not demonstrate that RPI is a better predictor than the other indices out there, which are surely highly correlated with RPI. I'm sure that ESPN can make a similar case for its own BPI. And of course, they do not demonstrate that RPI is a good indicator of team strength or a good forecast of success once the madness begins. 

 

 

An early version of this post appeared on TheLoboLair.com. Thanks to everyone there for their comments.