MappingVermont bio photo

MappingVermont

Github

Playoff Roster Streaks by University

I was watching the NBA Finals last summer and thinking about Duke. So many great players went there - Grant Hill, Jayson Tatum, Kyrie Irving, Shane Battier, etc. Given how many great players they’ve had, how often has one of their players played in the NBA Finals? It got me thinking - what school holds the record for most consecutive (W)NBA Finals where they’re represented by one of their graduates?

I’m a big fan of sports trivia, and I don’t think I’ve ever heard this one. Let’s dive into the data and find out.

The data

Wikipedia generally has this data available in a standard table in the article about each Finals series. I was prepared to write all kinds of hacky beautifulsoup code to pull this down, but now pandas has an amazing HTML table parser- all I had to do was find the index of the right table in the page ✨

Here’s a sample of the data from the first WNBA Championship in 1997. Full data (and code) for this project is in this github repo.

Name College Year
Guyton, Wanda South Florida 1997
Harris, Fran Texas 1997
Jackson, Tammy Florida 1997
Moore, Yolanda Mississippi 1997
Perrot, Kim Southwestern Louisiana 1997
Swoopes, Sheryl Texas Tech 1997
Thompson, Tina USC 1997
Woosley, Tiffany Tennessee 1997

Doing the math

I can’t remember the last time that the data part of a project was straightforward! I was really excited to come up with an elegant approach for the analysis. Maybe I’d use python, make some classes, attach names and years to these University objects, then sort them all into a giant list 🤔

I did a little googling, and discovered that this is sometimes referred to as a gaps and islands problem. The goal is to identify the islands of data amid the gaps. I was surprised to hear the problem had a name, and even more surprised to see an example solution using postgres.

I imported the data and adapted the sample query to my dataset. In only 12 short lines, postgres solved my problem:

select league, college, min(year) start_year, max(year) end_year, count(*) as frequency
from (
  select distinct_league_college_year.*,
    row_number() over (partition BY league, college order BY year) as seqnum
  from (
    select distinct league, college, year
    from playoff_roster
  ) distinct_league_college_year
) add_seqnum
group by league, college, year - seqnum
order by 5 desc
limit 10;

Absolutely mindblowing! Not only is there a name for this weird problem, but postgres can solve it easily. It’s awesome baby!

And the winner is . . . 🥁

I ran the NBA stats first, and the top five are pretty interesting:

College Start Year End Year Streak
Texas 2011 2021 11
Arizona 2015 2024 10
Minnesota 1981 1989 9
Duke 2012 2020 9
UCLA 1982 1990 9

Texas is the winner on the men’s side, with 6 of the 11 year’s being represented by KD and / or Tristan Thompson. 11 straight years is a pretty good run! Let’s look at the women’s side:

College Start Year End Year Streak
Connecticut 2004 2024 21
USC 1997 2003 7
Florida 1997 2003 7
Tennessee 2016 2021 6
Duke 2015 2019 5

W-O-W. I mean, we knew UConn could take it - but 21 years straight? Unbelivable 🤩

UConn also had players go to the finals in 1997, 1999 and 2000. Not too bad. And the list of UConn alums in the WNBA finals? Sue Bird, Maya Moore, DT and Breanna Stewart to name a few.

Lessons learned

Thanks for playing! Until next time - remember you can count on wikipedia for good data, and postgres to make analysis quick and easy 🎉