Playoff Roster Streaks by University

I was watching the NBA Finals last summer and thinking about Duke. So many great players went there - Grant Hill, Jayson Tatum, Kyrie Irving, Shane Battier, etc. Given how many great players they’ve had, how often has one of their players played in the NBA Finals? It got me thinking - what school holds the record for most consecutive (W)NBA Finals where they’re represented by one of their graduates?

I’m a big fan of sports trivia, and I don’t think I’ve ever heard this one. Let’s dive into the data and find out.

The data

Wikipedia generally has this data available in a standard table in the article about each Finals series. I was prepared to write all kinds of hacky beautifulsoup code to pull this down, but now pandas has an amazing HTML table parser- all I had to do was find the index of the right table in the page ✨

Here’s a sample of the data from the first WNBA Championship in 1997. Full data (and code) for this project is in this github repo.

Name	College	Year
Guyton, Wanda	South Florida	1997
Harris, Fran	Texas	1997
Jackson, Tammy	Florida	1997
Moore, Yolanda	Mississippi	1997
Perrot, Kim	Southwestern Louisiana	1997
Swoopes, Sheryl	Texas Tech	1997
Thompson, Tina	USC	1997
Woosley, Tiffany	Tennessee	1997

Doing the math

I can’t remember the last time that the data part of a project was straightforward! I was really excited to come up with an elegant approach for the analysis. Maybe I’d use python, make some classes, attach names and years to these University objects, then sort them all into a giant list 🤔

I did a little googling, and discovered that this is sometimes referred to as a gaps and islands problem. The goal is to identify the islands of data amid the gaps. I was surprised to hear the problem had a name, and even more surprised to see an example solution using postgres.

I imported the data and adapted the sample query to my dataset. In only 12 short lines, postgres solved my problem:

select league, college, min(year) start_year, max(year) end_year, count(*) as frequency
from (
  select distinct_league_college_year.*,
    row_number() over (partition BY league, college order BY year) as seqnum
  from (
    select distinct league, college, year
    from playoff_roster
  ) distinct_league_college_year
) add_seqnum
group by league, college, year - seqnum
order by 5 desc
limit 10;

Absolutely mindblowing! Not only is there a name for this weird problem, but postgres can solve it easily. It’s awesome baby!

And the winner is . . . 🥁

I ran the NBA stats first, and the top five are pretty interesting:

College	Start Year	End Year	Streak
Texas	2011	2021	11
Arizona	2015	2024	10
Minnesota	1981	1989	9
Duke	2012	2020	9
UCLA	1982	1990	9

Texas is the winner on the men’s side, with 6 of the 11 year’s being represented by KD and / or Tristan Thompson. 11 straight years is a pretty good run! Let’s look at the women’s side:

College	Start Year	End Year	Streak
Connecticut	2004	2024	21
USC	1997	2003	7
Florida	1997	2003	7
Tennessee	2016	2021	6
Duke	2015	2019	5

W-O-W. I mean, we knew UConn could take it - but 21 years straight? Unbelivable 🤩

UConn also had players go to the finals in 1997, 1999 and 2000. Not too bad. And the list of UConn alums in the WNBA finals? Sue Bird, Maya Moore, DT and Breanna Stewart to name a few.

Lessons learned

Thanks for playing! Until next time - remember you can count on wikipedia for good data, and postgres to make analysis quick and easy 🎉

MappingVermont

Playoff Roster Streaks by University

The data

Doing the math

And the winner is . . . 🥁

Lessons learned

You might also enjoy (View all posts)