Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .DS_Store
Binary file not shown.
Binary file added Ideas/.DS_Store
Binary file not shown.
Empty file added Ideas/.Rhistory
Empty file.
53 changes: 53 additions & 0 deletions Ideas/Idea 2 NBA Player Props Proposal 1.29.25 PM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Luke Finkielstein Mini Project Idea Proposal 2:
# NBA Player Props Prediction

Research Question: Can we predict NBA player point totals with reasonable accuracy using historical performance, matchup data, and contextual factors?

Approach: Develop a regression model to predict individual player point totals in NBA games using historical player statistics and game context. The model will help identify player prop betting opportunities.

## Gathering Tractable Data

**Target:** Individual player point totals per game

**Key Features:**
- Player historical scoring average (season, last 10 games, vs. specific opponents)
- Opponent defensive rating (points allowed per game)
- Game context (home/away, back-to-back games, rest days)
- Player usage rate and minutes played trends
- Opponent pace of play
- Player injury/availability status

**Data sources:** ESPN, Basketball-reference.com, NBA.com, Kaggle datasets with historical player-game logs. Feasibility is high—player statistics are publicly available and well-documented.

## Retrieval & Preparation

Two approaches:
- Use existing player-game log datasets (faster, reduces timeline)
- Web scrape player statistics from ESPN/Basketball-reference (more control, more time-intensive)

## EDA & Insights

Analyze how player scoring varies by opponent strength, rest status, and usage rate. Identify key predictive features (recent form, matchup difficulty). Calculate correlations between candidate features and player point totals. Visualizations will include scatter plots of usage rate vs. scoring, distribution plots of scoring by opponent defense rating, and time-series analysis of individual player trends. Compare model performance against a simple baseline (e.g., using season average) to ensure meaningful predictive value.

## Potential Limitations

- **Game-to-game variance:** Player performance is highly variable; some games are outliers due to hot/cold shooting.
- **Injury uncertainty:** Last-minute injuries or load management decisions affect playing time unpredictably.
- **Small sample sizes:** Some matchups may have limited historical data.
- **Model assumes consistency:** Player form and roles can change mid-season due to trades or coaching changes.

## Implications for Stakeholders

**Sports Bettors:** Help identify player prop bets with positive expected value.

**Sportsbooks:** Understand what drives player prop lines and refine odds-setting.

**NBA Analysts:** Identify which factors most influence individual player performance.

## Responsible Deployment & Ethics

**Concerns:** Model could encourage problem gambling; predictions are probabilistic.

**Legal:** Gambling laws vary by state. This model is for analysis only, not financial advice.

**Mitigation:** Include gambling risk disclaimers; frame as educational/analytical exercise.
42 changes: 42 additions & 0 deletions Ideas/Idea 2 NBA Player Props Proposal 1.29.25 PM.md.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!DOCTYPE html><html><head><meta charset="utf-8"><title>NBA Player Props Prediction.md</title><style></style></head><body id="preview">
<h1 class="code-line" data-line-start=0 data-line-end=1 ><a id="Luke_Finkielstein_Mini_Project_Idea_Proposal_2_0"></a>Luke Finkielstein Mini Project Idea Proposal 2:</h1>
<h1 class="code-line" data-line-start=1 data-line-end=2 ><a id="NBA_Player_Props_Prediction_1"></a>NBA Player Props Prediction</h1>
<p class="has-line-data" data-line-start="3" data-line-end="4">Research Question: Can we predict NBA player point totals with reasonable accuracy using historical performance, matchup data, and contextual factors?</p>
<p class="has-line-data" data-line-start="5" data-line-end="6">Approach: Develop a regression model to predict individual player point totals in NBA games using historical player statistics and game context. The model will help identify player prop betting opportunities.</p>
<h2 class="code-line" data-line-start=7 data-line-end=8 ><a id="Gathering_Tractable_Data_7"></a>Gathering Tractable Data</h2>
<p class="has-line-data" data-line-start="9" data-line-end="10"><strong>Target:</strong> Individual player point totals per game</p>
<p class="has-line-data" data-line-start="11" data-line-end="12"><strong>Key Features:</strong></p>
<ul>
<li class="has-line-data" data-line-start="12" data-line-end="13">Player historical scoring average (season, last 10 games, vs. specific opponents)</li>
<li class="has-line-data" data-line-start="13" data-line-end="14">Opponent defensive rating (points allowed per game)</li>
<li class="has-line-data" data-line-start="14" data-line-end="15">Game context (home/away, back-to-back games, rest days)</li>
<li class="has-line-data" data-line-start="15" data-line-end="16">Player usage rate and minutes played trends</li>
<li class="has-line-data" data-line-start="16" data-line-end="17">Opponent pace of play</li>
<li class="has-line-data" data-line-start="17" data-line-end="19">Player injury/availability status</li>
</ul>
<p class="has-line-data" data-line-start="19" data-line-end="20"><strong>Data sources:</strong> ESPN, <a href="http://Basketball-reference.com">Basketball-reference.com</a>, <a href="http://NBA.com">NBA.com</a>, Kaggle datasets with historical player-game logs. Feasibility is high—player statistics are publicly available and well-documented.</p>
<h2 class="code-line" data-line-start=21 data-line-end=22 ><a id="Retrieval__Preparation_21"></a>Retrieval &amp; Preparation</h2>
<p class="has-line-data" data-line-start="23" data-line-end="24">Two approaches:</p>
<ul>
<li class="has-line-data" data-line-start="24" data-line-end="25">Use existing player-game log datasets (faster, reduces timeline)</li>
<li class="has-line-data" data-line-start="25" data-line-end="27">Web scrape player statistics from ESPN/Basketball-reference (more control, more time-intensive)</li>
</ul>
<h2 class="code-line" data-line-start=27 data-line-end=28 ><a id="EDA__Insights_27"></a>EDA &amp; Insights</h2>
<p class="has-line-data" data-line-start="29" data-line-end="30">Analyze how player scoring varies by opponent strength, rest status, and usage rate. Identify key predictive features (recent form, matchup difficulty). Calculate correlations between candidate features and player point totals. Visualizations will include scatter plots of usage rate vs. scoring, distribution plots of scoring by opponent defense rating, and time-series analysis of individual player trends. Compare model performance against a simple baseline (e.g., using season average) to ensure meaningful predictive value.</p>
<h2 class="code-line" data-line-start=31 data-line-end=32 ><a id="Potential_Limitations_31"></a>Potential Limitations</h2>
<ul>
<li class="has-line-data" data-line-start="33" data-line-end="34"><strong>Game-to-game variance:</strong> Player performance is highly variable; some games are outliers due to hot/cold shooting.</li>
<li class="has-line-data" data-line-start="34" data-line-end="35"><strong>Injury uncertainty:</strong> Last-minute injuries or load management decisions affect playing time unpredictably.</li>
<li class="has-line-data" data-line-start="35" data-line-end="36"><strong>Small sample sizes:</strong> Some matchups may have limited historical data.</li>
<li class="has-line-data" data-line-start="36" data-line-end="38"><strong>Model assumes consistency:</strong> Player form and roles can change mid-season due to trades or coaching changes.</li>
</ul>
<h2 class="code-line" data-line-start=38 data-line-end=39 ><a id="Implications_for_Stakeholders_38"></a>Implications for Stakeholders</h2>
<p class="has-line-data" data-line-start="40" data-line-end="41"><strong>Sports Bettors:</strong> Help identify player prop bets with positive expected value.</p>
<p class="has-line-data" data-line-start="42" data-line-end="43"><strong>Sportsbooks:</strong> Understand what drives player prop lines and refine odds-setting.</p>
<p class="has-line-data" data-line-start="44" data-line-end="45"><strong>NBA Analysts:</strong> Identify which factors most influence individual player performance.</p>
<h2 class="code-line" data-line-start=46 data-line-end=47 ><a id="Responsible_Deployment__Ethics_46"></a>Responsible Deployment &amp; Ethics</h2>
<p class="has-line-data" data-line-start="48" data-line-end="49"><strong>Concerns:</strong> Model could encourage problem gambling; predictions are probabilistic.</p>
<p class="has-line-data" data-line-start="50" data-line-end="51"><strong>Legal:</strong> Gambling laws vary by state. This model is for analysis only, not financial advice.</p>
<p class="has-line-data" data-line-start="52" data-line-end="53"><strong>Mitigation:</strong> Include gambling risk disclaimers; frame as educational/analytical exercise.</p>

</body></html>
64 changes: 64 additions & 0 deletions Ideas/NBA_Game_Prediction_Proposal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# Luke Finkielstein Mini Project Idea Proposal:
# NBA Game & Stat Prediction


Research Question: How can we predict the outcome of NBA games with better than 60% accuracy?

Approach: Develop a classification model (probably logistic regression) to predict NBA game outcomes using historical game statistics and betting odds. The model will identify games with favorable edges to inform predictions.


## Gathering Tractable Data

**Target:** Game outcomes (win/loss)

**Key Features:**
- Team performance metrics (record, scoring, defense)
- Player availability/injury status
- Opponent strength ranking
- Game context (home/away, back-to-back games)
- Betting odds from sportsbooks (BetGM, DraftKings, Fanduel, etc.)

**Data sources:** ESPN, NBA.com, Basektball-reference.com, Kaggle all have pre-compiled datasets of real games going back 20+ years. Additional data can be scraped from these websites if necessary. Feasibility is high—game data and odds are publicly available.

## Retrieval & Preparation

Two viable approaches:

- Use existing public dataset (faster, reduces timeline overhead)
- Web scrape/API calls for game stats and odds (more control, more time-intensive)


## EDA & Insights

Analyze outcome variation by team strength, matchups, injuries, and game context. Identify predictive features (home-court advantage, efficiency metrics). Perform EDA and visualize feature correlations with game outcomes. Calculate correlations between candidate features and game outcomes to determine which have the strongest predictive signals. Visualizations will include scatter plots of team efficiency metrics, heatmaps of feature correlations, and distribution plots comparing home vs. away performance. I can compare the model performance against a simple baseline (like always predicting the higher-seeded team) to ensure the model adds meaningful value.

## Potential Limitations

- **Unpredictable events:** Model cannot account for unexpected injuries, trades, coaching changes, or rest decisions made close to game time.
- **Probabilistic predictions:** Accuracy >50% does not guarantee profit; individual game predictions are probabilistic and subject to variance.
- **Data limitations:** Historical data may not fully capture changes in league dynamics, rule changes, or roster composition over 20+ years.
- **Sample size:** Model performance is limited by the number of games available for training and testing.

## Implications for Stakeholders

**Sports Bettors/Fans:** Would help make informed decisions on predicting winners and increase profitability.

**Sportsbooks:** Understand what drives betting patterns and refine odds-setting.

**NBA Teams**: Understanding which factors affect a team's ability to win would be very interesting to coaches/players/owners.

## Responsible Deployment & Ethics

**Concerns:** Model could encourage problem gambling; predictions are probabilistic and not deterministic.

**Legal:** Gambling laws vary by state (sports betting is legal in PA, both online and in person). This model would be for analysis only, not financial advice.

**Mitigation:** Include gambling risk disclaimers, talk about it as purely academic.

## Timeline

- Weeks 1-2: Data collection/preparation
- Week 3: EDA and feature engineering
- Weeks 4-5: Model development and evaluation

**Deliverable:** Trained model with accuracy metrics and feature importance analysis.
48 changes: 48 additions & 0 deletions Ideas/NBA_Game_Prediction_Proposal.md.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
<!DOCTYPE html><html><head><meta charset="utf-8"><title>NBA Game Prediction Proposal.md</title><style></style></head><body id="preview">
<h1 class="code-line" data-line-start=0 data-line-end=1 ><a id="Luke_Finkielstein_Mini_Project_Idea_Proposal_0"></a>Luke Finkielstein Mini Project Idea Proposal:</h1>
<h1 class="code-line" data-line-start=1 data-line-end=2 ><a id="NBA_Game__Stat_Prediction_1"></a>NBA Game &amp; Stat Prediction</h1>
<p class="has-line-data" data-line-start="4" data-line-end="5">Research Question: How can we predict the outcome of NBA games with better than 60% accuracy?</p>
<p class="has-line-data" data-line-start="6" data-line-end="7">Approach: Develop a classification model (probably logistic regression) to predict NBA game outcomes using historical game statistics and betting odds. The model will identify games with favorable edges to inform predictions.</p>
<h2 class="code-line" data-line-start=9 data-line-end=10 ><a id="Gathering_Tractable_Data_9"></a>Gathering Tractable Data</h2>
<p class="has-line-data" data-line-start="11" data-line-end="12"><strong>Target:</strong> Game outcomes (win/loss)</p>
<p class="has-line-data" data-line-start="13" data-line-end="14"><strong>Key Features:</strong></p>
<ul>
<li class="has-line-data" data-line-start="14" data-line-end="15">Team performance metrics (record, scoring, defense)</li>
<li class="has-line-data" data-line-start="15" data-line-end="16">Player availability/injury status</li>
<li class="has-line-data" data-line-start="16" data-line-end="17">Opponent strength ranking</li>
<li class="has-line-data" data-line-start="17" data-line-end="18">Game context (home/away, back-to-back games)</li>
<li class="has-line-data" data-line-start="18" data-line-end="20">Betting odds from sportsbooks (BetGM, DraftKings, Fanduel, etc.)</li>
</ul>
<p class="has-line-data" data-line-start="20" data-line-end="21"><strong>Data sources:</strong> ESPN, <a href="http://NBA.com">NBA.com</a>, <a href="http://Basektball-reference.com">Basektball-reference.com</a>, Kaggle all have pre-compiled datasets of real games going back 20+ years. Additional data can be scraped from these websites if necessary. Feasibility is high—game data and odds are publicly available.</p>
<h2 class="code-line" data-line-start=22 data-line-end=23 ><a id="Retrieval__Preparation_22"></a>Retrieval &amp; Preparation</h2>
<p class="has-line-data" data-line-start="24" data-line-end="25">Two viable approaches:</p>
<ul>
<li class="has-line-data" data-line-start="26" data-line-end="27">Use existing public dataset (faster, reduces timeline overhead)</li>
<li class="has-line-data" data-line-start="27" data-line-end="28">Web scrape/API calls for game stats and odds (more control, more time-intensive)</li>
</ul>
<h2 class="code-line" data-line-start=30 data-line-end=31 ><a id="EDA__Insights_30"></a>EDA &amp; Insights</h2>
<p class="has-line-data" data-line-start="32" data-line-end="33">Analyze outcome variation by team strength, matchups, injuries, and game context. Identify predictive features (home-court advantage, efficiency metrics). Perform EDA and visualize feature correlations with game outcomes. Calculate correlations between candidate features and game outcomes to determine which have the strongest predictive signals. Visualizations will include scatter plots of team efficiency metrics, heatmaps of feature correlations, and distribution plots comparing home vs. away performance. I can compare the model performance against a simple baseline (like always predicting the higher-seeded team) to ensure the model adds meaningful value.</p>
<h2 class="code-line" data-line-start=34 data-line-end=35 ><a id="Potential_Limitations_34"></a>Potential Limitations</h2>
<ul>
<li class="has-line-data" data-line-start="36" data-line-end="37"><strong>Unpredictable events:</strong> Model cannot account for unexpected injuries, trades, coaching changes, or rest decisions made close to game time.</li>
<li class="has-line-data" data-line-start="37" data-line-end="38"><strong>Probabilistic predictions:</strong> Accuracy &gt;50% does not guarantee profit; individual game predictions are probabilistic and subject to variance.</li>
<li class="has-line-data" data-line-start="38" data-line-end="39"><strong>Data limitations:</strong> Historical data may not fully capture changes in league dynamics, rule changes, or roster composition over 20+ years.</li>
<li class="has-line-data" data-line-start="39" data-line-end="41"><strong>Sample size:</strong> Model performance is limited by the number of games available for training and testing.</li>
</ul>
<h2 class="code-line" data-line-start=41 data-line-end=42 ><a id="Implications_for_Stakeholders_41"></a>Implications for Stakeholders</h2>
<p class="has-line-data" data-line-start="43" data-line-end="44"><strong>Sports Bettors/Fans:</strong> Would help make informed decisions on predicting winners and increase profitability.</p>
<p class="has-line-data" data-line-start="45" data-line-end="46"><strong>Sportsbooks:</strong> Understand what drives betting patterns and refine odds-setting.</p>
<p class="has-line-data" data-line-start="47" data-line-end="48"><strong>NBA Teams</strong>: Understanding which factors affect a team’s ability to win would be very interesting to coaches/players/owners.</p>
<h2 class="code-line" data-line-start=49 data-line-end=50 ><a id="Responsible_Deployment__Ethics_49"></a>Responsible Deployment &amp; Ethics</h2>
<p class="has-line-data" data-line-start="51" data-line-end="52"><strong>Concerns:</strong> Model could encourage problem gambling; predictions are probabilistic and not deterministic.</p>
<p class="has-line-data" data-line-start="53" data-line-end="54"><strong>Legal:</strong> Gambling laws vary by state (sports betting is legal in PA, both online and in person). This model would be for analysis only, not financial advice.</p>
<p class="has-line-data" data-line-start="55" data-line-end="56"><strong>Mitigation:</strong> Include gambling risk disclaimers, talk about it as purely academic.</p>
<h2 class="code-line" data-line-start=57 data-line-end=58 ><a id="Timeline_57"></a>Timeline</h2>
<ul>
<li class="has-line-data" data-line-start="59" data-line-end="60">Weeks 1-2: Data collection/preparation</li>
<li class="has-line-data" data-line-start="60" data-line-end="61">Week 3: EDA and feature engineering</li>
<li class="has-line-data" data-line-start="61" data-line-end="63">Weeks 4-5: Model development and evaluation</li>
</ul>
<p class="has-line-data" data-line-start="63" data-line-end="64"><strong>Deliverable:</strong> Trained model with accuracy metrics and feature importance analysis.</p>

</body></html>
23 changes: 23 additions & 0 deletions presentations/DATA400TestPrez.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
---
title: "Test Presentation Ninja"
subtitle: "⚔<br/>with xaringan"
author: "Luke Finkielstein"
institute: "RStudio, PBC"
date: "2016/12/12 (updated: `r Sys.Date()`)"
output:
xaringan::moon_reader:
lib_dir: libs
nature:
highlightStyle: github
highlightLines: true
countIncrementalSlides: false
---

background-image: url(https://upload.wikimedia.org/wikipedia/commons/b/be/Sharingan_triple.svg)

```{r setup, include=FALSE}
options(htmltools.dir.version = FALSE)
```

# This is a test presentation

Loading