MBAI Graph Database
Welcome to the documentation for MBAI-GDB, the backend architecture for the Money Ball AI project.
Table of contents
TL;DR
MBAI-GDB is an advanced graph ingestion engine that transforms raw, tabular NBA play-by-play data into a high-fidelity Heterogeneous Temporal Graph stored in Neo4j.
Traditional sports analytics often rely on aggregated box scores (e.g., relational tables). MBAI-GDB breaks this paradigm by modeling basketball as a complex network of interactions. It parses thousands of events per game — shots, assists, fouls, and substitutions — into distinct nodes, linking them temporally via NEXT relationships.
Key Capabilities:
- Granular Traversal: Move seamlessly from a
Seasonto aGame, down to a specificPeriod,LineUp, or individualShot. - Context-Aware Analytics: Analyze player performance not just in isolation, but in the context of specific lineups and opponents.
- ML-Ready: Includes a built-in
to_pyg()pipeline to convert graph data directly into PyTorch Geometric tensors for Graph Neural Network (GNN) training.
🏀 NBA Games as Temporal Hierarchical Heterogeneous Graphs
The goal is to model an entire NBA Regular Season (approx. 1,230 games) with high fidelity. At the highest level, the graph organizes the NBA ecosystem into four primary static and semi-static nodes: Season, Team, Arena, and Game.
This hierarchy transforms the flat “schedule” into a navigable structure. The SeasonManager ingests the schedule and stitches games together, creating a continuous timeline of events.
The Team, Player and LineUp Nodes
The graph differentiates between persistent entities (Teams, Players) and situational entities (LineUps).
Team: Represents the franchise. It holds static properties like abbreviation and city, and connects physically to anArena.Player: Represents the individual athlete.LineUp: A unique node representing a specific combination of 5 players.
Note: LineUp IDs are deterministically generated by sorting the player IDs, ensuring that any time the same 5 players share the court—regardless of the game or season—they map to the same LineUp node.
graph TB
classDef team fill:#ffdd00,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef player fill:#bae1ff,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef lineup fill:#ffb3ba,stroke:#333,stroke-width:2px,rx:5,ry:5;
T[Team]:::team
L(LineUp):::lineup
P1((Player)):::player
P2((Player)):::player
P3((Player)):::player
P4((Player)):::player
P5((Player)):::player
T -->|HAS_LINEUP| L
P1 & P2 --> L
P3 -->|MEMBER_OF| L
P4 & P5 --> L
linkStyle 0 stroke:#ffd700,stroke-width:2px;
linkStyle 1,2,3,4,5 stroke:#bae1ff,stroke-width:2px;
The Game Node
The Game node acts as the central anchor. It enables context-aware traversal: by navigating outward from a game, we can immediately identify the physical location (Arena), the temporal context (Season), and the competing entities (Teams).
graph LR
classDef team fill:#ffdd00,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef arena fill:#bbf,stroke:#333,stroke-width:2px;
classDef season fill:#f3f,stroke:#333,stroke-width:2px;
classDef game fill:#f9f,stroke:#333,stroke-width:2px;
classDef period fill:#bbf,stroke:#333,stroke-width:2px;
S(Season):::season
G((Game)):::game
a[Arena]:::arena
ht[Team]:::team
at[Team]:::team
ht -- HOME_ARENA --> a
G -- AT --> a
G -.-> |IN_SEASON| S
at -- PLAYED_AWAY --> G
ht -- PLAYED_HOME --> G
linkStyle default stroke:#ff9900,stroke-width:2px;
linkStyle 2 stroke:#ff9900,stroke-width:2px,stroke-dasharray: 5 5;
linkStyle 3 stroke:red,stroke-width:2px;
linkStyle 4 stroke:green,stroke-width:2px;
The :NEXT Chain
Games are not isolated events; they exist within a schedule. To facilitate trend analysis (e.g., “How does a team perform in the game immediately following a home loss?”), Game nodes are linked sequentially via the :NEXT relationship.
This structure allows the graph to function as a doubly-linked list of events throughout the season.
graph LR
classDef team fill:#ffdd00,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef arena fill:#bbf,stroke:#333,stroke-width:2px;
classDef season fill:#f3f,stroke:#333,stroke-width:2px;
classDef game fill:#f9f,stroke:#333,stroke-width:2px;
classDef period fill:#bbf,stroke:#333,stroke-width:2px;
linkStyle default stroke:#ff9900,stroke-width:2px;
S(Season):::season
G((Game)):::game
G1((Game)):::game
G2((Game)):::game
ht[Team 1]:::team
at[Team 2]:::team
at1[Team 3]:::team
at2[Team 4]:::team
G -->|NEXT| G1 & G2
ht & at --> G
ht & at1 --> G1
at & at2 --> G2
G & G1 & G2 -.-> S
linkStyle 0,1 stroke:blue,stroke-width:3px;
linkStyle 2,4,6 stroke:green,stroke-width:2px;
linkStyle 3,5,7 stroke:red,stroke-width:2px;
linkStyle 8,9,10 stroke:#ff9900,stroke-width:2px,stroke-dasharray: 5 5;
The Period Nodes
To allow for precise clock calculations, the Game is subdivided into Period nodes. These represent distinct segments of RegularTime (Q1-Q4) and Overtime.
Similar to the game schedule, periods are linked via :NEXT. This creates a continuous time spine for the match, allowing linear traversal of the game clock from tip-off to the final buzzer.
Clock Precision: Every event in the graph is indexed by global_clock (cumulative seconds since game start) and local_clock (seconds remaining in the period), ensuring O(1) retrieval of events within specific time windows.
graph LR
classDef game fill:#f9f,stroke:#333,stroke-width:2px;
classDef period fill:#bbf,stroke:#333,stroke-width:2px;
G((Game)):::game
P1((Q1)):::period
P2((Q2)):::period
P3((Q3)):::period
P4((Q4)):::period
P5((OT1)):::period
P1 & P2 & P3 & P4 & P5 -.->|IN_GAME| G
P1 -- NEXT --> P2
P2 -- NEXT --> P3
P3 -- NEXT --> P4
P4 -- NEXT --> P5
linkStyle default stroke:#ff9900,stroke-width:2px;
linkStyle 5,6,7,8 stroke:blue,stroke-width:3px;
🔄 The Stint Engine
One of the project’s most advanced features is the automated reconstruction of on-court lineups.
LineUpStint and PlayerStint Nodes
The raw NBA data provides substitution events, but not the state of the court between them. MBAI-GDB fills this gap by calculating “Stints”:
- LineUpStints: The system calculates exactly when a specific 5-man unit enters and leaves the court.
- PlayerStints: Aggregates continuous playing time for individual players, linking them to every action that occurred during their shift.
The graph reconstructs the exact flow of substitutions. A LineUpStint represents a specific 5-man unit on the court for a specific duration, while PlayerStint nodes track an individual player’s continuous presence, linking them to the lineup.
graph LR
classDef team fill:#ffd700,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef lineup fill:#ffb3ba,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef player fill:#bae1ff,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef game fill:#f9f,stroke:#333,stroke-width:2px;
classDef period fill:#bbf,stroke:#333,stroke-width:2px,rx:50,ry:50;
classDef lineupstint fill:#fff,stroke:#ffb3ba,stroke-width:4px,rx:5,ry:5;
classDef playerstint fill:#fff,stroke:#bae1ff,stroke-width:4px,rx:5,ry:5;
subgraph Static [Static Context]
direction LR
T[Team]:::team
P((Player)):::player
L(LineUp):::lineup
end
subgraph Dynamic [In-Game Context]
direction LR
Q((Q1)):::period
G((Game)):::game
LS(LineUpStint):::lineupstint
PS(PlayerStint):::playerstint
end
T --> L
P --> L
L -.->|ON_COURT| LS
P -.->|ON_COURT| PS
PS -->|ON_COURT_WITH| LS
LS -->|IN_PERIOD| Q
Q --> G
linkStyle default stroke:#ff9900,stroke-width:2px;
linkStyle 0 stroke:#ffd700,stroke-width:2px;
linkStyle 1 stroke:#bae1ff,stroke-width:2px;
linkStyle 2 stroke:#ffb3ba,stroke-width:4px,stroke-dasharray: 5 5;
linkStyle 3 stroke:#bae1ff,stroke-width:4px,stroke-dasharray: 5 5;
The :NEXT and :ON_COURT_NEXT Chains
graph LR
classDef team fill:#ffd700,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef lineup fill:#ffb3ba,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef player fill:#bae1ff,stroke:#333,stroke-width:2px,rx:5,ry:5;
classDef game fill:#f9f,stroke:#333,stroke-width:2px;
classDef period fill:#bbf,stroke:#333,stroke-width:2px,rx:50,ry:50;
classDef lineupstint fill:#fff,stroke:#ffb3ba,stroke-width:4px,rx:5,ry:5;
classDef playerstint fill:#fff,stroke:#bae1ff,stroke-width:4px,rx:5,ry:5;
subgraph Static [Static Context]
direction LR
T[Team]:::team
L1[LineUp 1]:::lineup
L2[LineUp 2]:::lineup
L3[LineUp 3]:::lineup
end
subgraph Timeline [In-Game Context]
direction LR
subgraph Q1_group [" "]
direction LR
q1((Q1)):::period
LS1[Stint 1<br>12:00]:::lineupstint
LS2[Stint 2<br>08:30]:::lineupstint
LS3[Stint 3<br>05:45]:::lineupstint
end
subgraph Q2_group [" "]
direction LR
q2((Q2)):::period
LS4[Stint 4<br>12:00]:::lineupstint
LS5[Stint 5<br>05:45]:::lineupstint
end
end
T --> L1 & L2 & L3
L1 -.-> LS1 & LS5
L2 -.-> LS2 & LS4
L3 -.-> LS3
LS1 -.->|ON_COURT_NEXT| LS2
LS2 -.->|ON_COURT_NEXT| LS3
LS4 -.->|ON_COURT_NEXT| LS5
LS1 -->|NEXT| LS5
LS2 -->|NEXT| LS4
LS1 & LS2 & LS3 --> q1
LS4 & LS5 --> q2
q1 -->|NEXT| q2
%% --- Styling ---
linkStyle default stroke:#ff9900,stroke-width:2px;
linkStyle 0,1,2 stroke:#ffd700,stroke-width:2px;
linkStyle 3,4,5,6,7 stroke:#ffb3ba,stroke-width:4px,stroke-dasharray: 5 5;
linkStyle 8,9,10,11,12,18 stroke:blue,stroke-width:2px;
%% Subgraph transparency
style Q1_group fill:white,stroke:none
style Q2_group fill:white,stroke:none
Technical Stack
This project is built using a robust Python-to-Neo4j pipeline:
| Component | Technology | Description |
|---|---|---|
| Database | Neo4j | Graph storage engine handling complex relationships. |
| Driver | Python | Custom singleton driver for thread-safe connections. |
| ETL | Pandas | Data cleaning and normalization before graph ingestion. |
| Source | NBA API | Fetches live boxscores, schedules, and play-by-play logs. |
🕰️ Temporal Granularity
Periods are linked sequentially via [:NEXT]. This time chain allows us to traverse the game from start to finish linearly.- Every
Periodconnects to theGamevia [:IN_GAME]. - Labels like :
RegularTime:Q1or :OverTimefor easy filtering
nodes linked to the Arena, the Team nodes and to the Season
using a heterogeneous graph, so we
Data is ingested from the NBA API and normalized into a rich taxonomy of Action nodes:
-
Schedule hierararchy: Game nodes are linked to a Season node
- Scoring:
Shot(Made/Missed, 2PT/3PT),FreeThrow. - Flow:
Rebound,Turnover,JumpBall,Timeout. - Regulation:
Foul,Violation.