MBAI Graph Database

Welcome to the documentation for MBAI-GDB, the backend architecture for the Money Ball AI project.


Table of contents
  1. TL;DR
  2. 🏀 NBA Games as Temporal Hierarchical Heterogeneous Graphs
    1. The Team, Player and LineUp Nodes
    2. The Game Node
      1. The :NEXT Chain
      2. The Period Nodes
    3. 🔄 The Stint Engine
      1. LineUpStint and PlayerStint Nodes
      2. The :NEXT and :ON_COURT_NEXT Chains
  3. Technical Stack
    1. 🕰️ Temporal Granularity

TL;DR

MBAI-GDB is an advanced graph ingestion engine that transforms raw, tabular NBA play-by-play data into a high-fidelity Heterogeneous Temporal Graph stored in Neo4j.

Traditional sports analytics often rely on aggregated box scores (e.g., relational tables). MBAI-GDB breaks this paradigm by modeling basketball as a complex network of interactions. It parses thousands of events per game — shots, assists, fouls, and substitutions — into distinct nodes, linking them temporally via NEXT relationships.

Key Capabilities:

  • Granular Traversal: Move seamlessly from a Season to a Game, down to a specific Period, LineUp, or individual Shot.
  • Context-Aware Analytics: Analyze player performance not just in isolation, but in the context of specific lineups and opponents.
  • ML-Ready: Includes a built-in to_pyg() pipeline to convert graph data directly into PyTorch Geometric tensors for Graph Neural Network (GNN) training.

🏀 NBA Games as Temporal Hierarchical Heterogeneous Graphs

The goal is to model an entire NBA Regular Season (approx. 1,230 games) with high fidelity. At the highest level, the graph organizes the NBA ecosystem into four primary static and semi-static nodes: Season, Team, Arena, and Game.

This hierarchy transforms the flat “schedule” into a navigable structure. The SeasonManager ingests the schedule and stitches games together, creating a continuous timeline of events.


The Team, Player and LineUp Nodes

The graph differentiates between persistent entities (Teams, Players) and situational entities (LineUps).

  • Team: Represents the franchise. It holds static properties like abbreviation and city, and connects physically to an Arena.
  • Player: Represents the individual athlete.
  • LineUp: A unique node representing a specific combination of 5 players.

Note: LineUp IDs are deterministically generated by sorting the player IDs, ensuring that any time the same 5 players share the court—regardless of the game or season—they map to the same LineUp node.

graph TB
    classDef team fill:#ffdd00,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef player fill:#bae1ff,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef lineup fill:#ffb3ba,stroke:#333,stroke-width:2px,rx:5,ry:5;

    T[Team]:::team    
    L(LineUp):::lineup
    
    P1((Player)):::player
    P2((Player)):::player
    P3((Player)):::player
    P4((Player)):::player
    P5((Player)):::player

    T -->|HAS_LINEUP| L
    P1 & P2 --> L
    P3 -->|MEMBER_OF| L
    P4 & P5 --> L

    linkStyle 0 stroke:#ffd700,stroke-width:2px;
    linkStyle 1,2,3,4,5 stroke:#bae1ff,stroke-width:2px;

The Game Node

The Game node acts as the central anchor. It enables context-aware traversal: by navigating outward from a game, we can immediately identify the physical location (Arena), the temporal context (Season), and the competing entities (Teams).

graph LR
    classDef team fill:#ffdd00,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef arena fill:#bbf,stroke:#333,stroke-width:2px;
    classDef season fill:#f3f,stroke:#333,stroke-width:2px;
    classDef game fill:#f9f,stroke:#333,stroke-width:2px;
    classDef period fill:#bbf,stroke:#333,stroke-width:2px;

    S(Season):::season
    G((Game)):::game
    a[Arena]:::arena
    ht[Team]:::team
    at[Team]:::team

    ht -- HOME_ARENA --> a
    G -- AT --> a 
    G -.-> |IN_SEASON| S
    at -- PLAYED_AWAY --> G
    ht -- PLAYED_HOME --> G

    linkStyle default stroke:#ff9900,stroke-width:2px;
    linkStyle 2 stroke:#ff9900,stroke-width:2px,stroke-dasharray: 5 5;
    linkStyle 3 stroke:red,stroke-width:2px;
    linkStyle 4 stroke:green,stroke-width:2px;

The :NEXT Chain

Games are not isolated events; they exist within a schedule. To facilitate trend analysis (e.g., “How does a team perform in the game immediately following a home loss?”), Game nodes are linked sequentially via the :NEXT relationship.

This structure allows the graph to function as a doubly-linked list of events throughout the season.

graph LR
    classDef team fill:#ffdd00,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef arena fill:#bbf,stroke:#333,stroke-width:2px;
    classDef season fill:#f3f,stroke:#333,stroke-width:2px;
    classDef game fill:#f9f,stroke:#333,stroke-width:2px;
    classDef period fill:#bbf,stroke:#333,stroke-width:2px;

    linkStyle default stroke:#ff9900,stroke-width:2px;

    S(Season):::season
    G((Game)):::game
    G1((Game)):::game
    G2((Game)):::game

    ht[Team 1]:::team
    at[Team 2]:::team
    at1[Team 3]:::team
    at2[Team 4]:::team

    G -->|NEXT| G1 & G2

    ht & at --> G
    ht & at1 --> G1
    at & at2 --> G2

    G & G1 & G2 -.-> S

    linkStyle 0,1 stroke:blue,stroke-width:3px;
    linkStyle 2,4,6 stroke:green,stroke-width:2px;
    linkStyle 3,5,7 stroke:red,stroke-width:2px;
    linkStyle 8,9,10 stroke:#ff9900,stroke-width:2px,stroke-dasharray: 5 5;

The Period Nodes

To allow for precise clock calculations, the Game is subdivided into Period nodes. These represent distinct segments of RegularTime (Q1-Q4) and Overtime.

Similar to the game schedule, periods are linked via :NEXT. This creates a continuous time spine for the match, allowing linear traversal of the game clock from tip-off to the final buzzer.

Clock Precision: Every event in the graph is indexed by global_clock (cumulative seconds since game start) and local_clock (seconds remaining in the period), ensuring O(1) retrieval of events within specific time windows.

graph LR
    classDef game fill:#f9f,stroke:#333,stroke-width:2px;
    classDef period fill:#bbf,stroke:#333,stroke-width:2px;

    G((Game)):::game
    P1((Q1)):::period
    P2((Q2)):::period
    P3((Q3)):::period
    P4((Q4)):::period
    P5((OT1)):::period
    
    P1 & P2 & P3 & P4 & P5 -.->|IN_GAME| G    
    P1 -- NEXT --> P2
    P2 -- NEXT --> P3
    P3 -- NEXT --> P4
    P4 -- NEXT --> P5

    linkStyle default stroke:#ff9900,stroke-width:2px;
    linkStyle 5,6,7,8 stroke:blue,stroke-width:3px;

🔄 The Stint Engine

One of the project’s most advanced features is the automated reconstruction of on-court lineups.

LineUpStint and PlayerStint Nodes

The raw NBA data provides substitution events, but not the state of the court between them. MBAI-GDB fills this gap by calculating “Stints”:

  • LineUpStints: The system calculates exactly when a specific 5-man unit enters and leaves the court.
  • PlayerStints: Aggregates continuous playing time for individual players, linking them to every action that occurred during their shift.

The graph reconstructs the exact flow of substitutions. A LineUpStint represents a specific 5-man unit on the court for a specific duration, while PlayerStint nodes track an individual player’s continuous presence, linking them to the lineup.

graph LR
    classDef team fill:#ffd700,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef lineup fill:#ffb3ba,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef player fill:#bae1ff,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef game fill:#f9f,stroke:#333,stroke-width:2px;
    classDef period fill:#bbf,stroke:#333,stroke-width:2px,rx:50,ry:50;
    classDef lineupstint fill:#fff,stroke:#ffb3ba,stroke-width:4px,rx:5,ry:5;
    classDef playerstint fill:#fff,stroke:#bae1ff,stroke-width:4px,rx:5,ry:5;

    subgraph Static [Static Context]
        direction LR
        T[Team]:::team
        P((Player)):::player
        L(LineUp):::lineup
    end

    subgraph Dynamic [In-Game Context]
        direction LR
        Q((Q1)):::period
        G((Game)):::game
        LS(LineUpStint):::lineupstint    
        PS(PlayerStint):::playerstint
    end

    T --> L
    P --> L    
    L -.->|ON_COURT| LS
    P -.->|ON_COURT| PS
    PS -->|ON_COURT_WITH| LS
    LS -->|IN_PERIOD| Q
    Q --> G

    linkStyle default stroke:#ff9900,stroke-width:2px;
    linkStyle 0 stroke:#ffd700,stroke-width:2px;
    linkStyle 1 stroke:#bae1ff,stroke-width:2px;
    linkStyle 2 stroke:#ffb3ba,stroke-width:4px,stroke-dasharray: 5 5;
    linkStyle 3 stroke:#bae1ff,stroke-width:4px,stroke-dasharray: 5 5;

The :NEXT and :ON_COURT_NEXT Chains

graph LR
    classDef team fill:#ffd700,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef lineup fill:#ffb3ba,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef player fill:#bae1ff,stroke:#333,stroke-width:2px,rx:5,ry:5;
    classDef game fill:#f9f,stroke:#333,stroke-width:2px;
    classDef period fill:#bbf,stroke:#333,stroke-width:2px,rx:50,ry:50;
    classDef lineupstint fill:#fff,stroke:#ffb3ba,stroke-width:4px,rx:5,ry:5;
    classDef playerstint fill:#fff,stroke:#bae1ff,stroke-width:4px,rx:5,ry:5;

    subgraph Static [Static Context]
        direction LR
        T[Team]:::team
        L1[LineUp 1]:::lineup
        L2[LineUp 2]:::lineup
        L3[LineUp 3]:::lineup
    end

    subgraph Timeline [In-Game Context]
        direction LR
        
        subgraph Q1_group [" "]
            direction LR
            q1((Q1)):::period
            LS1[Stint 1<br>12:00]:::lineupstint
            LS2[Stint 2<br>08:30]:::lineupstint
            LS3[Stint 3<br>05:45]:::lineupstint
        end

        subgraph Q2_group [" "]
            direction LR
            q2((Q2)):::period
            LS4[Stint 4<br>12:00]:::lineupstint
            LS5[Stint 5<br>05:45]:::lineupstint
        end
    end
    
    T --> L1 & L2 & L3

    L1 -.-> LS1 & LS5
    L2 -.-> LS2 & LS4
    L3 -.-> LS3

    LS1 -.->|ON_COURT_NEXT| LS2
    LS2 -.->|ON_COURT_NEXT| LS3
    LS4 -.->|ON_COURT_NEXT| LS5

    LS1 -->|NEXT| LS5
    LS2 -->|NEXT| LS4

    LS1 & LS2 & LS3 --> q1
    LS4 & LS5 --> q2

    q1 -->|NEXT| q2

    %% --- Styling ---
    linkStyle default stroke:#ff9900,stroke-width:2px;

    linkStyle 0,1,2 stroke:#ffd700,stroke-width:2px;
    linkStyle 3,4,5,6,7 stroke:#ffb3ba,stroke-width:4px,stroke-dasharray: 5 5;
    linkStyle 8,9,10,11,12,18 stroke:blue,stroke-width:2px;

    %% Subgraph transparency
    style Q1_group fill:white,stroke:none
    style Q2_group fill:white,stroke:none

Technical Stack

This project is built using a robust Python-to-Neo4j pipeline:

Component Technology Description
Database Neo4j Graph storage engine handling complex relationships.
Driver Python Custom singleton driver for thread-safe connections.
ETL Pandas Data cleaning and normalization before graph ingestion.
Source NBA API Fetches live boxscores, schedules, and play-by-play logs.

🕰️ Temporal Granularity

  • Periods are linked sequentially via [:NEXT]. This time chain allows us to traverse the game from start to finish linearly.
  • Every Period connects to the Game via [:IN_GAME].
  • Labels like :RegularTime:Q1 or :OverTime for easy filtering

nodes linked to the Arena, the Team nodes and to the Season
using a heterogeneous graph, so we

Data is ingested from the NBA API and normalized into a rich taxonomy of Action nodes:

  • Schedule hierararchy: Game nodes are linked to a Season node

  • Scoring: Shot (Made/Missed, 2PT/3PT), FreeThrow.
  • Flow: Rebound, Turnover, JumpBall, Timeout.
  • Regulation: Foul, Violation.