top of page

Vision Transformers

Vision Transformers (ViTs) are a type of deep learning model designed for computer vision tasks. Introduced by researchers at Google in 2020, they adapt the transformer architecture, originally developed for natural language processing, to handle image data.


Each row represents a different phase of the golf swing in the video sequence for analysis.

Frame

Keypoint Annotations

Club Path

Angles (Joints)

Shot Outcome

Frame 1

Head (x1, y1), Shoulders (x2, y2), Hips (x3, y3)

Start position

Shoulder-hip (40°), Wrist angle (15°)

Setup position

Frame 10

Head (x1, y1), Shoulders (x2, y2), Hips (x3, y3)

Mid-backswing

Shoulder-hip (70°), Wrist angle (30°)

Backswing phase

Frame 20

Head (x1, y1), Shoulders (x2, y2), Hips (x3, y3)

Top-backswing

Shoulder-hip (90°), Wrist angle (45°)

Max backswing

Frame 30

Head (x1, y1), Shoulders (x2, y2), Hips (x3, y3)

Downswing start

Shoulder-hip (80°), Wrist angle (30°)

Transition phase

Frame 40

Head (x1, y1), Shoulders (x2, y2), Hips (x3, y3)

Impact

Shoulder-hip (50°), Wrist angle (15°)

Impact

Frame 50

Head (x1, y1), Shoulders (x2, y2), Hips (x3, y3)

Follow-through

Shoulder-hip (30°), Wrist angle (5°)

Post-impact


 
Full Sequence with Body Postures and Club Movement:

Frame

Phase

Head

Shoulder (L/R)

Elbow (L/R)

Wrist (L/R)

Hip (L/R)

Knee (L/R)

Clubhead (C)

CSA (°)

Swing Metrics

Frame 1

Setup

(25, 35)

(50, 70), (75, 70)

(55, 95), (70, 95)

(50, 120), (75, 120)

(60, 200), (85, 200)

(65, 250), (85, 250)

(100, 250)

60°

Start of the swing

Frame 10

Early Backswing

(25, 38)

(50, 75), (75, 75)

(60, 100), (70, 100)

(55, 115), (80, 115)

(60, 205), (85, 205)

(65, 252), (85, 252)

(110, 245)

70°

Club moving upward

Frame 20

Mid Backswing

(25, 42)

(50, 80), (75, 80)

(65, 105), (75, 105)

(60, 110), (85, 110)

(60, 210), (85, 210)

(65, 254), (85, 254)

(120, 230)

90°

Maximum shoulder rotation

Frame 30

Top of Backswing

(25, 44)

(48, 85), (78, 85)

(68, 110), (80, 110)

(65, 105), (90, 105)

(60, 212), (85, 212)

(65, 256), (85, 256)

(130, 225)

120°

Club at max height

Frame 40

Transition

(25, 43)

(48, 83), (78, 83)

(66, 108), (80, 108)

(64, 103), (90, 103)

(62, 212), (84, 212)

(65, 258), (84, 258)

(140, 215)

110°

Start of downswing

Frame 50

Downswing

(25, 42)

(48, 80), (78, 80)

(65, 105), (80, 105)

(63, 100), (89, 100)

(64, 210), (84, 210)

(65, 255), (84, 255)

(150, 200)

80°

Club accelerating downward

Frame 60

Impact

(25, 40)

(50, 75), (75, 75)

(63, 100), (78, 100)

(60, 95), (85, 95)

(65, 205), (85, 205)

(65, 250), (85, 250)

(155, 190)

Ball contact

Frame 70

Early Follow-through

(25, 38)

(52, 70), (78, 70)

(65, 95), (80, 95)

(62, 90), (85, 90)

(65, 200), (85, 200)

(65, 248), (85, 248)

(160, 180)

40°

Club rising post-impact

Frame 80

Follow-through

(25, 36)

(55, 65), (80, 65)

(68, 90), (83, 90)

(65, 85), (88, 85)

(65, 195), (85, 195)

(65, 246), (85, 246)

(170, 160)

80°

Full shoulder rotation


 
Breakdown of Vision Transformer Input:

  1. Patches as Keypoints:

    Each frame is broken down into key body and club position coordinates, and these keypoints will form image patches.

    A transformer model will learn the relationships between body movements (head, shoulder, elbow, wrist, hips) and club movement to detect patterns and deviations from optimal form.

  2. Angles and Positions:

    The angular data such as shoulder-hip rotation, club shaft angle, and knee flex will be crucial features for training the model.

    These numerical inputs will help the model understand the mechanics of the swing and how they change over time.

  3. Club Path and Joint Angles:

    The club path like how the club moves throughout the swing is represented as changing positions of the clubhead in each frame.

    Joint angles like shoulder rotation and wrist flexion are essential to ensure the player’s swing is efficient.

  4. Time Steps and Sequence Data:

    Each frame corresponds to a step in the sequence, which allows the transformer to analyze the swing as a continuous movement rather than isolated instances.

    The relationships between consecutive frames will help the model understand transitions between swing phases.


 

Possible Outcomes from Vision Transformers:

Swing Quality: The transformer can provide feedback on whether the swing follows an optimal path or if there is a deviation like over-the-top downswing or incorrect wrist angles


Posture Analysis: The transformer will analyze the golfer's posture at each phase, providing feedback on incorrect spine angles, weight shifts, or knee flex that may affect shot accuracy.


Club Path Optimization: The model can assess whether the club's path is inside-out, outside-in, or straight, and suggest corrections to achieve more consistent shots.


Shot Prediction: Based on the swing, the model can predict the likely outcome of the shot (a slice, hook, or straight ball flight), helping golfers understand why certain issues (like a slice) occur.


This structured, frame-by-frame approach helps Vision Transformers treat each swing as a sequence of body positions and joint angles, providing precise feedback and improvement suggestions based on learned patterns. By training with such a dataset, the app can improve its ability to detect swing flaws, provide personalized advice, and optimize performance.









11

Programs

1

Locations

2

Volunteers

Project Gallery

Get Monthly Updates

Be the first to find out about updates, new golf projects, and advances in AI.

Thanks for submitting!

3a131cb4-166b-48e1-b3cd-4a4bb5532516.jpg
bottom of page