TL;DR: I designed a machine learning model that is well-suited to learning the physics of the game Pong. I trained that model by showing it data from hundreds of thousands of sequential frames captured during normal gameplay. As a result, the model learned the deceptively complex rules and physics of the game. By feeding control inputs (for the paddles) into the trained model, you can play a game of Pong.

The neural network became Pong
This work is (obviously) not connected to Atari or the original Pong game in any way. I am using the term 'Pong' to describe a Pong-like table tennis video game.
How It Works
I wrote a simple Pong-style game using pygame. As each frame is displayed, it writes a text file containing information on the positions of the paddles, the position of the ball, and the state of the user inputs. This data, along with supplemental synthetic data that simulates a paddle miss (which is otherwise a rare event) that is generated by this script, is used to train an artificial neural network. The architecture is represented in the diagram below:
Whoah! All that to learn Pong? It may seem like overkill, but the physics are deceptively difficult to learn. I started out thinking I'd have this running in a few hours with a simple feedforward network, but it ended up taking months of spare time to get it working. The velocity inversion of the ball at bounces, the paddle hits and misses, and the issue of paddle movement and ball movement inappropriately influencing each other, for instance, was very, very hard for any model I tried to learn. Aside from basic feedforward architectures, I also tried LSTM/GRU layers, convolutional layers, and (as it felt, anyway) just about everything else.
What ended up finally working was a Transformer-based architecture with multiple isolated branches and output heads. In the next section I'll go into a deep dive of the model.
Ultimately, I would like to do another version of this project using images of the game screen as training data, and have the model predict the next image frame. That is actually the direction I started in for this project, but I soon realized that this goal was out of reach because I do not have a GPU. Some people are GPU poor, but I'm GPU broke. This model was trained on a machine with a pair of ten-year-old Xeon CPUs.
Explaining the Model Architecture
The model is defined in the training script.
The model takes in a set of 4 sequential time points containing ball and paddle coordinates and user inputs. I also included a number of engineered features (e.g. ball velocity, distance from each edge, etc.) to aid the model in learning.
train_x.append([paddle1_pos_1, paddle2_pos_1, ball_x_1, ball_y_1, paddle1_vel_1, paddle2_vel_1, paddle1_pos_2, paddle2_pos_2, ball_x_2, ball_y_2, paddle1_vel_2, paddle2_vel_2, paddle1_pos_3, paddle2_pos_3, ball_x_3, ball_y_3, paddle1_vel_3, paddle2_vel_3, paddle1_pos_4, paddle2_pos_4, ball_x_4, ball_y_4, paddle1_vel_4, paddle2_vel_4, delta_x_1, delta_y_1, dist_left_1, dist_right_1, dist_top_1, dist_bottom_1, coverage_p1_1, coverage_p2_1, delta_x_2, delta_y_2, dist_left_2, dist_right_2, dist_top_2, dist_bottom_2, coverage_p1_2, coverage_p2_2, delta_x_3, delta_y_3, dist_left_3, dist_right_3, dist_top_3, dist_bottom_3, coverage_p1_3, coverage_p2_3, delta_x_4, delta_y_4, dist_left_4, dist_right_4, dist_top_4, dist_bottom_4, coverage_p1_4, coverage_p2_4])
The goal is to learn the physics of ball movement, bounces at the edges of the screen, paddle misses (point scored) or bounces, how to handle user input to adjust paddle positions, and to keep everything within bounds of the screen — basically everything that makes up a game of Pong. This knowledge contained in the model is used to predict the next frame in the game, which then slides into the list of past frames as new predictions are made. So, initially a game is started with a seed of 4 time points of data, then the model does all the...
Read more »
Nick Bild

Sumit
Johanna Shi
alex.miller
Any code available? I'm not wanting to reproduce it myself, just to see how it works in a bit more detail.