I Cloned Pong With a Neural Network

Description

I designed a machine learning model that is well-suited to learning the physics of the game Pong. I trained that model by showing it data from hundreds of thousands of sequential frames captured during normal gameplay. As a result, the model learned the deceptively complex rules and physics of the game. By feeding control inputs (for the paddles) into the trained model, you can play a game of Pong.

Details

TL;DR: I designed a machine learning model that is well-suited to learning the physics of the game Pong. I trained that model by showing it data from hundreds of thousands of sequential frames captured during normal gameplay. As a result, the model learned the deceptively complex rules and physics of the game. By feeding control inputs (for the paddles) into the trained model, you can play a game of Pong.

The neural network became Pong

This work is (obviously) not connected to Atari or the original Pong game in any way. I am using the term 'Pong' to describe a Pong-like table tennis video game.

How It Works

I wrote a simple Pong-style game using pygame. As each frame is displayed, it writes a text file containing information on the positions of the paddles, the position of the ball, and the state of the user inputs. This data, along with supplemental synthetic data that simulates a paddle miss (which is otherwise a rare event) that is generated by this script, is used to train an artificial neural network. The architecture is represented in the diagram below:

Whoah! All that to learn Pong? It may seem like overkill, but the physics are deceptively difficult to learn. I started out thinking I'd have this running in a few hours with a simple feedforward network, but it ended up taking months of spare time to get it working. The velocity inversion of the ball at bounces, the paddle hits and misses, and the issue of paddle movement and ball movement inappropriately influencing each other, for instance, was very, very hard for any model I tried to learn. Aside from basic feedforward architectures, I also tried LSTM/GRU layers, convolutional layers, and (as it felt, anyway) just about everything else.

What ended up finally working was a Transformer-based architecture with multiple isolated branches and output heads. In the next section I'll go into a deep dive of the model.

Ultimately, I would like to do another version of this project using images of the game screen as training data, and have the model predict the next image frame. That is actually the direction I started in for this project, but I soon realized that this goal was out of reach because I do not have a GPU. Some people are GPU poor, but I'm GPU broke. This model was trained on a machine with a pair of ten-year-old Xeon CPUs.

Explaining the Model Architecture

The model is defined in the training script.

The model takes in a set of 4 sequential time points containing ball and paddle coordinates and user inputs. I also included a number of engineered features (e.g. ball velocity, distance from each edge, etc.) to aid the model in learning.

train_x.append([paddle1_pos_1, paddle2_pos_1, ball_x_1, ball_y_1, paddle1_vel_1, paddle2_vel_1,    paddle1_pos_2, paddle2_pos_2, ball_x_2, ball_y_2, paddle1_vel_2, paddle2_vel_2,    paddle1_pos_3, paddle2_pos_3, ball_x_3, ball_y_3, paddle1_vel_3, paddle2_vel_3,    paddle1_pos_4, paddle2_pos_4, ball_x_4, ball_y_4, paddle1_vel_4, paddle2_vel_4,    delta_x_1, delta_y_1, dist_left_1, dist_right_1, dist_top_1, dist_bottom_1, coverage_p1_1, coverage_p2_1,    delta_x_2, delta_y_2, dist_left_2, dist_right_2, dist_top_2, dist_bottom_2, coverage_p1_2, coverage_p2_2,    delta_x_3, delta_y_3, dist_left_3, dist_right_3, dist_top_3, dist_bottom_3, coverage_p1_3, coverage_p2_3,    delta_x_4, delta_y_4, dist_left_4, dist_right_4, dist_top_4, dist_bottom_4, coverage_p1_4, coverage_p2_4])

The goal is to learn the physics of ball movement, bounces at the edges of the screen, paddle misses (point scored) or bounces, how to handle user input to adjust paddle positions, and to keep everything within bounds of the screen — basically everything that makes up a game of Pong. This knowledge contained in the model is used to predict the next frame in the game, which then slides into the list of past frames as new predictions are made. So, initially a game is started with a seed of 4 time points of data, then the model does all the...

Discussions

GOAT INDUSTRIES wrote 08/24/2025 at 20:07

Any code available? I'm not wanting to reproduce it myself, just to see how it works in a bit more detail.

Are you sure? yes | no

Nick Bild wrote 08/24/2025 at 20:09

Yes: https://github.com/nickbild/game_clone

Are you sure? yes | no

GOAT INDUSTRIES wrote 08/24/2025 at 20:13

Perfect. Looks deceptively simple, but i fully appreciate these things can take months to get right. Excellent work !!

Are you sure? yes | no

Bertrand Selva wrote 08/22/2025 at 06:27

Very impressive work !

Are you sure? yes | no

Nick Bild wrote 08/22/2025 at 12:29

Thanks so much!

Are you sure? yes | no

I Cloned Pong With a Neural Network

Description

Details

How It Works

Explaining the Model Architecture

Discussions

Similar Projects

AREC - Agricultural Records On Electronic Contract

Vehicle License Plate Recognition Edge-AI Camera

Unlock Passwords with TinyML Digit Recognition

Tabular Data VS Computer Vision

I Cloned Pong With a Neural Network

Become a Hackaday.io member

Just one more thing

Description

Details

How It Works

Explaining the Model Architecture

Enjoy this project?

Discussions

Become a Hackaday.io Member

Similar Projects

AREC - Agricultural Records On Electronic Contract

Vehicle License Plate Recognition Edge-AI Camera

Unlock Passwords with TinyML Digit Recognition

Tabular Data VS Computer Vision

Does this project spark your interest?

Report project as inappropriate

Send message

Remove Member