Environment
We designed the environment using the Unity editor, one of the most widely used frameworks for game development. With its added support to train in the extensive support of ML agents we decided to develop our own version of Parking Mania in Unity. The environment has trees, walls (the periphery of the parking area), other car spots,the plane, road lights and also the miniscule objects associated with our very important part of training that is our agent. For our car agent, we have associated front rays, back rays,wheels, body, center of mass.
Approach
Reinforcement learning enables the AI agent to learn from its past experience (by trial and error) as to which actions to take. In our case, giving a negative reward to our agent, whenever it hits an obstacle would eventually make it learn to avoid them. Similarly a positive reward after successful parking, would indicate a good move. Initially, while interacting with the environment, our agent would follow a random policy, but gradually with the help of well defined rewards, it will learn to park the car successfully without collision.
Algorithm Used
As per our environment adaptability we decided to go with PPO. PPO strikes a balance between ease of implementation, sample complexity, and ease of tuning, trying to compute an update at each step that minimizes the cost function while ensuring the deviation from the previous policy is relatively small. Our agent was trained by collecting a small batch of experiences interacting with the environment and using that batch to update its decision-making policy. Once the policy is updated with this batch, the experiences are thrown away and a newer batch is collected with the newly updated policy.