A reinforcement learning–based torque-split controller for a parallel hybrid electric vehicle.
This project explores the use of Reinforcement Learning (RL) to design intelligent driving modes for a hybrid electric vehicle. Using Proximal Policy Optimization (PPO), separate control policies were trained to learn optimal torque-split behavior between an internal combustion engine and an electric motor.
The result is a simulation-based hybrid controller capable of exhibiting distinct Eco, Normal, and Sport driving behaviors, along with a supervisory controller that automatically selects between them based on driving conditions.
Objective:
Demonstrate that reinforcement learning can learn meaningful, mode-dependent hybrid powertrain control strategies without hard-coded rules.
Key Outcomes:
The project was implemented in MATLAB using the Reinforcement Learning Toolbox and a custom parallel hybrid vehicle model. All MATLAB scripts and models are available in the AI-controller/MATLAB directory of the GitHub repository.
The RL agent controls a single continuous action: the engine–motor torque split, while observing vehicle speed, acceleration demand, and battery state of charge (SOC).
Each driving mode is represented by a separately trained PPO agent sharing the same environment but using mode-specific reward functions.
Training used a two-stage strategy:
This approach produced stable, smooth control policies with reasonable training time.
The controller was trained and evaluated using a baseline drive cycle, then tested on additional urban and highway cycles to assess generalization.
Baseline drive cycle used for training and evaluation.
Distinct behaviors emerged purely from reward design.
PPO agents were evaluated using episode reward values and compared against fixed torque-split baselines (constant split). These rewards reflect the defined cost function (fuel usage, SOC behavior, and feasibility penalties), not direct real-world vehicle performance.
Based on reward values, the learned policies improved on the best fixed baselines for two of the three modes:
Generalization was evaluated across a Baseline, Urban, and Highway drive cycle. Across all cycles, consistent reward-driven behavior emerged: engine-dominant torque during acceleration and increased electric motor usage during steady cruising, resulting in gradual SOC decline.
Torque split behavior and power distribution for Eco, Normal, and Sport modes on the baseline drive cycle.
Mode separation was clearly reflected in reward-aligned trends. Eco minimized fuel-related reward penalties at the expense of deeper SOC usage, Sport accepted higher fuel penalties to preserve SOC during high-demand events, and Normal produced the most balanced fuel–SOC trade-off.
A rule-based supervisory controller integrates the three PPO agents by selecting modes based on:
Observed behavior included:
This produced intuitive, human-like mode transitions without manual input.
Urban short and highway drive cycles used to test generalization beyond the training data.
Power distribution and SOC behavior across driving modes on the highway cycle.
Mode-dependent behavior under frequent stop-and-go conditions.