AI-Based Hybrid Vehicle Mode Controller

A reinforcement learning–based torque-split controller for a parallel hybrid electric vehicle.

This project explores the use of Reinforcement Learning (RL) to design intelligent driving modes for a hybrid electric vehicle. Using Proximal Policy Optimization (PPO), separate control policies were trained to learn optimal torque-split behavior between an internal combustion engine and an electric motor.

The result is a simulation-based hybrid controller capable of exhibiting distinct Eco, Normal, and Sport driving behaviors, along with a supervisory controller that automatically selects between them based on driving conditions.

Project Overview

Screen Shot 2026-01-05 at 12 17 33 PM

Objective:
Demonstrate that reinforcement learning can learn meaningful, mode-dependent hybrid powertrain control strategies without hard-coded rules.

Key Outcomes:

PPO agents learned distinct torque-split behaviors
Learned controllers outperformed fixed torque-split baselines
Policies generalized across multiple drive cycles
Automatic mode switching produced intuitive behavior

System Architecture

The project was implemented in MATLAB using the Reinforcement Learning Toolbox and a custom parallel hybrid vehicle model. All MATLAB scripts and models are available in the AI-controller/MATLAB directory of the GitHub repository.

Engine: GM EcoTec inline-4 (85 kW peak)
Electric Motor: UQM PowerPhase PMSM (150 kW peak)
Battery: 14 kWh Li-ion pack (96s8p)
Driveline: 2-speed gearbox with final drive

The RL agent controls a single continuous action: the engine–motor torque split, while observing vehicle speed, acceleration demand, and battery state of charge (SOC).

Screen Shot 2026-01-05 at 12 19 33 PM

Reinforcement Learning Approach

Each driving mode is represented by a separately trained PPO agent sharing the same environment but using mode-specific reward functions.

Training used a two-stage strategy:

Rapid exploration with high learning rates

Screen Shot 2026-01-05 at 4 15 59 PM

Policy refinement with reduced learning rates

Screen Shot 2026-01-05 at 4 16 15 PM

This approach produced stable, smooth control policies with reasonable training time.

Driving Modes

The controller was trained and evaluated using a baseline drive cycle, then tested on additional urban and highway cycles to assess generalization.

Baseline drive cycle used for training and evaluation.

Baseline drive cycle velocity profile

Eco Mode

Strong fuel consumption penalty
SOC deadband for smoother control
Prioritizes efficiency

Normal Mode

Balanced fuel and SOC penalties
SOC regulated toward a target value

Sport Mode

Reduced fuel penalty
SOC floor instead of target
Performance reward tied to acceleration

Distinct behaviors emerged purely from reward design.

Performance Results

PPO agents were evaluated using episode reward values and compared against fixed torque-split baselines (constant split). These rewards reflect the defined cost function (fuel usage, SOC behavior, and feasibility penalties), not direct real-world vehicle performance.

Based on reward values, the learned policies improved on the best fixed baselines for two of the three modes:

Normal: -1914.99 (PPO) vs -2016.18 (best fixed baseline)
Eco: -3295.63 (PPO) vs -3342.69 (best fixed baseline)
Sport: -3230.79 (PPO), comparable to an engine-heavy baseline (split = 0.8) at -3189.2

Generalization was evaluated across a Baseline, Urban, and Highway drive cycle. Across all cycles, consistent reward-driven behavior emerged: engine-dominant torque during acceleration and increased electric motor usage during steady cruising, resulting in gradual SOC decline.

Baseline Drive Cycle Results

Torque split behavior and power distribution for Eco, Normal, and Sport modes on the baseline drive cycle.

Torque split over time on baseline drive cycle

Engine and motor power distribution on baseline drive cycle

Mode separation was clearly reflected in reward-aligned trends. Eco minimized fuel-related reward penalties at the expense of deeper SOC usage, Sport accepted higher fuel penalties to preserve SOC during high-demand events, and Normal produced the most balanced fuel–SOC trade-off.