Troubleshooting KVM Reboots on Oracle ODA Due to OOM Kills and HugePages Misconfiguration
Recently, I encountered an unusual behavior in one of our Oracle Database Appliance (ODA) environments that I felt compelled to share—especially for those running WebLogic workloads on ODA. We observed unplanned reboots of the KVMs hosting WebLogic Servers, which initially appeared to be random and without clear cause.
Initial Observations
After extensive troubleshooting and generating ODA SOS reports on both nodes, we discovered that the OOM (Out of Memory) killer was terminating the KVM processes. This pointed us toward a memory management issue.
What puzzled us was that the servers still showed available memory, yet the OOM killer was stepping in and forcefully shutting down KVMs. This inconsistent behavior led us to investigate deeper.
ODA Environment Details
For context, here’s a snapshot of our environment:
1. ODA Model: X8-2 HA
2. Databases: 3 running on ACFS
3. Virtual Machines: 4 KVMs hosting WebLogic Services
Root Cause Analysis
After diving deeper, we discovered the issue was related to HugePages allocation. Our ODA environment had an excessively high number of HugePages configured—over 100,000—which consumed a significant portion of the server’s available memory.
While our databases used around 50–60 GB of memory, the HugePages allocation was disproportionate and far beyond actual requirements. This left very little memory available for other components, like KVMs, leading the system to believe it was out of memory—even when it technically wasn’t.
The Solution
We decided to adjust the HugePages configuration to a more reasonable level. Here’s how we resolved the issue:
Check Current HugePages Configuration
# odacli list-osconfigurations
This command provided us with the current HugePages settings.
Modify HugePages Configuration
# odacli modify-osconfigurations –number-hugepages 50000
We reduced the HugePages count to 50,000—still sufficient for our database workloads but now allowing more free memory for KVMs.
# Reboot the Server
A reboot was necessary to apply the changes and bring the new configuration into effect.
Outcome
After implementing the above steps, memory utilization on our ODA servers significantly improved. Most importantly, we’ve had no further disruptions or unexpected reboots of the KVMs since the change.
Final Thoughts
If you’re facing unexplained KVM reboots in an ODA environment—especially one running WebLogic or other memory-intensive applications—it’s worth reviewing your HugePages settings. An oversized HugePages allocation can silently cripple available memory and lead to OOM-related instability.
I’ve been in Oracle consulting for over a decade, and I specialize in Oracle products including Databases, SOA, WebLogic, ODA, Exadata, and more. If you have similar issues or need help with your Oracle stack, feel free to reach out.
Contact: 00971-50-8718335