OC4J~OC4J_EM~default_island~1 crashing every 10 minutes

We had a problem with Enterprise Manager management server having outages every 10-15 minutes

Looking at the ipm.log we were getting the following:
08/03/24 02:48:00 [4] Starting Process: OC4J~OC4J_EM~default_island~1 (999687094:0)
08/03/24 02:51:29 [4] Process Alive: OC4J~OC4J_EM~default_island~1 (999687094:16579)
08/03/24 03:07:09 [4] Process Crashed: OC4J~OC4J_EM~default_island~1 (999687094:16579) - Restarting
08/03/24 03:07:11 [4] Starting Process: OC4J~OC4J_EM~default_island~1 (999687095:0)
08/03/24 03:10:40 [4] Process Alive: OC4J~OC4J_EM~default_island~1 (999687095:21694)
08/03/24 03:12:30 [4] Process Crashed: OC4J~OC4J_EM~default_island~1 (999687095:21694) - Restarting


I found that we had a backlog of uploaded files from the agents. Stopping the management server and deleting the files out of $ORACLE_HOME/sysman/emd/recv greater than 1 day and restarting the management server resolved the problem.

There must have been a file that had been corrupted or the oms could not process. Interestingly, the day that the problems started coincided with when we put the latest Critical Patch on our production servers. For future note we will shutdown the agents and create a blackout before starting to patch the database homes on the monitored machines.

0 comments:

All views on this blog are my own. All hints, tips and scripts should be run and tested on your development and test servers before attempting in production.

About this blog

I have been DBA with over 10 years experience in Oracle. This blog aims to note interesting bits and pieces that I come across on a day to day basis.