We had a problem with Enterprise Manager management server having outages every 10-15 minutes
Looking at the ipm.log we were getting the following:
08/03/24 02:48:00 [4] Starting Process: OC4J~OC4J_EM~default_island~1 (999687094:0)
08/03/24 02:51:29 [4] Process Alive: OC4J~OC4J_EM~default_island~1 (999687094:16579)
08/03/24 03:07:09 [4] Process Crashed: OC4J~OC4J_EM~default_island~1 (999687094:16579) - Restarting
08/03/24 03:07:11 [4] Starting Process: OC4J~OC4J_EM~default_island~1 (999687095:0)
08/03/24 03:10:40 [4] Process Alive: OC4J~OC4J_EM~default_island~1 (999687095:21694)
08/03/24 03:12:30 [4] Process Crashed: OC4J~OC4J_EM~default_island~1 (999687095:21694) - Restarting
I found that we had a backlog of uploaded files from the agents. Stopping the management server and deleting the files out of $ORACLE_HOME/sysman/emd/recv greater than 1 day and restarting the management server resolved the problem.
There must have been a file that had been corrupted or the oms could not process. Interestingly, the day that the problems started coincided with when we put the latest Critical Patch on our production servers. For future note we will shutdown the agents and create a blackout before starting to patch the database homes on the monitored machines.
OC4J~OC4J_EM~default_island~1 crashing every 10 minutes
Posted by
Notetaker
Wednesday, April 9, 2008
Labels: Enterprise Manager Grid Control , Oracle
All views on this blog are my own. All hints, tips and scripts should be run and tested on your development and test servers before attempting in production.
0 comments:
Post a Comment