Tuesday, April 5, 2016

Shutdown Immediate Hang Situations

Shutdown Immediate Hang Situations

Many times Oracle Database Administrator's are faced with a situation where shutting down the instance with shutdown immediate seems to take infinite time and gives a impression that the session is hung. In most of the cases this is not a hang. Hang can be defined as a scenario when few sessions/processes are waiting on some action and in turn blocking other session/processes to complete their tasks. Now the original sessions could be waiting because of resource issues or Oracle Bugs.

Shutdown immediate can take long time to complete (appear to be hung) because of three reasons:

1. Uncommitted transactions are being rolled back.

2. SMON is cleaning temp segments or performing delayed block cleanouts.

3. Processes still continue to be connected to the database and do not terminate.

1. Uncommitted transactions are being rolled back:

This is the case when the message 'Waiting for smon to disable tx recovery' is posted in the alert log after we issue shutdown immediate.

There are two reasons for this:
- A large query was running at the time of shutdown immediate.
-A large transaction was running at the time of shutdown immediate.

For large queries:

SQL > select count(*) from v$session_longops where time_remaining>0;

If it returns a value > 0 then we can do a shutdown abort and then startup restrict and then again shutdown immediate.

For large transactions:

SQL > select sum(used_ublk) from v$transaction;

If it returns a large value then we have to wait for a long time for shutdowm to get completed.
If the large transaction is aborted and then shutdown is issued then we have to query v$fast_start_transactions and v$fast_start_server, we will not see anything in v$transaction at this time.

At this particular moment transaction recovery is going on and the count(*) will keep on decreasing:

SQL > select count(*) from v$fast_start_transaction;

Decreasing count will show that recovery is going on and when the recovery is completed the database will be shutdown.

But it is not desirable under some circumstances such as, when we have very short maintance window and we need to perform a shutdown immediate to do some work, in those cases we can use the following event and set in the init.ora file TEMPERORARLY To disable transaction recovery:

event="10513 trace name context forever, level 2"

and bounce the instance and issue shutdown immediate to get complete without transaction recovery.SMON will not do a transaction recovery untill this event is set in the init.ora file so it is necessary to remove this event whenever you get a chance to shutdown the database again, this time shutdown immediate can even take 3-5 hours(Just remove this event from pfile).

2. SMON is cleaning temp segments or performing delayed block cleanouts:

During a SHUTDOWN IMMEDIATE and SHUTDOWN NORMAL, SMON cleans up extents which are no longer needed and marking them as freed. It means that count from uet$ will decrease and count in fet$ will increase.

To verify that the temporary segments are decreasing have an active session available in SQL during the SHUTDOWN IMMEDIATE. Run query to ensure the database is not hanging, but is actually perform extent cleanup:

SQL> select count(block#) from fet$;
COUNT(BLOCK)
----------
115

SQL> select count(block#) from uet$;
COUNT(BLOCK)
----------
713
After some time, issue the query again and check the results:

SQL> select count(block#) from fet$;
COUNT(BLOCK)
----------
210

SQL > select count(block#) from uet$;
COUNT(BLOCK)
----------
512
If you do not have sufficient time to wait for this cleanup then you can set the following event and bounce the database and reissue shutdown immediate to skip this cleanup:

event="10061 trace name context forever, level 10"

It allows you to prevent SMON from cleaning up temporary segments. Again it is not recommended to set this event event forever. Whenever you have large downtime remove this event and allow SMON to do its work.

3. Processes still continue to be connected to the database and do not terminate:

After issuing shutdown immediate, If we see entries in alert log file as:

Tue Jan  8 12:00:27 2008
Active call for process 10071 user 'oracle' program 'oracle@server.domain.abc (J001)'
SHUTDOWN: waiting for active calls to complete.
Tue Jan  8 12:00:57 2008

SHUTDOWN: Active sessions prevent database close operation
It shows that there are some active calls at program 'oracle@server.domain.abc (J001)' which pmon is not able to clear up.This message is due to the fact that database is waiting for pmon to clean up processes, but pmon is unable to clean them. The client connections to the server are causing the shutdown immediate or normal to hang. Do the following in this case:

1. Before shutdown immediate, shutdown the listener:

$ lsnrctl stop

2. Now check if there are any connection present at the database as:

$ ps -eaf | grep LOCAL

It will give you the OSPIDs of the client connected to database.

3 Manually kill them as:

# Kill -9 <OSPID>

4. Issue shutdown immediate now.

Do not forget to bring up the listener after startup

In addition to this you can set 10046 event in the session used to shutdown the instance. This will help to tell the event on which session is waiting

SQL>alter session set events '10046 trace name context forever, level 12'

SQL>Shutdown immediate;
Look for the trace file in user_dump_dest location. Also look at the alert.log for any other messages. They might be helpful in case the shutdown is experiencing hang situation.

In order to check reason why shutdown immediate hangs

SQL>connect / as SYSDBA
SQL>Select * from x$ktuxe where ktuxecfl = 'DEAD';
This above query shows dead transactions that SMON is looking to rollback

Oracle 11g SQL*Plus called the “prelim” option. This option is very useful for running oradebug and other utilities that do not require a real connection to the database.
C:\>sqlplus –prelim
-or- in SQL you can set
SQL>Set _prelim on
SQL>connect / as sysdba

Now you are able to run oradebug commands to diagnose a hung database issue:
SQL> oradebug hanganalyze 3

Wait at least 2 minutes to give time to identify process state changes.

SQL>oradebug hanganalyze 3
Open a separate SQL session and immediately generate a system state dump.
SQL>alter session set events 'immediate trace name SYSTEMSTATE level 10';

When shutdown immediate taking longer time as compare to the normal time usually it is taking. You must perform following task before performing actual shutdown immediate.

-All active session.
-Temporary Tablespace Recover.
-Long Running Query in Database.
-Large Transaction.
-Progress of the Transaction that oracle is recovering.
-Parallel Transaction Recovery.

SQL> Select sid, serial#, username, status, schemaname, logon_time from v$session where status='ACTIVE' and username is not null;
If Active session is exist then, try to find out what is doing in the database by this session. Active session make shutdown slower
SQL> Select f.R "Recovered", u.nr "Need Recovered" from (select count(block#) R , 1 ch from sys.fet$ ) f,(selectcount(block#) NR, 1 ch from sys.uet$) u where f.ch=u.ch;

Check to see any long query is running into the database while you are trying to shutdown the database.
SQL> Select * from v$session_longops where time_remaining>0 order by username;

Check to ensure large transaction is not going on while you are trying to shutdown the database.
SQL>Select sum(used_ublk) from v$transaction;

Check the progress of the transaction that oracle is recovering.
SQL>Select * from v$fast_start_transactions;

Check to ensure that any parallel transaction recovery is going on before performing shutdown immediate.
SQL>Select * from v$fast_start_servers;

Finally if you do not understand the reason why the shutdown is hanging or taking longer time to shutdown then try to shutdown your database with ‘abort’ option and startup with ‘restrict’ option and try shutdown with ‘immediate’ option.
Check the alert.log, if you find any error related ‘Thread 1 cannot allocate new log, sequence’ then you need to enable your archival process. Your archival is disable due to any reason.
Process:
1. In command prompt set the oracle_sid first
C:\SET ORACLE_SID = ‘your db_name’

2. Now start the SQL*plus:
C:\sqlplus /nolog
SQL>connect sys/***@instance_name
SQL>Select instance_name from v$instance;

3. Try to checkpoint before shutdown abort
SQL>alter system checkpoint;
SQL> shutdown abort;

4. Start the database with ‘restrict’ option so that no other user is able to connect you in the mean time.
SQL>startup restrict;
SQL>select logins from v$instance;
RESTRICTED
SQL>shutdown immediate;

5. Mount the database and ensure archive process is enabling by using archive log list command. If it is disabling then enable it.
SQL>startup mount;
SQL> archive log list;  --if disable then enable it
SQL>Alter database archivelog;
SQL> Alter system archive log start;
Note: If your archivelog destination and format is already set no need to set again. After setting check with the ‘archive log list’ command archival is enable or not.
SQL> alter database open;
Now check if your database is still in restricted mode then remove the restriction.
SQL>select logins from v$instance;
SQL>alter system disable restricted session;
Note: Now try to generate archivelog with any command
SQL>alter system archivelog current;
SQL>alter system switch logfile;
Now try to check or perform normal shutdown and startup with the database.

Reference metalink ID's

Note 1076161.6: Shutdown immediate or shutdown Normal hangs. SMON disabling TX recovery
Note 375935.1: What to do and not to do when shutdown immediate hangs.
Note 428688.1: Shutdown immediate very slow to close database.
Note 164504.1: How to Check Why Shutdown Immediate Hangs

No comments:

Post a Comment