oracle cloud

oracle也推出了自己的cloud服务,详细的情况可以登录https://cloud.oracle.com 来查看。oracle也开始推出各类基础平台的云解决方案了,就关系型数据库而言,价格可以参考下面。

 

 

 

 

ORA-3136: WARNING Inbound Connection Timed Out

 

客户遭遇后台报错:"ORA-3136: WARNING Inbound Connection Timed Out",而且连续几十条,但前段的用户似乎没有异常的状态反应,对用户到没有造成什么影响。ORA-3136的成因相对复杂,可能有很多种可能,例如在MOS中有篇文档[ID 465043.1],详细的介绍了可能性:
 
—————————————————————————————————————-
The following are the most likely reasons for this error: 
 
1.Server gets a connection request from a malicious client which is not supposed to connect to the database.  In this case the error thrown would be the expected and desirable behavior. 
You can get the client address for which the error was thrown in the sqlnet.log file that is local to the database. 
 
2.The server receives a valid client connection request but the client takes a long time to authenticate more than the default 60 seconds. 
 
3.The DB server is heavily loaded due to which it cannot finish the client logon within the timeout specified
 
—————————————————————————————————————–
 
The "WARNING: inbound connection timed out (ORA-3136)" in the alert log indicates that the client was not able to complete
the authentication process within the period of time specified by the parameter SQLNET.INBOUND_CONNECT_TIMEOUT
 
同时,在sqlnet.log中可能会伴随着出现ORA-12170,TNS-12535信息,而且sqlnet.log中应该记录了"验证失败"的客户端信息,在udump中可能会有一些trace文件,trace文件的内容包含类似这样的信息:
"opiino: Attach failed! error=-1 ifvp=0"
 
自从10201开始,SQLNET.INBOUND_CONNECT_TIMEOUT参数被引入,缺省值为60秒,如果在60秒内连接数据库的client由于各种原因无法完成验证工作,在alert中就会出现ORA-3136的报错信息,同时此参数的引入也有意为防止DDOS的攻击。
 
—————————————————————————————————————————-
From 10.2.0.1 onwards the default setting for the parameter SQLNET.INBOUND_CONNECT_TIMEOUT is 60 seconds.
If the client is not able to authenticate within 60 seconds, the warning would appear in the alert log and the client connection will be terminated
—————————————————————————————————————————-
 
Note: This timeout restriction was introduced to combat Denial of Service (DoS) attack whereby malicious clients attempt to flood database servers with connect requests that consumes resources.
 
影响:通常这个报错不会影响db本身的稳定性,但由于client端会无法连接,要视对client端的影响而定,如果客户端相关人员并没有任何异常的反馈,那影响有限。
 
如何解决:
 
The default value of 60 seconds is good enough in most conditions for the database server to authenticate a client connection. If it is taking longer, then it's worth checking the following items before implementing the workaround:
 
1. Check whether local connection on the database server is successful & quick.
2. If local connections are quick ,then check for underlying network delay with the help of your network administrator.
3. Check whether your Database performance has degraded in anyway.
4. Check alert log for any critical errors for eg, ORA-600 or ORA-7445 and get them  resolved first. 
These critical errors might have triggered the slowness of the database server.
 
It is often necessary to increase the values for INBOUND CONNECT TIMEOUT at  both the listener and the database in order to resolve this issue.It is usually advisable to set the database (sqlnet.ora) value slightly higher than the listener (listener.ora).    The authentication process is more demanding for the database than the listener.
(建议将sqlnet.ora中的参数值设置比listener.ora中稍微大一些)
 
将SQLNET.INBOUND_CONNECT_TIMEOUT增大,可以参考下面的步骤:
 
To set these parameters to use values higher than the default of 60 seconds, follow these instructions and restart the listener. 
 
There is no need to restart Oracle: 
 
Edit the server side sqlnet.ora file and add this parameter:
 
SQLNET.INBOUND_CONNECT_TIMEOUT=<n>  Where <n> is the value in seconds.
 
E.g.:
 
SQLNET.INBOUND_CONNECT_TIMEOUT = 120
 
Edit the listener.ora file and add this parameter: 
 
INBOUND_CONNECT_TIMEOUT_<listenername> = <n>  Again, where <n> is the timeout value in seconds.  
 
For example if the listener name is LISTENER then use:
 
INBOUND_CONNECT_TIMEOUT_LISTENER = 110
 
 
更详细的信息可以参考mos中的文档[ID 465043.1]
 
–EOF–
 

ORA-08103: object no longer exists

“ORA-08103: object no longer exists”,一位客户那里遇到了这个问题,其实这个问题还是比较经典的。之前遇到过,不过与这次的情况有些区别。以前遇到的案例类似如下(note:[8103.1]):

Cause

ORA-8103 is caused by an invalid block type. The block header has an invalid block type or the block type inside the block is not expected; e.g. a data block (Type=6) was expected but the actual block information is not a data block (Type!=6).

ORA-8103 is also caused by an unexpected data_object_id where it is changing for the involved objects while the affected SQL statement is executed.

These two causes might be due to an expected behavior or other problems.

Tables are being dropped/truncated while a SQL statement for those tables is still in execution。
In other words the object has been deleted by another session since the operation began。
Look if dba_objects.data_object_id is changing for the affected object while queries are being executed.
data_object_id is changed by DDL statements like:
———————–
truncate table
alter index .. rebuild
alter table .. move
etc.
———————–

更确切的现象是,当数据库中的对象遭遇多条并行SQL语句执行,同时某条语句做了更改,例如:truncate等,其中某条查询的语句会报出”ORA-08103: object no longer exists”

而这次客户的问题与上述现象不符,因为环境相对安静,仅仅是一条涉及某张表A的select无法执行,全表扫面的select * from能够查询,最关键的是analyze table报了错。
analyze table报错,基本可以断定是block corruption的问题,而且很可能是在memory级别,例如[note 81301.1]提到的下面现象:

Block Corruption.

Hardware, IO subsystem or OS problems may cause block corruptions overwriting the Block Type in the block header causing the error ORA-8103.
The block is healthy but it is an “OLD/STALE” block. These kind of corruptions might be caused by LOST IO/LOST WRITE or a bug in external non-oracle tools that migrate file systems while the database is OPEN.   In that case the data_object_id for the affected object could have changed but the actual block does not reflect it.
Note that the block may also be temporarily corrupted in the buffer cache (SGA Memory).

如果是block corruption,可以按照如下的思路设置event来诊断。

1)ANALYZE TABLE A VALIDATE STRUCTURE; <===do not use ‘CASCADE’
2)ANALYZE INDEX xxxx VALIDATE STRUCTURE; <===analyze index on the table one by one
3)If you found which index or table is corrupted, please get trace on that index or table:
alter session set max_dump_file_size=unlimited;
alter session set db_file_multiblock_read_count=1;
alter session set events ‘immediate trace name trace_buffer_on level 1048576’;
alter session set events ‘10200 trace name context forever, level 1’;
alter session set events ‘8103 trace name ERRORSTACK level 3’;
ANALYZE TABLE A VALIDATE STRUCTURE;

然后在udump中拿到相应的trace文件,里面查找”corrupt”关键字,如下:

——————————————————————————————
BH (0000000ACBFB9F08) file#: 7 rdba: 0x01d2c622 (7/1230370) class: 1 ba: 0000000ACB7FE000
set: 66 blksize: 8192 bsi: 0 set-flg: 0 pwbcnt: 0
dbwrid: 5 obj: 52656 objn: 52656 tsn: 6 afn: 7
hash: [ee90b078,ee90b078] lru: [cbfbab38,cbfb93d8]
lru-flags: hot_buffer
obj-flags: object_ckpt_list
ckptq: [cbfba048,c4fd54f8] fileq: [cbfba058,c4fd5508] objq: [cbfba108,c4fd55b8]
use: [dbf29bf8,dbf29bf8] wait: [NULL]
st: XCURRENT md: SHR tch: 0
flags: buffer_dirty gotten_in_current_mode block_written_once
redo_since_read
LRBA: [0x0.0.0] HSCN: [0x0.1b7da2] HSUB: [1]
buffer tsn: 6 rdba: 0x01d2c622 (7/1230370)
scn: 0x0000.001b7da2 seq: 0x01 flg: 0x02 tail: 0x7da20601
frmt: 0x01 chkval: 0x0000 type: 0x2b=unknown
Hex dump of corrupt header 4 = CORRUPT

<===================================CORRUPT. type: 0x2b=unknown
——————————————————————————————-
之所以能够full scan的查询出正常结果,很可能这个corrupt仅仅是发生在SGA中,alter system flush buffer_cache,将buffer刷出去也许即可解决了。
这次的案例是客户reboot OS之后,查询就恢复正常了,这里强调的是,遇到类似的问题,可以借助set events来诊断。

alter session set max_dump_file_size=unlimited;
alter session set db_file_multiblock_read_count=1;
alter session set events ‘immediate trace name trace_buffer_on level 1048576’;
alter session set events ‘10200 trace name context forever, level 1’;
alter session set events ‘8103 trace name ERRORSTACK level 3’;

–EOF–

oracle rac remove service

SQL> select NAME,FAILOVER_METHOD,FAILOVER_TYPE from dba_services where name=’testservice’

NAME                                     FAILOVER_METHOD                                                  FAILOVER_TYPE
—————————————- ———————————————————————-
testservice                              BASIC                                                                                  SESSION

SQL> show parameter service

NAME                                 TYPE        VALUE
———————————— ———–        ——————————
service_names               string                 testservice_PRECONNECT, rac.world
SQL>

[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
———————————————       —————
ora….E1.lsnr application        ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application     ONLINE    ONLINE    node1
ora….E2.lsnr application       ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application        ONLINE    ONLINE    node2
ora….c1.inst application        ONLINE    ONLINE    node1
ora….c2.inst application        ONLINE    ONLINE    node2
ora….vice.cs application       ONLINE    ONLINE    node1
ora….ac1.srv application       ONLINE    ONLINE    node1
//remove the service “testservice”
[oracle@node2 ~]$ srvctl remove service -d rac -s testservice
testservice PREF: rac1 AVAIL: rac2
Remove service testservice from the database rac? (y/[n]) y
PRKP-1024 : The service testservice is still running.

//看来需要先将service给停掉才可以

[oracle@node2 ~]$ srvctl stop service -d rac -s testservice
[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora….E1.lsnr application    ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora….E2.lsnr application    ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application         ONLINE    ONLINE    node2
ora….c1.inst application         ONLINE    ONLINE    node1
ora….c2.inst application        ONLINE    ONLINE    node2
ora….vice.cs application       OFFLINE   OFFLINE
ora….ac1.srv application       OFFLINE   OFFLINE

//stop the service
[oracle@node2 ~]$ srvctl remove service -d rac -s testservice
testservice PREF: rac1 AVAIL: rac2
Remove service testservice from the database rac? (y/[n]) y

//之前与service相关的两个服务消失了
xxxx.cs
xxxx.srv

(你可以测试,当启动、关闭xxxx.srv服务时,可以关联xxxx.cs的状态)
[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora….E1.lsnr application       ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora….E2.lsnr application       ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application        ONLINE    ONLINE    node2
ora….c1.inst application        ONLINE    ONLINE    node1
ora….c2.inst application        ONLINE    ONLINE    node2

//xxxx.srv的个数取决于在创建service时选择的instance个数,例如

//添加一个service,名字叫service2,rac1作为pre_list成员,rac2作为avail_list成员,TAF policy配置为PRECONNECT
[oracle@rac1 ~]$ srvctl add service -d rac -s service2 -r “rac1” -a “rac2” -P BASIC

如果是-r参数变成 -r “rac1,rac2″则会在创建两个xxx.srv结尾的resource

[oracle@node2 ~]$ srvctl add service -d rac -s service3 -r “rac1,rac2” -a “rac2” -P BASIC
[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora….E1.lsnr application        ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application     ONLINE    ONLINE    node1
ora….E2.lsnr application       ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application        ONLINE    ONLINE    node2
ora….c1.inst application        ONLINE    ONLINE    node1
ora….c2.inst application        ONLINE    ONLINE    node2
ora….ice3.cs application       OFFLINE   OFFLINE
ora….ac1.srv application       OFFLINE   OFFLINE
ora….ac2.srv application       OFFLINE   OFFLINE

//启动这两个resource
[oracle@node2 ~]$ crs_start ora.rac.service3.rac1.srv
Attempting to start `ora.rac.service3.rac1.srv` on member `node1`
Start of `ora.rac.service3.rac1.srv` on member `node1` succeeded.
[oracle@node2 ~]$ crs_start ora.rac.service3.rac2.srv
Attempting to start `ora.rac.service3.rac2.srv` on member `node2`
Start of `ora.rac.service3.rac2.srv` on member `node2` succeeded.
[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora….E1.lsnr application       ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora….E2.lsnr application       ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application        ONLINE    ONLINE    node2
ora….c1.inst application       ONLINE    ONLINE    node1
ora….c2.inst application      ONLINE    ONLINE    node2
ora….ice3.cs application       ONLINE    ONLINE    node1
ora….ac1.srv application      ONLINE    ONLINE    node1
ora….ac2.srv application      ONLINE    ONLINE    node2

//remove the service “service3”
先把service3停掉
[oracle@node2 ~]$ srvctl stop service -d rac -s service3
[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora….E1.lsnr application        ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora….E2.lsnr application        ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application        ONLINE    ONLINE    node2
ora….c1.inst application        ONLINE    ONLINE    node1
ora….c2.inst application        ONLINE    ONLINE    node2
ora….ice3.cs application       OFFLINE   OFFLINE
ora….ac1.srv application      OFFLINE   OFFLINE
ora….ac2.srv application      OFFLINE   OFFLINE

删除service,删除service可能无法搞定全部的相关resource,下面的例子就是还剩下一个resource offline,没有删除掉
可以考虑crs_unregister来清除

[oracle@node2 ~]$ srvctl remove service -d rac -s service3 -f
[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora….E1.lsnr application        ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora….E2.lsnr application       ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application        ONLINE    ONLINE    node2
ora….c1.inst application       ONLINE    ONLINE    node1
ora….c2.inst application       ONLINE    ONLINE    node2
ora….ac2.srv application      OFFLINE   OFFLINE
[oracle@node2 ~]$ crs_unregister ora.rac.service3.rac2.srv
[oracle@node2 ~]$ crs_stat -t
Name           Type           Target    State     Host
————————————————————
ora….E1.lsnr application    ONLINE    ONLINE    node1
ora.node1.gsd  application    ONLINE    ONLINE    node1
ora.node1.ons  application    ONLINE    ONLINE    node1
ora.node1.vip  application    ONLINE    ONLINE    node1
ora….E2.lsnr application    ONLINE    ONLINE    node2
ora.node2.gsd  application    ONLINE    ONLINE    node2
ora.node2.ons  application    ONLINE    ONLINE    node2
ora.node2.vip  application    ONLINE    ONLINE    node2
ora.rac.db     application    ONLINE    ONLINE    node2
ora….c1.inst application    ONLINE    ONLINE    node1
ora….c2.inst application    ONLINE    ONLINE    node2

–EOF–

 

 

opatch_bug_9772888

相信很多升级到oracle 10205这个版本的用户都碰见过比较noisy的信息提示:

"WARNING:Could not lower the asynch I/O limit to 160 for SQL direct I/O. It is set to -1"


可能不会在alert.log中出现,可能仅仅会出现在udump中,以trace文件的形式出现,不断的产生大小1k左右的文件。

其实影响比较小,在MOS上有一篇文档介绍了相关的bug信息。

Bug 9772888 – Needless "WARNING:Could not lower the asynch I/O limit to .. for SQL direct I/O It is set to -1" messages [ID 9772888.8]

这篇文章比较详细的记录了这个"噪音",并给出了相应的解决方案。
—————————————————————————————-
This issue is fixed in
12.1 (Future Release)
11.2.0.2 (Server Patch Set)
10.2.0.5.2 Patch Set Update
10.2.0.5 Patch 1 on Windows Platforms

—————————————————————————————


文档中的workaround也解释说,相关的patch也仅仅是帮助"not produce noise"..

————————————————————————————————–
Description
Trace files may be seen with warning of the form:
 WARNING:Could not lower the asynch I/O limit to 160 for SQL direct I/O.
 It is set to -1
 
This bug is specifically for the case of messages showing "It is set
to -1" in the message.
 
Workaround
  Ignore the message – it is spurious and of no value.
  The fix only hides the message so that it does not produce "noise"
——————————————————————–

如果是windows平台的server,可以参考最后一个解决方案:10.2.0.5 Patch 1 on Windows Platforms

多说一句,MOS上准备的patch似乎按照这样的逻辑分类:windows、非windows,所有windows相关的patch都是单独分类的。(有待求证)

"10.2.0.5.2 Patch Set Update"都是非windows平台的补丁。

后续开始opatch 相关补丁的过程,如果在windows平台上,还需要注意一些文件被占用的问题,如下。

"补丁程序13460968: ORACLE 10G 10.2.0.5 PATCH 14 BUG FOR WINDOWS (64-BIT AMD64 AND INTEL EM64"

是在没有购买"Software Extended Support"服务情况下,能够取得的10205最高的patch bundle了,而非windows平台的补丁是以PatchSetUpdate的形式出现的。

opatch的过程不算麻烦,按照readme使用opatch apply即可,但是在windows下面需要注意将oracle相关的进程关闭掉,即使关闭掉所有与oracle相关的服务,可能仍然会遇见文件占用问题。

——————————————————————————————————————————
Running prerequisite checks…
信息:Space Needed : 1091658872
信息:Prereq checkPatchApplicableOnCurrentPlatform Passed for patch : 13460968
信息:
Following files are active :
E:oracleproduct10.2.0db_1binoci.dll
信息:Prerequisite check "CheckActiveFilesAndExecutables" failed.
The details are:

Following files are active :
E:oracleproduct10.2.0db_1binoci.dll

严重:OUI-67074:在先决条件检查期间 ApplySession 失败: Prerequisite check "CheckActiveFilesAndExecutables" failed.
信息:系统无任何变化, OPatch 不会尝试还原系统
信息:Finishing ApplySession at Tue Dec 04 16:51:42 CST 2012
信息:Total time spent waiting for user-input is 0 seconds.  Finish at Tue Dec 04 16:51:42 CST 2012
信息:堆栈说明: oracle.opatch.PrereqFailedException: Prerequisite check "CheckActiveFilesAndExecutables" failed.
信息:堆栈跟踪: oracle.opatch.OPatchSessionHelper.runApplyPrereqs(OPatchSessionHelper.java:4416)
信息:堆栈跟踪: oracle.opatch.ApplySession.processLocal(ApplySession.java:3695)
信息:堆栈跟踪: oracle.opatch.ApplySession.process(ApplySession.java:5577)
信息:堆栈跟踪: oracle.opatch.OPatchSession.main(OPatchSession.java:1719)
信息:堆栈跟踪: oracle.opatch.OPatch.main(OPatch.java:630)
———————————————————————————————–

报错显示E:oracleproduct10.2.0db_1binoci.dll还是active、被占用的,在windows平台上,很可能是windows服务中某些关联的服务是在OS启动时,自启动的。
而这些服务与ORACLE_HOME中的某些文件占用是存在关联关系的。关闭掉Distribute Transaction Coordinator服务,问题得到解决,opatch apply一路顺利。

一般按照readme执行完毕opatch apply之后还有一些post script需要执行,来完成收尾的工作。

(1)
For each database instance running out of the ORACLE_HOME being patched, connect to the database using SQL*Plus as SYSDBA and run catcpu.sql as follows:

> cd %ORACLE_HOME%BUNDLEPatch14
> sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> STARTUP
SQL> @catcpu.sql
SQL> QUIT

(2)
If catcpu.sql reports any Invalid Objects, compile the invalid objects as follows.

(For large numbers of objects, this compilation step can take some time. If you are applying this patch through the Oracle Enterprise Manager console, you will be prompted to run this compilation script.)

> cd %ORACLE_HOME%rdbmsadmin
> sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> @utlprp.sql 0

To check for invalid objects, execute the following statement:

SQL> SELECT OBJECT_NAME FROM DBA_OBJECTS WHERE STATUS = 'INVALID';

整个opatch apply 13460968的过程基本结束,有个奇怪问题,几次10201打patchset到10205都遇到一个invalid的object无法处理,一个类型为type的对象,名称叫STR_SPLIT。

–EOF–