sundog315
===========================================================
Linux网卡绑定实现带宽翻倍
===========================================================

之前有过一篇BLOG,是介绍如何简单的将Linux两张网卡进行绑定,已实现高可用。但在之前的模式里,仅仅能实现高可用,当两个网卡中的一个失效时并不会影响服务器的联通性。

但是,如果有交换机的配合,是否可以时间负载均衡,同时达到带宽翻倍的目的呢?

首先,需要先明确,绑定的各种模式

0 - Sets a round-robin policy for fault tolerance and load balancing. Transmissions are received and sent out sequentially on each bonded slave interface beginning with the first one available.

1 - Sets an active-backup policy for fault tolerance. Transmissions are received and sent out via the first available bonded slave interface. Another bonded slave interface is only used if the active bonded slave interface fails.

2 - Sets an XOR (exclusive-or) policy for fault tolerance and load balancing. Using this method, the interface matches up the incoming request's MAC address with the MAC address for one of the slave NICs. Once this link is established, transmissions are sent out sequentially beginning with the first available interface.

3 - Sets a broadcast policy for fault tolerance. All transmissions are sent on all slave interfaces.

4 - Sets an IEEE 802.3ad dynamic link aggregation policy. Creates aggregation groups that share the same speed and duplex settings. Transmits and receives on all slaves in the active aggregator. Requires a switch that is 802.3ad compliant.

5 - Sets a Transmit Load Balancing (TLB) policy for fault tolerance and load balancing. The outgoing traffic is distributed according to the current load on each slave interface. Incoming traffic is received by the current slave. If the receiving slave fails, another slave takes over the MAC address of the failed slave.

6 - Sets an Active Load Balancing (ALB) policy for fault tolerance and load balancing. Includes transmit and receive load balancing for IPV4 traffic. Receive load balancing is achieved through ARP negotiation.

需要注意的是MODE 0仅仅能实现网卡轮询,但同一时间只有一个网卡生效,总体带宽还是1个网卡的带宽。

这里,我们需要开启交换机的802.3ad,并设置mode=4来实现带宽增倍

具体测试方法参见:

http://sundog315.itpub.net/post/308/525631

注意,在上面这个BLOG里的mode需要改一下:

vi /etc/modprobe.conf

alias bond0 bonding
options bond0 miimon=100 mode=4 lacp_rate=1

我们用iperf工具来测试带宽,由于mode 4模式使用mac地址分配那个网卡参与工作,因此,需要多台测试机进行测试

10.199.81.39作为server,10.199.81.40及10.199.81.42作为client,如果两个client均能达到1000Mbits,那也就意味着带宽是翻倍了。

server:

# iperf -s -w 1M
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 2.00 MByte (WARNING: requested 1.00 MByte)
------------------------------------------------------------
[ 4] local 10.199.81.39 port 5001 connected with 10.199.81.40 port 9951
[ 5] local 10.199.81.39 port 5001 connected with 10.199.81.42 port 61211
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-100.0 sec 11.1 GBytes 949 Mbits/sec
[ 5] 0.0-100.0 sec 11.1 GBytes 949 Mbits/sec

达到了带宽翻倍的目的

 查看全文
sundog315 发表于:2012.03.28 13:48 ::分类: ( Linux ) ::阅读:(37次) :: Permanent link
===========================================================
ORA-12161: TNS:内部错误:收到部分数据
===========================================================

周末将部分服务器搬迁至廊坊

今天早上发现,OA测试Oracle数据库无法从北京正常连接,使用pl/sql developer连接数据库后,查询一个稍大些的表,报错

ORA-12161: TNS:内部错误:收到部分数据

同时,JDBC连接也出现问题

java.sql.SQLException: Io exception: Bad packet type

metalink上查了一下,没有相关文章,google也没有什么收获。又只能靠猜了

根据这个错误,结合周末服务器搬迁的事实,变化的因素只有网络,估计问题出在网络上。

联想到之前同事李龙(lilong.itpub.net)碰到过ora-02068的错误,也是类似的情况,最后通过修改了SDU临时解决了一下。那么,这个错误是否也可以呢?

修改了一下链接串,增加sdu

oatest213=
(DESCRIPTION =
(sdu=1740)
(ADDRESS = (PROTOCOL = TCP)(HOST = 10.199.81.33)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = orcl )
)
)

恢复正常。

而且,1740是个临界值,1741就不正常。


sundog315 发表于:2012.03.19 15:23 ::分类: ( Oracle ) ::阅读:(80次) :: Permanent link
===========================================================
SP2-0642: SQL*Plus internal error state 2130, context 0:0:0 Unsafe to proceed
===========================================================

从10.2.0.1版本客户端通过EZCONNECT连接一套RAC环境时报错:

SQL> conn system/xxxx@10.199.87.122:1521/gpp
SP2-0642: SQL*Plus internal error state 2130, context 0:0:0
Unsafe to proceed

这个错误是因为BUG:8599395

EZCONNECT Connections Error with SP2-0642: Sql*Plus Internal Error State 2130 [ID 855965.1]


解决方法很简单,不要加1521端口号,或者使用11.2版本以上的客户端

SQL> conn system/xxxx@10.199.87.119/gpp
Connected.

Bug is fixed from release 11.2 onwards.
Check for one off patches for your release / platform via Patch 6135152
Workaround, do not use the port number in the connection string. Port 1521 is the default.


sundog315 发表于:2012.02.15 17:36 ::分类: ( Oracle ) ::阅读:(183次) :: Permanent link
===========================================================
OPatch failed with error code 135, Given 'ocmrf' file does not exists
===========================================================

利用opatch打补丁时出现如下错误:

2012-02-13 09:11:53: The apply patch output is
Oracle Interim Patch Installer version 11.2.0.1.9
Copyright (c) 2011, Oracle Corporation. All rights reserved.

Argument(s) Error... Given 'ocmrf' file does not exists.

Please check the arguments and try again.

OPatch failed with error code 135

support.oracle.com上查到如下文章:

opatch auto failed with message OPatch failed with error code 135 while applying 11.2.0.2 Bundle Patch 3 [ID 1283954.1]

大意为需建立OCM configuration file,但在这个环境中,ocm.rsp文件已存在,却依然报错

报错的原因很简单,ocm.rsp文件的属主属性不正确,应为oracle用户

chown oracle.oinstall ocm.rsp

重新打补丁,顺利完成

 查看全文
sundog315 发表于:2012.02.13 14:49 ::分类: ( Oracle ) ::阅读:(159次) :: Permanent link
===========================================================
unable to get oracle owner for
===========================================================

在使用opatch打PSU时,如果不注意,经常会出现如下错误

2012-02-13 10:42:25: Command output:
> 数据库唯一名称: xxx
> 数据库名: xxx
> Oracle 主目录: /u01/app/oracle/product/11.2.0/dbhome_1
> Oracle 用户: oracle
> Spfile: +DATA/xxx/spfilegpp.ora
> 域:
> 启动选项: open
> 停止选项: immediate
> 数据库角色: PRIMARY
> 管理策略: AUTOMATIC
> 服务器池: xxx
> 数据库实例: xxx1,xxx2
> 磁盘组: DATA
> 装载点路径:
> 服务:
> 类型: RAC
> 数据库是管理员管理的
>End Command output
2012-02-13 10:42:25: output is
2012-02-13 10:42:25: Oracle home for database gpp is
2012-02-13 10:42:25: Oracle Home is configured with Database(s)-> gpp
2012-02-13 10:42:25: unable to get oracle owner for

这个错误的主要原因是LANG环境变量为中文导致opatch不能准确的获取ORACLE_HOME导致,这应该算是opatch的一个BUG了,但Oracle认为这只是个PROBLEM。

OPATCH AUTO Fails with "unable to get oracle owner for" in Multi-Byte Language Environment [ID 1325256.1]

export LANG=C

重新打补丁,就不会出现这个错误了,怕以后又忘记,在这里记录一下。

 查看全文
sundog315 发表于:2012.02.13 14:36 ::分类: ( Oracle ) ::阅读:(143次) :: Permanent link
===========================================================
EXP直接导出压缩问津,IMP直接导入压缩文件的方法
===========================================================

在10G之前,甚至在10G的Oracle环境中,有很多数据量不大,重要性不太高的系统依然采用EXP/IMP逻辑导出备份方式,或者,作为辅助备份方式。
通常情况下,我们都是这样操作的:
1.exp导出
2.gzip压缩
3.gzip解压
4.imp导入

这样操作有如下两个不好的地方:
1.占用大量磁盘空间,磁盘剩余空间必须大于导出的,未压缩的文件大小加上压缩后的文件大小。如果设置计划任务,每日定时导出的话,很有可能因为磁盘空间不足导致备份失败。由于这种备份方式磁盘使用率的剧烈抖动,即使有监控工具,也不能很好的提供趋势分析
2.系统资源浪费,在导出时,大部分都在等待IO。而压缩时,又大部分等待CPU,整体利用率不高。

那么,有没有办法直接导出成压缩文件?并直接从压缩文件导入呢?

EXP导出:
$ mknod p p

$ gzip < p > test.dmp.gz & exp system/xxxx tables=TEST buffer=31457280 CONSISTENT=Y COMPRESS=N file=p
[3] 24532

Export: Release 10.2.0.5.0 - Production on 星期四 1月 19 10:27:45 2012

Copyright (c) 1982, 2007, Oracle. All rights reserved.


连接到: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Data Mining and Real Application Testing options
已导出 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集

即将导出指定的表通过常规路径...
当前的用户已更改为 SYSTEM
. . 正在导出表 TEST导出了 1875063 行
成功终止导出, 没有出现警告。
[1] Done gzip < p > test.dmp.gz
[2]- Done gzip < p > test.dmp.gz
[3]+ Done gzip < p > test.dmp.gz

$ rm -rf p

IMP导入:
$ mknod p p

$ gunzip < test.dmp.gz > p & imp system/xxx file=p full=y buffer=31457280
[2] 24572

Import: Release 10.2.0.5.0 - Production on 星期四 1月 19 10:29:16 2012

Copyright (c) 1982, 2007, Oracle. All rights reserved.


连接到: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production
With the Partitioning, Data Mining and Real Application Testing options

经由常规路径由 EXPORT:V10.02.01 创建的导出文件
已经完成 ZHS16GBK 字符集和 AL16UTF16 NCHAR 字符集中的导入
. 正在将 SYSTEM 的对象导入到 SYSTEM
. . 正在导入表 "TEST"导入了 1875063 行
成功终止导入, 没有出现警告。
[1] Done gzip < p > test.dmp.gz
[2]+ Done gunzip < test.dmp.gz > p

 查看全文
sundog315 发表于:2012.01.19 10:33 ::分类: ( Oracle ) ::阅读:(213次) :: Permanent link
===========================================================
一个简单的问题,查了半个小时,记录一下
===========================================================

Unable To Connect To ASM Due To SQL*Plus Shows “Connected To An Idle Instance. [ID 1179825.1]

--------------------------------------------------------------------------------

修改时间 30-SEP-2010 类型 PROBLEM 状态 PUBLISHED

In this Document
Symptoms
Cause
Solution

--------------------------------------------------------------------------------

Applies to:
Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2
Information in this document applies to any platform.

Symptoms

1) On a new ASM installation/configuration (Standalone or RAC) you are not able to connect to the ASM instance due to the SQL*Plus shows “Connected to an idle instance.”


[grid@dbaasm ~]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 18 09:20:55 2010

Copyright (c) 1982, 2009, Oracle. All rights reserved.

Connected to an idle instance.

2) You confirmed the ASM installation/configuration successfully completed.

3) Also, the ASM instance is up and running:


[grid@dbaasm ~]$ ps -fea | grep asm_
grid 9226 1 0 Aug11 ? 00:00:00 asm_asmb_+ASM
grid 27656 1 0 May14 ? 00:02:41 asm_pmon_+ASM
grid 27658 1 0 May14 ? 00:00:00 asm_vktm_+ASM
grid 27662 1 0 May14 ? 00:00:01 asm_gen0_+ASM
grid 27664 1 0 May14 ? 00:00:13 asm_diag_+ASM
grid 27666 1 0 May14 ? 00:00:02 asm_psp0_+ASM
grid 27668 1 0 May14 ? 00:45:27 asm_dia0_+ASM
grid 27670 1 0 May14 ? 00:00:03 asm_mman_+ASM
grid 27672 1 0 May14 ? 00:00:05 asm_dbw0_+ASM
grid 27674 1 0 May14 ? 00:00:05 asm_lgwr_+ASM
grid 27676 1 0 May14 ? 00:00:03 asm_ckpt_+ASM
grid 27678 1 0 May14 ? 00:00:06 asm_smon_+ASM
grid 27680 1 0 May14 ? 00:00:07 asm_rbal_+ASM
grid 27682 1 0 May14 ? 00:12:19 asm_gmon_+ASM
grid 27684 1 0 May14 ? 00:00:06 asm_mmon_+ASM
grid 27686 1 0 May14 ? 00:00:50 asm_mmnl_+ASM
grid 28051 1 0 May18 ? 00:00:00 asm_vbg0_+ASM
grid 28306 1 0 May18 ? 00:05:32 asm_vdbg_+ASM
grid 28308 1 0 May18 ? 00:00:00 asm_vmb0_+ASM
grid 28310 1 0 May18 ? 00:00:00 asm_vbg1_+ASM
grid 28312 1 0 May18 ? 00:00:00 asm_vbg2_+ASM

4) You started the ASM instance with the same OS user used to install the ASM Oracle Home (10gR2 or 11gR1) or Grid Infrastructure Home (11gR2), so this is OK.

Cause
1) The environment variables are set as follow:


[grid@dbaasm ~]$ env | grep ORA
ORACLE_SID=+ASM
ORACLE_BASE=/u01/app/grid
ORACLE_HOME=/u01/app/grid/product/11.2.0/grid/

2) The problem is due to the ORACLE_HOME variable has an extra ‘/’ at the end of the full path:


ORACLE_HOME=/u01/app/grid/product/11.2.0/grid/ <(====


Solution
Remove the extra ‘/’ at the end of the full path, then you will be able to connect to the ASM instance:

[grid@dbaasm ~]$ export ORACLE_HOME=/u01/app/grid/product/11.2.0/grid
[grid@dbaasm ~]$ echo $ORACLE_HOME
/u01/app/grid/product/11.2.0/grid
[grid@dbaasm ~]$ sqlplus "/as sysasm"

SQL*Plus: Release 11.2.0.1.0 Production on Wed Aug 18 09:27:20 2010

Copyright (c) 1982, 2009, Oracle. All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - Production
With the Automatic Storage Management option


SQL> show parameter instance_name

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
instance_name string +ASM


 查看全文
sundog315 发表于:2011.12.29 11:35 ::分类: ( Oracle ) ::阅读:(193次) :: Permanent link
===========================================================
vip/public ip断网,导致instance crash
===========================================================
Oracle 10.2.0.1 HP-UX 11.31 ia64

alert.log:

Mon Dec 26 14:11:16 2011
Shutting down instance (abort)
License high water mark = 486
Instance terminated by USER, pid = 1675
syslog:
Dec 26 14:10:14 wandadb1 cmnetd[29338]: lan0 is down at the data link layer.
Dec 26 14:10:14 wandadb1 cmnetd[29338]: lan0 failed.
Dec 26 14:10:14 wandadb1 cmnetd[29338]: Subnet 10.0.4.0 down
Dec 26 14:10:36 wandadb1 cmnetd[29338]: lan0 is up at the data link layer.
Dec 26 14:10:36 wandadb1 cmnetd[29338]: lan0 recovered.
Dec 26 14:10:36 wandadb1 cmnetd[29338]: Subnet 10.0.4.0 up
Dec 26 14:10:42 wandadb1 cmnetd[29338]: 10.0.4.161 failed.
Dec 26 14:10:42 wandadb1 cmnetd[29338]: lan0 is down at the IP layer.
Dec 26 14:10:42 wandadb1 cmnetd[29338]: lan0 failed.
Dec 26 14:10:42 wandadb1 cmnetd[29338]: Subnet 10.0.4.0 down
Dec 26 14:10:56 wandadb1 cmnetd[29338]: lan0 is down at the data link layer.
Dec 26 14:11:58 wandadb1 cmnetd[29338]: lan0 is up at the data link layer.
Dec 26 14:11:58 wandadb1 cmnetd[29338]: lan0 is still down at the IP layer.
Dec 26 14:12:04 wandadb1 cmnetd[29338]: 10.0.4.161 recovered.
Dec 26 14:12:04 wandadb1 cmnetd[29338]: Subnet 10.0.4.0 up
Dec 26 14:12:04 wandadb1 cmnetd[29338]: lan0 is up at the IP layer.
Dec 26 14:12:04 wandadb1 cmnetd[29338]: lan0 recovered.

crsd.log
2011-12-26 14:11:08.785: [ CRSAPP][4060] CheckResource error for ora.wandadb1.vip error code = 1
2011-12-26 14:11:08.792: [ CRSRES][4060] In stateChanged, ora.wandadb1.vip target is ONLINE
2011-12-26 14:11:08.793: [ CRSRES][4060] ora.wandadb1.vip on wandadb1 went OFFLINE unexpectedly
2011-12-26 14:11:08.793: [ CRSRES][4060] StopResource: setting CLI values
2011-12-26 14:11:08.817: [ CRSRES][4060] Attempting to stop `ora.wandadb1.vip` on member `wandadb1`
2011-12-26 14:11:09.348: [ CRSRES][4060] Stop of `ora.wandadb1.vip` on member `wandadb1` succeeded.
2011-12-26 14:11:09.349: [ CRSRES][4060] ora.wandadb1.vip RESTART_COUNT=0 RESTART_ATTEMPTS=0
2011-12-26 14:11:09.357: [ CRSRES][4060] ora.wandadb1.vip failed on wandadb1 relocating.
2011-12-26 14:11:09.455: [ CRSRES][4060] StopResource: setting CLI values
2011-12-26 14:11:09.458: [ CRSRES][4060] Attempting to stop `ora.wandadb1.LISTENER_WANDADB1.lsnr` on member `wandadb1`
2011-12-26 14:11:09.680: [ OCRSRV][29]th_select_handler: Failed to retrieve procctx from ht. constr = [44541328] retval lht [-27] Signal CV.
2011-12-26 14:11:10.040: [ CRSRES][4060] Stop of `ora.wandadb1.LISTENER_WANDADB1.lsnr` on member `wandadb1` succeeded.
2011-12-26 14:11:10.041: [ CRSRES][4060] StopResource: setting CLI values
2011-12-26 14:11:10.047: [ CRSRES][4060] Attempting to stop `ora.ufsa8.ufsa81.inst` on member `wandadb1`
2011-12-26 14:11:24.934: [ CRSRES][4060] Stop of `ora.ufsa8.ufsa81.inst` on member `wandadb1` succeeded.

Should the Database Instance Be Brought Down after VIP service crashes? [ID 391454.1]
 查看全文
sundog315 发表于:2011.12.27 07:53 ::分类: ( Oracle ) ::阅读:(175次) :: Permanent link
===========================================================
通过Database Link/IMPDP,同步10G、11G数据库失败
===========================================================

源库10.2.0.1,目标库11.2.0.3

impdp system/"xxxx" network_link=wdyx_prod schemas=wd_web,wanda parallel=4 TABLE_EXISTS_ACTION=REPLACE directory=dumpdir logfile=wdyx_trans.log VERSION=10.2

Import: Release 11.2.0.3.0 - Production on Tue Dec 20 23:13:02 2011

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning option
ORA-39006: internal error
ORA-39113: Unable to determine database version
ORA-04052: error occurred when looking up remote object SYS.DBMS_UTILITY@WDYX_PROD
ORA-00604: error occurred at recursive SQL level 3
ORA-06544: PL/SQL: internal error, arguments: [55916], [], [], [], [], [], [], []
ORA-06553: PLS-801: internal error [55916]
ORA-02063: preceding 2 lines from WDYX_PROD

ORA-39097: Data Pump job encountered unexpected error -4052

 查看全文
sundog315 发表于:2011.12.20 16:25 ::分类: ( Oracle ) ::阅读:(201次) :: Permanent link
===========================================================
接上条,问题可能是Oracle新出的BUG
===========================================================
11.2.0.3 VIP/SCAN VIP is Not Pingable After Failover Leads to Connection Issue (Doc ID 1379498.1) 查看全文
sundog315 发表于:2011.12.20 16:24 ::分类: ( Oracle ) ::阅读:(169次) :: Permanent link
===========================================================
诡异的事情,RAC,public ip通,vip不通
===========================================================

一台Linux 11GR2

public vip scan 均在10.199.88.0网段,服务器启动后,从服务器本身或者10.199.88.0网段的其他服务器看,均正常。

在10.199.88.0网段外看这个RAC坏境,public ip是通的,但是,vip scan都不通。

在这两台RAC服务器上使用

/sbin/arping -s vip gateway

后,所有的网段看到RAC的状态都正常了。

原因待查

 查看全文
sundog315 发表于:2011.12.15 06:21 ::分类: ( Oracle ) ::阅读:(172次) :: Permanent link
===========================================================
Linux 服务器设置网卡绑定
===========================================================

记录一下,以防忘记

1.创建/etc/sysconfig/network-scripts/ifcfg-bond0

DEVICE=bond0
BOOTPROTO=none
ONBOOT=yes
TYPE=Ethernet
NETMASK=255.255.255.0
IPADDR=10.199.88.13
GATEWAY=10.199.88.1
USERCTL=no
IPV6INIT=no

2.修改ifcfg-eth0 ifcfg-eth1

DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
TYPE=Ethernet
MASTER=bond0
USERCTL=no
IPV6INIT=no


DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
TYPE=Ethernet
MASTER=bond0
USERCTL=no
IPV6INIT=no

3.修改/etc/modprobe.conf,添加

alias bond0 bonding
options bond0 miimon=100 mode=0

mode=0 负载均衡

mode=1 HA

4.修改/etc/rc.local,添加

ifenslave bond0 eth0 eth1


需重启服务器,最好禁用sendmail


sundog315 发表于:2011.12.14 18:09 ::分类: ( Linux ) ::阅读:(197次) :: Permanent link
===========================================================
session一直等待kksfbc child completion事件
===========================================================

HP-UX 11.31

Oracle 10.2.0.1 RAC

其中一个session的等待事件一直为kksfbc child completion,查询了一下,有可能是个BUG

Bug 6795880,相关的doc id为:6795880.8

A session may go into an infinite spin just after a wait for 'kksfbc child completion'. The spin occurs with a stack including kksSearchChildList -> kkshgnc where kksSearchChildList loops forever.
This problem can also lead to internal error such as any of
ORA-600 [kksSearchChildList1], ORA-600 [kksSearchChildList2]
ORA-600 [kksSearchChildList3], ORA-600 [kkshgnc-nextchild]
Note:
Fixes for this bug in 10g and 11gR1 are disabled by default.
To enable this fix you must explicitly set the following parameter for instance startup:
"_cursor_features_enabled" = 10

sundog315 发表于:2011.12.14 18:06 ::分类: ( Oracle ) ::阅读:(225次) :: Permanent link
===========================================================
遭遇Bug 4414666 OERI[KGHALO4] can occur on NUMA
===========================================================

一个多灾多难的系统:)

HP-UX 11.31 IA64 Oracle 10.2.0.1 两节点RAC

数据库集群自动重启,检查alert文件,报ORA-00600错误

Mon Nov 14 15:43:53 2011
Errors in file /u01/app/oracle/admin/ufsa8/udump/ufsa81_ora_4531.trc:
ORA-00600: internal error code, arguments: [KGHALO4], [0xC0000000BD0CD060], [], [], [], [], [], []
Mon Nov 14 15:43:58 2011
Trace dumping is performing id=[cdmp_20111114154358]
Mon Nov 14 15:43:58 2011
Errors in file /u01/app/oracle/admin/ufsa8/udump/ufsa81_ora_4531.trc:
ORA-00600: internal error code, arguments: [600], [], [], [], [], [], [], []

检查support.oracle.com,有不少BUG与此相关,无法马上定位具体的原因。

于是检查trace文件,在ufsa81_ora_4531.trc文件中发现:

 查看全文

sundog315 发表于:2011.11.14 16:29 ::分类: ( Oracle ) ::阅读:(253次) :: Permanent link
===========================================================
诡异的HP-UX Load averages
===========================================================

Load averages可以很好的表示系统的负载情况,它统计的是CPU运行及等待队列的长度,而与CPU使用率无关。当CPU队列长度长期超过CPU数量时,表明系统CPU已超载,需调整应用或增加硬件。

对于Load averages,wikipedia有一些描述

http://en.wikipedia.org/wiki/Load_averages

在文章里,Load averages/CPU_NUM,如果这个值超过1,则代表处理器已无法及时的处理所有的请求。

但是,在HP-UX 11.31 IA64下,观察的结果却不一致。在一个16核的小机,Load averages为1.5,按照标准的计算模式,1.5/16,此服务器负载不高,处理器绰绰有余。但是,查看CPU queue时发现,此时的队列长度居然达到了25,已经过载了,而25/16,基本就是1.5,似乎这个1.5的值已经除了CPU_NUM。

于是,打800电话给HP,HP工程师似乎也不太清楚Load averages的概念,需要查一下文档。不久,打电话过来,说法与wikipedia的说法一致,但与观测到的情况不符。

大部分人都喜欢用CPU使用率来评估CPU的负载,但CPU使用率有很大的局限性,一旦满载后,值便保持在100%,超载2倍与超载20倍是无法区分的。

还需继续探究啊


sundog315 发表于:2011.11.14 10:00 ::分类: ( 杂谈 ) ::阅读:(193次) :: Permanent link
切换风格
新闻聚合
博客日历
文章归档...
最新发表...
最多阅读文章...
博客统计...
网站链接...