Weblogic10.3.0在AIX6.1、JDK1.6下挂起解决方法
上周在AIX6.1下安装weblogic10.3.0,并配置了hacmp集群环境,但是接下来的几天遇到了挂起问题,为此还加班了一天。
现象描述:
Weblogic启动后,10到30分钟就会hang住,应用和管理控制台都无法访问。强制kill -9 pid后端口无法释放,使用rmsock 命令查看端口显示Wait for exiting processes to be cleaned up before removing the socket。
分析及处理过程
1. 用ps –ef | grep java找到weblogic进程,每隔三分种执行kill -3 pid,在domain目录下生成javacore文件
2. 分析weblogic日志,发现如下内容
<Aug 21, 2009 4:33:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “620″ seconds working on the request
“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
……
<Aug 21, 2009 4:34:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “680″ seconds working on the request
“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:
java.net.SocketOutputStream.socketWrite0(Native Method)
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)
……
3. 用IBM Thread and Monitor Dump Analyzer for java分析刚才生成的thread dump,找到如下两个线程信息:
3XMTHREADINFO “[ACTIVE] ExecuteThread: ‘5′ for queue: ‘weblogic.kernel.Default (self-tuning)’” TID:0×39CBED00, j9thread_t:0×3751C83C, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0xCE1DB, native priority:0×5, native policy:UNKNOWN)
4XESTACKTRACE at java/net/PlainSocketImpl.socketClose0(Native Method)
4XESTACKTRACE at java/net/PlainSocketImpl.socketPreClose(PlainSocketImpl.java:706)
4XESTACKTRACE at java/net/PlainSocketImpl.close(PlainSocketImpl.java:540)
4XESTACKTRACE at java/net/SocksSocketImpl.close(SocksSocketImpl.java:1041)
4XESTACKTRACE at java/net/Socket.close(Socket.java:1343)
4XESTACKTRACE at weblogic/socket/SocketMuxer.closeSocket(SocketMuxer.java:475)
4XESTACKTRACE at weblogic/socket/SocketMuxer.cancelIo(SocketMuxer.java:813)
4XESTACKTRACE at weblogic/socket/SocketMuxer$TimerListenerImpl.timerExpired(SocketMuxer.java:1021(Compiled Code))
4XESTACKTRACE at weblogic/timers/internal/TimerImpl.run(TimerImpl.java:273(Compiled Code))
4XESTACKTRACE at weblogic/work/SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516(Compiled Code))
4XESTACKTRACE at weblogic/work/ExecuteThread.execute(ExecuteThread.java:201(Compiled Code))
4XESTACKTRACE at weblogic/work/ExecuteThread.run(ExecuteThread.java:173)
3XMTHREADINFO “ExecuteThread: ‘7′ for queue: ‘weblogic.socket.Muxer’” TID:0×35381D00, j9thread_t:0×35385864, state:R, prio=5
3XMTHREADINFO1 (native thread ID:0xB916F, native priority:0×5, native policy:UNKNOWN)
4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.poll(Native Method)
4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.processSockets(PosixSocketMuxer.java:102(Compiled Code))
4XESTACKTRACE at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)
4XESTACKTRACE at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)
4XESTACKTRACE at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)
4XESTACKTRACE at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)
4. 执行线程只有这两个是running状态,一个做CLOSE(),一个做POLL()。别的都是blocked或者wait状态。
5. 经过metalink查询以及和800支持人员确认,这是Weblogic在AIX的JVM上由来已久的bug,从8.1.4就开始在不同版本间出现。原因是IBM的JVM底层socket实现和weblogic配合问题,需要打patch CR370915_1030GA.jar解决。
操作过程
1.在weblogic的启动脚本中,找到CLASSPATH一行
2.在CLASSPATH变量的第一位添加补丁jar包
Eg: CLASSPATH=”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”
—>
CLASSPATH=/路径/CR370915_1030GA.jar:”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”
3.以上操作仅对这个domain起作用,为了对所有domain起作用,可以添加到common/bin/的目录中的commEnv.sh文件中WEBLOGIC_CLASSPATH=最前面
总结
这个bug在weblgoic和IBM的JVM相组合的平台上出现较为普遍,如果出现相关日志信息,基本可以断定需要打CR370915补丁。
更新:我这里的补丁仅仅 for weblogic 10.3.0.0,其它版本的可以自行用Smart Update下载
Patches for WLS 8.x can be found in My Oracle Support. Open the Patches & Updates tab. Search for patch ID 8173442 for the patches for WLS 8.1mp3, 8.1mp4, and 8.1mp5. Search for patch ID 8179792 for the patch for WLS 8.1mp6.
Patches for WLS 9.x and higher can be downloaded from Smart Update using these patch IDs and passcodes:
——————————————
PATCH REPOSITORY INFORMATION
——————————————
WLS Version | Patch ID | Passcode
————+———-+—————-
9.2 | T4DV | 7C7PYV9B
9.2mp1 | HZHQ | PTUYCCSI
9.2mp2 | WJD2 | GU1CW2AB
9.2mp3 | GNLT | 8J9L6Q4Y
10.0 | PMAJ | 9UQ69LLT
10.0mp1 | ITVL | K8RBHQQ2
10.3 | 9YT5 | I1DB5QSV
如果生产机无法联网,可以
- Download the patch using SmartUpdate on another machine with Internet access.
- Copy the files (for example E5W8.jar and WGQJ.jar) and patch-catalog.xml from your machine with Internet access to the offline machine. For example, say you have a test environment running on a Windows box. Your production environment is running on UNIX. You might copy the jar files from %BEA_HOME%\utils\bsu\cache-dir to $BEA_HOME/utils/bsu/cache-dir.
- When a machine connects to Smart Update, the catalog of patches is always updated automatically. Thus, when a patch is being copied to an offline machine, the patch-catalog.xml file must also be copied over.
- Run SmartUpdate in offline mode and apply patches and patch sets. This can be done using the SmartUpdate command-line interface (see http://download.oracle.com/docs/cd/E14759_01/doc.32/e14143/commands.htm#i1074489).
- This is the syntax for the command to install a patch:.
- You can apply the patch to the offline system manually by extracting the actual patch and adding it to the classpath on the offline system:Extract the actual patch jar file. If you downloaded the patch using SmartUpdate, it will be in the form <patch_id>.jar (for example: E5W8.jar). Inside this jar file is the actual patch jar file, which will be of the form CR326566_92mp3.jar. Extract the latter file for the following steps.
- Add the extracted jar file as the first element of the classpath of the Admin server as well as the managed servers in the domain.
- If you are starting servers using the WebLogic startup script, update the classpath in the startup script like this:set CLASSPATH=<PATCH_DIR>\jars\CR326566_92mp3.jar;%CLASSPATH% (Windows)CLASSPATH=<PATCH_DIR>/jars/CR326566_92mp3.jar:$CLASSPATH (UNIX)where PATCH_DIR is the directory on your local machine where you extracted/saved the patch file.
- Similarly, if you are starting servers using Node Manager, add the patch jar to the beginning of the Class Path argument in the Server Start tab for the server(s).
我一般用第二种,对于单个补丁快捷方便,SmartUpdate可以单独安装,但是会让你选择应用到哪个BEA的主目录,不同的版本和平台能下的补丁不一样。在Windows平台上当然没有AIX的BEA版本,不过只要自己建个目录,然后拷贝一份register.xml进去就可以了。


这个补丁CR370915_1030GA.jar,到那里能获取到啊,深受其害啊.
如果方便的话,能发到我邮箱吗,万分感激!!!!
已经发给你了
你好,我也遇到这个问题,这个补丁CR370915_1030GA.jar,能发给我一份吗,谢谢!!!
你好,这个补丁可以发给我么>谢谢.发到我邮件吧,谢谢你咯zhoulinling@sinobest.cn
你好,终于找到这个问题出现的原因了,谢谢你的分析,能否将这个补丁也给给我发一份呢,谢谢!我的邮箱是stevenyj@msn.com
@stevenyj 已经发送
您好,我也遇到同样的问题,能否把补丁也发给我一份呢。谢谢。
邮箱地址:none1314@gmail.com
熊大牛,能不能发一个这个补丁包给我,碰到和你同样的问题.谢谢!
邮箱地址:qunfa7987@sina.com
兄弟,能不能也给我发一个补丁包,我也是这样的问题。谢谢!
邮箱地址:yangxiaofeng771@yeah.net
您好,麻烦把补丁也发给我一份呢。非常感谢!
邮箱地址:yj_yuan@126.com
你好,我也碰到这个问题了,太感谢楼主的分析,很需要CR370915_1030GA.jar这个压缩包,楼主能发我一份吗?谢谢!邮箱地址:nd1104@163.com
补丁发了,以后叫博主,这不是论坛。
哥们,你真是救世主,发我一份,感谢万分!邮箱:godwhung@sohu.com
博主,我也遇到这样的问题啊。能发我一份吗?十分感谢。。环境是: 2003 server sp2 64ED,
xcqxmz@gmail.com..谢谢啊
Windows环境下没有这个问题吧,你没看我这都是对AIX的么@kualer
您好,麻烦把补丁也发给我一份呢。非常感谢!
邮箱地址:dongfei1983@vip.qq.com
你好. 可否给我发一份,非常感谢!!
lgh990@21cn.com
@匿名 发给你了,不好意思发晚了
博主,我也遇到这个问题aix6.1+jdk1.6,感谢!!!!
fzulengbing@163.com
博主,我的环境是weblogic9.2.2 aix 5 jdk 1.5
能否发个给我
邮箱 cx4244@gmail.com
谢谢啦!!!
10.3.1还有这个问题吗?是不是下载补丁,查找资料都得进metalink?
@miaomiao 都得进metalink,不过我把那个贴出来,可以直接用Smart Update下载
@匿名 这个我也没用,你按照我写的方法从Smart Update下载吧
我的环境是weblogic10.3 jdk 1.6 麻烦发个给我
邮箱 jxb8901@gmail.com 谢谢!
正好碰到同样的环境,Weblogic10.3.0、AIX6.1、JDK1.6,中彩了…. 哪位大侠发给补丁给我吧。dlhunter2006@163.com
本人受害者,请发一个给我!!邮箱150304428@qq.com,,跪谢博主!!
10.3.1貌似不需要打这个补丁
我们的环境是aix6.1+java6_64+weblogic 10.3.1
打了这个patch后报错,主要错误信息如下:
the listen thread beacause of an error:java.lang.illegalaccesserror
server failed to bind to the configured admin port.the port may already be used by another process
server failed.reason:server failed to bind to any usable port.see preceeding log message for details.>
但是只要把这个patch的路径从CLASSPATH中删除,就不再报这个错误了,我估计是10.3.1已经整合了这个补丁
但是我们的测试系统还是偶尔有挂起的现象,准备用10.3.0打这个patch试一下@miaomiao
@wulolo
这个错误fix in10.3.1以后就没有了