存档

2009年8月 的存档

七夕喜鹊节

2009年8月26日 hashei 2 条评论

转自“科学松鼠会”http://songshuhui.net/archives/18485.html,我最喜欢的网站之一,理科生自己写的科普小令。没有枯枝烂叶,没有乏味无趣,篇篇精彩。

当你老了,头白了,睡意昏沉,

炉火旁打盹,请取下这部诗歌,

慢慢读,回想你过去眼神的柔和,

回想它们昔日浓重的阴影;

多少人爱你青春欢畅的时辰,

爱慕你的美丽,假意或真心,

只有一个人爱你那朝圣者的灵魂,

爱你衰老了的脸上痛苦的皱纹;

垂下头来,在红光闪耀的炉子旁,

凄然地轻轻诉说那爱情的消逝,

在头顶的山上它缓缓踱着步子,

在一群星星中间隐藏着脸庞。

这世间最美的情诗,吟的是一件叫做“爱”的事物,也许会有一日,我们都将像诗人叶芝于其中所描述一般,在洒满暮光的阳台上追忆过往那些神魂颠倒、鬼迷心窍、百般纠结,却只是不明白,一切的一切何以发生,又消逝——

何以我们一见钟情?何以我们脉脉无语?何以我们难以自己?何以我们困顿于爱与性?何以我们陷入爱无力?何以我们不再相信?何以我们仍寻寻觅觅?

thing-called-love说白了,爱情它到底是个什么东西?爱情它可以是个什么东西?谁能给出答案?且看这七夕小辑。

毛利华 《爱情三问》

史军 《七夕讲点花花事》

小庄 《朱丽叶的生理周期》

凌晨 《刀兰》

小庄 《不是10年,不是50年,是21年》

过往有关文章链接:

木遥 《我要我们在一起》

梁嘉歆 《完美爱人进化论》

Riset 《吻之道,知其妙》

 

 

 
再转个爱情情商测试http://nlp.cn/ceshi/NLPql/

         找了十对男女,其中有5对是真正的情侣,还有5对是路人甲、乙、丙、丁,没事在这花花世界集体游戏一下。分别让男的说“我爱你”,你就要分辨他们到底是真的还是假的。

我居然还看对了3对,真是不错。

你的成绩:情商3段

       当你完成测试,全球心理学家极度恐慌,他们在寻找你爱情EQ如此之低的秘密。据可靠分析,这很可能是许仙、董永附体所造成的。不过可以肯定的一点是,把你对爱情的理解复制给的每个人,每年的KTV点播《单身情歌》的人数会天雷地火地狂涨。

Weblogic10.3.0在AIX6.1、JDK1.6下挂起解决方法

2009年8月25日 hashei 30 条评论

上周在AIX6.1下安装weblogic10.3.0,并配置了hacmp集群环境,但是接下来的几天遇到了挂起问题,为此还加班了一天。

现象描述:

Weblogic启动后,10到30分钟就会hang住,应用和管理控制台都无法访问。强制kill -9 pid后端口无法释放,使用rmsock 命令查看端口显示Wait for exiting processes to be cleaned up before removing the socket。

分析及处理过程

1. 用ps –ef | grep java找到weblogic进程,每隔三分种执行kill -3 pid,在domain目录下生成javacore文件

2. 分析weblogic日志,发现如下内容

<Aug 21, 2009 4:33:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “620″ seconds working on the request

“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:

java.net.SocketOutputStream.socketWrite0(Native Method)

java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)

……

<Aug 21, 2009 4:34:37 AM CDT> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: ‘1′ for queue: ‘weblogic.kernel.Default (self-tuning)’ has been busy for “680″ seconds working on the request

“weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de”, which is more than the configured time (StuckThreadMaxTime) of “600″ seconds. Stack trace:

java.net.SocketOutputStream.socketWrite0(Native Method)

java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:103)

……

3. 用IBM Thread and Monitor Dump Analyzer for java分析刚才生成的thread dump,找到如下两个线程信息:

3XMTHREADINFO “[ACTIVE] ExecuteThread: ‘5′ for queue: ‘weblogic.kernel.Default (self-tuning)’” TID:0×39CBED00, j9thread_t:0×3751C83C, state:R, prio=5

3XMTHREADINFO1 (native thread ID:0xCE1DB, native priority:0×5, native policy:UNKNOWN)

4XESTACKTRACE at java/net/PlainSocketImpl.socketClose0(Native Method)

4XESTACKTRACE at java/net/PlainSocketImpl.socketPreClose(PlainSocketImpl.java:706)

4XESTACKTRACE at java/net/PlainSocketImpl.close(PlainSocketImpl.java:540)

4XESTACKTRACE at java/net/SocksSocketImpl.close(SocksSocketImpl.java:1041)

4XESTACKTRACE at java/net/Socket.close(Socket.java:1343)

4XESTACKTRACE at weblogic/socket/SocketMuxer.closeSocket(SocketMuxer.java:475)

4XESTACKTRACE at weblogic/socket/SocketMuxer.cancelIo(SocketMuxer.java:813)

4XESTACKTRACE at weblogic/socket/SocketMuxer$TimerListenerImpl.timerExpired(SocketMuxer.java:1021(Compiled Code))

4XESTACKTRACE at weblogic/timers/internal/TimerImpl.run(TimerImpl.java:273(Compiled Code))

4XESTACKTRACE at weblogic/work/SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516(Compiled Code))

4XESTACKTRACE at weblogic/work/ExecuteThread.execute(ExecuteThread.java:201(Compiled Code))

4XESTACKTRACE at weblogic/work/ExecuteThread.run(ExecuteThread.java:173)

3XMTHREADINFO “ExecuteThread: ‘7′ for queue: ‘weblogic.socket.Muxer’” TID:0×35381D00, j9thread_t:0×35385864, state:R, prio=5

3XMTHREADINFO1 (native thread ID:0xB916F, native priority:0×5, native policy:UNKNOWN)

4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.poll(Native Method)

4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.processSockets(PosixSocketMuxer.java:102(Compiled Code))

4XESTACKTRACE at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)

4XESTACKTRACE at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)

4XESTACKTRACE at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)

4XESTACKTRACE at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)

4. 执行线程只有这两个是running状态,一个做CLOSE(),一个做POLL()。别的都是blocked或者wait状态。

5. 经过metalink查询以及和800支持人员确认,这是Weblogic在AIX的JVM上由来已久的bug,从8.1.4就开始在不同版本间出现。原因是IBM的JVM底层socket实现和weblogic配合问题,需要打patch CR370915_1030GA.jar解决。

操作过程

1.在weblogic的启动脚本中,找到CLASSPATH一行

2.在CLASSPATH变量的第一位添加补丁jar包
Eg: CLASSPATH=”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”
—>
CLASSPATH=/路径/CR370915_1030GA.jar:”${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}”

3.以上操作仅对这个domain起作用,为了对所有domain起作用,可以添加到common/bin/的目录中的commEnv.sh文件中WEBLOGIC_CLASSPATH=最前面

总结

这个bug在weblgoic和IBM的JVM相组合的平台上出现较为普遍,如果出现相关日志信息,基本可以断定需要打CR370915补丁。

更新:我这里的补丁仅仅 for weblogic 10.3.0.0,其它版本的可以自行用Smart Update下载

Patches for WLS 8.x can be found in My Oracle Support. Open the Patches & Updates tab. Search for patch ID 8173442 for the patches for WLS 8.1mp3, 8.1mp4, and 8.1mp5. Search for patch ID 8179792 for the patch for WLS 8.1mp6.

Patches for WLS 9.x and higher can be downloaded from Smart Update using these patch IDs and passcodes:

——————————————
PATCH REPOSITORY INFORMATION
——————————————
WLS Version | Patch ID |  Passcode
————+———-+—————-
9.2      |  T4DV    |  7C7PYV9B
9.2mp1   |  HZHQ    |  PTUYCCSI
9.2mp2   |  WJD2    |  GU1CW2AB
9.2mp3   |  GNLT    |  8J9L6Q4Y
10.0     |  PMAJ    |  9UQ69LLT
10.0mp1  |  ITVL    |  K8RBHQQ2
10.3     |  9YT5    |  I1DB5QSV

如果生产机无法联网,可以

1. Using SmartUpdate in offline mode
===========================
You can apply the patch using SmartUpdate with the following steps:
  1. Download the patch using SmartUpdate on another machine with Internet access.
  2. Copy the files (for example E5W8.jar and WGQJ.jar) and patch-catalog.xml from your machine with Internet access to the offline machine. For example, say you have a test environment running on a Windows box. Your production environment is running on UNIX. You might copy the jar files from %BEA_HOME%\utils\bsu\cache-dir to $BEA_HOME/utils/bsu/cache-dir.
  3. When a machine connects to Smart Update, the catalog of patches is always updated automatically. Thus, when a patch is being copied to an offline machine, the patch-catalog.xml file must also be copied over.
  4. Run SmartUpdate in offline mode and apply patches and patch sets. This can be done using the SmartUpdate command-line interface (see http://download.oracle.com/docs/cd/E14759_01/doc.32/e14143/commands.htm#i1074489).
  5. This is the syntax for the command to install a patch:.
/bsu.sh -prod_dir=<weblogic_home> -patchlist=<patchID> -verbose -install
For example,
./bsu.sh -prod_dir=/opt/bea/weblogic92 -patchlist=E5W8 -verbose -install
./bsu.sh -prod_dir=/opt/bea/weblogic92 -patchlist=WGQJ -verbose -install
2. Applying the patch to the classpath manually
============================
  1. You can apply the patch to the offline system manually by extracting the actual patch and adding it to the classpath on the offline system:Extract the actual patch jar file. If you downloaded the patch using SmartUpdate, it will be in the form <patch_id>.jar (for example: E5W8.jar). Inside this jar file is the actual patch jar file, which will be of the form CR326566_92mp3.jar. Extract the latter file for the following steps.
  2. Add the extracted jar file as the first element of the classpath of the Admin server as well as the managed servers in the domain.
  3. If you are starting servers using the WebLogic startup script, update the classpath in the startup script like this:set CLASSPATH=<PATCH_DIR>\jars\CR326566_92mp3.jar;%CLASSPATH% (Windows)CLASSPATH=<PATCH_DIR>/jars/CR326566_92mp3.jar:$CLASSPATH (UNIX)where PATCH_DIR is the directory on your local machine where you extracted/saved the patch file.
  4. Similarly, if you are starting servers using Node Manager, add the patch jar to the beginning of the Class Path argument in the Server Start tab for the server(s).

我一般用第二种,对于单个补丁快捷方便,SmartUpdate可以单独安装,但是会让你选择应用到哪个BEA的主目录,不同的版本和平台能下的补丁不一样。在Windows平台上当然没有AIX的BEA版本,不过只要自己建个目录,然后拷贝一份register.xml进去就可以了。

分类: weblogic, 排错 标签: ,

一次WebSphere性能问题诊断过程

2009年8月24日 hashei 没有评论

一次接到用户电话,说某个应用在并发量稍大的情况下就会出现响应时间陡然增大,同时管理控制台的响应时间也很慢,几乎无法进行正常工作。

赶到现场后,查看平台版本为Webshpere6.0.2.29,操作系统为Windows 2003企业版sp2,于是首先分析systemout.log,发现有如下报错:

= com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException Max connections reached 869

Exception = com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException

Source = Max connections reached

probeid = 869

同时也发现有“Caused by: java.io.IOException: Async IO operation failed, reason: RC: 10053 您的主机中的软件放弃了一个已建立的连接。”

很明显是连接池无法分配一个新连接给请求,wait时间过长达到WaitTimeout时间,第一反应是数据库连接池大小开的不够,于是设成初始50,最大150,120S空闲则自动释放连接。

但是调整参数后没有改善,过了不到10分钟应用依旧变慢。再次查看System.out和FFDC里的错误信息,发现有许多关于IO的报错:

com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream 102

Exception = java.net.SocketTimeoutException

Source = com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream

probeid = 102

stack Dump = java.net.SocketTimeoutException: Async operation timed out

java.io.IOException com.ibm.ws.webcontainer.servlet.RequestUtils.parsePostData 398

Exception = java.io.IOException

Source = com.ibm.ws.webcontainer.servlet.RequestUtils.parsePostData

probeid = 398

Stack Dump = java.io.IOException: Async IO operation failed, reason: RC: 55 指定的网络资源或设备不再可用。probeid = 1184

事后才知道其实数据库和中间件之间的问题,但是一来没有报连接池大小不够的错,二来此时管理控制台也几乎无法使用,又结合此应用在操作中会上传许多文件并进行校验,怀疑是服务器的I/O瓶颈导致应用变慢。

        于是在服务器上开启性能工具,添加%Disk time、%Disk Write、%Disk Read、Disk Queue Length、Fage Fault等计数器。发现%Disk Time平均维持在20~70之间,瞬时的Disk Time会达到90多,而且Disk Read值很小,基本是Disk Write。

继续观察了一段时间,发现当磁盘读写下来后,应用还是很慢,于是分析内存回收情况,查看是否有内存泄漏发生。

用IBM Monitoring and Diagnostic Tools for Java™ – Garbage Collection and Memory Visualizer分析发现 Mean interval between collections只有0.46分钟,且内存使用量才250多M就开始GC,而内存参数设置为760~1260M,于是分析内存中的碎片太多,导致GC频繁,使服务的响应速度变慢。同时分析软件得出The mean heap unusable due to fragmentation is estimated at 34.685%,问了应用他们申请内存对象一般是短期的,于是更改GC Policy为Gencon(分代并发),再观察GC日志发现每次回收间隔都较长,而且是young区的回收,Global collections间隔为23分钟。

可惜做了如此的性能优化,情况一点都未改善,在控制台的性能实时检测中可以看到JDBC连接有40~60个繁忙状态,当时无法确定这是否正常,是否是确实需要用到如此多连接。后来应用开发的检测数据库,发现很多active的连接时间长达5到10分钟,内容为一查询语句。原来应用是在Hibernat下开发的,查询语句被它加了自己的函数,导致原先建的索引无法起作用(应用建立索引的时候犯了低级错误),后来重新建立索引后,查询很快完成,连接池繁忙数量降低到0~5,应用恢复正常。原来是数据库的查询时间过长,导致线程都在等待数据库的返回信息,100个线程很快被用完,无法响应新的服务,因为数据库连接池资源已经开大,所以没有这方面的报错。

回顾这一周的排错过程,走了很大的弯路,当时因为经验欠缺没有想到做thread dump,如果做了thread dump的话,应该很容易看到大量的线程在等待数据库的返回,从而定位到数据库问题。

从中我们也看到,最终的问题也许很低级,但是排查的过程会很复杂,因为中间件问题牵扯到主机、网络、数据库、应用等各方面。不过得到的经验就是,在应用响应慢的时候,应该做个thread dump观察线程运行情况,而并非要等到hang住,cpu 100%,OutOfMemory的时候才想起分析javacore。

应用程序死锁导致服务器挂起的介绍

2009年8月17日 hashei 没有评论

原来好东西都躲到Metalink上去了

Problem Description

An inadvertent deadlock in the application code can cause a server to hang. For example, a situation in which thread1 is waiting for resource1 and is holding a lock on resource2, while thread2 needs resource2 and is holding the lock on resource1. Neither thread can progress.

Problem Troubleshooting

This Application Deadlock pattern should be used only after doing all the steps in the Generic Server Hang pattern. One indicator that this is an application deadlock problem is that thread dumps will show the threads are in the application methods. Several thread dumps taken a few seconds apart will show that the threads are not progressing. Troubleshooting this problem will involve reviewing the application code. There exists a thread analyzer tool at BEA dev2dev which has proven useful in analysis of the thread dumps.

Quick Links

阅读全文…

分类: weblogic, 排错 标签: ,

JDBC引发的服务器hang解决思路

2009年8月16日 hashei 2 条评论

这篇也是转自BEA的官方文档,源地址在BEA被Oracle收购后就转到Oracle官网了,所以留为备份。

JDBC Causes Server Hang


Problem Description
A JDBC connection which is used by an application or by WebLogic Server itself will block one WebLogic Server execute thread for the complete duration of the calls that are made via this connection. The JVM will ensure that the CPU is given to runnable threads by its thread scheduling mechanism, while the thread that blocks on a SQL query needs to wait. However, the thread occupied by the JDBC call will be reserved and used for the application until the call returns from the SQL query.

Even a transaction timeout will not kill or timeout any action that is done by the resources that are enlisted in this transaction. The actions will run as long as they take, without interruption. A transaction timeout will set a flag on the transaction that will mark it as rollback only, so that any subsequent request to commit this transaction will fail with a TimedOutException or RollbackException. However, as mentioned above, the long running JDBC calls can lead to blocked WebLogic Server execute threads, which can finally lead to a hanging instance, if all threads are blocked and no execute thread remains available for handling incoming requests.

More recent WebLogic Server versions have a health check functionality that regularly checks if a thread does not react for a certain period of time (the default is 600 seconds). If this happens, an error message is printed to your log file similar to following:


####<Nov 6, 2004 1:42:30 PM EST> <Warning> <WebLogicServer> <mydomain> <myserver> <CoreHealthMonitor>
<kernel identity> <>
<000337> <ExecuteThread: ‘64′ for queue: ‘default’ has been busy for “740″ seconds working on the request “Scheduled Trigger”,
which is more than the configured time (StuckThreadMaxTime) of “600″ seconds.>


This does not interrupt the thread, as this is just a notification for the administrator. The only way a stuck thread becomes unstuck again is when the request it is handling finishes. In this case, you will find a message similar to following in your WebLogic Server’s log file:


####<Nov 7, 2004 4:17:34 PM EST> <Info> <WebLogicServer><mydomain> <myserver> <ExecuteThread: ‘66′
for queue: ‘default’>
<kernel identity> <> <000339> <ExecuteThread: ‘66′ for queue: ‘default’ has become “unstuck”.>


The time interval for the health check functionality is configurable. Please check StuckThreadMaxTime property in the <Server> tag of your config.xml file: http://e-docs.bea.com/wls/docs81/config_xml/Server.html#StuckThreadMaxTime or the “Detecting stuck threads” section in the WebLogic Server administration console help: http://e-docs.bea.com/wls/docs81/perform/WLSTuning.html#stuckthread.

Top of Page

Problem Troubleshooting
Different programming techniques or JDBC connection pool configurations can lead to deadlocks or long running JDBC calls that lead to hanging WebLogic Server instances. General information about how to troubleshoot and analyze a hanging WebLogic Server instance is provided in Generic Server Hang Pattern.

This pattern addresses JDBC calls causing a server hang and other well known JDBC-related causes for common problems leading to hanging WebLogic Server instance.  Other Support Patterns referenced in this pattern are at the WebLogic Server Support Patterns Site.

Quick Links

Why does the problem occur?
The following are some different possible reasons that can cause JDBC calls to lead to a hanging WebLogic Server instance:

Top of Page

Synchronized DriverManager.getConnection()
Older JDBC application code sometimes uses DriverManager.getConnection() calls to retrieve a database connection using a certain driver. This technique is not recommended as it can cause deadlocks or at least relatively low performance for your connection requests. The reason behind this is, that all DriverManager calls are class-synchronized, meaning that one DriverManager call in one thread will block all other DriverManager calls in any other thread inside one WebLogic Server instance.

In addition to that, the constructor for a SQLException makes a DriverManager call, and most drivers have DriverManager.println() calls for logging, so any of these can block all other threads that issue a DriverManager call.

DriverManager.getConnection() can take a relatively long time until it returns with the physical connection created to the database. Even if no deadlock occurs, all other calls need to wait until that one thread gets its connection. This is not a best practice in a multi-threaded system like WebLogic Server.


This information is taken from http://forums.bea.com/bea//thread.jspa?forumID=2022&threadID=200063365&messageID=202311284&start=-1#202311284.
Also our documentation clearly states that DriverManager.getConnection() should not be used: http://e-docs.bea.com/wls/docs81/faq/jdbc.html#501044.

If you prefer to use JDBC connections in your JDBC code, you should use a WebLogic Server JDBC connection pool, define a DataSource for it, and get the connection from the DataSource. This will give you all advantages from a pool (resource sharing, connection reuse, connection refresh if a database was down, etc). It also will help you avoid the deadlocks that may happen with DriverManager calls. See detailed information on how to use JDBC connection pools, DataSources, and other JDBC objects in WebLogic Server at: http://e-docs.bea.com/wls/docs81/jdbc/intro.html#1036718 and http://e-docs.bea.com/wls/docs81/jdbc/programming.html#1054307.

A typical thread blocked in a DriverManager.getConnection() call looks like:

“ExecuteThread-39″ daemon prio=5 tid=0×401660 nid=0×33 waiting for monitor entry [0xd247f000..0xd247fc68]
  at java.sql.DriverManager.getConnection(DriverManager.java:188)
  at com.bla.updateDataInDatabase(MyClass.java:296)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:865)
  at weblogic.servlet.internal.ServletStubImpl.invokeServlet
(ServletStubImpl.java:120)
  at weblogic.servlet.internal.ServletContextImpl.invokeServlet
(ServletContextImpl.java:945)
  at weblogic.servlet.internal.ServletContextImpl.invokeServlet
(ServletContextImpl.java:909)
  at weblogic.servlet.internal.ServletContextManager.invokeServlet
(ServletContextManager.java:269)
  at weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:392)
  at weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:274)
  at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:130)

Top of Page

Long Running SQL Queries
Long running SQL queries block execute threads for their duration and until they return their result to the calling application. This means that a WebLogic Server instance needs to be configured to be able to handle enough calls simultaneously as they are requested by the application load. Limiting factors here are the number of execute threads and the number of connections in the JDBC connection pools. A general rule of thumb is to set the number of connections in the pool equally to the number of execute threads to enable optimal resource utilization. If JTS is used, some more connections in the pools should be available because connections may be reserved for transactions that are actually not active.

A thread hanging in a long running SQL call will show a very similar stack in a thread dump as the one for a hanging database. Please compare the next section for details.

Hanging Database
Good database performance is key for the performance of an application that relies on this database. Consequently, a hanging database can block many or all available execute threads in a WebLogic Server instance and finally lead to a hanging server. To diagnose this, you should take 5 to 10 thread dumps from your hanging WebLogic Server instance and check your execute threads (in the default queue or your application thread queue) to see if they are currently in SQL calls and waiting for a result from the database. A typical stack trace for a thread that currently issues a sql query could look similar to following example:


“ExecuteThread: ‘4′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0×8e93c8 nid=0×19 runnable [e137f000..e13819bc]
  at java.net.SocketInputStream.socketRead0(Native Method)
  at java.net.SocketInputStream.read(SocketInputStream.java:129)
  at oracle.net.ns.Packet.receive(Unknown Source)
  at oracle.net.ns.DataPacket.receive(Unknown Source)
  at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)
  at oracle.net.ns.NetInputStream.read(Unknown Source)
  at oracle.net.ns.NetInputStream.read(Unknown Source)
  at oracle.net.ns.NetInputStream.read(Unknown Source)
  at oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:931)
  at oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893)
  at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:375)
  at oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1983)
  at oracle.jdbc.ttc7.TTC7Protocol.fetch(TTC7Protocol.java:1250)
  – locked <e8c68f00> (a oracle.jdbc.ttc7.TTC7Protocol)
  at oracle.jdbc.driver.OracleStatement.doExecuteQuery(OracleStatement.java:2529)
  at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout
(OracleStatement.java:2857)
  at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:608)
  – locked <e5cc44d0> (a oracle.jdbc.driver.OraclePreparedStatement)
  – locked <e8c544c8> (a oracle.jdbc.driver.OracleConnection)
  at oracle.jdbc.driver.OraclePreparedStatement.executeQuery
(OraclePreparedStatement.java:536)
  – locked <e5cc44d0> (a oracle.jdbc.driver.OraclePreparedStatement)
  – locked <e8c544c8> (a oracle.jdbc.driver.OracleConnection)
  at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:80)
  at myPackage.query.getAnalysis(MyClass.java:94)
  at jsp_servlet._jsp._jspService(__jspService.java:242)
  at weblogic.servlet.jsp.JspBase.service(JspBase.java:33)
  at weblogic.servlet.internal.ServletStubImpl$
ServletInvocationAction.run(ServletStubImpl.java:971)
  at weblogic.servlet.internal.ServletStubImpl.invokeServlet
(ServletStubImpl.java:402)
  at weblogic.servlet.internal.ServletStubImpl.invokeServlet
(ServletStubImpl.java:305)
  at weblogic.servlet.internal.RequestDispatcherImpl.include
(RequestDispatcherImpl.java:607)
  at weblogic.servlet.internal.RequestDispatcherImpl.include
(RequestDispatcherImpl.java:400)
  at weblogic.servlet.jsp.PageContextImpl.include(PageContextImpl.java:154)
  at jsp_servlet._jsp.__mf1924jq._jspService(__mf1924jq.java:563)
  at weblogic.servlet.jsp.JspBase.service(JspBase.java:33)
  at weblogic.servlet.internal.ServletStubImpl$
ServletInvocationAction.run(ServletStubImpl.java:971)
  at weblogic.servlet.internal.ServletStubImpl.invokeServlet
(ServletStubImpl.java:402)
  at weblogic.servlet.internal.ServletStubImpl.invokeServlet
(ServletStubImpl.java:305)
  at weblogic.servlet.internal.WebAppServletContext$
ServletInvocationAction.run(WebAppServletContext.java:6350)
  at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:317)
  at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:118)
  at weblogic.servlet.internal.WebAppServletContext.invokeServlet
(WebAppServletContext.java:3635)
  at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2585)
  at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
  at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)


The thread will be in running state. You should compare the threads in your different thread dumps in order to see if they receive the return from the SQL call in a timely manner or if they hang in this same call for a longer period of time. If the thread dumps seem to imply long response times from SQL calls, the corresponding database logs should be checked to see if problems in the database cause this slow performance or hang situation.

Top of Page

Slow Network
Communication between WebLogic Server and the database relies on a well-performing and reliable network in order to serve the requests in a timely manner. Slow network performance can therefore lead to hanging or blocking execute threads waiting for results of SQL queries. The related stack traces will look similar to example above in Hanging Database section. It is not possible to find the root cause of the hanging or slow SQL queries by solely analyzing the WebLogic Server thread dumps. These give the first hint that something is wrong with the performance of the SQL calls. The next step is to check if there is a database or network problem that causes poorly performing SQL calls.

Deadlock
Both an application level deadlock as well as a deadlock on the database level can lead to hanging threads. You should check your thread dumps to see if there is an application level deadlock. Information on how to do this is provided in Server Hang – Application Deadlock Pattern. A database deadlock can be detected either in the database log or by the SQL Exception that can be found in the WebLogic Server log file. An example for a related SQL Exception is:


java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource
  at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:170)
  at oracle.jdbc.oci8.OCIDBAccess.check_error(OCIDBAccess.java:1614)
  at oracle.jdbc.oci8.OCIDBAccess.executeFetch(OCIDBAccess.java:1225)
  at oracle.jdbc.oci8.OCIDBAccess.parseExecuteFetch(OCIDBAccess.java:1338)
  at oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java:1722)
  at oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java:1647)
  at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:2167)
  at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate
(OraclePreparedStatement.java:404)


As it generally can take some time until a database detects a deadlock and resolves it by rolling back one or more transactions that cause the deadlock, one or more execute threads will be blocked until the rollback has finished.

RefreshMinutes or TestFrequencySeconds
If you see recurring periods of low database performance, slow SQL calls, or connection peaks, the setting of the RefreshMinutes or TestFrequencySeconds configuration property in your JDBC connection pools could be the reason. This is described in detail in Investigating JDBC Problems Pattern. Unless you do not have a firewall between your WebLogic Server instance and your database, you should disable this functionality.

Pool Shrinking
Physical connections to a database are resources that should be opened once and kept open as long as possible, as a new connection request is a considerable resource overhead for the database, the operating system kernel, and the WebLogic Server. Consequently, pool shrinking should be disabled on production systems in order to keep this overhead at a minimum. If pool shrinking is enabled, idle pool connections will be closed and reopened once connection requests to the pool cannot be satisfied.

As these activities can take some time, the related application requests may take an unexpectedly long time which can lead users to assume that the system hangs. Information on how to optimize JDBC connection pool configurations is provided in Investigating JDBC Problems Pattern.

Top of Page

Analysis of a hanging WebLogic Server instance
General information on how to analyze a hanging WebLogic Server instance is provided in Generic Server Hang Pattern.

Most times it will be helpful to start with taking thread dumps from the hanging system in order to find out what is going on, e.g., what the different threads are doing and why they hang. Generally, thread dumps can be taken on production systems, however caution is necessary for very old versions of the JVM (<1.3.1_09), as they may crash during thread dumps. Also if the WebLogic Server instance has a huge number of threads, it will mean that the thread dump will take awhile to complete, while the rest of the threads are blocked.

Please take more than one thread dump (5 to 10) with a delay of some seconds in between. This gives you the possibility to check the progress of the different threads. Also it will show if the system actually hangs (no progress at all) or if the throughput is extremely slow, which can seem to be a hanging system.

Information on how to take thread dumps is provided in “Generic Server Hang” support pattern or in our documentation: http://e-docs.bea.com/wls/docs81/cluster/trouble.html.

Also please check if the complete WebLogic Server instance hangs or if it is the application that hangs. “Generic Server Hang” support pattern also includes this information.

Analyzing the thread dumps can show if one of the reasons mentioned in the previous section Why does the problem occur? actually is responsible for your hanging instance. If for example all your threads are in a DriverManager method like getConnection() then you have identified the root cause and need to change your application to use a DataSource or Driver.connect() instead of DriverManager.getConnection().

A very useful tool, Samurai, can be used to analyze thread dumps and to monitor the progress of threads between different thread dumps. This can be downloaded from dev2dev at:  http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp.

A whitepaper on analyzing thread dumps on dev2dev: http://dev2dev.bea.com/products/wlplatform81/articles/thread_dumps.jsp will also be helpful in going deeper into the thread dumps to find out more about the server hang.

Top of Page

Tips and Tricks to optimize your JDBC code and JDBC connection pool configuration
There are some best practices both in the development of JDBC code and also in the configuration practice of JDBC connection pools that can help to avoid common problems and optimize resource usage so that hanging server instances should not happen.

JDBC Programming
In order to optimize resource usage in WebLogic Server and conserve database resources, you should use JDBC connection pools for your application’s JDBC calls. Connections created and destroyed in your application code generate an unnecessary overhead which should be avoided. For generic documentation on JDBC programming, see: http://e-docs.bea.com/wls/docs81/jdbc/rmidriver.html#1028977. Also details on JDBC performance tuning are at: http://e-docs.bea.com/wls/docs81/jdbc/performance.html#1027791.

You can view comprehensive information on JDBC that will help to optimize your JDBC code and the utilization of your JDBC resources on dev2dev Java Database Connectivity page at: http://dev2dev.bea.com/technologies/jdbc/index.jsp.

JDBC Connection Pool Configuration
The Investigating JDBC Problems Pattern has recommendations on how to configure a connection pool for production environments. In order to avoid hangs or bad performance, these configuration tips should be considered.

Top of Page

Known Issues
You can periodically review the Release Notes for your version of WLS for more information on Known Issues or Resolved Issues in Service Packs and browse for JDBC server hang-related issues.  For your convenience, see the following:

Please note that changes have been made in WLS 8.1 SP3 to resolve CR134921, where for certain JDBC connections, the call to roll back a transaction was not being handled immediately because the driver had to wait for any currently-executing statement to return. 

Searching will also return Release Notes, as well as other Support Solutions and CR-related information as noted at Need Further Help?.  Contract customers who are logged in at
http://support.bea.com/ will also see a Browse portlet for both Solutions and Bug Central where latest available CRs can be browsed by Product version.


Need Further Help?
If you have followed the pattern, but still require additional help, you can:
  1. Query AskBEA at http://support.bea.com/ using “jdbc server hang”, as an example, to discover other published solutions.  Contract Support Customers: Ensure you are logged to access available CR-related information.
  2. Ask a more detailed question on one of BEA’s newsgroups at http://forums.bea.com

If this does not resolve your issue and you have a valid Support Contract, you can open a Support Case by logging in at: http://support.bea.com/ .


FEEDBACK

Please provide us input on whether or not this Support Diagnostic Pattern “JDBC Causes Server Hang” helped, any clarifications you needed, and any requests for new topics to Support Diagnostic Patterns.



DISCLAIMER NOTICE:

BEA Systems, Inc. provides the technical tips and patches on this Website for your use under the terms of BEA’s maintenance and support agreement with you. While you may use this information and code in connection with software you have licensed from BEA, BEA makes no warranty of any kind, express or implied, regarding the technical tips and patches.

Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.

分类: 性能优化, 排错 标签: , , ,

应用服务器发生hang的诊断方法

2009年8月15日 hashei 没有评论

写在前面

其实这是BEA官网上的一篇文档,是在weblogic8.1的时候推出的。在BEA被Oracle收购后,所有的support文章也就被重定向到Oracle的官网首页= =,而且google的快照也没有了。这篇来自无意间google到的一个外国论坛,虽然是写在8.1时,但是解决问题的方法和思路现在依旧有效。本想理解之后结合案例来写一篇,但是最近一直没有遇到相关的问题,而且觉得那样也许会破坏文章的完整性,所以放出原文,既在网上留个副本,也能让大家各取所需,见仁见智。

从内容看,你会发现除了这篇,还有EJB_RMI Server Hang、Application Dead Lock、JDBC Causes Server Hang,但是那个论坛里还能找到的仅有JDBC Causes Server Hang一篇。所以如果你接触weblogic比较早,保存过另两篇文章,或者在网上看到了,那请留言说明,万分感谢。

Generic Hang

Problem Description
A server hang is suspected when:

  • The server does not respond to new requests.
  • Requests time out.
  • Requests take longer and longer to process (may be on the way to a hang).
  • A server crash is not usually a symptom of a hung server but may follow.
Problem Troubleshooting
Please note that not all of the following items would need to be done. Some issues can be solved by only following a few of the items.Quick Links:

Why does the problem occur?
A server can hang for a variety of reasons (refer to Potential Causes of Server Hang). Generally, a server hangs because of a lack of some resource. Lack of a resource prevents the server from servicing requests. For example, because of a problem (deadlock) or volume of requests there may be no execute threads available to do any work; all are busy or busy with previous requests.

Top of Page

Topic
Pattern Name
Link
RMI, RJVM responses – all threads tied up waiting for RJVM, RMI responses. EJB_RMI Server Hang
Application Deadlock – thread locks resource1 then waits for lock for resource2. Another thread locks resource2 and then waits for lock for resource1. Application Deadlock Causes Server Hang
Threads are all used up, none available for new work. Thread Usage Server Hang TBD
Garbage Collection taking too much time. Garbage Collection Server Hang TBD
JSP improper settings for servlet times, e.g. PageCheckSeconds. JSP cause Server Hang TBD
Long Running JDBC calls or JDBC deadlocks lead to a hang. JDBC Causes Server Hang JDBC Causes Server Hang
JVM hang during (code optimization), looks like server hang. Server Hang in Code Optimization TBD
JSP compilation causes server hang under heavy load. JSP Compilation Server Hang TBD
SUN JVM bugs, e.g. Light weight thread library. Sun JVM Bugs that Cause Server Hangs TBD
Top of Page


When a server is hanging, first ping the server using java weblogic.Admin t3://server:port PING. If the server can respond to the ping, it may be that the application is hanging and not the server itself.

Ensure that the server is actually hanging and not doing garbage collection. To verify, restart the server with -verbosegc turned on, and redirect stdout and stderr to one file. When the server stops responding, it can be determined if it’s doing garbage collection or it is really hanging.  If the garbage collection is taking too long (>10 seconds), the server may miss the heartbeats that servers use to keep each other informed of the topoplogy of the cluster.

WebLogic Server uses the ‘default’ thread queue or a configured application specific thread queue to service client requests. Client requests will only be handled in the default queue if no application specific thread queue is defined.  Please see Tuning WebLogic Server Applications, Tuning the Default Execute Queue Threads, and Tuning WebLogic Server Performance Parameters for more information on defining application specific thread queues.

In release 8.1, a change was made to the thread architecture in WebLogic Server.  A specific kernel thread group for internal WebLogic tasks was created.  This was found to be necessary to avoid deadlocks that occurred in earlier releases when all threads in the ‘default’ thread queue were used and none were thus available for WebLogic internal tasks.

The threads in the ‘default’ queue or the application specific thread queue (if one has been configured) are the threads that should be examined in the event of a server hang. Here’s an example of what one of these threads looks like in a thread dump. Execute Thread ‘14′ from the ‘default’ queue looks like in a thread dump when the thread is waiting for work. The latest method called by this thread is Object.wait(). This thread is in a state “waiting on monitor”.

“ExecuteThread: ‘14′ for queue: ‘default’” daemon prio=5 tid=0×8b0ab30 nid=0×1f4 waiting on monitor [0x96af000..0x96afdc4]
at
java.lang.Object.wait(Native Method)
at
java.lang.Object.wait(Object.java:420)
at
weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:94)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:118)
Threads can be in one of several states.  Please see the table below for a description of the thread states.
The format of the thread dump varies with the vendor.  Check on the vendor’s website for information regarding the format.

Below is an example of  threads that  may  be hanging.  ExecuteThread ‘9′ is waiting to lock some object <dde51520>.   Notice the “waiting to lock <dde51520>” line in the stack trace for this thread.  ExecuteThread ‘6′ is also “waiting to lock the same object <dde51520>”.  The third thread, ExecuteThread ‘5′ has locked this object <dde51520>and is doing work.  This  example demonstrates why one thread dump is not enough.  If the server is hanging, and it is suspected that the cause is the locked object <dde51520>, then subsequent thread dumps will show whether or not that object was released and a new thread has locked object <dde51520>.  If after several thread dumps,  you do not see that the threads have progressed, that object <dde51520> has not been released, you may suspect that there is a problem with the routine(s) in the ExecuteThread ‘5′ call stack because the lock is not being released.

“ExecuteThread: ‘9′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0xf684c8 nid=0×13 waiting for monitor entry [cc2ff000..cc2ffc24]
at weblogic.cluster.MemberManager.done(MemberManager.java:306)
- waiting to lock <dde51520> (a weblogic.cluster.MemberManager)
at weblogic.cluster.MulticastManager.execute(MulticastManager.java:399)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)

“ExecuteThread: ‘6′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0×9df020 nid=0×10 waiting for monitor entry [cc5ff000..cc5ffc24]
at weblogic.cluster.MemberManager.getRemoteMembers(MemberManager.java:396)
- waiting to lock <dde51520> (a weblogic.cluster.MemberManager)
at weblogic.cluster.ClusterService.getRemoteMembers(ClusterService.java:238)
at weblogic.servlet.internal.HttpServer.setServerList(HttpServer.java:388)
at weblogic.servlet.internal.HttpServer.clusterMembersChanged(HttpServer.java:418)
- locked <ddf32360> (a weblogic.servlet.internal.HttpServer)
at weblogic.cluster.MemberManager$2.execute(MemberManager.java:421)
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)

“ExecuteThread: ‘5′ for queue: ‘weblogic.kernel.Default’” daemon prio=5 tid=0×9df020 nid=0×12 waiting for monitor entry [cc5ff000..cc5ffc24]
. . .

at weblogic.cluster.MemberManager.checkTimeouts(MemberManager.java:346)
- locked <dde51520> (a weblogic.cluster.MemberManager)
at weblogic.cluster.MulticastManager.trigger(MulticastManager.java:291)
at weblogic.time.common.internal.ScheduledTrigger.run(ScheduledTrigger.java:243

Determine if the”default” ExecuteThread queue is overloaded. Use the console to determine if any of the ExecuteThreads in the ‘default’ queue are idle. If none are idle, then the application probably needs to be configured with a larger number of ExecuteThreads. This value can be changed through the console and is in the config.xml file.

If the Execute Queue has idle threads, it is possible that not enough socket reader threads are allocated. By default, a WebLogic Server instance creates three socket reader threads upon booting. If a cluster system utilizes more than three sockets during peak periods, increase the number of socket reader threads.

The number of socket reader threads should usually be small. However, configure one thread for each Weblogic Server that acts as a client of the server instance that is hanging.

If using a JDBC connection pool, ensure that the JDBC connections have been configured to be equivalent to the number of simultaneous requests, i.e., execute threads, for the pool.

Top of Page


The possibility exists that a problem with JDBC could produce deadlock. Check the version and service pack level of the server found in the beginning of the weblogic.log. Then check above the version and service pack lines for any temporary patches that have already been applied to the server classpath. The patches will tell what problems have already been addressed.

Top of Page


The way to take a thread dump is dependent on the operating system where the hung server instance is installed. Information about taking a thread dump on various operating systems can be found at http://e-docs.bea.com/wls/docs81/cluster/trouble.html#gc. Redirection of both standard error and standard out places the thread dump information in the proper context with server information and other messages and provides more useful logs.

Unix Systems (Solaris, HP, AIX)
Use kill –3 <weblogic process id> to create the necessary thread dumps to diagnose a problem. Ensure this is done several times on each server, spaced about 5 to 10 seconds apart, to help diagnose deadlocks. For this to work, nohup the process when starting the server (refer to Solutions S-12292 and S-15924).

Windows, XP, NT
Each server requires <Ctrl>-<Break> to create the necessary thread dumps to diagnose a problem. Ensure this is done several times on each server, spaced about 5 to 10 seconds apart, to help diagnose deadlocks. On NT, in the command shell type CTRL-Break.

If you have installed WebLogic as a Windows service, you will not be able to see the messages from the JVM or WebLogic Server that are printed to standard out or standard error.  To view these messages, you must direct standard out and standard error to a file.  To do this, take the following steps:

  1. Create a backup copy of the WL_HOME\server\bin\installSvc.cmd master script.
  2. In a text editor, open the WL_HOME\server\bin\installSvc.cmd master script.
  3. In installSvc.cmd, the last command in the script invokes the beasvc utility.
  4. At the end of the beasvc command, append the command -log:”pathname”
    where pathname is a fully qualified path and filename of the file that you want to store the server’s standard out and standard error messages.
  5. The modified beasvc command will resemble the following command:
    “%WL_HOME%\server\bin\beasvc” -install
    -svcname:”%DOMAIN_NAME%_%SERVER_NAME%”
    -javahome:”%JAVA_HOME%” -execdir:”%USERDOMAIN_HOME%”
    -extrapath:”%WL_HOME%\server\bin” -password:”%WLS_PW%”
    -cmdline:%CMDLINE%
    -log:”d:\bea\user_projects\domains\myWLSdomain\myWLSserver-stdout.txt”
  6. If you started WebLogic with nohup, the log messages will show up in nohup.out.

Linux
The Linux operating system views threads differently than other operating systems. Each thread is seen by the operating system as a process. To take a thread dump on Linux, find the process id from which all the other processes were started. Use the commands:

  • To obtain the root PID, use:

    ps -efHl | grep ‘java’ **. **

Use a grep argument that is a string that will be found in the process stack that matches the server startup command. The first PID reported will be the root process, assuming that the ps command has not been piped to another routine.

  • Use the weblogic.Admin command THREAD_DUMP

Another method of getting a thread dump is to use the THREAD_DUMP admin command. This method is independent of the OS on which the server instance is running.

java weblogic.Admin -url ManagedHost:8001 -username weblogic -password weblogic THREAD_DUMP

NOTE: This command cannot be used if unable to ping the server instance.

If the JVM in use is Sun’s, the thread dump goes to stdout. Sun has enhanced the thread dump format between JVM 1.3.1 and 1.4. To obtain Sun’s 1.4 style of thread dump add the following option to the java command line for starting the 1.3.1 JVM:

-XX:+JavaMonitorsInStackTrace

Top of Page


The most useful tool in analyzing a server hang is a set of thread dumps. A thread dump provides information on what each of the threads is doing at a particular moment in time. A set of thread dumps (usually 3 or more taken 5 to 10 seconds apart) can help analyze the change or lack of change in each thread’s state from one thread dump to another. A hung server thread dump would typically show little change in thread states from the first to the last dump.

Threads can be in one of the following states:

Running or runnable thread A runnable state means that the threads could be running or are running at that instance in time.
Suspended thread Thread has been suspended by the JVM.
Thread waiting on a condition variable Threads in a condition wait state can be thought of as waiting for an event to occur.
Thread waiting on a monitor lock Monitors are used to manage access to code that should only be run by a single thread at a time

More information on thread states can be found at http://java.sun.com/developer/onlineTraining/Programming/JDCBook/stack.html#states.

There is also a thread analysis tool at http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp.
Download the tool and read the instructions at the link.

What to Look at in the Thread Dump

All requests enter the WebLogic Server through the ListenThread. If the ListenThread is gone, no work can be received and therefore no work can be done. Verify that a ListenThread exists in the thread dump. The ListenThread should be in the socketAccept method. The following example shows what the Listen Thread looks like:

“ListenThread.Default” prio=10 tid=0×00037888 nid=93 lwp_id=6888343 runnable [0x 1a81b000..0x1a81b530] at java.net.PlainSocketImpl.socketAccept(Native Method)
at
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:353)
- locked <0×26d9d490> (a java.net.PlainSocketImpl)
at
java.net.ServerSocket.implAccept(ServerSocket.java:439)
at
java.net.ServerSocket.accept(ServerSocket.java:410)
at
weblogic.socket.WeblogicServerSocket.accept(WeblogicServerSocket.java:24)
at
weblogic.t3.srvr.ListenThread.accept(ListenThread.java:713)
at
weblogic.t3.srvr.ListenThread.run(ListenThread.java:290)
Socket Reader Threads accept the incoming request from the Listen Thread Queue and put it on the Execute Thread Queue. If there are no socket reader threads in the thread dump, then there is a bug somewhere that is causing the socket reader thread to vanish. There should always be at least 3 socket reader threads. One socket reader thread is usually in the poll function, while the other two are available to process requests. Below are Socket Reader threads from a sample thread dump.
“ExecuteThread: ‘2′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×000 36128 nid=75 lwp_id=6888070 waiting for monitor entry [0x1b12f000..0x1b12f530]
at
weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:92)
- waiting to lock <0×25c01198> (a java.lang.String)
at
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)

“ExecuteThread: ‘1′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×000 35fc8 nid=74 lwp_id=6888067 runnable [0x1b1b0000..0x1b1b0530] at weblogic.socket.PosixSocketMuxer.poll(Native Method)
at
weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:99)
– locked <0×25c01198> (a java.lang.String)
at
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)

“ExecuteThread: ‘0′ for queue: ‘weblogic.socket.Muxer’” daemon prio=10 tid=0×000 35e68 nid=73 lwp_id=6888066 waiting for monitor entry [0x1b231000..0x1b231530]
at
weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:92)
- waiting to lock <0×25c01198> (a java.lang.String)
at
weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)
at
weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)
at
weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)

The ThreadPoolPercentSocketReaders attribute sets the maximum percentage of execute threads that are set to read messages from a java socket. The optimal value for this attribute is application-specific. The default value is 33, and the valid range is 1 to 99.

Allocating execute threads to act as socket reader threads increases the speed and the ability of the server to accept client requests. It is essential to balance the number of execute threads that are devoted to reading messages from a socket and those threads that perform the actual execution of tasks in the server.

In release 8.1, the socket reader threads no longer use “ExecuteThreads” in the default queue.  Instead they have their own thread group named.

Next Steps
The next steps require a further analysis of the thread dump. Look in the thread dump to see what each the threads are doing at the time of the hang. This will help to analyze the next stage of the investigation. For example, if there are many threads involved in JSP compilation, refer to Potential Causes of Server Hang for further diagnosis and actions to test.

Top of Page