<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>聚沙成塔-小哈的记事薄 &#187; 排错</title>
	<atom:link href="http://www.hashei.me/category/websphere%e7%b3%bb%e5%88%97/%e6%8e%92%e9%94%99/feed" rel="self" type="application/rss+xml" />
	<link>http://www.hashei.me</link>
	<description>一个系统工程师的絮叨</description>
	<lastBuildDate>Tue, 10 Jan 2012 18:03:42 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		
<!-- Start Of Script Generated By WP-PostViews Plus -->
<script type='text/javascript' src='http://hashei.me/wp-includes/js/jquery/jquery.js?ver=1.3.2'></script>
<script type="text/javascript">
/* <![CDATA[ */
jQuery.ajax({type:'GET',url:'http://hashei.me/wp-content/plugins/wp-postviews-plus/postviews_plus.php',data:'todowppvp=add&type=cat&id=25_1',cache:false,dataType:'script'});
/* ]]> */
</script>
<!-- End Of Script Generated By WP-PostViews Plus -->
	<item>
		<title>Java 类加载器的又一篇文章</title>
		<link>http://www.hashei.me/2010/03/inside_java_classloader.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=inside_java_classloader</link>
		<comments>http://www.hashei.me/2010/03/inside_java_classloader.html#comments</comments>
		<pubDate>Wed, 03 Mar 2010 15:30:00 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[class]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[类加载]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2010/02/inside_java_classloader.html</guid>
		<description><![CDATA[之前写过两篇关于java类加载的文章，分别是：《WebSphere的类加载机制和故障排查》，《再谈WebSphere的类加载和故障排查》。今天在IBM网站上看到一篇《深入探讨 Java 类加载器》，分享出来炒炒冷饭。以后遇到问题的时候也能有点方向。
Java 虚拟机默认的行为就已经足够满足大多数情况的需求了。不过如果遇到了需要与类加载器进行交互的情况，而对类加载器的机制又不是很了解的话，就很容易花大量的时间去调试 ClassNotFoundException 和 NoClassDefFoundError 等异常。本文将详细介绍 Java 的类加载器，帮助读者深刻理解 Java 语言中的这个重要概念。

  Copyright &#169; 2008 This feed is for personal, non-commercial use only
聚沙成塔-小哈的记事薄 by hashei 
如果喜欢，欢迎订阅feed.hashei.com
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949]]></description>
			<content:encoded><![CDATA[<p>之前写过两篇关于java类加载的文章，分别是：<a title="JAVA类加载原理和故障排查" href="http://www.hashei.me/2009/05/websphere-class-loader-troubshooting.html" target="_blank">《WebSphere的类加载机制和故障排查》</a>，《<a title="java类加载问题故障排查" href="http://www.hashei.me/2009/06/troubshoot-classloader-problems.html" target="_blank">再谈WebSphere的类加载和故障排查</a>》。今天在IBM网站上看到一篇《<a href="http://www.ibm.com/developerworks/cn/java/j-lo-classloader/index.html?ca=drs-cn-0301" target="_blank">深入探讨 Java 类加载器</a>》，分享出来炒炒冷饭。以后遇到问题的时候也能有点方向。</p>
<blockquote><p>Java 虚拟机默认的行为就已经足够满足大多数情况的需求了。不过如果遇到了需要与类加载器进行交互的情况，而对类加载器的机制又不是很了解的话，就很容易花大量的时间去调试 <code>ClassNotFoundException</code> 和 <code>NoClassDefFoundError</code> 等异常。本文将详细介绍 Java 的类加载器，帮助读者深刻理解 Java 语言中的这个重要概念。</p>
</blockquote>
<hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2010/03/inside_java_classloader.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WebSphere简单故障排查</title>
		<link>http://www.hashei.me/2009/09/basic_websphere_troubleshooting.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=basic_websphere_troubleshooting</link>
		<comments>http://www.hashei.me/2009/09/basic_websphere_troubleshooting.html#comments</comments>
		<pubDate>Fri, 25 Sep 2009 14:59:00 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[ihs]]></category>
		<category><![CDATA[troubleshooting]]></category>
		<category><![CDATA[端口冲突]]></category>
		<category><![CDATA[虚拟主机]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/09/basic_websphere_troubleshooting.html</guid>
		<description><![CDATA[工作中经常遇到这样那样的或有迹可循、或“灵异”的情况：WebSphere在某次停止后无法启动了，部署在集群上的应用无法通过IHS访问，应用更新后重启服务器发送回滚……出现问题当然都可以联系专门的中间件管理员来解决，但等管理员赶到现场，也许时间已过去半天，问题也许很简单，几分钟就能解决，所以如果你会一些基本的排查技巧和诊断方法，那么这些小问题就可以自己迎刃而解了。
下面我就介绍几种常见的简单错误，希望对于现场人员能有所帮助：
应用无法访问
下面是一张常见的由IBM HTTP SERVER（IHS）转发到后端AppCluster上的拓扑结构：
 
应用无法访问，问题可以出现在HTTP Server上，或者App Server上，更可能发生在数据库上，所以第一步需要缩小范围，确定问题发生的点。
我在这里假设IHS的应用地址为http://192.168.1.51/yingyong
DMGR的访问地址是http://192.168.1.51:9060/admin
APP SERVER的应用地址为http://192.168.2.50:9080/yingyong和 http://192.168.2.51:9080/yingyong
&#160;
1. 找不到服务器或404错误
访问http://192.168.1.51，确定IHS是否正常，如果页面无法显示，那么去“服务”中尝试重启“IBM HTTP SERVER V6.x”。服务启动失败的话，“服务”只会提示你一句服务无法启动或者启动后又因为致命错误停止。所以你要到IBM\HTTPServer\bin目录下运行apache –k start或者httpd –k start，失败的话会有详细信息供参考。一般是端口被占用或者config目录下的httpd.conf格式出错（它会提示你出错的行数）。
如果IHS访问完好，那么尝试分别访问http://192.168.2.50（51）:9080/yingyong，如果访问正常，那么是IHS转发失败。
 
可以在管理控制台http://192.168.1.51:9060/admin中的“服务器”——“Web服务器”中勾选相应的webserver，“生成插件”并且“传播插件”。
&#160;
&#160;
很多IHS转发失败是因为应用发布过程中没有选则发布到webserver上，或在传播插件的过程中，由于目录访问控制等原因传播失败。你可以在“应用程序”中找到自己的应用，点击“管理模块”，确定是否正确的发布到app server上和webserver上了，注意首先在第一个框中选择要发布到集群和服务器，然后勾选模块前的勾，最后一定要点“应用”，而不是直接确定。
 
转发失败的原因很多，不过最快的解决方法是手动复制文件。生成插件后控制台会提示文件生成的位置，直接拿到然后复制到传播插件失败的位置就可以了。
不过我也遇到过很蹊跷的情况，明明部署正确，传播正确，确依旧无法访问。这时候你要看一下生成的plugin-cfg.xml文件
&#60;UriGroup Name=&#34;default_host_server1_xzh-hasheiNode01_Cluster_URIs&#34;&#62;    &#160;&#160;&#160;&#160;&#160; &#60;Uri AffinityCookie=&#34;JSESSIONID&#34; AffinityURLIdentifier=&#34;jsessionid&#34; Name=&#34;/snoop/*&#34;/&#62;     &#160;&#160;&#160;&#160;&#160; &#60;Uri AffinityCookie=&#34;JSESSIONID&#34; AffinityURLIdentifier=&#34;jsessionid&#34; Name=&#34;/hello&#34;/&#62;     &#160;&#160;&#160;&#160;&#160; &#60;Uri AffinityCookie=&#34;JSESSIONID&#34; AffinityURLIdentifier=&#34;jsessionid&#34; Name=&#34;/hitcount&#34;/&#62;     
&#160;&#160;&#160;&#160;&#160;&#160; 是否有你的应用url那行存在，不存在的话手动添加一下即可，不过记得下次生成插件后注意再修改。
&#160;&#160;&#160;&#160;&#160;&#160; 最后要确定app server是否已经启动，是否遇到错误退出了，这点在下面一部分细说。
2. 505 [...]]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt">工作中经常遇到这样那样的或有迹可循、或“灵异”的情况：WebSphere在某次停止后无法启动了，部署在集群上的应用无法通过IHS访问，应用更新后重启服务器发送回滚……出现问题当然都可以联系专门的中间件管理员来解决，但等管理员赶到现场，也许时间已过去半天，问题也许很简单，几分钟就能解决，所以如果你会一些基本的排查技巧和诊断方法，那么这些小问题就可以自己迎刃而解了。</p>
<p style="text-indent: 24pt">下面我就介绍几种常见的简单错误，希望对于现场人员能有所帮助：</p>
<h3>应用无法访问</h3>
<p style="text-indent: 24pt">下面是一张常见的由IBM HTTP SERVER（IHS）转发到后端AppCluster上的拓扑结构：</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/09/ndtopo.jpg"><img style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; margin-left: 0px; margin-right: 0px; border-right-width: 0px" height="166" alt="nd topo" src="http://hashei.me/wp-content/uploads/2009/09/ndtopo_thumb.jpg" width="255" align="left" border="0" /></a> </p>
<p style="text-indent: 24pt">应用无法访问，问题可以出现在HTTP Server上，或者App Server上，更可能发生在数据库上，所以第一步需要缩小范围，确定问题发生的点。</p>
<p style="text-indent: 24pt">我在这里假设IHS的应用地址为http://192.168.1.51/yingyong</p>
<p style="text-indent: 24pt">DMGR的访问地址是http://192.168.1.51:9060/admin</p>
<p style="text-indent: 24pt">APP SERVER的应用地址为http://192.168.2.50:9080/yingyong和 http://192.168.2.51:9080/yingyong</p>
<p style="text-indent: 24pt">&#160;</p>
<h4>1. 找不到服务器或404错误</h4>
<p style="text-indent: 24pt">访问<a href="http://192.168.1.51">http://192.168.1.51</a>，确定IHS是否正常，如果页面无法显示，那么去“服务”中尝试重启“IBM HTTP SERVER V6.x”。服务启动失败的话，“服务”只会提示你一句服务无法启动或者启动后又因为致命错误停止。所以你要到IBM\HTTPServer\bin目录下运行apache –k start或者httpd –k start，失败的话会有详细信息供参考。一般是端口被占用或者config目录下的httpd.conf格式出错（它会提示你出错的行数）。</p>
<p style="text-indent: 24pt">如果IHS访问完好，那么尝试分别访问http://192.168.2.50（51）:9080/yingyong，如果访问正常，那么是IHS转发失败。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/09/ihs.jpg"><img style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; margin-left: 0px; margin-right: 0px; border-right-width: 0px" height="88" alt="ihs转发" src="http://hashei.me/wp-content/uploads/2009/09/ihs_thumb.jpg" width="247" align="left" border="0" /></a> </p>
<p style="text-indent: 24pt">可以在管理控制台http://192.168.1.51:9060/admin中的“服务器”——“Web服务器”中勾选相应的webserver，“生成插件”并且“传播插件”。</p>
<p style="text-indent: 24pt">&#160;</p>
<p style="text-indent: 24pt">&#160;</p>
<p style="text-indent: 24pt">很多IHS转发失败是因为应用发布过程中没有选则发布到webserver上，或在传播插件的过程中，由于目录访问控制等原因传播失败。你可以在“应用程序”中找到自己的应用，点击“管理模块”，确定是否正确的发布到app server上和webserver上了，注意首先在第一个框中选择要发布到集群和服务器，然后勾选模块前的勾，最后一定要点“<strong>应用</strong>”，而不是直接确定。</p>
<p><a href="http://hashei.me/wp-content/uploads/2009/09/applicationdeployment.jpg"><img style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" height="276" alt="application deployment" src="http://hashei.me/wp-content/uploads/2009/09/applicationdeployment_thumb.jpg" width="504" border="0" /></a> </p>
<p style="text-indent: 24pt">转发失败的原因很多，不过最快的解决方法是手动复制文件。生成插件后控制台会提示文件生成的位置，直接拿到然后复制到传播插件失败的位置就可以了。</p>
<p style="text-indent: 24pt">不过我也遇到过很蹊跷的情况，明明部署正确，传播正确，确依旧无法访问。这时候你要看一下生成的plugin-cfg.xml文件</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">&lt;UriGroup Name=&quot;default_host_server1_xzh-hasheiNode01_Cluster_URIs&quot;&gt;    <br />&#160;&#160;&#160;&#160;&#160; &lt;Uri AffinityCookie=&quot;JSESSIONID&quot; AffinityURLIdentifier=&quot;jsessionid&quot; Name=&quot;/snoop/*&quot;/&gt;     <br />&#160;&#160;&#160;&#160;&#160; &lt;Uri AffinityCookie=&quot;JSESSIONID&quot; AffinityURLIdentifier=&quot;jsessionid&quot; Name=&quot;/hello&quot;/&gt;     <br />&#160;&#160;&#160;&#160;&#160; &lt;Uri AffinityCookie=&quot;JSESSIONID&quot; AffinityURLIdentifier=&quot;jsessionid&quot; Name=&quot;/hitcount&quot;/&gt;     </div>
<p>&#160;&#160;&#160;&#160;&#160;&#160; 是否有你的应用url那行存在，不存在的话手动添加一下即可，不过记得下次生成插件后注意再修改。</p>
<p>&#160;&#160;&#160;&#160;&#160;&#160; 最后要确定app server是否已经启动，是否遇到错误退出了，这点在下面一部分细说。</p>
<h4>2. 505 Internal Error</h4>
<p style="text-indent: 24pt">505内部错误有三种情况，一是程序出错，不是本文讨论的重点。二是AppServer或应用程序没有正常启动，三是数据库连接失败。</p>
<p style="text-indent: 24pt"><strong>AppServer是否运行</strong>可以通过访问管理控制台，查看JAVA进程确定。在profiles\AppSrv01\logs\server1目录下会有一个pid文件，此文件记录的PID号即为进程号。Windows下在“任务管理器”点击“查看”—“选择列”，勾选PID-进程标识符即可显示。Unix/linux下运行ps –ef | grep PID或者ps –ef | grep java，查看该app的进程和所有的JAVA进程。注意：在安装DM profile的节点上，一般至少有DM、Node agent、app server三个java进程，注意区分。</p>
<p style="text-indent: 24pt">确定服务器没有运行或者想重启时，在profiles\AppSrv01\bin下运行startServer.sh(bat）即可启动服务器，观察启动状况，直到出现“为电子商务开放服务器 server1”，即为启动成功。如果失败，那就要打开logs下的SystemOut.log，查看最新的日志，查找error信息。</p>
<p style="text-indent: 24pt">一般启动失败无外乎<strong>端口冲突</strong>、<strong>权限不够</strong>。</p>
<h5>端口冲突</h5>
<p style="text-indent: 24pt">端口出错在SystemOut.log中的信息如下：</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">TCPC0003E: TCP 通道 TCP_2 初始化失败。主机 * 和端口 9081 的套接字绑定失败。端口可能已在使用。</div>
<p style="text-indent: 24pt">这时你可以用netstat –an 命令查看监听端口信息，然后用tcpview或者icesword等工具查看占用端口的进程，linux/unix下可以用netstat –an | grep LISTEN（或端口号）直接查看，然后使用lsof -i :端口号或者rmsock来查看占用端口的进程。</p>
<p style="text-indent: 24pt">这时候你也许才恍然想起某个不经意的操作将websphere的端口占用了，怎么办？如果要WebSphere作出让步，那么可以修改profile_path\config\cells\cell_name\nodes\node_name目录中serverindex.xml文件：</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">specialEndpoints xmi:id=&quot;NamedEndPoint_1243228596786&quot; endPointName=&quot;WC_adminhost&quot;&gt;    <br />&lt;endPoint xmi:id=&quot;EndPoint_1243228596786&quot; host=&quot;*&quot; port=&quot;9060&quot;/&gt;     <br />&lt;/specialEndpoints&gt;     <br />&lt;specialEndpoints xmi:id=&quot;NamedEndPoint_1243228596787&quot; endPointName=&quot;WC_defaulthost&quot;&gt;     <br />……     </div>
</p>
<p style="text-indent: 24pt">看到端口号了么？不过要注意WC_adminhost、WC_defaulthost、WC_adminhost_secure、WC_defaulthost_secure，也就是常用的管理端口、应用访问端口和它们各自的SSL端口，被修改后需要到profile_path\config\cells\cell_name再修改virtualhosts.xml文件中的相应端口（添加亦可），否则出现<strong>虚拟主机未定义</strong>的错误可别怪我没提醒。（我遇到过很多说用IHS可以访问，但是直接访问端口出错的情况，原因就是没有添加相应的虚拟主机，在管理控制台——虚拟主机——default host里添加改动后的端口就可以了）。</p>
<h5>权限不足</h5>
<p style="text-indent: 24pt">权限不足一般发生在Unix/Linux下，比较常见的是安装websphere时新建了一个单独的用户和组，但是开发阶段权限管理不严导致开发人员也有root权限，启停没有su到was用户，等到权限回收之后发现无法启动服务了。这时候只要用root权限chown username/groupname 整个安装 目录即可。</p>
<p style="text-indent: 24pt">还有一种情况是修改的端口&lt;1024，在Unix/Linux下只能用root来起了。</p>
<h5>其它情况</h5>
<p style="text-indent: 24pt">还要注意文件系统的情况，见过几次access.log和dump文件把文件系统撑满的。</p>
<h4>应用更新失败</h4>
<p style="text-indent: 24pt">应用更新了，修改的文件直接上传到目录，重启应用程序，测试正常。等等！为何重启app server或者集群下重启dm后又变回修改前了呢？</p>
<p style="text-indent: 24pt">这应该是dm的同步机制在捣鬼，你有没有注意到profiles\AppSrv01\config\cells\cell_name\applications目录下也有你的程序，打开可以看到并不是程序所有的内容都在此，而是web.xml和WEB-INF等重要内容。所以如果你更新的文件在config目录下也存在，那么你需要这里也更新一份。集群环境下还要注意profiles\Dmgr的config目录下还有一份等着你呢。</p>
<h4>3. 确定数据库无故障</h4>
<p style="text-indent: 24pt"> 这个很简单，只要用sqlplus连接数据库正常且能查询即可。</p>
<h4>4. 日志文件很重要</h4>
<p style="text-indent: 24pt">日志文件是排查的依赖。我见过不少项目，因为处于试运行修改阶段，log4j中输出日志信息极多，每条sql语句都丝毫不差的打出来，导致1m大小的SystemOut.log文件十几分钟就写满，10个SystemOut.log存档也顶不过几小时的日志量（单个文件1～2M，总共10～20个存档是一般设置），等我赶到时案发现场已经荡然无存。（这种情况一般是重启能暂时解决问题，但是故障原因没有找到）</p>
<p style="text-indent: 24pt">所以即时保存当时日志是很重要的，logs\server1下的SystemOut.log、SystemErr.log一定要保存一份，并记下故障发生的时间。</p>
<p style="text-indent: 24pt">WebSphere不像Weblogic，可以在console窗口后一直看到运行的日志，在unix/linux下，你可以用tail –f SystemOut.log来达到这个效果，windows下也有一个tail工具，后跟文件名运行就可以了。</p>
<p>
<div class="wlWriterEditableSmartContent" style="padding-right: 0px; display: inline; padding-left: 0px; float: none; padding-bottom: 0px; margin: 0px; padding-top: 0px">
<p>tail tool <a href="http://hashei.me/wp-content/uploads/2009/09/tail.exe" target="_blank">tail tool</a></p>
</div>
<h3>结束语</h3>
<p style="text-indent: 24pt">暂时能想到的简单排错就这些，这些都比较容易被开发人员遇到，所以还是很有必要了解一下的。</p>
<hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/09/basic_websphere_troubleshooting.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Weblogic10.3.0在AIX6.1、JDK1.6下挂起解决方法</title>
		<link>http://www.hashei.me/2009/08/cr370915_in_weblogic10-3_and_jdk1-6.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=cr370915_in_weblogic10-3_and_jdk1-6</link>
		<comments>http://www.hashei.me/2009/08/cr370915_in_weblogic10-3_and_jdk1-6.html#comments</comments>
		<pubDate>Tue, 25 Aug 2009 05:14:57 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[weblogic]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[bug]]></category>
		<category><![CDATA[CR370915]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/08/cr370915_in_weblogic10-3_and_jdk1-6.html</guid>
		<description><![CDATA[上周在AIX6.1下安装weblogic10.3.0，并配置了hacmp集群环境，但是接下来的几天遇到了挂起问题，为此还加班了一天。
现象描述：
Weblogic启动后，10到30分钟就会hang住，应用和管理控制台都无法访问。强制kill -9 pid后端口无法释放，使用rmsock 命令查看端口显示Wait for exiting processes to be cleaned up before removing the socket。
分析及处理过程
1. 用ps –ef &#124; grep java找到weblogic进程，每隔三分种执行kill -3 pid，在domain目录下生成javacore文件
2. 分析weblogic日志，发现如下内容
&#60;Aug 21, 2009 4:33:37 AM CDT&#62; &#60;Error&#62; &#60;WebLogicServer&#62; &#60;BEA-000337&#62; &#60;[STUCK] ExecuteThread: &#8216;1&#8242; for queue: &#8216;weblogic.kernel.Default (self-tuning)&#8217; has been busy for &#8220;620&#8243; seconds working on the request
&#8220;weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de&#8221;, which is more than the configured time [...]]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt;">上周在AIX6.1下安装weblogic10.3.0，并配置了hacmp集群环境，但是接下来的几天遇到了挂起问题，为此还加班了一天。</p>
<h4>现象描述：</h4>
<p style="text-indent: 24pt;">Weblogic启动后，10到30分钟就会hang住，应用和管理控制台都无法访问。强制kill -9 pid后端口无法释放，使用rmsock 命令查看端口显示Wait for exiting processes to be cleaned up before removing the socket。</p>
<h4>分析及处理过程</h4>
<p>1. 用ps –ef | grep java找到weblogic进程，每隔三分种执行kill -3 pid，在domain目录下生成javacore文件</p>
<p>2. 分析weblogic日志，发现如下内容</p>
<blockquote><p>&lt;Aug 21, 2009 4:33:37 AM CDT&gt; &lt;Error&gt; &lt;WebLogicServer&gt; &lt;BEA-000337&gt; &lt;[STUCK] ExecuteThread: &#8216;1&#8242; for queue: &#8216;weblogic.kernel.Default (self-tuning)&#8217; <strong><span style="color: #ff0000;">has been busy for &#8220;620&#8243; seconds working on the request</span></strong></p>
<p>&#8220;weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de&#8221;, which is more than the configured time (StuckThreadMaxTime) of &#8220;600&#8243; seconds. Stack trace:</p>
<p>java.net.SocketOutputStream.<strong><span style="color: #ff0000;">socketWrite0</span></strong>(Native Method)</p>
<p>java.net.SocketOutputStream.<strong><span style="color: #ff0000;">socketWrite</span></strong>(SocketOutputStream.java:103)</p>
<p>……</p></blockquote>
<blockquote><p>&lt;Aug 21, 2009 4:34:37 AM CDT&gt; &lt;Error&gt; &lt;WebLogicServer&gt; &lt;BEA-000337&gt; &lt;[STUCK] ExecuteThread: &#8216;1&#8242; for queue: &#8216;weblogic.kernel.Default (self-tuning)&#8217; <strong><span style="color: #ff0000;">has been busy for &#8220;680&#8243; seconds working on the request</span></strong></p>
<p>&#8220;weblogic.work.SelfTuningWorkManagerImpl$WorkAdapterImpl@20de20de&#8221;, which is more than the configured time (StuckThreadMaxTime) of &#8220;600&#8243; seconds. Stack trace:</p>
<p>java.net.SocketOutputStream.<strong><span style="color: #ff0000;">socketWrite0</span></strong>(Native Method)</p>
<p>java.net.SocketOutputStream.<strong><span style="color: #ff0000;">socketWrite</span></strong>(SocketOutputStream.java:103)</p>
<p>……</p></blockquote>
<p>3. 用IBM Thread and Monitor Dump Analyzer for java分析刚才生成的thread dump，找到如下两个线程信息：</p>
<blockquote><p>3XMTHREADINFO &#8220;[ACTIVE] ExecuteThread: &#8216;5&#8242; for queue: &#8216;weblogic.kernel.Default (self-tuning)&#8217;&#8221; TID:0&#215;39CBED00, j9thread_t:0&#215;3751C83C, state:R, prio=5</p>
<p>3XMTHREADINFO1 (native thread ID:0xCE1DB, native priority:0&#215;5, native policy:UNKNOWN)</p>
<p>4XESTACKTRACE at <strong><span style="color: #ff0000;">java/net/PlainSocketImpl.socketClose0</span></strong>(Native Method)</p>
<p>4XESTACKTRACE at java/net/PlainSocketImpl.socketPreClose(PlainSocketImpl.java:706)</p>
<p>4XESTACKTRACE at java/net/PlainSocketImpl.close(PlainSocketImpl.java:540)</p>
<p>4XESTACKTRACE at java/net/SocksSocketImpl.close(SocksSocketImpl.java:1041)</p>
<p>4XESTACKTRACE at java/net/Socket.close(Socket.java:1343)</p>
<p>4XESTACKTRACE at weblogic/socket/SocketMuxer.closeSocket(SocketMuxer.java:475)</p>
<p>4XESTACKTRACE at weblogic/socket/SocketMuxer.cancelIo(SocketMuxer.java:813)</p>
<p>4XESTACKTRACE at weblogic/socket/SocketMuxer$TimerListenerImpl.timerExpired(SocketMuxer.java:1021(Compiled Code))</p>
<p>4XESTACKTRACE at weblogic/timers/internal/TimerImpl.run(TimerImpl.java:273(Compiled Code))</p>
<p>4XESTACKTRACE at weblogic/work/SelfTuningWorkManagerImpl$WorkAdapterImpl.run(SelfTuningWorkManagerImpl.java:516(Compiled Code))</p>
<p>4XESTACKTRACE at weblogic/work/ExecuteThread.execute(ExecuteThread.java:201(Compiled Code))</p>
<p>4XESTACKTRACE at weblogic/work/ExecuteThread.run(ExecuteThread.java:173)</p></blockquote>
<blockquote><p>3XMTHREADINFO &#8220;ExecuteThread: &#8216;7&#8242; for queue: &#8216;weblogic.socket.Muxer&#8217;&#8221; TID:0&#215;35381D00, j9thread_t:0&#215;35385864, state:R, prio=5</p>
<p>3XMTHREADINFO1 (native thread ID:0xB916F, native priority:0&#215;5, native policy:UNKNOWN)</p>
<p>4XESTACKTRACE at <strong><span style="color: #ff0000;">weblogic/socket/PosixSocketMuxer.poll</span></strong>(Native Method)</p>
<p>4XESTACKTRACE at weblogic/socket/PosixSocketMuxer.processSockets(PosixSocketMuxer.java:102(Compiled Code))</p>
<p>4XESTACKTRACE at weblogic/socket/SocketReaderRequest.run(SocketReaderRequest.java:29)</p>
<p>4XESTACKTRACE at weblogic/socket/SocketReaderRequest.execute(SocketReaderRequest.java:42)</p>
<p>4XESTACKTRACE at weblogic/kernel/ExecuteThread.execute(ExecuteThread.java:145)</p>
<p>4XESTACKTRACE at weblogic/kernel/ExecuteThread.run(ExecuteThread.java:117)</p></blockquote>
<p>4. 执行线程只有这两个是running状态，一个做CLOSE()，一个做POLL()。别的都是blocked或者wait状态。</p>
<p>5. 经过metalink查询以及和800支持人员确认，这是Weblogic在AIX的JVM上由来已久的bug，从8.1.4就开始在不同版本间出现。原因是IBM的JVM底层socket实现和weblogic配合问题，需要打patch <strong><span style="color: #ff0000;">CR370915_1030GA</span></strong>.jar解决。</p>
<h4>操作过程</h4>
<p>1．在weblogic的启动脚本中，找到CLASSPATH一行</p>
<p>2．在CLASSPATH变量的<strong><span style="color: #ff0000;">第一位</span></strong>添加补丁jar包<br />
Eg: CLASSPATH=&#8221;${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}&#8221;<br />
&#8212;&gt;<br />
CLASSPATH=/路径/CR370915_1030GA.jar:&#8221;${CLASSPATH}${CLASSPATHSEP}${MEDREC_WEBLOGIC_CLASSPATH}&#8221;</p>
<p>3．以上操作仅对这个domain起作用，为了对所有domain起作用，可以添加到common/bin/的目录中的commEnv.sh文件中WEBLOGIC_CLASSPATH=最前面</p>
<h4>总结</h4>
<p style="text-indent: 24pt;">这个bug在weblgoic和IBM的JVM相组合的平台上出现较为普遍，如果出现相关日志信息，基本可以断定需要打CR370915补丁。</p>
<p style="text-indent: 24pt;"><span style="color: #ff0000;">更新：我这里的补丁仅仅 for weblogic 10.3.0.0，其它版本的可以自行用Smart Update下载</span></p>
<p>Patches for WLS 8.x can be found in My Oracle Support. Open the Patches &amp; Updates tab. Search for patch ID 8173442 for the patches for WLS 8.1mp3, 8.1mp4, and 8.1mp5. Search for patch ID 8179792 for the patch for WLS 8.1mp6.</p>
<p>Patches for WLS 9.x and higher can be downloaded from Smart Update using these patch IDs and passcodes:</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
PATCH REPOSITORY INFORMATION<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br />
WLS Version | Patch ID |  Passcode<br />
&#8212;&#8212;&#8212;&#8212;+&#8212;&#8212;&#8212;-+&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
9.2      |  T4DV    |  7C7PYV9B<br />
9.2mp1   |  HZHQ    |  PTUYCCSI<br />
9.2mp2   |  WJD2    |  GU1CW2AB<br />
9.2mp3   |  GNLT    |  8J9L6Q4Y<br />
10.0     |  PMAJ    |  9UQ69LLT<br />
10.0mp1  |  ITVL    |  K8RBHQQ2<br />
10.3     |  9YT5    |  I1DB5QSV</p>
<p>如果生产机无法联网，可以</p>
<div id="_mcePaste">1. Using SmartUpdate in offline mode</div>
<div id="_mcePaste">===========================</div>
<div id="_mcePaste">You can apply the patch using SmartUpdate with the following steps:</div>
<div></div>
<div id="_mcePaste">
<ol>
<li>Download the patch using SmartUpdate on another machine with Internet access.</li>
<li>Copy the files (for example E5W8.jar and WGQJ.jar) and patch-catalog.xml from your machine with Internet access to the offline machine. For example, say you have a test environment running on a Windows box. Your production environment is running on UNIX. You might copy the jar files from %BEA_HOME%\utils\bsu\cache-dir to $BEA_HOME/utils/bsu/cache-dir.</li>
<li><strong>When a machine connects to Smart Update, the catalog of patches is always updated automatically. Thus, when a patch is being copied to an offline machine, the patch-catalog.xml file must also be copied over.</strong></li>
<li>Run SmartUpdate in offline mode and apply patches and patch sets. This can be done using the SmartUpdate command-line interface (see http://download.oracle.com/docs/cd/E14759_01/doc.32/e14143/commands.htm#i1074489).</li>
<li>This is the syntax for the command to install a patch:.</li>
</ol>
</div>
<div id="_mcePaste">/bsu.sh -prod_dir=&lt;weblogic_home&gt; -patchlist=&lt;patchID&gt; -verbose -install</div>
<div id="_mcePaste">For example,</div>
<div id="_mcePaste">./bsu.sh -prod_dir=/opt/bea/weblogic92 -patchlist=E5W8 -verbose -install</div>
<div id="_mcePaste">./bsu.sh -prod_dir=/opt/bea/weblogic92 -patchlist=WGQJ -verbose -install</div>
<div></div>
<div id="_mcePaste">2. Applying the patch to the classpath manually</div>
<div></div>
<div>============================</div>
<ol>
<li>You can apply the patch to the offline system manually by extracting the actual patch and adding it to the classpath on the offline system:Extract the actual patch jar file. If you downloaded the patch using SmartUpdate, it will be in the form &lt;patch_id&gt;.jar (for example: E5W8.jar). Inside this jar file is the actual patch jar file, which will be of the form CR326566_92mp3.jar. Extract the latter file for the following steps.</li>
<li>Add the extracted jar file as the first element of the classpath of the Admin server as well as the managed servers in the domain.</li>
<li>If you are starting servers using the WebLogic startup script, update the classpath in the startup script like this:set CLASSPATH=&lt;PATCH_DIR&gt;\jars\CR326566_92mp3.jar;%CLASSPATH% (Windows)CLASSPATH=&lt;PATCH_DIR&gt;/jars/CR326566_92mp3.jar:$CLASSPATH (UNIX)where PATCH_DIR is the directory on your local machine where you extracted/saved the patch file.</li>
<li>Similarly, if you are starting servers using Node Manager, add the patch jar to the beginning of the Class Path argument in the Server Start tab for the server(s).</li>
</ol>
<p>我一般用第二种，对于单个补丁快捷方便，SmartUpdate可以单独安装，但是会让你选择应用到哪个BEA的主目录，不同的版本和平台能下的补丁不一样。在Windows平台上当然没有AIX的BEA版本，不过只要自己建个目录，然后拷贝一份register.xml进去就可以了。</p>
<hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/08/cr370915_in_weblogic10-3_and_jdk1-6.html/feed</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>一次WebSphere性能问题诊断过程</title>
		<link>http://www.hashei.me/2009/08/websphere-performance-troubshooting-1.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=websphere-performance-troubshooting-1</link>
		<comments>http://www.hashei.me/2009/08/websphere-performance-troubshooting-1.html#comments</comments>
		<pubDate>Mon, 24 Aug 2009 14:50:28 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[性能优化]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[websphere]]></category>
		<category><![CDATA[数据库]]></category>
		<category><![CDATA[索引]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/08/websphere-performance-troubshooting-1.html</guid>
		<description><![CDATA[ 一次接到用户电话，说某个应用在并发量稍大的情况下就会出现响应时间陡然增大，同时管理控制台的响应时间也很慢，几乎无法进行正常工作。
赶到现场后，查看平台版本为Webshpere6.0.2.29，操作系统为Windows 2003企业版sp2，于是首先分析systemout.log，发现有如下报错：
= com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException Max connections reached 869
Exception = com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException
Source = Max connections reached
probeid = 869
同时也发现有“Caused by: java.io.IOException: Async IO operation failed, reason: RC: 10053 您的主机中的软件放弃了一个已建立的连接。”

 很明显是连接池无法分配一个新连接给请求，wait时间过长达到WaitTimeout时间，第一反应是数据库连接池大小开的不够，于是设成初始50，最大150，120S空闲则自动释放连接。
 但是调整参数后没有改善，过了不到10分钟应用依旧变慢。再次查看System.out和FFDC里的错误信息，发现有许多关于IO的报错：
com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream 102
Exception = java.net.SocketTimeoutException
Source = com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream
probeid = 102
stack Dump = java.net.SocketTimeoutException: Async operation timed out
java.io.IOException com.ibm.ws.webcontainer.servlet.RequestUtils.parsePostData 398
Exception = java.io.IOException
Source = com.ibm.ws.webcontainer.servlet.RequestUtils.parsePostData
probeid = 398
Stack Dump = java.io.IOException: Async IO [...]]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt"> 一次接到用户电话，说某个应用在并发量稍大的情况下就会出现响应时间陡然增大，同时管理控制台的响应时间也很慢，几乎无法进行正常工作。</p>
<p style="text-indent: 24pt">赶到现场后，查看平台版本为Webshpere6.0.2.29，操作系统为Windows 2003企业版sp2，于是首先分析systemout.log，发现有如下报错：</p>
<blockquote><p>= com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException Max connections reached 869
<p>Exception = com.ibm.websphere.ce.j2c.ConnectionWaitTimeoutException
<p>Source = Max connections reached
<p>probeid = 869
<p>同时也发现有“Caused by: java.io.IOException: Async IO operation failed, reason: RC: 10053 您的主机中的软件放弃了一个已建立的连接。”</p>
</blockquote>
<p style="text-indent: 24pt"> 很明显是连接池无法分配一个新连接给请求，wait时间过长达到WaitTimeout时间，第一反应是数据库连接池大小开的不够，于是设成初始50，最大150，120S空闲则自动释放连接。</p>
<p style="text-indent: 24pt"> 但是调整参数后没有改善，过了不到10分钟应用依旧变慢。再次查看System.out和FFDC里的错误信息，发现有许多关于IO的报错：</p>
<blockquote><p>com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream 102
<p>Exception = java.net.SocketTimeoutException
<p>Source = com.ibm.ws.webcontainer.channel.WCCByteBufferInputStream
<p>probeid = 102
<p>stack Dump = java.net.SocketTimeoutException: Async operation timed out
<p>java.io.IOException com.ibm.ws.webcontainer.servlet.RequestUtils.parsePostData 398
<p>Exception = java.io.IOException
<p>Source = com.ibm.ws.webcontainer.servlet.RequestUtils.parsePostData
<p>probeid = 398
<p>Stack Dump = java.io.IOException: Async IO operation failed, reason: RC: 55 指定的网络资源或设备不再可用。probeid = 1184</p>
</blockquote>
<p style="text-indent: 24pt"> 事后才知道其实数据库和中间件之间的问题，但是一来没有报连接池大小不够的错，二来此时管理控制台也几乎无法使用，又结合此应用在操作中会上传许多文件并进行校验，怀疑是服务器的I/O瓶颈导致应用变慢。</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 于是在服务器上开启性能工具，添加%Disk time、%Disk Write、%Disk Read、Disk Queue Length、Fage Fault等计数器。发现%Disk Time平均维持在20~70之间，瞬时的Disk Time会达到90多，而且Disk Read值很小，基本是Disk Write。
<p style="text-indent: 24pt"> 继续观察了一段时间，发现当磁盘读写下来后，应用还是很慢，于是分析内存回收情况，查看是否有内存泄漏发生。</p>
<p style="text-indent: 24pt"> 用IBM Monitoring and Diagnostic Tools for Java™ &#8211; Garbage Collection and Memory Visualizer分析发现 Mean interval between collections只有0.46分钟，且内存使用量才250多M就开始GC，而内存参数设置为760~1260M，于是分析内存中的碎片太多，导致GC频繁，使服务的响应速度变慢。同时分析软件得出The mean heap unusable due to fragmentation is estimated at 34.685%，问了应用他们申请内存对象一般是短期的，于是更改GC Policy为Gencon（分代并发），再观察GC日志发现每次回收间隔都较长，而且是young区的回收，Global collections间隔为23分钟。</p>
<p style="text-indent: 24pt"> 可惜做了如此的性能优化，情况一点都未改善，在控制台的性能实时检测中可以看到JDBC连接有40~60个繁忙状态，当时无法确定这是否正常，是否是确实需要用到如此多连接。后来应用开发的检测数据库，发现很多active的连接时间长达5到10分钟，内容为一查询语句。原来应用是在Hibernat下开发的，查询语句被它加了自己的函数，导致原先建的索引无法起作用（应用建立索引的时候犯了低级错误），后来重新建立索引后，查询很快完成，连接池繁忙数量降低到0~5，应用恢复正常。原来是数据库的查询时间过长，导致线程都在等待数据库的返回信息，100个线程很快被用完，无法响应新的服务，因为数据库连接池资源已经开大，所以没有这方面的报错。</p>
<p style="text-indent: 24pt">回顾这一周的排错过程，走了很大的弯路，当时因为经验欠缺没有想到做thread dump，如果做了thread dump的话，应该很容易看到大量的线程在等待数据库的返回，从而定位到数据库问题。</p>
<p style="text-indent: 24pt">从中我们也看到，最终的问题也许很低级，但是排查的过程会很复杂，因为中间件问题牵扯到主机、网络、数据库、应用等各方面。不过得到的经验就是，在应用响应慢的时候，应该做个thread dump观察线程运行情况，而并非要等到hang住，cpu 100%，OutOfMemory的时候才想起分析javacore。</p>
<p style="text-indent: 24pt">
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2010/03/inside_java_classloader.html" rel="bookmark" title="Permanent Link: Java 类加载器的又一篇文章">Java 类加载器的又一篇文章</a></li><li><a href="http://www.hashei.me/2009/07/java-performance-tuning-resources.html" rel="bookmark" title="Permanent Link: Java性能优化参考资料">Java性能优化参考资料</a></li><li><a href="http://www.hashei.me/2010/02/tunning-websphere-application-server-was.html" rel="bookmark" title="Permanent Link: 软硬兼施 优化 WebSphere Application Server">软硬兼施 优化 WebSphere Application Server</a></li><li><a href="http://www.hashei.me/2010/05/linux-system-performance-monitoring.html" rel="bookmark" title="Permanent Link: Linux 性能监控">Linux 性能监控</a></li><li><a href="http://www.hashei.me/2009/04/websphere-introduce.html" rel="bookmark" title="Permanent Link: Websphere系列介绍">Websphere系列介绍</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/08/websphere-performance-troubshooting-1.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>应用程序死锁导致服务器挂起的介绍</title>
		<link>http://www.hashei.me/2009/08/serverhang_application_deadlock.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=serverhang_application_deadlock</link>
		<comments>http://www.hashei.me/2009/08/serverhang_application_deadlock.html#comments</comments>
		<pubDate>Mon, 17 Aug 2009 08:09:00 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[weblogic]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[deadlock]]></category>
		<category><![CDATA[hang]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/08/serverhang_application_deadlock.html</guid>
		<description><![CDATA[原来好东西都躲到Metalink上去了
Problem Description
An inadvertent deadlock in the application code can cause a server to hang. For example, a situation in which thread1 is waiting for resource1 and is holding a lock on resource2, while thread2 needs resource2 and is holding the lock on resource1. Neither thread can progress.
Problem Troubleshooting
This Application Deadlock pattern should be used only [...]]]></description>
			<content:encoded><![CDATA[<h3>原来好东西都躲到Metalink上去了</h3>
<h3>Problem Description</h3>
<p>An inadvertent deadlock in the application code can cause a server to hang. For example, a situation in which thread1 is waiting for resource1 and is holding a lock on resource2, while thread2 needs resource2 and is holding the lock on resource1. Neither thread can progress.</p>
<h3>Problem Troubleshooting</h3>
<p>This Application Deadlock pattern should be used only after doing all the steps in the <b><a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/GenericServerHangPattern.html">Generic Server Hang</a></b> pattern. One indicator that this is an application deadlock problem is that thread dumps will show the threads are in the application methods. Several thread dumps taken a few seconds apart will show that the threads are not progressing. Troubleshooting this problem will involve reviewing the application code. There exists a thread analyzer tool at BEA <a href="http://dev2dev.bea.com/products/wlplatform81/articles/thread_dumps.jsp">dev2dev</a> which has proven useful in analysis of the thread dumps.</p>
<h4>Quick Links</h4>
<ul>
<li><a href="http://www.hashei.me/TEMP/non15BB.htm#Why_does_the_problem_occur">Why does the problem occur?</a></li>
<li><a href="http://www.hashei.me/TEMP/non15BB.htm#Known_WebLogic_Server_Issues">Known WebLogic Server Issues</a></li>
<li><a href="http://www.hashei.me/TEMP/non15BB.htm#External_Resources">External Resources</a></li>
</ul>
<p> <span id="more-698"></span><br />
<h3><a name="Why_does_the_problem_occur"></a>Why does the problem occur?</h3>
<p>Fundamentaly, this problem happens because the design and implementation of the application has introduced the possibility of deadlocks. These types of problems may only show up under heavy load. Therefore, these applications often pass through QA testing and become problems in production.</p>
<p>Coding problems to look for:</p>
<ul>
<li>Unnecessary use of synchronized java classes, e.g., using <code>Hashtable</code> (synchronized) versus the use of <code>HashMap</code>(unsynchronized) </li>
<li>Application has a synchronized method that contains synchronized object method calls. See example below.
<p>import java.util.Vector;&#160;&#160; &lt;&#8211; Vector is a synchronized java class       <br />Public class Employee {        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Vector names = new Vector();        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Employees () {        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; Object object = new Object();        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; synchronized (object) {        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; names.add(&quot;al&quot;);        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; names.add(&quot;Saganich&quot;);        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; }        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; }        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; synchronized String getName (int index) {        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; String name = (String) names.elementAt(index);        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; return name;        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p>
</li>
<li>Using synchronization around long running complex code. </li>
<li>Threads are waiting on resources that will never become available.</li>
</ul>
<h4>Application Design</h4>
<ul>
<li>The application uses up all of the configured number of threads. This can happen when an executing thread reaches a point where it must wait for work done by another thread to complete. The timing may be that the waited for method which this thread is trying to enter is long running. Eventually, all the threads must reach this long running method. After running awhile, the application will find that the threads will be lined up waiting for this long running method. No new work can be introduced because the allocated number of threads is all used up. See example below.
<p>import java&#8230;..;       <br />import java&#8230;..;        <br />public class myAppMethods {        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; public String getName(String name) {        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; String lastname =&#160; getLastName(name);        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; return lastname;        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; }        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; public synchronized String getLastName (String name) {        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; do a DB Lookup&#160;&#160;&#160; &lt;&#8212;&#8212;&#8212;&#8212; takes mucho time to get a last name        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; return lastname;        <br />&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p>
<p>If the database if very slow, the server can appear to be hung because the threads will line up trying to get access to the database and all the available threads could eventually be used up.</p>
</li>
<li>The application running inside a WebLogic Server invokes a service on another WebLogic instance on a remote machine. The remote service invoked on the remote machine makes a call back to the first server. This sets up the opportunity for a deadlock on the first server especially under heavy load. The first server has an execution thread that is tied up waiting for an inbound response. This inbound response will require a thread from the same execute pool as the thread that is waiting to receive the response.
<p>If the first server is faster than the remote server, eventually all the threads in the execute pool will be exhausted by the server making outbound requests with fewer threads available for processing inbound responses. As the load grows, the number of outgoing requests that cannot complete their work grows while they wait for an inbound response to complete. Below is an example of a thread in the <code>waitForDataResponseImpl.java</code> method of.</p>
<p>&quot;ExecuteThread: &#8216;52&#8242; for queue: &#8216;default&#8217;&quot; daemon prio=5 tid=0&#215;4b3e40b0 nid=0&#215;1170 waiting on monitor [0x4c74f000..0x4c74fdbc]&#160; <br />at java.lang.Object.wait(Native Method)        <br />at&#160; <br />weblogic.rjvm.ResponseImpl.waitForData(ResponseImpl.java:72)</p>
</li>
</ul>
<p><a href="http://www.hashei.me/TEMP/non15BB.htm#TOP">Top of Page</a></p>
<h3><a name="Known_WebLogic_Server_Issues"></a>Known WebLogic Server Issues</h3>
<p>WebLogic Server cannot detect deadlocked threads. Some JVM&#8217;s are able to do so. See <a href="http://www.hashei.me/TEMP/non15BB.htm#External_Resources">External Resources</a>. There is a tool available for thread analysis as well as good information about thread dumps on BEA <a href="http://dev2dev.bea.com/products/wlplatform81/articles/thread_dumps.jsp">dev2dev</a>.</p>
<p><a href="http://www.hashei.me/TEMP/non15BB.htm#TOP">Top of Page</a></p>
<h3><a name="External_Resources"></a>External Resources</h3>
<p>If you suspect a deadlock, it is helpful to go to the site of the JVM vendor to learn if there are clues provided for you in their thread dumps. If you are using JDK 1.3.1 you can add <code>- XX:+JavaMonitorInStack Trace</code> to show locking more explicitly/builtin. Here is an example of an HP JVM thread dump in which it is clearly marked that the threads&#8217; state are waiting on monitor, and the monitor is identified. See example below.</p>
<p>&quot;msg 0-941667944865&quot; (TID:0&#215;7b1a5ba0, sys_thread_t:0&#215;2a4290,   <br />&#160; state:<b>Waiting on Monitor</b>,    <br />&#160; thread_t: t@108, stack_base:0&#215;7a76e000, stack_size:0&#215;20000,    <br /> pc: 0xc01ea178, monitor = <b>0&#215;25334</b>) prio=8    <br />&#160; MsgThread.rest(Compiled Code)    <br />&#160; MsgThread.run(Compiled Code)    <br />&quot;msg 3-941667944865&quot; (TID:0&#215;7b1a5a80, sys_thread_t:0&#215;263278,    <br />&#160; state:<b>Waiting on Monitor</b>,    <br />&#160; thread_t: t@105, stack_base:0&#215;7a95d000, stack_size:0&#215;20000,    <br /> pc: 0xc01ea178, monitor = <b>0&#215;25334</b>) prio=8    <br />&#160; MsgThread.rest(Compiled Code)    <br />&#160; MsgThread.run(Compiled Code)    <br />&quot;msg 5-941667944864&quot; (TID:0&#215;7b1a5f08, sys_thread_t:0&#215;2c42d8,    <br />&#160; state:<b>Waiting on </b>Monitor,    <br />&#160; thread_t: t@106, stack_base:0&#215;7aa65000, stack_size:0&#215;20000,    <br /> pc: 0xc01ea178, monitor = <b>0&#215;25334</b>) prio=8    <br />&#160; MsgThread.rest(Compiled Code)    <br />&#160; MsgThread.run(Compiled Code)</p>
<hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/08/serverhang_application_deadlock.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>JDBC引发的服务器hang解决思路</title>
		<link>http://www.hashei.me/2009/08/jdbc_causes_server_hang.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=jdbc_causes_server_hang</link>
		<comments>http://www.hashei.me/2009/08/jdbc_causes_server_hang.html#comments</comments>
		<pubDate>Sun, 16 Aug 2009 03:58:36 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[性能优化]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[hang]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[jdbc]]></category>
		<category><![CDATA[weblogic]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/08/jdbc_causes_server_hang.html</guid>
		<description><![CDATA[这篇也是转自BEA的官方文档，源地址在BEA被Oracle收购后就转到Oracle官网了，所以留为备份。虽然由BEA撰写，但是思路对所有中间件产品和应用开发都有用。]]></description>
			<content:encoded><![CDATA[<p>这篇也是转自BEA的官方文档，源地址在BEA被Oracle收购后就转到Oracle官网了，所以留为备份。</p>
<h4>JDBC Causes Server Hang</h4>
<p> <br />
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><font color="#009900"><b><u>Problem Description</u></b></font><br />A JDBC connection which is used by an application or by WebLogic Server itself will block one WebLogic Server execute thread for the complete duration of the calls that are made via this connection. The JVM will ensure that the CPU is given to runnable threads by its thread scheduling mechanism, while the thread that blocks on a SQL query needs to wait. However, the thread occupied by the JDBC call will be reserved and used for the application until the call returns from the SQL query.</p>
<p>Even a transaction timeout will not kill or timeout any action that is done by the resources that are enlisted in this transaction. The actions will run as long as they take, without interruption. A transaction timeout will set a flag on the transaction that will mark it as rollback only, so that any subsequent request to commit this transaction will fail with a <font size="-1">TimedOutException</font> or <font size="-1">RollbackException</font>. However, as mentioned above, the long running JDBC calls can lead to blocked WebLogic Server execute threads, which can finally lead to a hanging instance, if all threads are blocked and no execute thread remains available for handling incoming requests.</p>
<p>More recent WebLogic Server versions have a health check functionality that regularly checks if a thread does not react for a certain period of time (the default is 600 seconds). If this happens, an error message is printed to your log file similar to following:</td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td style="vertical-align: top; background-color: rgb(204,204,204)"><font size="-1">####&lt;Nov 6, 2004 1:42:30 PM EST&gt; &lt;Warning&gt; &lt;WebLogicServer&gt; &lt;mydomain&gt; &lt;myserver&gt; &lt;CoreHealthMonitor&gt;<br /> &lt;kernel identity&gt; &lt;&gt; <br />&lt;000337&gt; &lt;ExecuteThread: &#8216;64&#8242; for queue: &#8216;default&#8217; has been busy for &#8220;740&#8243; seconds working on the request &#8220;Scheduled Trigger&#8221;, <br />which is more than the configured time (StuckThreadMaxTime) of &#8220;600&#8243; seconds.&gt;</font><font size="-1"><br /></font></td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><a name="1"></a>This does not interrupt the thread, as this is just a notification for the administrator. The only way a stuck thread becomes unstuck again is when the request it is handling finishes. In this case, you will find a message similar to following in your WebLogic Server&#8217;s log file:</td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td style="vertical-align: top; background-color: rgb(204,204,204)"><font size="-1">####&lt;Nov 7, 2004 4:17:34 PM EST&gt; &lt;Info&gt; &lt;WebLogicServer&gt;&lt;mydomain&gt; &lt;myserver&gt; &lt;ExecuteThread: &#8216;66&#8242;<br /> for queue: &#8216;default&#8217;&gt;<br />&lt;kernel identity&gt; &lt;&gt; &lt;000339&gt; &lt;ExecuteThread: &#8216;66&#8242; for queue: &#8216;default&#8217; has become &#8220;unstuck&#8221;.&gt;</font><font size="-1"><br /></font></td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">The time interval for the health check functionality is configurable. Please check <font size="-1">StuckThreadMaxTime</font> property in the <span style="font-style: italic">&lt;Server&gt;</span> tag of your <font size="-1">config.xml</font> file: <a href="http://e-docs.bea.com/wls/docs81/config_xml/Server.html#StuckThreadMaxTime">http://e-docs.bea.com/wls/docs81/config_xml/Server.html#StuckThreadMaxTime</a> or the &#8220;Detecting stuck threads&#8221; section in the WebLogic Server administration console help: <a href="http://e-docs.bea.com/wls/docs81/perform/WLSTuning.html#stuckthread">http://e-docs.bea.com/wls/docs81/perform/WLSTuning.html#stuckthread</a>.</p>
<p><font size="-1"><a href="#TOP">Top of Page</a></font></p>
<p><font color="#009900"><u><b><a name="Problem_Troubleshooting"></a>Problem Troubleshooting</b></u></font><br />Different programming techniques or JDBC connection pool configurations can lead to deadlocks or long running JDBC calls that lead to hanging WebLogic Server instances. General information about how to troubleshoot and analyze a hanging WebLogic Server instance is provided in <a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/Generic_Server_Hang_Pattern.html">Generic Server Hang Pattern</a>.</p>
<p>This pattern addresses JDBC calls causing a server hang and other well known JDBC-related causes for common problems leading to hanging WebLogic Server instance.&nbsp; Other Support Patterns referenced in this pattern are at the <a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/wls_support_patterns.jsp">WebLogic Server Support Patterns Site</a>.</p>
<p><span style="font-weight: bold; text-decoration: underline">Quick Links</span><font color="#009900"><u><b><br /></b></u></font>
<ul>
<li><a href="#Why_does_the_problem_occur"><span style="color: rgb(0,0,238); text-decoration: underline">Why does the problem occur?</span></a>
<li><a href="#Analysis_of_a_hanging_WebLogic_Server"><span style="color: rgb(0,0,238); text-decoration: underline">Analysis of a hanging WebLogic Server instance</span></a>
<li><a href="#Tips_and_Tricks_to_optimize_your_JDBC"><span style="color: rgb(0,0,238); text-decoration: underline">Tips and Tricks to optimize your JDBC code and JDBC connection pool configuration</span></a> </li>
</ul>
<p><a name="Why_does_the_problem_occur"></a><span style="font-weight: bold; text-decoration: underline">Why does the problem occur?</span><br />The following are some different possible reasons that can cause JDBC calls to lead to a hanging WebLogic Server instance:<br /> 
<ul>
<li>Use of <a href="#DriverManager.getConnection">DriverManager.getConnection()</a> in your JDBC code.
<li><a href="#Long_Running_SQL_Queries">SQL Queries</a> issued to the database take unexpectedly long time to return.
<li><a href="#Hanging_Database">Database</a> for which the JDBC connection pool is configured hangs and does not return from calls in a timely manner.
<li>A slow or overloaded <a href="#Slow_Network">network</a> causes database calls to slow down or hang.<br /> 
<li>A <a href="#Deadlock">deadlock</a> causes all execute threads to hang and wait forever.
<li><a href="#RefreshMinutes">RefreshMinutes or TestFrequencySeconds</a> property in the JDBC connection pool causes hang periods in WebLogic Server.
<li><a href="#Pool_Shrinking">JDBC connection pool shrinking</a> and re-creation of database connections causes long response times. </li>
</ul>
<p><font size="-1"><a href="#TOP">Top of Page</a></font></p>
<p><a name="DriverManager.getConnection"></a><span style="font-weight: bold">Synchronized DriverManager.getConnection()</span><br />Older JDBC application code sometimes uses <font size="-1">DriverManager.getConnection()</font> calls to retrieve a database connection using a certain driver. This technique is not recommended as it can cause deadlocks or at least relatively low performance for your connection requests. The reason behind this is, that all DriverManager calls are class-synchronized, meaning that one DriverManager call in one thread will block all other DriverManager calls in any other thread inside one WebLogic Server instance.</p>
<p>In addition to that, the constructor for a <font size="-1">SQLException</font> makes a DriverManager call, and most drivers have <font size="-1">DriverManager.println() </font>calls for logging, so any of these can block all other threads that issue a DriverManager call.</p>
<p><font size="-1">DriverManager.getConnection()</font> can take a relatively long time until it returns with the physical connection created to the database. Even if no deadlock occurs, all other calls need to wait until that one thread gets its connection. This is not a best practice in a multi-threaded system like WebLogic Server.</td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">This information is taken from <a href="http://forums.bea.com/bea//thread.jspa?forumID=2022&amp;threadID=200063365&amp;messageID=202311284&amp;start=-1#202311284">http://forums.bea.com/bea//thread.jspa?forumID=2022&amp;threadID=200063365&amp;messageID=202311284&amp;start=-1#202311284</a>. </td>
</tr>
</tbody>
</table>
<table style="width: 600px; height: 252px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">Also our documentation clearly states that <font size="-1">DriverManager.getConnection()</font> should not be used: <a href="http://e-docs.bea.com/wls/docs81/faq/jdbc.html#501044">http://e-docs.bea.com/wls/docs81/faq/jdbc.html#501044</a>.</p>
<p><a name="2"></a>If you prefer to use JDBC connections in your JDBC code, you should use a WebLogic Server JDBC connection pool, define a DataSource for it, and get the connection from the DataSource. This will give you all advantages from a pool (resource sharing, connection reuse, connection refresh if a database was down, etc). It also will help you avoid the deadlocks that may happen with DriverManager calls. See detailed information on how to use JDBC connection pools, DataSources, and other JDBC objects in WebLogic Server at: <a href="http://e-docs.bea.com/wls/docs81/jdbc/intro.html#1036718">http://e-docs.bea.com/wls/docs81/jdbc/intro.html#1036718</a> and <a href="http://e-docs.bea.com/wls/docs81/jdbc/programming.html#1054307">http://e-docs.bea.com/wls/docs81/jdbc/programming.html#1054307</a>.</p>
<p>A typical thread blocked in a <font size="-1">DriverManager.getConnection()</font> call looks like:</td>
</tr>
</tbody>
</table>
<table style="width: 600px" cellspacing="2" cellpadding="2" width="611" border="1">
<tbody>
<tr>
<td style="vertical-align: top; background-color: rgb(204,204,204)" width="605"><font size="-1">&#8220;ExecuteThread-39&#8243; daemon prio=5 tid=0&#215;401660 nid=0&#215;33 waiting for monitor entry [0xd247f000..0xd247fc68]<br />&nbsp; at java.sql.DriverManager.getConnection(DriverManager.java:188)<br />&nbsp; at com.bla.updateDataInDatabase(MyClass.java:296)<br />&nbsp; at javax.servlet.http.HttpServlet.service(HttpServlet.java:865)<br />&nbsp; at weblogic.servlet.internal.ServletStubImpl.invokeServlet<br />(ServletStubImpl.java:120)<br />&nbsp; at weblogic.servlet.internal.ServletContextImpl.invokeServlet<br />(ServletContextImpl.java:945)<br />&nbsp; at weblogic.servlet.internal.ServletContextImpl.invokeServlet<br />(ServletContextImpl.java:909)<br />&nbsp; at weblogic.servlet.internal.ServletContextManager.invokeServlet<br />(ServletContextManager.java:269)<br />&nbsp; at weblogic.socket.MuxableSocketHTTP.invokeServlet(MuxableSocketHTTP.java:392)<br />&nbsp; at weblogic.socket.MuxableSocketHTTP.execute(MuxableSocketHTTP.java:274)<br />&nbsp; at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:130)</font></p>
</td>
</tr>
</tbody>
</table>
<p><font size="-1"><a href="#TOP">Top of Page</a></font></p>
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><a name="Long_Running_SQL_Queries"></a><span style="font-weight: bold">Long Running SQL Queries </span><br />Long running SQL queries block execute threads for their duration and until they return their result to the calling application. This means that a WebLogic Server instance needs to be configured to be able to handle enough calls simultaneously as they are requested by the application load. Limiting factors here are the number of execute threads and the number of connections in the JDBC connection pools. A general rule of thumb is to set the number of connections in the pool equally to the number of execute threads to enable optimal resource utilization. If JTS is used, some more connections in the pools should be available because connections may be reserved for transactions that are actually not active.</p>
<p>A thread hanging in a long running SQL call will show a very similar stack in a thread dump as the one for a <a href="#Hanging_Database">hanging database</a>. Please compare the next section for details.</p>
<p><a name="Hanging_Database"></a><span style="font-weight: bold">Hanging Database</span> <br />Good database performance is key for the performance of an application that relies on this database. Consequently, a hanging database can block many or all available execute threads in a WebLogic Server instance and finally lead to a hanging server. To diagnose this, you should take 5 to 10 thread dumps from your hanging WebLogic Server instance and check your execute threads (in the default queue or your application thread queue) to see if they are currently in SQL calls and waiting for a result from the database. A typical stack trace for a thread that currently issues a sql query could look similar to following example:</td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td style="vertical-align: top; background-color: rgb(204,204,204)" width="719"><font size="-1">&#8220;ExecuteThread: &#8216;4&#8242; for queue: &#8216;weblogic.kernel.Default&#8217;&#8221; daemon prio=5 tid=0&#215;8e93c8 nid=0&#215;19 runnable [e137f000..e13819bc]<br />&nbsp; at java.net.SocketInputStream.socketRead0(Native Method)<br />&nbsp; at java.net.SocketInputStream.read(SocketInputStream.java:129)<br />&nbsp; at oracle.net.ns.Packet.receive(Unknown Source)<br />&nbsp; at oracle.net.ns.DataPacket.receive(Unknown Source)<br />&nbsp; at oracle.net.ns.NetInputStream.getNextPacket(Unknown Source)<br />&nbsp; at oracle.net.ns.NetInputStream.read(Unknown Source)<br />&nbsp; at oracle.net.ns.NetInputStream.read(Unknown Source)<br />&nbsp; at oracle.net.ns.NetInputStream.read(Unknown Source)<br />&nbsp; at oracle.jdbc.ttc7.MAREngine.unmarshalUB1(MAREngine.java:931)<br />&nbsp; at oracle.jdbc.ttc7.MAREngine.unmarshalSB1(MAREngine.java:893)<br />&nbsp; at oracle.jdbc.ttc7.Oall7.receive(Oall7.java:375)<br />&nbsp; at oracle.jdbc.ttc7.TTC7Protocol.doOall7(TTC7Protocol.java:1983)<br />&nbsp; at oracle.jdbc.ttc7.TTC7Protocol.fetch(TTC7Protocol.java:1250)<br />&nbsp; &#8211; locked &lt;e8c68f00&gt; (a oracle.jdbc.ttc7.TTC7Protocol)<br />&nbsp; at oracle.jdbc.driver.OracleStatement.doExecuteQuery(OracleStatement.java:2529)<br />&nbsp; at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout<br />(OracleStatement.java:2857)<br />&nbsp; at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate(OraclePreparedStatement.java:608)<br />&nbsp; &#8211; locked &lt;e5cc44d0&gt; (a oracle.jdbc.driver.OraclePreparedStatement)<br />&nbsp; &#8211; locked &lt;e8c544c8&gt; (a oracle.jdbc.driver.OracleConnection)<br />&nbsp; at oracle.jdbc.driver.OraclePreparedStatement.executeQuery<br />(OraclePreparedStatement.java:536)<br />&nbsp; &#8211; locked &lt;e5cc44d0&gt; (a oracle.jdbc.driver.OraclePreparedStatement)<br />&nbsp; &#8211; locked &lt;e8c544c8&gt; (a oracle.jdbc.driver.OracleConnection)<br />&nbsp; at weblogic.jdbc.wrapper.PreparedStatement.executeQuery(PreparedStatement.java:80)<br />&nbsp; at myPackage.query.getAnalysis(MyClass.java:94)<br />&nbsp; at jsp_servlet._jsp._jspService(__jspService.java:242)<br />&nbsp; at weblogic.servlet.jsp.JspBase.service(JspBase.java:33)<br />&nbsp; at weblogic.servlet.internal.ServletStubImpl$<br />ServletInvocationAction.run(ServletStubImpl.java:971)<br />&nbsp; at weblogic.servlet.internal.ServletStubImpl.invokeServlet<br />(ServletStubImpl.java:402)<br />&nbsp; at weblogic.servlet.internal.ServletStubImpl.invokeServlet<br />(ServletStubImpl.java:305)<br />&nbsp; at weblogic.servlet.internal.RequestDispatcherImpl.include<br />(RequestDispatcherImpl.java:607)<br />&nbsp; at weblogic.servlet.internal.RequestDispatcherImpl.include<br />(RequestDispatcherImpl.java:400)<br />&nbsp; at weblogic.servlet.jsp.PageContextImpl.include(PageContextImpl.java:154)<br />&nbsp; at jsp_servlet._jsp.__mf1924jq._jspService(__mf1924jq.java:563)<br />&nbsp; at weblogic.servlet.jsp.JspBase.service(JspBase.java:33)<br />&nbsp; at weblogic.servlet.internal.ServletStubImpl$<br />ServletInvocationAction.run(ServletStubImpl.java:971)<br />&nbsp; at weblogic.servlet.internal.ServletStubImpl.invokeServlet<br />(ServletStubImpl.java:402)<br />&nbsp; at weblogic.servlet.internal.ServletStubImpl.invokeServlet<br />(ServletStubImpl.java:305)<br />&nbsp; at weblogic.servlet.internal.WebAppServletContext$<br />ServletInvocationAction.run(WebAppServletContext.java:6350)<br />&nbsp; at weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:317)<br />&nbsp; at weblogic.security.service.SecurityManager.runAs(SecurityManager.java:118)<br />&nbsp; at weblogic.servlet.internal.WebAppServletContext.invokeServlet<br />(WebAppServletContext.java:3635)<br />&nbsp; at weblogic.servlet.internal.ServletRequestImpl.execute(ServletRequestImpl.java:2585)<br />&nbsp; at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)<br />&nbsp; at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)</font></td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">The thread will be in running state. You should compare the threads in your different thread dumps in order to see if they receive the return from the SQL call in a timely manner or if they hang in this same call for a longer period of time. If the thread dumps seem to imply long response times from SQL calls, the corresponding database logs should be checked to see if problems in the database cause this slow performance or hang situation.</p>
<p><font size="-1"><a href="#TOP">Top of Page</a></font></p>
<p><a name="Slow_Network"></a><span style="font-weight: bold">Slow Network</span> <br />Communication between WebLogic Server and the database relies on a well-performing and reliable network in order to serve the requests in a timely manner. Slow network performance can therefore lead to hanging or blocking execute threads waiting for results of SQL queries. The related stack traces will look similar to example above in <a href="#Hanging_Database">Hanging Database</a> section. It is not possible to find the root cause of the hanging or slow SQL queries by solely analyzing the WebLogic Server thread dumps. These give the first hint that something is wrong with the performance of the SQL calls. The next step is to check if there is a database or network problem that causes poorly performing SQL calls.</p>
<p><a name="Deadlock"></a><span style="font-weight: bold">Deadlock</span> <br />Both an application level deadlock as well as a deadlock on the database level can lead to hanging threads. You should check your thread dumps to see if there is an application level deadlock. Information on how to do this is provided in <a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/ServerHang_Application_Deadlock_Pattern.html">Server Hang &#8211; Application Deadlock Pattern</a>. A database deadlock can be detected either in the database log or by the SQL Exception that can be found in the WebLogic Server log file. An example for a related SQL Exception is:</td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px" cellspacing="2" cellpadding="2" border="1">
<tbody>
<tr>
<td style="vertical-align: top; background-color: rgb(204,204,204)"><font size="-1">java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource<br />&nbsp; at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:170)<br />&nbsp; at oracle.jdbc.oci8.OCIDBAccess.check_error(OCIDBAccess.java:1614)<br />&nbsp; at oracle.jdbc.oci8.OCIDBAccess.executeFetch(OCIDBAccess.java:1225)<br />&nbsp; at oracle.jdbc.oci8.OCIDBAccess.parseExecuteFetch(OCIDBAccess.java:1338)<br />&nbsp; at oracle.jdbc.driver.OracleStatement.executeNonQuery(OracleStatement.java:1722)<br />&nbsp; at oracle.jdbc.driver.OracleStatement.doExecuteOther(OracleStatement.java:1647)<br />&nbsp; at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:2167)<br />&nbsp; at oracle.jdbc.driver.OraclePreparedStatement.executeUpdate<br />(OraclePreparedStatement.java:404)</font><font size="-1"><br /></font></td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">As it generally can take some time until a database detects a deadlock and resolves it by rolling back one or more transactions that cause the deadlock, one or more execute threads will be blocked until the rollback has finished.</p>
<p><a name="RefreshMinutes"></a><span style="font-weight: bold">RefreshMinutes or TestFrequencySeconds</span><br />If you see recurring periods of low database performance, slow SQL calls, or connection peaks, the setting of the <font size="-1">RefreshMinutes</font> or <font size="-1">TestFrequencySeconds</font><span style="font-style: italic"> </span>configuration property in your JDBC connection pools could be the reason. This is described in detail in <a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/Investigating_JDBC_Problems_Pattern.html">Investigating JDBC Problems Pattern</a>. Unless you do not have a firewall between your WebLogic Server instance and your database, you should disable this functionality.</p>
<p><a name="Pool_Shrinking"></a><span style="font-weight: bold">Pool Shrinking</span> <br />Physical connections to a database are resources that should be opened once and kept open as long as possible, as a new connection request is a considerable resource overhead for the database, the operating system kernel, and the WebLogic Server. Consequently, pool shrinking should be disabled on production systems in order to keep this overhead at a minimum. If pool shrinking is enabled, idle pool connections will be closed and reopened once connection requests to the pool cannot be satisfied.</p>
<p>As these activities can take some time, the related application requests may take an unexpectedly long time which can lead users to assume that the system hangs. Information on how to optimize JDBC connection pool configurations is provided in <a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/Investigating_JDBC_Problems_Pattern.html">Investigating JDBC Problems Pattern</a>.</p>
<p><font size="-1"><a href="#TOP">Top of Page</a></font></p>
<p><span style="font-weight: bold; color: rgb(0,0,0); text-decoration: underline"><a name="Analysis_of_a_hanging_WebLogic_Server"></a>Analysis of a hanging WebLogic Server instance</span><br />General information on how to analyze a hanging WebLogic Server instance is provided in <a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/Generic_Server_Hang_Pattern.html">Generic Server Hang Pattern</a>. </p>
<p>Most times it will be helpful to start with taking thread dumps from the hanging system in order to find out what is going on, e.g., what the different threads are doing and why they hang. Generally, thread dumps can be taken on production systems, however caution is necessary for very old versions of the JVM (&lt;1.3.1_09), as they may crash during thread dumps. Also if the WebLogic Server instance has a huge number of threads, it will mean that the thread dump will take awhile to complete, while the rest of the threads are blocked.</p>
<p>Please take more than one thread dump (5 to 10) with a delay of some seconds in between. This gives you the possibility to check the progress of the different threads. Also it will show if the system actually hangs (no progress at all) or if the throughput is extremely slow, which can seem to be a hanging system.</p>
<p>Information on how to take thread dumps is provided in &#8220;Generic Server Hang&#8221; support pattern or in our documentation: <a href="http://e-docs.bea.com/wls/docs81/cluster/trouble.html">http://e-docs.bea.com/wls/docs81/cluster/trouble.html</a>.</p>
<p>Also please check if the complete WebLogic Server instance hangs or if it is the application that hangs. &#8220;Generic Server Hang&#8221; support pattern also includes this information.</p>
<p><a name="3"></a>Analyzing the thread dumps can show if one of the reasons mentioned in the previous section <a href="#Why_does_the_problem_occur">Why does the problem occur?</a> actually is responsible for your hanging instance. If for example all your threads are in a DriverManager method like <font size="-1">getConnection()</font> then you have identified the root cause and need to change your application to use a DataSource or <font size="-1">Driver.connect()</font> instead of <font size="-1">DriverManager.getConnection()</font>.</p>
<p>A very useful tool, Samurai, can be used to analyze thread dumps and to monitor the progress of threads between different thread dumps. This can be downloaded from dev2dev at:&nbsp; <a href="http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp">http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp</a>.</p>
<p>A whitepaper on analyzing thread dumps on dev2dev: <a href="http://dev2dev.bea.com/products/wlplatform81/articles/thread_dumps.jsp">http://dev2dev.bea.com/products/wlplatform81/articles/thread_dumps.jsp</a> will also be helpful in going deeper into the thread dumps to find out more about the server hang. </p>
<p><font size="-1"><a href="#TOP">Top of Page</a></font></p>
<p><span style="font-weight: bold; color: rgb(0,0,0); text-decoration: underline"><a name="Tips_and_Tricks_to_optimize_your_JDBC"></a>Tips and Tricks to optimize your JDBC code and JDBC connection pool configuration</span><br />There are some best practices both in the development of JDBC code and also in the configuration practice of JDBC connection pools that can help to avoid common problems and optimize resource usage so that hanging server instances should not happen.</p>
<p><span style="font-weight: bold"><a name="JDBC_Programming"></a>JDBC Programming </span><br />In order to optimize resource usage in WebLogic Server and conserve database resources, you should use JDBC connection pools for your application&#8217;s JDBC calls. Connections created and destroyed in your application code generate an unnecessary overhead which should be avoided. For generic documentation on JDBC programming, see: <a href="http://e-docs.bea.com/wls/docs81/jdbc/rmidriver.html#1028977">http://e-docs.bea.com/wls/docs81/jdbc/rmidriver.html#1028977</a>. Also details on JDBC performance tuning are at: <a href="http://e-docs.bea.com/wls/docs81/jdbc/performance.html#1027791">http://e-docs.bea.com/wls/docs81/jdbc/performance.html#1027791</a>.</p>
<p>You can view comprehensive information on JDBC that will help to optimize your JDBC code and the utilization of your JDBC resources on dev2dev Java Database Connectivity page at: <a href="http://dev2dev.bea.com/technologies/jdbc/index.jsp">http://dev2dev.bea.com/technologies/jdbc/index.jsp</a>.</p>
<p><span style="font-weight: bold"><a name="JDBC_Connection_Pool_Configuration"></a>JDBC Connection Pool Configuration</span><br />The <a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/Investigating_JDBC_Problems_Pattern.html">Investigating JDBC Problems Pattern</a> has recommendations on how to configure a connection pool for production environments. In order to avoid hangs or bad performance, these configuration tips should be considered. </td>
</tr>
</tbody>
</table>
<p><font size="-1"><a href="#TOP">Top of Page</a></font></p>
<table style="width: 600px; text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><font color="#009900"><u><b><a name="Known_Issues"></a>Known Issues</b></u></font><br /><span style="color: rgb(0,0,0)">You can periodically review the Release Notes for your version of WLS for more information on Known Issues or Resolved Issues in Service Packs and browse for JDBC server hang-related issues.&nbsp; </span><span style="color: rgb(0,0,0)">For your convenience, see the following: <br /></span>
<ul>
<li><span style="color: rgb(0,0,0)"><a href="http://edocs/wls/docs81/notes/index.html">WLS 8.1 Release Notes</a></span>
<li><span style="color: rgb(0,0,0)"><a href="http://edocs/wls/docs70/notes/index.html">WLS 7.0 Release Notes</a></span>
<li><span style="color: rgb(0,0,0)"><a href="http://edocs/wls/docs61/notes/index.html">WLS 6.1 Release Notes</a></span> </li>
</ul>
<p><span style="color: rgb(0,0,0)">Please note that changes have been made in <a href="http://e-docs.bea.com/wls/docs81/notes/resolved_sp03.html#1817208">WLS 8.1 SP3</a> to resolve CR134921, where for certain JDBC connections, the call to roll back a transaction was not being handled immediately because the driver had to wait for any currently-executing statement to return.&nbsp; <br /><span style="color: rgb(0,0,0)"><br />Searching will also return Release Notes, as well as other Support Solutions and CR-related information as noted at <a href="#Need_Further_Help?">Need Further Help?</a>.&nbsp; Contract customers who are logged in at </span><span style="font-size: 12pt; font-family: 'Times New Roman'"><a href="http://support.bea.com/">http://support.bea.com/</a> will also see a Browse portlet for both Solutions and Bug Central where latest available CRs can be browsed by Product version.</span><br /></span></td>
</tr>
</tbody>
</table>
<p> <br />
<table style="width: 600px; color: rgb(0,0,0); text-align: left" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><span style="text-decoration: underline"><span style="font-weight: bold"><a name="Need_Further_Help?"></a>Need Further Help?</span></span><br /><span style="font-size: 12pt; font-family: 'Times New Roman'">If you have followed the pattern, but still require additional help, you can:<br /></span>
<ol>
<li><span style="font-size: 12pt; font-family: 'Times New Roman'">Query AskBEA at </span><span style="font-size: 12pt; font-family: 'Times New Roman'"><a href="http://support.bea.com/">http://support.bea.com/</a></span><span style="font-size: 12pt; font-family: 'Times New Roman'"> </span><span style="font-size: 12pt; font-family: 'Times New Roman'">using </span><span style="font-size: 12pt; font-family: 'Times New Roman'">&#8220;jdbc server hang&#8221;, as an example, to discover other published solutions.&nbsp; </span><span style="font-size: 12pt; color: rgb(0,0,0); font-family: 'Times New Roman'">Contract Support Customers: Ensure you are logged to access available CR-related information.</span>
<li><span style="font-size: 12pt; font-family: 'Times New Roman'">Ask a more detailed question on one of BEA&#8217;s newsgroups at </span><a href="http://newsgroups.bea.com/">http://forums.bea.com<br /></a></li>
</ol>
<p>If this does not resolve your issue and you have a valid Support Contract, you can open a Support Case by logging in at: <span style="font-size: 12pt; font-family: 'Times New Roman'"><a href="http://support.bea.com/">http://support.bea.com/</a></span> .</td>
</tr>
</tbody>
</table>
<p> <br />
<table width="600" border="2">
<tbody>
<tr>
<td>
<p><strong>FEEDBACK</strong></p>
<p><font color="#000000">Please provide us input on whether or not this Support Diagnostic Pattern <strong>&#8220;JDBC Causes Server Hang&#8221;</strong> helped, any clarifications you needed, and any requests for new topics to <a href="mailto:support.ke@bea.com?subject=Patterns%20Feedback:%20JDBC%20Causes%20Server%20Hang&amp;body=">Support Diagnostic Patterns</a>. <br /></font></p>
</td>
</tr>
</tbody>
</table>
<p> <br />
<table width="600" border="2"><!--DWLayoutTable--><br />
<tbody>
<tr>
<td height="78">
<p><strong>DISCLAIMER NOTICE:</strong></p>
<p>BEA Systems, Inc. provides the technical tips and patches on this Website for your use under the terms of BEA&#8217;s maintenance and support agreement with you. While you may use this information and code in connection with software you have licensed from BEA, BEA makes no warranty of any kind, express or implied, regarding the technical tips and patches.</p>
<p>Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.</p>
</td>
</tr>
</tbody>
</table>
<hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/08/jdbc_causes_server_hang.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>应用服务器发生hang的诊断方法</title>
		<link>http://www.hashei.me/2009/08/java_generic_server_hang.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=java_generic_server_hang</link>
		<comments>http://www.hashei.me/2009/08/java_generic_server_hang.html#comments</comments>
		<pubDate>Sat, 15 Aug 2009 15:04:26 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[性能优化]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[hang]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[thread dump]]></category>
		<category><![CDATA[weblogic]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/08/java_generic_server_hang.html</guid>
		<description><![CDATA[写在前面
其实这是BEA官网上的一篇文档，是在weblogic8.1的时候推出的。在BEA被Oracle收购后，所有的support文章也就被重定向到Oracle的官网首页= =，而且google的快照也没有了。这篇来自无意间google到的一个外国论坛，虽然是写在8.1时，但是解决问题的方法和思路现在依旧有效。本想理解之后结合案例来写一篇，但是最近一直没有遇到相关的问题，而且觉得那样也许会破坏文章的完整性，所以放出原文，既在网上留个副本，也能让大家各取所需，见仁见智。
从内容看，你会发现除了这篇，还有EJB_RMI Server Hang、Application Dead Lock、JDBC Causes Server Hang，但是那个论坛里还能找到的仅有JDBC Causes Server Hang一篇。所以如果你接触weblogic比较早，保存过另两篇文章，或者在网上看到了，那请留言说明，万分感谢。
Generic Hang



Problem Description
A server hang is suspected when:

The server does not respond to new requests.
Requests time out.
Requests take longer and longer to process (may be on the way to a hang).
A server crash is not usually a symptom of a hung server but may [...]]]></description>
			<content:encoded><![CDATA[<h4>写在前面</h4>
<p style="text-indent: 24pt">其实这是BEA官网上的一篇文档，是在weblogic8.1的时候推出的。在BEA被Oracle收购后，所有的support文章也就被重定向到Oracle的官网首页= =，而且google的快照也没有了。这篇来自无意间google到的一个外国论坛，虽然是写在8.1时，但是解决问题的方法和思路现在依旧有效。本想理解之后结合案例来写一篇，但是最近一直没有遇到相关的问题，而且觉得那样也许会破坏文章的完整性，所以放出原文，既在网上留个副本，也能让大家各取所需，见仁见智。</p>
<p style="text-indent: 24pt">从内容看，你会发现除了这篇，还有EJB_RMI Server Hang、Application Dead Lock、JDBC Causes Server Hang，但是那个论坛里还能找到的仅有JDBC Causes Server Hang一篇。所以如果你接触weblogic比较早，保存过另两篇文章，或者在网上看到了，那请留言说明，万分感谢。</p>
<h4><span style="font-size: medium;"><span style="line-height: normal; -webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; ">Generic Hang</span></span></h4>
<table style="width: 600px; text-align: left;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><strong><span style="text-decoration: underline;">Problem Description</span></strong><br />
A server hang is suspected when:</p>
<ul>
<li>The server does not respond to new requests.</li>
<li>Requests time out.</li>
<li>Requests take longer and longer to process (may be on the way to a hang).</li>
<li>A server crash is not usually a symptom of a hung server but may follow.</li>
</ul>
</td>
</tr>
</tbody>
</table>
<table style="width: 600px; text-align: left;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><span style="text-decoration: underline;"><strong>Problem Troubleshooting</strong><strong><br />
</strong></span>Please note that not all of the following items would need to be done. Some issues can be solved by only following a few of the items.<strong><span style="text-decoration: underline;">Quick Links:</span></strong></p>
<ul>
<li><span style="text-decoration: underline;"><a href="#Why_does_the_problem_occur?">Why does the problem occur?</a></span></li>
<li><span style="text-decoration: underline;"><a href="#Potential_Causes_of_Server_Hang">Potential Causes of Server Hang</a></span></li>
<li><span style="text-decoration: underline;"><a href="#Basic_Steps">Basic Steps</a></span></li>
<li><span style="text-decoration: underline;"><a href="#Known_WebLogic_Server_Issues">Known WebLogic Server Issues</a></span></li>
<li><span style="text-decoration: underline;"><a href="#Collecting_Thread_Dumps">Collecting Thread Dumps</a></span></li>
<li><span style="text-decoration: underline;"><a href="#Analysis_of_Thread_Dump">Analysis of a Thread Dump</a></span></li>
</ul>
<p><span style="text-decoration: underline;"><strong><a name="Why_does_the_problem_occur?"></a></strong><span style="text-decoration: underline;"><strong>Why does the problem occur?</strong></span><strong> </strong></span><br />
A server can hang for a variety of reasons (refer to <a href="#Potential_Causes_of_Server_Hang">Potential Causes of Server Hang</a>). Generally, a server hangs because of a lack of some resource. Lack of a resource prevents the server from servicing requests. For example, because of a problem (deadlock) or volume of requests there may be no execute threads available to do any work; all are busy or busy with previous requests.</p>
<p><span><a href="#TOP">Top of Page</a></span></p>
<p><strong><span style="text-decoration: underline;"><a name="Potential_Causes_of_Server_Hang"></a></span></strong></td>
</tr>
</tbody>
</table>
<table border="1" width="600">
<tbody>
<tr>
<td width="45%">
<div><strong>Topic</strong></div>
</td>
<td width="25%">
<div><strong>Pattern Name</strong></div>
</td>
<td width="30%">
<div><strong>Link</strong></div>
</td>
</tr>
<tr>
<td valign="top">RMI, RJVM responses – all threads tied up waiting for RJVM, RMI responses.</td>
<td>EJB_RMI Server Hang</td>
<td>
<div><a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/EJB_RMI_Server_Hang_Pattern.html">EJB_RMI Server Hang</a></div>
</td>
</tr>
<tr>
<td valign="top">Application Deadlock – thread locks resource1 then waits for lock for resource2. Another thread locks resource2 and then waits for lock for resource1.</td>
<td>Application Deadlock Causes Server Hang</td>
<td style="color: #ff0000">
<div><a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/ServerHang_Application_Deadlock_Pattern.html">Application Dead Lock</a></div>
</td>
</tr>
<tr>
<td valign="top">Threads are all used up, none available for new work.</td>
<td>Thread Usage Server Hang</td>
<td>TBD</td>
</tr>
<tr>
<td valign="top">Garbage Collection taking too much time.</td>
<td>Garbage Collection Server Hang</td>
<td>TBD</td>
</tr>
<tr>
<td valign="top">JSP improper settings for servlet times, e.g. PageCheckSeconds.</td>
<td>JSP cause Server Hang</td>
<td>TBD</td>
</tr>
<tr>
<td valign="top">Long Running JDBC calls or JDBC deadlocks lead to a hang.</td>
<td><span style="color: #000000">JDBC Causes Server Hang</span></td>
<td style="text-align: center"><a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/JDBC_Causes_Server_Hang_Pattern.html">JDBC Causes Server Hang</a></td>
</tr>
<tr>
<td valign="top">JVM hang during (code optimization), looks like server hang.</td>
<td>Server Hang in Code Optimization</td>
<td>TBD</td>
</tr>
<tr>
<td valign="top">JSP compilation causes server hang under heavy load.</td>
<td>JSP Compilation Server Hang</td>
<td><a href="http://support.bea.com/application_content/product_portlets/support_patterns/wls/JDBC_Causes_Server_Hang_Pattern.html"></a>TBD</td>
</tr>
<tr>
<td valign="top">SUN JVM bugs, e.g. Light weight thread library.</td>
<td>Sun JVM Bugs that Cause Server Hangs</td>
<td>TBD</td>
</tr>
</tbody>
</table>
<table style="width: 600px; text-align: left;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><span><a href="#TOP">Top of Page</a></span></p>
<p><a name="Basic_Steps"></a><br />
When a server is hanging, first ping the server using <span style="font-family: 'Courier New', Courier, mono;">java weblogic.Admin t3://server:port PING</span>. If the server can respond to the ping, it may be that the application is hanging and not the server itself.</p>
<p>Ensure that the server is actually hanging and not doing garbage collection. To verify, restart the server with <span style="font-family: 'Courier New', Courier, mono;">-verbosegc</span> turned on, and redirect <span style="font-family: 'Courier New', Courier, mono;">stdout</span> and <span style="font-family: 'Courier New', Courier, mono;">stderr</span> to one file. When the server stops responding, it can be determined if it’s doing garbage collection or it is really hanging.  If the garbage collection is taking too long (&gt;10 seconds), the server may miss the heartbeats that servers use to keep each other informed of the topoplogy of the cluster.</p>
<p>WebLogic Server uses the ‘default’ thread queue or a configured application specific thread queue<span style="color: #ff0000"> </span>to service client requests.<span style="color: #ff0000"> </span>Client requests will only be handled in the default queue if no application specific thread queue is defined.  Please see <a href="http://e-docs.bea.com/wls/docs81/perform/AppTuning.html#11052010">Tuning WebLogic Server Applications</a>, <a href="http://e-docs.bea.com/wls/docs81/perform/WLSTuning.html#1140013">Tuning the Default Execute Queue Threads</a>, and <a href="http://e-docs.bea.com/wls/docs81/perform/topten.html#1129089">Tuning WebLogic Server Performance Parameters</a> for more information on defining application specific thread queues. <a href="http://e-docs.bea.com/wls/docs81/perform/topten.html#1129089"><br />
</a></p>
<p>In release 8.1, a change was made to the thread architecture in WebLogic Server.  A specific kernel thread group for internal WebLogic tasks was created.  This was found to be necessary to avoid deadlocks that occurred in earlier releases when all threads in the &#8216;default&#8217; thread queue were used and none were thus available for WebLogic internal tasks.<span style="color: #ff0000"> </span></p>
<p>The threads in the &#8216;default&#8217; queue or the application specific thread queue (if one has been configured)<span style="color: #ff0000"> </span>are the threads that should be examined in the event of a server hang.<span style="color: #ff0000"> </span>Here’s an example of what one of these threads looks like in a thread dump. Execute Thread &#8216;14&#8242; from the &#8216;default&#8217; queue looks like in a thread dump when the thread is waiting for work. The latest method called by this thread is <span style="font-family: 'Courier New', Courier, mono;">Object.wait()</span>. This thread is in a state &#8220;waiting on monitor&#8221;.</td>
</tr>
</tbody>
</table>
<table style="width: 600px;" border="1" cellspacing="2" cellpadding="2" width="700">
<tbody>
<tr>
<td style="vertical-align: top; background-color: #cccccc" width="700"><span>&#8220;ExecuteThread: &#8216;14&#8242; for queue: &#8216;default&#8217;&#8221; daemon prio=5 tid=0&#215;8b0ab30 nid=0&#215;1f4 waiting on monitor [0x96af000..0x96afdc4]</span><br />
<span>at</span><br />
<span>java.lang.Object.wait(Native Method)</span><br />
<span>at</span><br />
<span>java.lang.Object.wait(Object.java:420)</span><br />
<span>at</span><br />
<span>weblogic.kernel.ExecuteThread.waitForRequest(ExecuteThread.java:94)</span><br />
<span>at</span><br />
<span>weblogic.kernel.ExecuteThread.run(ExecuteThread.java:118)</span></td>
</tr>
</tbody>
</table>
<table style="width: 600px; color: #ff0000; text-align: left;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top"><span style="color: #000000">Threads can be in one of several states.  Please see the <a href="#Analysis_of_Thread_Dump">table</a> below for a description of the thread states.</span><br />
<span style="color: #000000">The format of the thread dump varies with the vendor.  Check on the vendor&#8217;s website for information regarding the format. </span></p>
<p><span style="color: #000000">Below is an example of  threads that  may  be hanging.  ExecuteThread &#8216;9&#8242; is waiting to lock some object &lt;dde51520&gt;.   Notice the &#8220;waiting to lock &lt;dde51520&gt;&#8221; line in the stack trace for this thread.  ExecuteThread &#8216;6&#8242; is also &#8220;waiting to lock the same object &lt;dde51520&gt;&#8221;.  The third thread, ExecuteThread &#8216;5&#8242; has locked this object &lt;dde51520&gt;and is doing work.  This  example demonstrates why one thread dump is not enough.  If the server is hanging, and it is suspected that the cause is the locked object &lt;dde51520&gt;, then subsequent thread dumps will show whether or not that object was released and a new thread has locked object &lt;dde51520&gt;.  If after several thread dumps,  you do not see that the threads have progressed, that object &lt;dde51520&gt; has not been released, you may suspect that there is a problem with the routine(s) in the ExecuteThread &#8216;5&#8242; call stack because the lock is not being released.</span></td>
</tr>
</tbody>
</table>
<table style="width: 600px;" border="1" cellspacing="2" cellpadding="2" width="700">
<tbody>
<tr>
<td style="vertical-align: top; background-color: #cccccc" width="700"><span>&#8220;ExecuteThread: &#8216;9&#8242; for queue: &#8216;weblogic.kernel.Default&#8217;&#8221; daemon prio=5 tid=0xf684c8 nid=0&#215;13 waiting for monitor entry [cc2ff000..cc2ffc24]<br />
at weblogic.cluster.MemberManager.done(MemberManager.java:306)<br />
- waiting to lock &lt;dde51520&gt; (a weblogic.cluster.MemberManager)<br />
at weblogic.cluster.MulticastManager.execute(MulticastManager.java:399)<br />
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)<br />
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)</p>
<p>&#8220;ExecuteThread: &#8216;6&#8242; for queue: &#8216;weblogic.kernel.Default&#8217;&#8221; daemon prio=5 tid=0&#215;9df020 nid=0&#215;10 waiting for monitor entry [cc5ff000..cc5ffc24]<br />
at weblogic.cluster.MemberManager.getRemoteMembers(MemberManager.java:396)<br />
- waiting to lock &lt;dde51520&gt; (a weblogic.cluster.MemberManager)<br />
at weblogic.cluster.ClusterService.getRemoteMembers(ClusterService.java:238)<br />
at weblogic.servlet.internal.HttpServer.setServerList(HttpServer.java:388)<br />
at weblogic.servlet.internal.HttpServer.clusterMembersChanged(HttpServer.java:418)<br />
- locked &lt;ddf32360&gt; (a weblogic.servlet.internal.HttpServer)<br />
at weblogic.cluster.MemberManager$2.execute(MemberManager.java:421)<br />
at weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:197)<br />
at weblogic.kernel.ExecuteThread.run(ExecuteThread.java:170)</p>
<p>&#8220;ExecuteThread: &#8216;5&#8242; for queue: &#8216;weblogic.kernel.Default&#8217;&#8221; daemon prio=5 tid=0&#215;9df020 nid=0&#215;12 waiting for monitor entry [cc5ff000..cc5ffc24]<br />
. . .</p>
<p><span> at weblogic.cluster.MemberManager.checkTimeouts(MemberManager.java:346)<br />
- locked &lt;dde51520&gt; (a weblogic.cluster.MemberManager)<br />
at weblogic.cluster.MulticastManager.trigger(MulticastManager.java:291)<br />
at weblogic.time.common.internal.ScheduledTrigger.run(ScheduledTrigger.java:243 </span></p>
<p></span></td>
</tr>
</tbody>
</table>
<table style="width: 600px; text-align: left;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">Determine if the&#8221;default&#8221; ExecuteThread queue is overloaded. Use the console to determine if any of the ExecuteThreads in the ‘default’ queue are idle. If none are idle, then the application probably needs to be configured with a larger number of ExecuteThreads. This value can be changed through the console and is in the <span style="font-family: 'Courier New', Courier, mono;">config.xml</span> file.</p>
<p>If the Execute Queue has idle threads, it is possible that not enough socket reader threads are allocated. By default, a WebLogic Server instance creates three socket reader threads upon booting. If a cluster system utilizes more than three sockets during peak periods, increase the number of socket reader threads.</p>
<p>The number of socket reader threads should usually be small. However, configure one thread for each Weblogic Server that acts as a client of the server instance that is hanging.</p>
<p>If using a JDBC connection pool, ensure that the JDBC connections have been configured to be equivalent to the number of simultaneous requests, i.e., execute threads, for the pool.</p>
<p><span><a href="#TOP">Top of Page</a></span></p>
<p><a name="Known_WebLogic_Server_Issues"></a><br />
The possibility exists that a problem with JDBC could produce deadlock. Check the version and service pack level of the server found in the beginning of the <span style="font-family: 'Courier New', Courier, mono;">weblogic.log</span>. Then check above the version and service pack lines for any temporary patches that have already been applied to the server classpath. The patches will tell what problems have already been addressed.</p>
<p><span><a href="#TOP">Top of Page</a></span></p>
<p><a name="Collecting_Thread_Dumps"></a><br />
The way to take a thread dump is dependent on the operating system where the hung server instance is installed. Information about taking a thread dump on various operating systems can be found at <a href="http://e-docs.bea.com/wls/docs81/cluster/trouble.html#gc">http://e-docs.bea.com/wls/docs81/cluster/trouble.html#gc</a>. Redirection of both standard error and standard out places the thread dump information in the proper context with server information and other messages and provides more useful logs.</p>
<p><em><strong>Unix Systems (Solaris, HP, AIX)</strong></em><br />
Use <span style="font-family: 'Courier New', Courier, mono;">kill –3 &lt;weblogic process id&gt;</span> to create the necessary thread dumps to diagnose a problem. Ensure this is done several times on each server, spaced about 5 to 10 seconds apart, to help diagnose deadlocks. For this to work, nohup the process when starting the server (refer to Solutions <a href="http://support.bea.com/application?namespace=askbea&amp;origin=ask_bea_answer.jsp&amp;event=link.view_answer_page_clfydoc&amp;answerpage=solution&amp;page=wls/S-12292.htm">S-12292</a> and <a href="http://support.bea.com/application?namespace=askbea&amp;origin=ask_bea_answer.jsp&amp;event=link.view_answer_page_clfydoc&amp;answerpage=solution&amp;page=wls/S-15924.htm">S-15924</a>).</p>
<p><em><strong>Windows, XP, NT</strong></em><br />
Each server requires <span style="font-family: 'Courier New', Courier, mono;">&lt;Ctrl&gt;-&lt;Break&gt;</span> to create the necessary thread dumps to diagnose a problem. Ensure this is done several times on each server, spaced about 5 to 10 seconds apart, to help diagnose deadlocks. On NT, in the command shell type <span style="font-family: 'Courier New', Courier, mono;">CTRL-Break</span>.</p>
<p>If you have installed WebLogic as a Windows service, you will not be able to see the messages from the JVM or WebLogic Server that are printed to standard out or standard error.  To view these messages, you must direct standard out and standard error to a file.  To do this, take the following steps:</p>
<ol>
<li>Create a backup copy of the <span style="font-family: 'Courier New', Courier, mono;">WL_HOME\server\bin\installSvc.cmd </span>master script.</li>
<li>In a text editor, open the <span style="font-family: 'Courier New', Courier, mono;">WL_HOME\server\bin\installSvc.cmd </span>master script.</li>
<li>In <span style="font-family: 'Courier New', Courier, mono;">installSvc.cmd</span>, the last command in the script invokes the <span style="font-family: 'Courier New', Courier, mono;">beasvc</span> utility.</li>
<li>At the end of the <span style="font-family: 'Courier New', Courier, mono;">beasvc </span>command, append the command <span style="font-family: 'Courier New', Courier, mono;">-log:&#8221;pathname&#8221;</span><br />
where pathname is a fully qualified path and filename of the file that you want to store the server&#8217;s standard out and standard error messages.</li>
<li>The modified <span style="font-family: 'Courier New', Courier, mono;">beasvc</span> command will resemble the following command:<br />
<span>&#8220;%WL_HOME%\server\bin\beasvc&#8221; -install </span><br />
<span>-svcname:&#8221;%DOMAIN_NAME%_%SERVER_NAME%&#8221; </span><br />
<span>-javahome:&#8221;%JAVA_HOME%&#8221; -execdir:&#8221;%USERDOMAIN_HOME%&#8221; </span><br />
<span>-extrapath:&#8221;%WL_HOME%\server\bin&#8221; -password:&#8221;%WLS_PW%&#8221; </span><br />
<span>-cmdline:%CMDLINE% </span><br />
<span>-log:&#8221;d:\bea\user_projects\domains\myWLSdomain\myWLSserver-stdout.txt&#8221; </span></li>
<li>If you started WebLogic with nohup, the log messages will show up in <span style="font-family: 'Courier New', Courier, mono;">nohup.out</span>.</li>
</ol>
<p><em><strong>Linux</strong></em><br />
The Linux operating system views threads differently than other operating systems. Each thread is seen by the operating system as a process. To take a thread dump on Linux, find the process id from which all the other processes were started. Use the commands:</p>
<ul>
<li>To obtain the root PID, use:<br />
<blockquote><p><span style="font-family: 'Courier New', Courier, mono;">ps -efHl | grep &#8216;java&#8217; **. ** </span></p></blockquote>
</li>
</ul>
<p>Use a grep argument that is a string that will be found in the process stack that matches the server startup command. The first PID reported will be the root process, assuming that the ps command has not been piped to another routine.</p>
<ul>
<li>Use the weblogic.Admin command <span style="font-family: 'Courier New', Courier, mono;">THREAD_DUMP</span></li>
</ul>
<p>Another method of getting a thread dump is to use the <span style="font-family: 'Courier New', Courier, mono;">THREAD_DUMP</span> admin command. This method is independent of the OS on which the server instance is running.</p>
<blockquote><p><span style="font-family: 'Courier New', Courier, mono;">java weblogic.Admin -url ManagedHost:8001 -username weblogic -password weblogic THREAD_DUMP</span></p></blockquote>
<p><strong>NOTE:</strong> This command cannot be used if unable to ping the server instance.</p>
<p>If the JVM in use is Sun’s, the thread dump goes to stdout. Sun has enhanced the thread dump format between JVM 1.3.1 and 1.4. To obtain Sun’s 1.4 style of thread dump add the following option to the java command line for starting the 1.3.1 JVM:</p>
<blockquote><p><span style="font-family: 'Courier New', Courier, mono;">-XX:+JavaMonitorsInStackTrace</span></p></blockquote>
<p><span><a href="#TOP">Top of Page</a></span></p>
<p><a name="Analysis_of_Thread_Dump"></a><br />
The most useful tool in analyzing a server hang is a set of thread dumps. A thread dump provides information on what each of the threads is doing at a particular moment in time. A set of thread dumps (usually 3 or more taken 5 to 10 seconds apart) can help analyze the change or lack of change in each thread’s state from one thread dump to another. A hung server thread dump would typically show little change in thread states from the first to the last dump.</p>
<p>Threads can be in one of the following states:</p>
<table style="width: 600px; height: 171px;" border="1">
<tbody>
<tr>
<td>Running or runnable thread</td>
<td>A runnable state means that the threads could be running or are running at that instance in time.</td>
</tr>
<tr>
<td>Suspended thread</td>
<td>Thread has been suspended by the JVM.</td>
</tr>
<tr>
<td>Thread waiting on a condition variable</td>
<td>Threads in a condition wait state can be thought of as waiting for an event to occur.</td>
</tr>
<tr>
<td>Thread waiting on a monitor lock</td>
<td>Monitors are used to manage access to code that should only be run by a single thread at a time</td>
</tr>
</tbody>
</table>
<p>More information on thread states can be found at<a href="http://java.sun.com/developer/onlineTraining/Programming/JDCBook/stack.html#states"> http://java.sun.com/developer/onlineTraining/Programming/JDCBook/stack.html#states.</a></p>
<p>There is also a thread analysis tool at <a href="http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp">http://dev2dev.bea.com/resourcelibrary/utilitiestools/adminmgmt.jsp.</a><br />
Download the tool and read the instructions at the link.<br />
<strong><br />
What to Look at in the Thread Dump</strong><br />
All requests enter the WebLogic Server through the ListenThread. If the ListenThread is gone, no work can be received and therefore no work can be done. Verify that a ListenThread exists in the thread dump. The ListenThread should be in the socketAccept method. The following example shows what the Listen Thread looks like:</td>
</tr>
</tbody>
</table>
<table style="width: 600px;" border="1" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top; background-color: #cccccc"><span>&#8220;ListenThread.Default&#8221; prio=10 tid=0&#215;00037888 nid=93 lwp_id=6888343 runnable [0x 1a81b000..0x1a81b530]</span> <span>at java.net.PlainSocketImpl.socketAccept(Native Method)</span><br />
<span>at</span><br />
<span>java.net.PlainSocketImpl.accept(PlainSocketImpl.java:353)</span><br />
<span>- locked &lt;0&#215;26d9d490&gt; (a java.net.PlainSocketImpl)</span><br />
<span>at</span><br />
<span>java.net.ServerSocket.implAccept(ServerSocket.java:439)</span><br />
<span>at</span><br />
<span>java.net.ServerSocket.accept(ServerSocket.java:410)</span><br />
<span>at</span><br />
<span>weblogic.socket.WeblogicServerSocket.accept(WeblogicServerSocket.java:24)</span><br />
<span>at</span><br />
<span>weblogic.t3.srvr.ListenThread.accept(ListenThread.java:713)</span><br />
<span>at</span><br />
<span>weblogic.t3.srvr.ListenThread.run(ListenThread.java:290)</span></td>
</tr>
</tbody>
</table>
<table style="width: 600px; text-align: left;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">Socket Reader Threads accept the incoming request from the Listen Thread Queue and put it on the Execute Thread Queue. If there are no socket reader threads in the thread dump, then there is a bug somewhere that is causing the socket reader thread to vanish. There should always be at least 3 socket reader threads. One socket reader thread is usually in the poll function, while the other two are available to process requests. Below are Socket Reader threads from a sample thread dump.</td>
</tr>
</tbody>
</table>
<table style="width: 600px;" border="1" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top; background-color: #cccccc"><span>&#8220;ExecuteThread: &#8216;2&#8242; for queue: &#8216;weblogic.socket.Muxer&#8217;&#8221; daemon prio=10 tid=0&#215;000</span> <span>36128 nid=75 lwp_id=6888070 waiting for monitor entry [0x1b12f000..0x1b12f530]</span><br />
<span>at</span><br />
<span>weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:92)</span><br />
<span>- waiting to lock &lt;0&#215;25c01198&gt; (a java.lang.String)</span><br />
<span>at</span><br />
<span>weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)</span><br />
<span>at</span><br />
<span>weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)</span><br />
<span>at</span><br />
<span>weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)</span></p>
<p><span>&#8220;ExecuteThread: &#8216;1&#8242; for queue: &#8216;weblogic.socket.Muxer&#8217;&#8221; daemon prio=10 tid=0&#215;000</span> <span>35fc8 nid=74 lwp_id=6888067 runnable [0x1b1b0000..0x1b1b0530]</span> <span>at weblogic.socket.PosixSocketMuxer.poll(Native Method)</span><br />
<span>at</span><br />
<span>weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:99)</span><br />
<span> &#8211; locked &lt;0&#215;25c01198&gt; (a java.lang.String)</span><br />
<span>at</span><br />
<span>weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)</span><br />
<span>at</span><br />
<span>weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)</span><br />
<span>at</span><br />
<span>weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)</span></p>
<p><span>&#8220;ExecuteThread: &#8216;0&#8242; for queue: &#8216;weblogic.socket.Muxer&#8217;&#8221; daemon prio=10 tid=0&#215;000</span> <span>35e68 nid=73 lwp_id=6888066 waiting for monitor entry [0x1b231000..0x1b231530]</span><br />
<span>at</span><br />
<span>weblogic.socket.PosixSocketMuxer.processSockets(PosixSocketMuxer.java:92)</span><br />
<span>- waiting to lock &lt;0&#215;25c01198&gt; (a java.lang.String)</span><br />
<span>at</span><br />
<span>weblogic.socket.SocketReaderRequest.execute(SocketReaderRequest.java:32)</span><br />
<span>at</span><br />
<span>weblogic.kernel.ExecuteThread.execute(ExecuteThread.java:178)</span><br />
<span>a</span><span>t</span><br />
<span>weblogic.kernel.ExecuteThread.run(ExecuteThread.java:151)</span></td>
</tr>
</tbody>
</table>
<table style="width: 600px; text-align: left;" border="0" cellspacing="2" cellpadding="2">
<tbody>
<tr>
<td style="vertical-align: top">The <span style="font-family: 'Courier New', Courier, mono;">ThreadPoolPercentSocketReaders</span> attribute sets the maximum percentage of execute threads that are set to read messages from a java socket. The optimal value for this attribute is application-specific. The default value is 33, and the valid range is 1 to 99.</p>
<p>Allocating execute threads to act as socket reader threads increases the speed and the ability of the server to accept client requests. It is essential to balance the number of execute threads that are devoted to reading messages from a socket and those threads that perform the actual execution of tasks in the server.</p>
<p>In release 8.1, the socket reader threads no longer use &#8220;ExecuteThreads&#8221; in the default queue.  Instead they have their own thread group named.</p>
<p><strong>Next Steps</strong><br />
The next steps require a further analysis of the thread dump. Look in the thread dump to see what each the threads are doing at the time of the hang. This will help to analyze the next stage of the investigation. For example, if there are many threads involved in JSP compilation, refer to <a href="#Potential_Causes_of_Server_Hang">Potential Causes of Server Hang</a> for further diagnosis and actions to test.</p>
<p><span><a href="#TOP">Top of Page</a></span></td>
</tr>
</tbody>
</table>
<p><span style="text-decoration: underline;"><span style="color: #669966; font-size: x-small;"> </span></span></p>
<hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/08/java_generic_server_hang.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>用IBM HeapAnalyzer和MOD4J分析Java内存泄漏</title>
		<link>http://www.hashei.me/2009/07/heapanalyzer-and-mod4j-introduction.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=heapanalyzer-and-mod4j-introduction</link>
		<comments>http://www.hashei.me/2009/07/heapanalyzer-and-mod4j-introduction.html#comments</comments>
		<pubDate>Sun, 05 Jul 2009 11:15:09 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[性能优化]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[HeapAnalyzer]]></category>
		<category><![CDATA[heapdump]]></category>
		<category><![CDATA[MOD4J]]></category>
		<category><![CDATA[内存优化]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/07/heapanalyzer-and-mod4j-introduction.html</guid>
		<description><![CDATA[
内存泄漏是比较常见的一种应用程序性能问题，一旦发生，则系统的可用内存和性能持续下降；最终将导致内存不足(OutOfMemory)，系统彻底宕掉，不能响应任何请求，其危害相当严重。同时，Java堆(Heap)中大量的对象以及对象间之复杂关系，导致内存泄漏问题的探测和分析均比较困难，采用相应的辅助工具是很必要的。

我使用的比较多的是Memory Dump Diagnostic for Java (MDD4J)和IBM HeapAnalyzer，这两个工具都能支持几乎所有JDK版本所生成的堆转储文件，使用前可以在两者的帮助文件中查看一下支持列表。
先说一下IBM HeapAnalyzer，下载之后首先阅读一下readme，这上面详细写了HeapAnalyzer的使用方法。对于我用的2.6版本（最新为3.8），可以在命令行中输入&#60;Java path&#62;java –Xmx[heapsize] –jar ha26.jar &#60;heapdump file&#62;来启动工具并加载heapdump文件。对于比较大的heapdump，将-Xmx设置一个较大的值（大于heapdump的大小），来避免加载过程中的OOM。对于64位机器上产生的超大heapdump，个人机器上分析就不大可能了。
打开heapdump文件后，我一般点击“Analysis”里的“Tree View”，以树的形式从根节点展示内存对象分配的信息

第一行java.lang.ref.Refenrence这个class及它的76个children占用了67%的已用堆大小（31M/46M），它本身仅占用了76bits。双击java.lang.ref.Refenrence，我们可以看到它所引用的两个子节点。其中一个子节点java.lang.ref.Finalizer后的67%指引我们内存泄漏的问题应该在它的引用上。


接下去你可以逐级展开，或者右键点击“Locate a leak suspect”，让HeapAnalyzer帮你找到泄漏可能发生的地方。泄漏一般发生在那些拥有“超乎寻常多”的引用（子节点）的class上，正是这些创建后没有释放、累积了成千上百的对象，造成了OutOfMemory。右键中的“Go to the largest drop subtrees”也是以此为原理而设的，它的解释为：
“Search for total size drop” will find a size drop between the total size of a parent and the biggest total size of child of the parent.
因为出现泄漏的点，每个子节点占用的内存空间不大，但是巨大的数量会导致父节点占用的total size很大。不过反过来寻找到的点都是泄漏发生的地方这种说法是不成立的，否则也不需要我们来分析了。
更多细节的内容，可以看这篇PPT
Memory Dump Diagnostic for Java (MDD4J)则是IBM [...]]]></description>
			<content:encoded><![CDATA[<blockquote>
<p style="text-indent: 24pt">内存泄漏是比较常见的一种应用程序性能问题，一旦发生，则系统的可用内存和性能持续下降；最终将导致内存不足(OutOfMemory)，系统彻底宕掉，不能响应任何请求，其危害相当严重。同时，Java堆(Heap)中大量的对象以及对象间之复杂关系，导致内存泄漏问题的探测和分析均比较困难，采用相应的辅助工具是很必要的。</p>
</blockquote>
<p style="text-indent: 24pt">我使用的比较多的是Memory Dump Diagnostic for Java (MDD4J)和IBM HeapAnalyzer，这两个工具都能支持几乎所有JDK版本所生成的堆转储文件，使用前可以在两者的帮助文件中查看一下支持列表。</p>
<p style="text-indent: 24pt">先说一下IBM HeapAnalyzer，<a title="IBM HeapAnalyzer" href="http://www.alphaworks.ibm.com/tech/heapanalyzer" target="_blank">下载</a>之后首先阅读一下readme，这上面详细写了HeapAnalyzer的使用方法。对于我用的2.6版本（最新为3.8），可以在命令行中输入&lt;Java path&gt;java –Xmx[heapsize] –jar ha26.jar &lt;heapdump file&gt;来启动工具并加载heapdump文件。对于比较大的heapdump，将-Xmx设置一个较大的值（大于heapdump的大小），来避免加载过程中的OOM。对于64位机器上产生的超大heapdump，个人机器上分析就不大可能了。</p>
<p style="text-indent: 24pt">打开heapdump文件后，我一般点击“Analysis”里的“Tree View”，以树的形式从根节点展示内存对象分配的信息</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/heapanalysis1.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/heapanalysis1_thumb.jpg" border="0" alt="heapanalysis1" width="504" height="358" /></a></p>
<p style="text-indent: 24pt">第一行java.lang.ref.Refenrence这个class及它的76个children占用了67%的已用堆大小（31M/46M），它本身仅占用了76bits。双击java.lang.ref.Refenrence，我们可以看到它所引用的两个子节点。其中一个子节点java.lang.ref.Finalizer后的67%指引我们内存泄漏的问题应该在它的引用上。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/heapanalysis2.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/heapanalysis2_thumb.jpg" border="0" alt="heapanalysis2" width="504" height="381" /></a></p>
<p><span id="more-508"></span></p>
<p style="text-indent: 24pt">接下去你可以逐级展开，或者右键点击“Locate a leak suspect”，让HeapAnalyzer帮你找到泄漏可能发生的地方。泄漏一般发生在那些拥有“超乎寻常多”的引用（子节点）的class上，正是这些创建后没有释放、累积了成千上百的对象，造成了OutOfMemory。右键中的“Go to the largest drop subtrees”也是以此为原理而设的，它的解释为：</p>
<blockquote><p>“Search for total size drop” will find a size drop between the total size of a parent and the biggest total size of child of the parent.</p></blockquote>
<p style="text-indent: 24pt">因为出现泄漏的点，每个子节点占用的内存空间不大，但是巨大的数量会导致父节点占用的total size很大。不过反过来寻找到的点都是泄漏发生的地方这种说法是不成立的，否则也不需要我们来分析了。</p>
<p style="text-indent: 24pt">更多细节的内容，可以看这篇<a title="How to use IBM HeapAnalyzer" href="http://docs.google.com/fileview?id=F.242a8f41-b76f-47ba-8350-abeaff1d0d68" target="_blank">PPT</a></p>
<p style="text-indent: 24pt"><strong>Memory Dump Diagnostic for Java (MDD4J)</strong>则是IBM Support Assistant（ISA）里的一个工具，可以在ISA里加载。它的使用方法和HeapAnalyzer类似，不过它会自动列出“可疑泄漏点”供分析。所依据的，是“分析算法查找父对象与子对象之间对象大小的显著变化。这些发生显著变化的父对象可能是基于数组的容器对象，它们包含大量不断增大的子对象。”</p>
<p style="text-indent: 24pt">具体的使用方法可以参考<a title="内存泄漏检测与分析" href="http://www.ibm.com/developerworks/cn/websphere/library/techarticles/0608_poddar/0608_poddar.html" target="_blank">《WebSphere Application Server 中的内存泄漏检测与分析:第 2 部分：用于泄漏检测与分析的工具和功能》</a>一文中的实际案例。（不过文中的版本应该比较低，现在能下到的2和3版本有些不同，不过不妨碍使用）.</p>
<p style="text-indent: 24pt">Heapdump工具的使用很简单，难点在于找到“内存泄漏的真正原因”，<strong>一般需要通过多个heapdump文件的对比才能找到</strong>。</p>
<blockquote><p>比较分析用于对运行内存泄漏应用程序期间（即可用 Java 堆内存流失时）获取的两个内存转储进行分析。在运行泄漏应用程序的早期触发的内存转储被称为基线内存转储，发生泄漏的应用程序运行一段时间（以允许泄漏程度加大）后触发的内存转储被称为主内存转储。在发生了内存泄漏的情况下，主内存转储可能包含大量对象，而这些对象占用的 Java 堆空间量会比基线内存转储大很多。</p>
<p>为了获得更好的分析结果，建议使主内存转储的触发点与基线内存转储的触发点在时间上拉开一定距离，从而使总耗用堆大小在两个触发点之间大幅增长。</p></blockquote>
<p style="text-indent: 24pt">如果发现“主内存转储”中的某个对象数量大大大于“基线内存转储”，那么这个对象一般就是发生泄漏的点。<strong>但是要避免在appserver刚启动时就做heapdump，否则会把正常需要分配的对象当作泄漏嫌疑点。</strong>比如原先运行3天会发生OOM，那么可以：缩小堆大小，让OOM提早发生；在运行4个小时后每隔4小时手动做一次Heapdump直到OOM发生。这些动作也许不适合在生产环境下进行，可以另建测试环境进行。</p>
<p style="text-indent: 24pt">之前几篇文章中介绍的分析gc log，和本文讲到的分析heapdump，都是脱机分析法。它们的缺陷就是无法找到代码引起的“性能低下”的原因，正如《<a title="HPjtune分析gc log" href="http://www.hashei.me/2009/07/use-hpjtune-to-analysis-gc-log.html" target="_blank">用HPjtune分析GC日志</a>》里所看到的那样，系统性能很差，但是没有OOM发生，可用堆在每次full gc后还不断减少的现象不能简单怪罪为内存泄漏，毕竟最后都回收下来了，如果手动做heapdump，可能有问题的对象已被回收，无法得到正确的结果。这种情况下要使用诸如Jprofile这样直接附加到JVM上的工具来监测了。</p>
<p style="text-indent: 24pt">最后附一下手动生成heapdump的方法，免得事到临头在google。</p>
<p style="text-indent: 24pt"><strong>在Linux/AIX环境下</strong></p>
<p style="text-indent: 24pt">使用Kill -3 pid命令来调用堆转储.</p>
<p style="text-indent: 24pt"><strong>Windows环境下</strong></p>
<blockquote><p>1． 找到JVM对象名字。</p>
<pre>
<pre style="font-family: 'Courier New', Courier, monospace; font-size: 13px; background-color: #dadada; white-space: pre-wrap; word-wrap: break-word; padding: 5px;">&lt;wsadmin&gt; set objectName [$AdminControl queryNames
WebSphere:type=JVM,process=&lt;<em>servername</em>&gt;,node=&lt;<em>nodename</em>&gt;,*]</pre>
</pre>
<p>2． 对JVM MBean调用generateHeapDump操作。</p>
<pre>&lt;wsadmin&gt; $AdminControl invoke $objectName generateHeapDump</pre>
</blockquote>
<p style="text-indent: 24pt">如果上述方法是没有生成，那么进行下面的设置。</p>
<blockquote>
<li>访问管理控制台</li>
<li>转到“服务器”&gt;“应用程序服务器”&gt; Server1（或者要获取其堆转储的服务器的名称）&gt;“进程定义”&gt;“环境条目”。</li>
<li>单击“新建”。</li>
<li>在“名称”字段中，输入 IBM_HEAPDUMP（默认是开启的）。在“值”字段中，输入 true。</li>
<li>单击“确定”。</li>
<li>重复步骤 3 至 5，但将 IBM_HEAPDUMP_OUTOFMEMORY 设置为 true。</li>
<li>缺省情况下，将在 ~/WebSphere/AppServer/ 目录中创建内存转储（对于 WebSphere Application Server V6.x 而言，缺省目录是：~/WebSphere/AppServer/profiles/default）。要将堆转储目标定向到另一个目录，请转至“环境条目”，单击“新建”，将 IBM_HEAPDUMPDIR 设置为适当的目录（例如 /heapdumps），然后单击“确定”。</li>
<li>单击“保存”，然后在下一个屏幕中再次单击“保存”。</li>
<li>转到“服务器”&gt;“应用程序服务器”&gt; server1（或者要获取其堆转储的服务器的名称）&gt;“进程定义”&gt;“Java 虚拟机”。</li>
<li>选择“详细垃圾回收”。</li>
<li>单击“保存”，然后在下一个屏幕中再次单击“保存”。</li>
<li>重新启动服务器。</li>
<li>打开命令提示符并转至 /WebSphere/AppServer/bin 目录。</li>
<li>通过发出 kill -3 XXXXX 命令来调用堆转储，其中 XXXXX 是进程标识。</li>
</blockquote>
<p style="text-indent: 24pt">如果WebSphere运行在HP-UX上，那么需要</p>
<ul>
<li>访问管理控制台</li>
<li>转到“服务器”&gt;“应用程序服务器”&gt; Server1（或者要获取其堆转储的服务器的名称）&gt;“进程定义”&gt;“环境条目”。</li>
<li>在“常规参数”中，输入：-Xrunhprof:depth=0,heap=dump,format=a,thread=n,doe=n</li>
<li>缺省情况下，将在 ~/Websphere/AppServer/ 目录中创建内存转储。要将堆转储目标定向到另一个目录，请添加 HProf 参数 file=/heapdumpdir/hprof.txt，其中 heapdumpdir 是适当的目录，而 hprof.txt 是适当的文件名。如果创建了多个内存转储，那么将把每个内存转储追加到同一个 hprof.txt 文件中。</li>
<li>选中“启用详细垃圾回收方式”。</li>
<li>重新启动服务器。</li>
<li>通过发出 kill -3 XXXXX 命令创建堆转储，其中 XXXXX 是进程标识。</li>
<li>除非另有指定，否则将在 ~/WebSphere/AppServer/ 目录中创建 hprof 转储，并且文件名看起来类似于 java.hprof.txt。</li>
<li>关闭应用程序服务器，然后移动 hprof 转储文件。直到正确关闭应用程序服务器之后，hprof 转储文件才完整。</li>
<li>注意：请检查是否每个 hprof 转储都包含 HEAP DUMP BEGIN 和 HEAP DUMP END 这两组标记。如果 hprof 转储的这两组标记不齐全，那么表明该转储不完整且不能用于分析。</li>
</ul>
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2010/05/linux-system-performance-monitoring.html" rel="bookmark" title="Permanent Link: Linux 性能监控">Linux 性能监控</a></li><li><a href="http://www.hashei.me/2009/09/ibm_websphere_support_tips1.html" rel="bookmark" title="Permanent Link: IBM WebSphere Recent Supports">IBM WebSphere Recent Supports</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/07/heapanalyzer-and-mod4j-introduction.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>用HPjtune分析GC日志（一个实际案例的诊断过程）</title>
		<link>http://www.hashei.me/2009/07/use-hpjtune-to-analysis-gc-log.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=use-hpjtune-to-analysis-gc-log</link>
		<comments>http://www.hashei.me/2009/07/use-hpjtune-to-analysis-gc-log.html#comments</comments>
		<pubDate>Thu, 02 Jul 2009 08:48:14 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[性能优化]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[gc performance tuning]]></category>
		<category><![CDATA[gc日志分析]]></category>
		<category><![CDATA[内存优化]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/07/use-hpjtune-to-analysis-gc-log.html</guid>
		<description><![CDATA[上次介绍了IBM的两款分析gc log的工具（GCMV和PMAT），这次讲讲HP推出的HPjmeter。HPjmeter集成了以前的HPjtune功能，可以分析在HP机器上产生的垃圾回收日志文件。你可以到Hewlett-Packard Java website免费下载最新的4.0版本，当然会让你填一些信息。
接下来我将分析一个实际生产环境下的日志文件，这个生产系统在启用新的功能后应用访问速度变慢，每个操作都要耗时10s左右，通过对比前后不同的gc信息，希望能从JVM的层面来优化当前的性能。
HP小机（Pa-Risc和安腾平台）使用HP的JDK后，使用-Xloggc:filename或者-Xverbosegc:file=filename参数会生成形如
&#60;GCH: vmrelease=&#8221;1.4.2 1.4.2.10-060112-16:07-PA_RISC2.0 PA2.0 (aCC_AP)
……
&#60;GCH: mode=n &#62;
&#60;GCH: ncpu=8 &#62;
&#60;GCH: availswap=33554432 &#62;
&#60;GCH: usedswap=0 &#62;
……
&#60;GC: 2 4  9.625554 1 0 31 48539536 0 286392320 0 0 35782656 0 2409608 715849728 20971424 20971424 20971520 0.279391 0.279391 &#62;
&#60;GC: 2 4  10.879321 2 0 31 9797920 0 286392320 0 0 35782656 2409608 2742416 715849728 25165568 25165568 25165824 [...]]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt">上次介绍了IBM的两款分析gc log的工具（GCMV和PMAT），这次讲讲HP推出的<strong>HPjmeter</strong>。HPjmeter集成了以前的HPjtune功能，可以分析在HP机器上产生的垃圾回收日志文件。你可以到<a href="http://www.hp.com/go/java">Hewlett-Packard Java website</a>免费下载最新的4.0版本，当然会让你填一些信息。</p>
<p style="text-indent: 24pt">接下来我将分析一个实际生产环境下的日志文件，这个生产系统在启用新的功能后应用访问速度变慢，每个操作都要耗时10s左右，通过对比前后不同的gc信息，希望能从JVM的层面来优化当前的性能。</p>
<p style="text-indent: 24pt">HP小机（Pa-Risc和安腾平台）使用HP的JDK后，使用-Xloggc:filename或者-Xverbosegc:file=filename参数会生成形如</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">&lt;GCH: vmrelease=&#8221;1.4.2 1.4.2.10-060112-16:07-PA_RISC2.0 PA2.0 (aCC_AP)<br />
……<br />
&lt;GCH: mode=n &gt;<br />
&lt;GCH: ncpu=8 &gt;<br />
&lt;GCH: availswap=33554432 &gt;<br />
&lt;GCH: usedswap=0 &gt;<br />
……<br />
&lt;GC: 2 4  9.625554 1 0 31 48539536 0 286392320 0 0 35782656 0 2409608 715849728 20971424 20971424 20971520 0.279391 0.279391 &gt;<br />
&lt;GC: 2 4  10.879321 2 0 31 9797920 0 286392320 0 0 35782656 2409608 2742416 715849728 25165568 25165568 25165824 0.307422 0.307422 &gt;</div>
<p>的日志，这种格式人肉分析就别想了，它可以在PMAT中以Xverbosegc/hpux文件格式打开，但是图象功能我这里没法使用，只好求助于HP自家的工具——HPjmeter了。</p>
<h4>分析过程</h4>
<p style="text-indent: 24pt">用HPjmeter加载日志文件后，会自动打开HPjtune的窗口。首先会看到Heap Usage After GC标签页，这是四月份正常的情况（请先忽略systemgc，这点留待后面分析）</p>
<p style="text-indent: 24pt"><span id="more-500"></span></p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/April-Heap-Usage-After-GC.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/April-Heap-Usage-After-GC_thumb.png" border="0" alt="April Heap Usage After GC" width="504" height="285" /></a></p>
<p style="text-indent: 24pt">下面是六月份速度慢的情况：</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/June-Heap-Usage-After-GC.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/June-Heap-Usage-After-GC_thumb.png" border="0" alt="June Heap Usage After GC" width="504" height="285" /></a></p>
<p style="text-indent: 24pt">明显能看到Old full（with perm）代表的黄点增多了，从之前的日志文件头我们了解到这个系统所用的<strong>JDK为1.4.2 32位版本（64位版本会写明Java VM name = Java HotSpot(TM) 64-Bit Server VM）</strong>，默认的回收策略是串行收集器，在Old区发生垃圾回收时是Stop the world的full gc，每次full gc耗时基本在10s～12s</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/Duration-6.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/Duration-6_thumb.png" border="0" alt="Duration 6" width="504" height="285" /></a></p>
<p style="text-indent: 24pt">切换到&#8221;Summary&#8221;标签页</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/GC-Summary-4.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/GC-Summary-4_thumb.jpg" border="0" alt="GC Summary 4" width="504" height="304" /></a></p>
<p style="text-indent: 24pt">4月花在gc上的时间占整个JVM运行时间的3.036%，Full GC占整个JVM运行时间的0.993%，应该说是情况良好。</p>
<p style="text-indent: 24pt">到了6月份，情况却变化很大，时间分别为<span style="color: #ff0000;">10.791%</span>和<span style="color: #ff0000;">8.417%</span>，因为超过了5%的警戒线而显示为红色，而且79%的GC时间花在full gc上。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/GC-Summary-6.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/GC-Summary-6_thumb.jpg" border="0" alt="GC Summary 6" width="504" height="304" /></a></p>
<p style="text-indent: 24pt">这还是一周的情况，包括了周末和晚间空闲时刻，让我们看看在上班高峰期间的运行情况。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/GC-Summary-6-4.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/GC-Summary-6-4_thumb.jpg" border="0" alt="GC Summary 6-4" width="504" height="227" /></a></p>
<p style="text-indent: 24pt">乖乖，有61%的时间花在gc上，速度不慢才怪了。我们查看当前对应的Heap Usage After GC</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/June-22-morning.png" target="_blank"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/June-22-morning_thumb.png" border="0" alt="June 22 morning" width="504" height="285" /></a></p>
<p style="text-indent: 24pt">除了开始的少数年轻代中发生的快速Scavenge，大部分都是慢速的Full GC，而且可以看到每次回收后使用的堆空间并没有减小，反而越来越大，有内存泄漏的征兆。不过堆空间并没有一路增长下去直到OutOfMemory，而是像下图般那样反复。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/Jun22-Heap-Usage-After-GC.png" target="_blank"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/Jun22-Heap-Usage-After-GC_thumb.png" border="0" alt="Jun22 Heap Usage After GC" width="504" height="285" /></a></p>
<p style="text-indent: 24pt">早上和下午两个业务繁忙期全是full gc，性能表现很差，而4月正常的情况应是如此</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/April-23-Heap-Usage-After-GC.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/April-23-Heap-Usage-After-GC_thumb.png" border="0" alt="April 23 Heap Usage After GC" width="504" height="285" /></a></p>
<p style="text-indent: 24pt">Eden区满了后，经过Scavenge动作一部分对象被转移到了Old区，所以堆中占用空间上升，直到Old区也无法分配了，那么发生full gc，内存又重新回到一个较低的位置，这是正常的情况。现在6月份出现一直Full GC也无法回收，但又没有发生OutOfMemory，可以判断为原来设置的参数无法满足新内容投产后的需求</p>
<p style="text-indent: 24pt">例如没有使用并行回收，没有发挥8个CPU的效果，没有采用低响应时间的CMS回收模式。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/system-details-hpjtune.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/system-details-hpjtune_thumb.jpg" border="0" alt="system details hpjtune" width="504" height="237" /></a></p>
<p style="text-indent: 24pt">同时新系统产生的对象数量也大大增加，从四月一天的500000个增加到900000个（左边四月，右边六月）。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/April-Cumulative-Allocation.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/April-Cumulative-Allocation_thumb.png" border="0" alt="April Cumulative Allocation" width="244" height="139" /></a> <a href="http://hashei.me/wp-content/uploads/2009/07/June-Cumulative-Allocation.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/June-Cumulative-Allocation_thumb.png" border="0" alt="June Cumulative Allocation" width="244" height="139" /></a></p>
<p style="text-indent: 24pt">导致每次回收后，从新生代转移到年老区的对象数量也变多，其实它们并非是长存对象，只是新生代暂时无法容纳下它们了。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/April-Promoted-Bytes.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/April-Promoted-Bytes_thumb.png" border="0" alt="April Promoted Bytes" width="244" height="139" /></a> <a href="http://hashei.me/wp-content/uploads/2009/07/June-Promoted-Byte.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/June-Promoted-Byte_thumb.png" border="0" alt="June Promoted Byte" width="244" height="139" /></a></p>
<p style="text-indent: 24pt">而且full gc会导致Survivor区里的所有对象都被转移到old区，这造成了恶性循环。（黄色的Full GC后，Survivor里的对象为零）</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/07/June22-Survivor-After.png"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/07/June22-Survivor-After_thumb.png" border="0" alt="June22 Survivor After" width="504" height="285" /></a></p>
<h4>优化操作</h4>
<p style="text-indent: 24pt"><strong>调整目标</strong>：尽可能的将短时间存活的对象在年轻代就能被丢弃掉，而不要转移到年老代中；采用并行回收方式增加效率；避免产生不必要的Full GC；或者采用响应时间短的垃圾回收方式。</p>
<p style="text-indent: 24pt"><strong>调整方法</strong>：增大年轻代大小，减小SurvivorRatio加大Survivor区（也就是From or To）；设置并行回收参数;设置初始堆和最大堆为同样值、设置初始PermSize为一个合理值，避免运行过程中增长；设置回收策略为CMS。</p>
<p style="text-indent: 24pt"><strong>参数设置一</strong>：-Xms1500m -Xmx1500m -Xmn800m -XX:SurvivorRatio=4 -XX:PermSize=160m  -XX:+UseParallelGC（-XX:ParallelGCThreads=8我觉得可以不用显示的声明，可以再上述参数设置后分析新的gc log，看一下System Details页面中ParallelGCThreads的数目再做定夺，1.4.2的JDK不能再Old区做并行回收，也是一个遗憾）</p>
<p style="text-indent: 24pt"><strong>参数设置二</strong>：-Xms1500m -Xmx1500m -Xmn800m -XX:PermSize=160m  -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSFullGCsBeforeCompaction=5（或者最后一个参数设置为-XX:+UseCMSCompactAtFullCollection）</p>
<p style="text-indent: 24pt">上述参数的意义可以看<a title="Sun Hotspot JDK参数设置" href="http://www.hashei.me/2009/05/tuning-the-sun-hotspot-jdk.html" target="_blank">《JAVA性能优化—Sun Hotspot JDK JVM参数设置》</a></p>
<h4>后续进展</h4>
<p style="text-indent: 24pt">参数设置后还有一个观察过程，如果效果不佳，那从系统集成的角度，一是更换64位JDK，这样可以设置更大的堆空间（不过WebSphere更换JDK不像Weblogic那么简单，如果没有买过64位WebSphere的话只好作罢）；二是启用WebSphere的集群，但这需要ND版本的WAS。</p>
<p style="text-indent: 24pt">从应用的角度，可以在早上8点做一次heapdump，9点半做一次heapdump，分析一下full gc内存回收不下来的原因，确定不是程序的错误造成的。或者启用-agentlib:hprof参数，用HPjmeter来trace应用的表现、用HPjmeter来直接监控应用的运行情况。不过这两个方法对性能影响较大，要在测试环境下进行。</p>
<h4>其它的一些碎碎念</h4>
<p style="text-indent: 24pt">现在我们来说说日志中那么多的systemgc，刚开始看到我大吃一惊，但放大图像后发现这些自行调用的full gc都是下班后做的，应该是另一个应用触发的，对白天的性能影响应该不大。</p>
<p style="text-indent: 24pt">不过这里还是要再申明一句：自行调用System.gc（）函数会损害到JVM的性能，因为这时候是Stop the World的回收，消耗的时间长，但效果并非最佳。你也许会认为你对程序很熟悉，可以在空闲的时间执行system.gc，不会影响到客户访问，但是正如之前所说，full gc后survivor里的所有内容都被转移到了old区长久保存，所以在某个将来，JVM就不得不因为这个原因再做一次不必要的full gc。</p>
<p style="text-indent: 24pt">IBM JDK下避免主动回收的参数是“<strong>-Xdisableexplicitgc</strong><span style="font-weight: bold;">”，Sun JDK下的参数是“<strong>-XX:+DisableExplicitGC</strong>”，注意区别。</span></p>
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2009/06/websphere-tools-preview.html" rel="bookmark" title="Permanent Link: 预告">预告</a></li><li><a href="http://www.hashei.me/2009/12/critical_thinking.html" rel="bookmark" title="Permanent Link: 什么是真正的思考？">什么是真正的思考？</a></li><li><a href="http://www.hashei.me/2009/06/gc-performance-tuning-with-isa.html" rel="bookmark" title="Permanent Link: WebSphere troubshooting-用工具分析GC Log">WebSphere troubshooting-用工具分析GC Log</a></li><li><a href="http://www.hashei.me/2009/07/heapanalyzer-and-mod4j-introduction.html" rel="bookmark" title="Permanent Link: 用IBM HeapAnalyzer和MOD4J分析Java内存泄漏">用IBM HeapAnalyzer和MOD4J分析Java内存泄漏</a></li><li><a href="http://www.hashei.me/2009/08/serverhang_application_deadlock.html" rel="bookmark" title="Permanent Link: 应用程序死锁导致服务器挂起的介绍">应用程序死锁导致服务器挂起的介绍</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/07/use-hpjtune-to-analysis-gc-log.html/feed</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>启用安全性后无法停止WAS的解决方法</title>
		<link>http://www.hashei.me/2009/06/solve-the-00000056-rolebasedauth-a-secj0305i-problem.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=solve-the-00000056-rolebasedauth-a-secj0305i-problem</link>
		<comments>http://www.hashei.me/2009/06/solve-the-00000056-rolebasedauth-a-secj0305i-problem.html#comments</comments>
		<pubDate>Sat, 20 Jun 2009 08:15:26 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[日志记录]]></category>
		<category><![CDATA[管理安全性]]></category>
		<category><![CDATA[类]]></category>
		<category><![CDATA[类加载]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/06/solve-the-00000056-rolebasedauth-a-secj0305i-problem.html</guid>
		<description><![CDATA[解决启用管理安全性后停止was服务器时失败，systemout.log中出现“00000056 RoleBasedAuth A SECJ0305I: 对于 admin-authz 操作 Server，基于角色的授权检查失败：stop:java.lang.Boolean:java.lang.Integer。用户 UNAUTHENTICATED（唯一标识：unauthenticated）未被授予下列任何必需角色：administrator, operator。”的问题]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt">按照前一篇<a title="WebSphere管理安全性" href="http://www.hashei.me/2009/06/was-console-security.html" target="_blank">启用WebSphere管理安全性</a>，你也许已经为WAS添加了密码来防止未授权的访问。但是当你想在Windows服务中重启、停止websphere时，你会发现无法停止，提示你“发生内部错误”。而在Systemout.log中，则有：</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">00000056 RoleBasedAuth A SECJ0305I: 对于 admin-authz 操作 Server，基于角色的授权检查失败：stop:java.lang.Boolean:java.lang.Integer。用户 UNAUTHENTICATED（唯一标识：unauthenticated）未被授予下列任何必需角色：administrator, operator。</div>
<p style="text-indent: 24pt">这是由于was服务启动用户一般是“本地系统”或者“Administraotr”，而这两个用户都是非认证（unauthenticated）的。如果没有被添加为服务，或者在Unix/Linux环境下，停止服务器时就会提示你输入用户名和密码，或者运行命令时添加“-username 用户名 -password 密码”参数。</p>
<p style="text-indent: 24pt">所以解决的方法很简单，只要更新一下WAS服务属性即可。在wasHome下的bin目录中，运行</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">wasservice -add 服务名（跟在IBMWAS61Service &#8211; 后的那部分）-serverName server1 -profilePath E:\IBM\WebSphere\AppServer\profiles\AppSrv02 -stopArgs &#8220;-username 用户名 -password 密码&#8221;</div>
<p style="text-indent: 24pt">启动服务时不需要用户名密码，所以不需要添加-startArgs参数。得到提示：</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">Service already exists, updating parameters&#8230;</div>
<p style="text-indent: 24pt">
<p><span id="more-437"></span></p>
<p style="text-indent: 24pt">PS：to 香我宁的几个问题：</p>
<p style="text-indent: 24pt"><strong>Q</strong>：在控制台的应用程序服务器–&gt;serverxxx–&gt;进程定义–&gt;Java虚拟机页面勾选“详细类装入”（与-verbose:class作用一样），为何在SystemOut.log里没有类加载的日志输出？应该怎么看？</p>
<p style="text-indent: 24pt"><strong>A</strong>：出现在了native_stderr.log里。</p>
<p><strong>Q</strong>：没有更改日志级别详细信息，也没有类装入器查看器 = =跟踪规范是 *=all=disabled 这是啥意思？</p>
<p><strong>A</strong>：在“故障诊断”中有“类装入器查看器”，在“日志和跟踪”里有“更改日志详细信息级别”，默认为“*=info”。其它级别为</p>
<blockquote><p>WebSphere Application Server 缺省配置在缺省情况下支持 Level.INFO 和更高的 JUL 级别。应用服务器将从 Level.CONFIG 到 Level.SEVERE 的所有 JUL 级别作为日志事件级别对待，旨在供管理员使用。</p>
<p><strong>表 3. 日志记录级别</strong></p>
<p>日志记录级别</p>
<p><code>Level.SEVERE</code></p>
<p><code>Level.WARNING</code></p>
<p><code>Level.INFO</code></p>
<p><code>Level.CONFIG</code></p>
<p>应用服务器将从 Level.FINE 到 Level.FINEST 的级别作为跟踪级别对待（这些级别用于旨在帮助代码的作者进行调试的事件）。</p></blockquote>
<p><a href="http://www.ibm.com/developerworks/cn/websphere/techjournal/0802_supauth/0802_supauth.html" target="_blank">权威支持: WebSphere Application Server 日志记录开发人员指南</a></p>
<p style="text-indent: 24pt">“*=all=disabled ”说明对所有的记录器关闭跟踪，不过具体的要视WAS版本而定。</p>
<blockquote><p>在 WebSphere Application Server V6 和更高版本中，使用了一种日志记录基础结构，即，扩展 Java 日志记录。这导致 WebSphere Application Server 中记录基础结构配置的以下更改：</p>
<ul>
<li>Java 日志记录中定义的记录器等同于先前版本的 WebSphere Application Server 中引入的跟踪组件，并且是使用同一种方式配置的。两者都称为“组件”。</li>
<li>Java 日志记录级别和 WebSphere Application Server 级别都可以使用。以下是有效级别的完整列表，按严重性以升序排列：</li>
<li>将组件的记录和跟踪级别设置为 all 将启用该组件的所有记录。将组件的记录和跟踪级别设置为 off 将禁用该组件的所有记录。</li>
<li>只可以将一个组件配置为一个级别。但是，将组件配置为某种级别将使它可以在配置的级别和任何更高严重性级别执行记录。</li>
<li>几个级别有同等名称：finest 等同于 debug；finer 等同于 entryExit；fine 等同于 event；severe 等同于error。</li>
</ul>
<p>COMPONENT_NAME 是用跟踪服务记录基础结构注册的组件或组的名称。通常，WebSphere Application Server 组件使用标准 Java 类名（例如 com.ibm.servlet.engine.ServletEngine）注册。另外，您可以使用星号（*）通配符终止组件名称和表明多个类或包。例如，使用组件名 com.ibm.servlet.* 指定所有名称以 com.ibm.servlet 开头的组件。在组件名或组名末尾使用星号（*）作为通配符，以使记录字符串可应用于其名称以指定字符串开头的所有组件或组。例如，将“com.ibm.servlet.*”指定为组件名的记录字符串将应用到名称以 com.ibm.servlet 开头的所有组件。当使用星号（*）本身来代替组件名时，字符串指定的级别将应用到所有组件。</p>
<p>以下是在记录字符串中使用星号（*）的一些示例。注意，记录字符串中的星号（*）不需要在它前面具有句点（.）。句点（.）可用于记录字符串中的任何位置。</p>
<ul>
<li><tt>com.ibm.ejs.ras.*=all</tt> － 对于名称以“com.ibm.ejs.ras.”开头的所有记录器启用跟踪。如果有一个记录器名为“com.ibm.ejs.ras”，那么它将不会启用跟踪。</li>
<li><tt>com.ibm.ejs.ras*=all</tt> － 对于名称以“com.ibm.ejs.ras”开头的所有记录器（例如，com.ibm.ejs.ras、com.ibm.ejs.raslogger 和 com.ibm.ejs.ras.ManagerAdmin）启用跟踪</li>
</ul>
<p>注：</p>
<ul>
<li>在 WebSphere Application Server V5.1.1 和更低版本中，可以将 LEVEL 设置为“all=disabled”以禁用跟踪。从 V6.0 开始，此语法将导致 LEVEL=info；将禁用跟踪，但是将启用日志记录。</li>
<li>在 WebSphere Application Server V6 和更高版本中，“info”是缺省级别。如果指定的组件不存在（未找到 *=xxx），那么始终意味着 *=info。跟踪字符串不匹配的任何组件会将其级别设置为 info。</li>
<li>如果记录字符串不以指定所有组件级别的组件记录字符串（使用“*”代替组件名）开始，那么将添加一个设置所有组件为缺省级别的组件记录字符串。</li>
<li>在 V6 和更高版本中，不需要使 STATE = enabled | disabled。但如果使用的话，它具有下列效果：
<ul>
<li>“enabled”将指定组件的记录设置为指定的级别</li>
<li>“disabled”将指定组件的记录设置为上述指定级别的上一个级别。下列示例说明禁用记录级别的影响：</li>
</ul>
</li>
</ul>
</blockquote>
<p>更详细的看<a title="跟踪和日志记录配置" href="http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/rtrb_enabletrc.html" target="_blank">跟踪和日志记录配置</a></p>
<p>最后，如果可以的话，改下你的名字。</p>
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2009/08/serverhang_application_deadlock.html" rel="bookmark" title="Permanent Link: 应用程序死锁导致服务器挂起的介绍">应用程序死锁导致服务器挂起的介绍</a></li><li><a href="http://www.hashei.me/2009/08/java-heap-fragmentation-with-ibm-jdk.html" rel="bookmark" title="Permanent Link: IBM JDK的Java堆空间的碎片问题">IBM JDK的Java堆空间的碎片问题</a></li><li><a href="http://www.hashei.me/2009/12/critical_thinking.html" rel="bookmark" title="Permanent Link: 什么是真正的思考？">什么是真正的思考？</a></li><li><a href="http://www.hashei.me/2009/12/ibm_support_newsletter_for_websphere_application_server_1219.html" rel="bookmark" title="Permanent Link: IBM WebSphere最新技术支持信息">IBM WebSphere最新技术支持信息</a></li><li><a href="http://www.hashei.me/2009/06/gc-performance-tuning-with-isa.html" rel="bookmark" title="Permanent Link: WebSphere troubshooting-用工具分析GC Log">WebSphere troubshooting-用工具分析GC Log</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/06/solve-the-00000056-rolebasedauth-a-secj0305i-problem.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>再谈WebSphere的类加载和故障排查</title>
		<link>http://www.hashei.me/2009/06/troubshoot-classloader-problems.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=troubshoot-classloader-problems</link>
		<comments>http://www.hashei.me/2009/06/troubshoot-classloader-problems.html#comments</comments>
		<pubDate>Mon, 15 Jun 2009 15:14:31 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[类]]></category>
		<category><![CDATA[类加载]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/06/troubshoot-classloader-problems.html</guid>
		<description><![CDATA[引言
上次在WebSphere的类加载机制和故障排查一文中整理了一下Classloader的一些概念和加载原理，但是概念的应用不是很明确，故障排查也没有具体的方法。所以今天再次整理了IBM developerworks里有参考价值的文章，希望在遇到ClassCastException、ClassNotFoundException、NoClassDefFoundException、UnsatisfiedLinkError的错误时提供解决思路。
《IBM WebSphere 开发者技术期刊: 类路径冲突的鉴别》一文描述了应用迁移过程中所发生的“故障”，打引号那是因为WAS并没有发出ClassNotFoundException、NoClassDefFoundException等的错误信息，而是“并非预期中的表现”。最终排查的结果是：
其结果是 C:/Program Files/IBM/WebSphere Studio/Application Developer/v5.1.1/runtimes/base_v51/lib/jython.jar ，而不是我们期望的 WEB-INF/lib/jakarta-oro-2.0.7.jar 。
这是一个很典型的类加载错误例子，默认的PARENT_FIRST加载模式让类加载器从应用程序 类装入器中先加载了jython.jar（正好有需要的同名类），而不是Web 模块类装入器里正确的jakarta-oro-2.0.7.jar。解决的方法可以启用PARENT_LAST 加载模式即可。文章中检测类加载路径的Servelet可以作为参考，不过我觉得一般会用-verbose 命令行选项打开 IBM JVM 的详细输出、trace某个确定类的情况或者dump的方法找到类加载路径。
调试方法
《类装入问题解密，第 1 部分: 类装入和调试工具介绍》 介绍了这三种方法

可以用 -verbose 命令行选项打开 IBM JVM 的详细输出。当某些事件发生的时候（例如，类装入时），详细输出会在控制台上显示信息。要想得到额外的类装入信息，可以用详细类输出。可以用 -verbose:class 选项启动这个模式。
解释详细输出
详细输出列出已经打开的所有 JAR 文件，包括到这些 JAR 的完整路径。下面是一个示例：
...
[Opened D:\jre\lib\core.jar in 10 ms]
[Opened D:\jre\lib\graphics.jar in 10 ms]
...
所有装入的类都已经列出，同时还指出它们是从哪个 JAR 文件或目录装入的。例如：
...
[Loaded java.lang.NoClassDefFoundError from D:\jre\lib\core.jar]
[Loaded java.lang.Class from D:\jre\lib\core.jar]
[Loaded java.lang.Object from D:\jre\lib\core.jar]
...



 可以仅输出某个特定的类加载情况，剔除无关紧要的类：

-Dibm.cl.verbose=&#60;class name&#62;。可以用正则表达式声明类的名称，例如 [...]]]></description>
			<content:encoded><![CDATA[<h4>引言</h4>
<p style="text-indent: 24pt">上次在<a title="WebSphere的类加载机制和故障排查" href="http://www.hashei.me/2009/05/websphere-class-loader-troubshooting.html" target="_blank">WebSphere的类加载机制和故障排查</a>一文中整理了一下Classloader的一些概念和加载原理，但是概念的应用不是很明确，故障排查也没有具体的方法。所以今天再次整理了IBM developerworks里有参考价值的文章，希望在遇到ClassCastException、ClassNotFoundException、NoClassDefFoundException、UnsatisfiedLinkError的错误时提供解决思路。</p>
<p>《<a title="IBM WebSphere 开发者技术期刊: 类路径冲突的鉴别" href="http://www.ibm.com/developerworks/cn/websphere/techjournal/0406_brown/0406_brown.html" target="_blank">IBM WebSphere 开发者技术期刊: 类路径冲突的鉴别</a>》一文描述了应用迁移过程中所发生的“故障”，打引号那是因为WAS并没有发出ClassNotFoundException、NoClassDefFoundException等的错误信息，而是“并非预期中的表现”。最终排查的结果是：</p>
<blockquote><p>其结果是 <code>C:/Program Files/IBM/WebSphere Studio/Application Developer/v5.1.1/runtimes/base_v51/lib/jython.jar</code> ，而不是我们期望的 <code>WEB-INF/lib/jakarta-oro-2.0.7.jar</code> 。</p></blockquote>
<p style="text-indent: 24pt">这是一个很典型的类加载错误例子，默认的PARENT_FIRST加载模式让类加载器从<em>应用程序</em> 类装入器中先加载了<code>jython.jar（正好有需要的同名类），而不是Web 模块类装入器里正确的<code>jakarta-oro-2.0.7.jar。解决的方法可以启用PARENT_LAST 加载模式即可。文章中检测类加载路径的Servelet可以作为参考，不过我觉得一般会用<code>-verbose</code> 命令行选项打开 IBM JVM 的详细输出、trace某个确定类的情况或者dump的方法找到类加载路径。</code></code></p>
<h4>调试方法</h4>
<p><a title="类装入问题解密，第 1 部分: 类装入和调试工具介绍" href="http://www.ibm.com/developerworks/cn/java/j-dclp1/index.html#resources" target="_blank">《类装入问题解密，第 1 部分: 类装入和调试工具介绍》</a> 介绍了这三种方法</p>
<blockquote><p><a name="2.1"></a></p>
<p>可以用 <code>-verbose</code> 命令行选项打开 IBM JVM 的详细输出。当某些事件发生的时候（例如，类装入时），详细输出会在控制台上显示信息。要想得到额外的类装入信息，可以用详细类输出。可以用 <code>-verbose:class</code> 选项启动这个模式。</p>
<p><strong>解释详细输出</strong><br />
详细输出列出已经打开的所有 JAR 文件，包括到这些 JAR 的完整路径。下面是一个示例：</p>
<pre>...
[Opened D:\jre\lib\core.jar in 10 ms]
[Opened D:\jre\lib\graphics.jar in 10 ms]
...</pre>
<p>所有装入的类都已经列出，同时还指出它们是从哪个 JAR 文件或目录装入的。例如：</p>
<pre>...
[Loaded java.lang.NoClassDefFoundError from D:\jre\lib\core.jar]
[Loaded java.lang.Class from D:\jre\lib\core.jar]
[Loaded java.lang.Object from D:\jre\lib\core.jar]
...</pre>
</blockquote>
<p><span id="more-430"></span></p>
<blockquote></blockquote>
<p style="text-indent: 24pt"><code></code> 可以仅输出某个特定的类加载情况，剔除无关紧要的类：</p>
<blockquote>
<p style="text-indent: 24pt"><code>-Dibm.cl.verbose=&lt;class name&gt;</code>。可以用正则表达式声明类的名称，例如 <code>Hello*</code> 会跟踪所有以 <code>Hello</code> 开头的名称。</p>
</blockquote>
<p style="text-indent: 24pt">更准确的就是用trace工具来跟踪java代码</p>
<blockquote><p><a name="2.4"></a></p>
<p>IBM JVM 有一个内置的方法跟踪工具。这样，不需要修改 Java 代码，就可以跟踪任何 Java 代码（包括核心系统）中的方法。因为这个工具可以提供大量数据，所以可以控制跟踪的级别，只获取需要的信息。</p></blockquote>
<p style="text-indent: 24pt">比较常用的步骤是</p>
<blockquote>
<ol>
<li>在控制台导航树中单击<strong>故障诊断 &gt; 记录和跟踪</strong>，然后单击<strong>服务器 &gt; 诊断跟踪</strong>。</li>
<li>单击<strong>配置</strong>。</li>
<li>选中<strong>启用记录</strong>复选框启用跟踪，清除此复选框禁用跟踪。</li>
</ol>
<p>…………</p>
<ol>
<li>
<ol>
<li>在控制台导航树中单击<strong>故障诊断 &gt; 记录和跟踪</strong>。</li>
<li>选择服务器名。</li>
<li>单击<strong>更改日志级别详细信息</strong>。</li>
<li>如果已启用所有组件，那么可能要关闭它，然后启用特定组件。</li>
<li>单击组件或组名。要获取更多详细信息，请参阅<a href="http://publib.boulder.ibm.com/utrb_loglevel.html">日志级别设置 </a>。 如果所选服务器未在运行，您将无法以图形方式查看个人组件。</li>
<li>在跟踪字符串框中输入跟踪字符串。</li>
<li>选择<strong>应用</strong>，然后选择<strong>确定</strong>。</li>
</ol>
</li>
<p>要输入<strong>跟踪字符串</strong>以将跟踪规范设置为需要的状态：</ol>
</blockquote>
<p style="text-indent: 24pt">亦可以在故障诊断 &gt; 类装入器查看器 以访问<a href="http://publib.boulder.ibm.com/utrb_classload_topology.html">企业应用程序拓扑页</a>。</p>
<blockquote><p>在“类装入器查看器”页中，单击搜索以访问<a href="http://publib.boulder.ibm.com/utrb_classload_viewer_search.html">搜索页</a>，在该页面上，可以在类装入器中搜索下列内容：</p>
<ul>
<li>特定字符串</li>
<li>特定 .jar 文件</li>
<li>特定目录中的文件名</li>
<li>特定类装入器装入的文件名</li>
</ul>
<p>搜索区分大小写</p></blockquote>
<p style="text-indent: 24pt">用Trace来跟踪应用程序情况其实是很有技术含量的活，能自成一章。我接触的项目基本到了这一层都是由开发人员去找BUG，所以实践比较少，没有分享的心得。有机会的话我以后学习学习再写。</p>
<p style="text-indent: 24pt">如果你觉得用trace来跟踪调试比较复杂，那么可以直接生成一个javacore文件，里面会记录类装入器信息。推荐用wsadmin命令来生成core文件，因为kill -3带出个heapdump可就不好玩了。</p>
<blockquote><p>键入wsadmin.bat命令，进入wsadmin管理命令行，键入如下代码：</p>
<p>set jvm [$AdminControl completeObjectName type=JVM,process=server1,*]<br />
$AdminControl invoke $jvm dumpThreads</p>
<p>javacore.TIMESTAMP.NUMBER.txt  文件会自动在C:\WebSphere\AppServer或C:\WebSphere\AppServer\default\或您指定的目录中产生。</p></blockquote>
<h4>解决问题</h4>
<p style="text-indent: 24pt"><code></code> 了解了查找的方法，那么就可以开始解决问题：《<a title="类装入问题解密，第 2 部分: 基本的类装入异常" href="http://www.ibm.com/developerworks/cn/java/j-dclp3/j-dclp2.html" target="_blank">类装入问题解密，第 2 部分: 基本的类装入异常</a>》</p>
<blockquote><p><a name="1"></a></p>
<p><code>ClassNotFoundException</code> 是最常见的类装入异常类型。它发生在装入阶段。Java 规范对 <code>ClassNotFoundException</code> 的描述是这样的：</p>
<p>当应用程序试图通过类的字符串名称，使用以下三种方法装入类，但却找不到指定名称的类定义时抛出该异常。</p>
<ul>
<li>类 <code>Class</code> 中的 <code>forName()</code> 方法。</li>
<li>类 <code>ClassLoader</code> 中的 <code>findSystemClass()</code> 方法。</li>
<li>类 <code>ClassLoader</code> 中的 <code>loadClass()</code> 方法。</li>
</ul>
<p>所以，如果显式地装入类的尝试失败，那么就抛出 <code>ClassNotFoundException</code>。</p>
<p>通过抛出 <code>ClassNotFoundException</code>，类装入器提示，定义类时所需要的字节码在类装入器所查找的位置上不存在。这些异常修复起来通常比较简单。可以用 IBM 的 verbose 选项检查类路径，确保使用的类路径设置正确。如果类路径设置正确，但是仍然看到这个错误，那么就是需要的类在类路径中不存在。要修复这个问题，<strong>可以把类移动到类路径中指定的目录或 JAR 文件中，或者把类所在的位置添加到类路径中</strong>。</p></blockquote>
<p style="text-indent: 24pt">更多的解决方法请看原文，第三第四部分的<a title="类装入问题解密，第 3 部分: 处理更少见的类装入问题" href="http://www.ibm.com/developerworks/cn/java/j-dclp3/" target="_blank">《类装入问题解密，第 3 部分: 处理更少见的类装入问题》</a>、</p>
<p style="text-indent: 24pt"><a title="类装入问题解密，第 4 部分: 死锁和约束" href="http://www.ibm.com/developerworks/cn/java/j-dclp4/" target="_blank">《类装入问题解密，第 4 部分: 死锁和约束》</a>就当扩展阅读好了。</p>
<h4>个人感想</h4>
<p style="text-indent: 24pt">类加载问题是一个不难解决却又令人头疼的问题，特别当多个应用部署在一起的时候。问题状况相似，其实出错的原因并不相同，往往单独成功的例子——例如jar包换个目录，对于另一个环境却束手无策。网上有不少的“解决案例”就是这种情况，所以我把“渔”的方法整理在这里，并非个人原创，都是照搬IBM developerworks已有的内容。惭愧之余，也感叹一下IBM的强大。其它的大型厂商，Oracle是需要Metelink帐号才能看到最有用的资料；SAP也需要购买产品后才能查看详细文档，否则就是简陋两字；Microsoft的KB和webcasting做的很好，但是本地化和有价值的文章却逊一筹。在都想依靠服务赚钱的现在，IBM没有把这些资料敝帚自珍，让我们有个学习交流的环境，这也许是它数十年它经久不衰的奥秘之一吧。</p>
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2010/03/inside_java_classloader.html" rel="bookmark" title="Permanent Link: Java 类加载器的又一篇文章">Java 类加载器的又一篇文章</a></li><li><a href="http://www.hashei.me/2009/05/code-optimization-for-gc.html" rel="bookmark" title="Permanent Link: JAVA性能优化&mdash;编写符合GC胃口的程序">JAVA性能优化&mdash;编写符合GC胃口的程序</a></li><li><a href="http://www.hashei.me/2009/08/serverhang_application_deadlock.html" rel="bookmark" title="Permanent Link: 应用程序死锁导致服务器挂起的介绍">应用程序死锁导致服务器挂起的介绍</a></li><li><a href="http://www.hashei.me/2009/12/ibm_support_newsletter_for_websphere_application_server_1219.html" rel="bookmark" title="Permanent Link: IBM WebSphere最新技术支持信息">IBM WebSphere最新技术支持信息</a></li><li><a href="http://www.hashei.me/2009/09/ibm_websphere_support_tips1.html" rel="bookmark" title="Permanent Link: IBM WebSphere Recent Supports">IBM WebSphere Recent Supports</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/06/troubshoot-classloader-problems.html/feed</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>WebSphere troubshooting-用工具分析GC Log</title>
		<link>http://www.hashei.me/2009/06/gc-performance-tuning-with-isa.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=gc-performance-tuning-with-isa</link>
		<comments>http://www.hashei.me/2009/06/gc-performance-tuning-with-isa.html#comments</comments>
		<pubDate>Tue, 09 Jun 2009 14:52:59 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[性能优化]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[gc performance tuning]]></category>
		<category><![CDATA[gc日志分析]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/06/gc-performance-tuning-with-isa.html</guid>
		<description><![CDATA[要进行gc performance tuning，不得不对gc log进行分析。之前说到了“人肉”的方法，总觉得不够形象，无法让不了解的开发人员抑或是技术负责人有个直观的了解，所以本文介绍几个分析GC log的工具。
首先需要下载IBM Support Assistant，下载之后就可以从Update-Tools add on中下载我们需要的工具了，ISA使用方法。ISA把所有的工具集成在一个界面内，省去了设置启动参数的麻烦，同时能保持最及时的更新。分析垃圾回收日志，我主要用“The IBM Monitoring and Diagnostic Tools for Java™ &#8211; Garbage Collection and Memory Visualizer”和“IBM Pattern Modeling and Analysis Tool for Java Garbage Collector (PMAT)”这两个工具。我会用实际例子来说明如何使用这个工具。
用Garbage Collection and Memory Visualizer载入native_stderr.log，首先你会看到
 点击展开大图
这是一个500分钟的垃圾回收曲线图，可以观察到一天以内的大致情况。总的来说，蓝色的Used heap（after collection）运行在“平行通道”内，没有走“上升通道”（炒股的朋友应该知道上升通道的图形是咋样的）。所以在Report这个标签内，可以看到“The memory usage of the application does not indicate any obvious leaks.”。
Report中的Summary是需要关注的，它向我们显示了GC发生的次数，所用的policy（optthruput），平均停顿时间248ms，平均间隔时间3.37分钟，还有垃圾回收的速率（垃圾产生多并非不好，反而是吞吐率高的一种表现）。

让我们再切回Line plot视图，现在可以框选某一个时间段进行放大，同时在右边Axes中选择X轴的坐标系，默认的是相对时间，以分钟为单位，适用于你的应用程序总在启动N个小时后出现问题。如果是每天固定时间发生性能问题，那么应该选用绝对时间。
默认的曲线开启了Heap size和Used heap（after collection），你可以根据需要，在VGC Pause [...]]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt">要进行gc performance tuning，不得不对gc log进行分析。之前说到了“<a title="直接分析GC日志" href="http://www.hashei.me/2009/06/analyse-the-gc-logs.html" target="_blank">人肉</a>”的方法，总觉得不够形象，无法让不了解的开发人员抑或是技术负责人有个直观的了解，所以本文介绍几个分析GC log的工具。</p>
<p style="text-indent: 24pt">首先需要<a title="下载IBM Support Assistant" href="http://www-01.ibm.com/software/support/isa/download.html" target="_blank">下载IBM Support Assistant</a>，下载之后就可以从Update-Tools add on中下载我们需要的工具了，<a title="使用 IBM Support Assistant 进行快速的问题诊断" href="http://www.ibm.com/developerworks/cn/websphere/library/techarticles/0710_pengfei/index.html" target="_blank">ISA使用方法</a>。ISA把所有的工具集成在一个界面内，省去了设置启动参数的麻烦，同时能保持最及时的更新。分析垃圾回收日志，我主要用“The IBM Monitoring and Diagnostic Tools for Java™ &#8211; Garbage Collection and Memory Visualizer”和“IBM Pattern Modeling and Analysis Tool for Java Garbage Collector (<strong>PMAT</strong>)”这两个工具。我会用实际例子来说明如何使用这个工具。</p>
<p style="text-indent: 24pt">用Garbage Collection and Memory Visualizer载入native_stderr.log，首先你会看到</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/06/gclog1.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/gclog1-thumb.jpg" border="0" alt="gclog1" width="244" height="144" /></a> 点击展开大图</p>
<p style="text-indent: 24pt">这是一个500分钟的垃圾回收曲线图，可以观察到一天以内的大致情况。总的来说，蓝色的Used heap（after collection）运行在“平行通道”内，没有走“上升通道”（炒股的朋友应该知道上升通道的图形是咋样的）。所以在Report这个标签内，可以看到“The memory usage of the application does not indicate any obvious leaks.”。</p>
<p style="text-indent: 24pt">Report中的Summary是需要关注的，它向我们显示了GC发生的次数，所用的policy（optthruput），平均停顿时间248ms，平均间隔时间3.37分钟，还有垃圾回收的速率（垃圾产生多并非不好，反而是吞吐率高的一种表现）。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/06/summary.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/summary-thumb.jpg" border="0" alt="Summary" width="243" height="244" /></a></p>
<p style="text-indent: 24pt">让我们再切回Line plot视图，现在可以框选某一个时间段进行放大，同时在右边Axes中选择X轴的坐标系，默认的是相对时间，以分钟为单位，适用于你的应用程序总在启动N个小时后出现问题。如果是每天固定时间发生性能问题，那么应该选用绝对时间。</p>
<p style="text-indent: 24pt">默认的曲线开启了Heap size和Used heap（after collection），你可以根据需要，在VGC Pause Date和VGC Date、VGC Heap Data中勾选你需要查看的曲线。比如你觉得程序响应时间很长，那么可以勾选上Intervals between garbage collection triggers和Pause time，看看上一条曲线是否和下面一条靠的“太近”。</p>
<p><span id="more-415"></span></p>
<p style="text-indent: 24pt">
<p style="text-indent: 24pt">通过在Tabbed data查看表格模式的回收信息（同样可以勾选上节提到的内容来增加列），我们看到GC reason都是af。通过查看Free heap before collection，我们发觉有些af发生的时候，堆中剩余空间其实还很多</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/06/tabbed-data.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/tabbed-data-thumb.jpg" border="0" alt="tabbed data" width="244" height="243" /></a> <a href="http://hashei.me/wp-content/uploads/2009/06/free-heap.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/free-heap-thumb.jpg" border="0" alt="free heap" width="244" height="223" /></a></p>
<p style="text-indent: 24pt">我覆盖上Compact times曲线，发现十分吻合，原来这时候是堆中的碎片太多，所以导致即使剩余空间很大，但是没有连续空间分配给新的请求，这时候就发生了压缩（Compact）。加了Unuseable heap due to fragmentatiion的第二副图可以更明显的看到这一点。进一步的，可以根据第一列的collection id去查看引起压缩的requested_bytes，是否是需要比较大的空间（比如大于64K），然后结合一下所有requested_bytes中大对象的数量，确定是否需要调整LOA大小。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/06/compact-time.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/compact-time-thumb.jpg" border="0" alt="compact time" width="244" height="206" /></a> <a href="http://hashei.me/wp-content/uploads/2009/06/compact-time2.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/compact-time2-thumb.jpg" border="0" alt="compact time2" width="244" height="231" /></a></p>
<p style="text-indent: 24pt">你会发现还有些曲线是灰色不能选择状态，那是因为根据我现在的gc policy，关于gencon模式下的nursery和tenured曲线在这个模式下是没有的。</p>
<p style="text-indent: 24pt">总的来说，运用不同的曲线组合，合适的缩放比例，不同的日志显示方法对照，我们可以验证gc是否符合我之前说过的一些“最优原则”——回收的间隔是否大大大于持续时间，整个回收时间是否在整个运行时间的5%以下，压缩的次数是否仅占一小部分（太多的话要考虑设置-Xk -Xp和调整LOA）……Report标签中的Tuning recommendation是你需要参考的内容，但不要一上来就根据它的提示去优化。</p>
<p style="text-indent: 24pt">下一个例子就是一个有问题的垃圾回收日志。</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/06/problem3.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/problem3-thumb.jpg" border="0" alt="problem3" width="244" height="226" /></a> 整个垃圾回收曲线，当中有一个Excessive GC引发的OutOfMemory</p>
<blockquote>
<h4><a name="cms.oom"></a></h4>
<p>The concurrent collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small.</p></blockquote>
<p style="text-indent: 24pt">放大出问题的地方的曲线</p>
<p style="text-indent: 24pt"><a href="http://hashei.me/wp-content/uploads/2009/06/problem.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/problem-thumb.jpg" border="0" alt="problem" width="244" height="210" /></a> <a href="http://hashei.me/wp-content/uploads/2009/06/problem2.jpg"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/06/problem2-thumb.jpg" border="0" alt="problem2" width="244" height="241" /></a></p>
<p style="text-indent: 24pt">根据ID查看原始文件</p>
<blockquote><p>&lt;af type=&#8221;tenured&#8221; id=&#8221;1030&#8243; timestamp=&#8221;Tue Nov 04 14:57:26 2008&#8243; intervalms=&#8221;11.440&#8243;&gt;<br />
&lt;minimum requested_bytes=&#8221;216&#8243; /&gt;<br />
&lt;time exclusiveaccessms=&#8221;0.322&#8243; /&gt;<br />
&lt;tenured freebytes=&#8221;0&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; &gt;<br />
&lt;soa freebytes=&#8221;0&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;loa freebytes=&#8221;0&#8243; totalbytes=&#8221;0&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;/tenured&gt;<br />
&lt;gc type=&#8221;global&#8221; id=&#8221;1031&#8243; totalid=&#8221;1031&#8243; intervalms=&#8221;11.837&#8243;&gt;<br />
&lt;compaction movecount=&#8221;3301258&#8243; movebytes=&#8221;196674784&#8243; reason<strong>=&#8221;low free space (less than 4%)&#8221;</strong> /&gt;<br />
&lt;refs_cleared soft=&#8221;0&#8243; weak=&#8221;15&#8243; phantom=&#8221;7&#8243; /&gt;<br />
&lt;finalization objectsqueued=&#8221;40&#8243; /&gt;<br />
&lt;timesms mark=&#8221;1753.108&#8243; sweep=&#8221;10.302&#8243; compact=&#8221;1858.390&#8243; total=&#8221;3622.032&#8243; /&gt;<br />
&lt;tenured freebytes=&#8221;519040&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; &gt;<br />
&lt;soa freebytes=&#8221;519040&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;loa freebytes=&#8221;0&#8243; totalbytes=&#8221;0&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;/tenured&gt;<br />
&lt;/gc&gt;<br />
&lt;warning details=&#8221;excessive gc activity detected&#8221; /&gt;<br />
&lt;tenured freebytes=&#8221;518352&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; &gt;<br />
&lt;soa freebytes=&#8221;518352&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;loa freebytes=&#8221;0&#8243; totalbytes=&#8221;0&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;/tenured&gt;<br />
&lt;time totalms=&#8221;3623.033&#8243; /&gt;<br />
&lt;/af&gt;</p>
<p>&lt;af type=&#8221;tenured&#8221; id=&#8221;1031&#8243; timestamp=&#8221;Tue Nov 04 14:57:30 2008&#8243; intervalms=&#8221;3.715&#8243;&gt;<br />
&lt;minimum requested_bytes=&#8221;120&#8243; /&gt;<br />
&lt;time exclusiveaccessms=&#8221;0.232&#8243; /&gt;<br />
&lt;tenured freebytes=&#8221;0&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; &gt;<br />
&lt;soa freebytes=&#8221;0&#8243; totalbytes=&#8221;1342177280&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;loa freebytes=&#8221;0&#8243; totalbytes=&#8221;0&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;/tenured&gt;<br />
&lt;gc type=&#8221;global&#8221; id=&#8221;1032&#8243; totalid=&#8221;1032&#8243; intervalms=&#8221;4.385&#8243;&gt;<br />
&lt;compaction movecount=&#8221;3170275&#8243; movebytes=&#8221;192337800&#8243; reason=&#8221;<strong>compact to aid heap contraction&#8221; </strong>/&gt;<br />
&lt;contraction type=&#8221;tenured&#8221; amount=&#8221;67108352&#8243; newsize=&#8221;1275068928&#8243; timetaken=&#8221;0.001&#8243; reason=&#8221;<strong>excess free space following gc</strong>&#8221; /&gt;<br />
&lt;refs_cleared soft=&#8221;0&#8243; weak=&#8221;0&#8243; phantom=&#8221;0&#8243; /&gt;<br />
&lt;finalization objectsqueued=&#8221;9&#8243; /&gt;<br />
&lt;timesms mark=&#8221;245.373&#8243; sweep=&#8221;12.272&#8243; compact=&#8221;482.748&#8243; total=&#8221;745.486&#8243; /&gt;<br />
&lt;tenured freebytes=&#8221;1022923992&#8243; totalbytes=&#8221;1275068928&#8243; percent=&#8221;80&#8243; &gt;<br />
&lt;soa freebytes=&#8221;1022923992&#8243; totalbytes=&#8221;1275068928&#8243; percent=&#8221;80&#8243; /&gt;<br />
&lt;loa freebytes=&#8221;0&#8243; totalbytes=&#8221;0&#8243; percent=&#8221;0&#8243; /&gt;<br />
&lt;/tenured&gt;<br />
&lt;/gc&gt;</p></blockquote>
<p style="text-indent: 24pt">考虑到垃圾回收频率也较高，每次请求的空间都不大，所以我讲垃圾回收模式改为了gencon，之后没有OutOfMemory发生。不过我那份回收日志没有留下来，否则应该在文件Compare file中导入修改后生成的日志，进行比较，确定修改的结果。</p>
<p style="text-indent: 24pt"><strong>注意事项：</strong>对于IBM JDK，直接勾选“详细垃圾回收”即可，但是对于HP 或者Sun JDK，最好使用-Xverbosegclog:file_name（生成能解析的XML格式）来指定一下，否则默认文件中会包含其它信息，工具讲无法以图像显示，还要自己修改文件，比较麻烦的。</p>
<p style="text-indent: 24pt">IBM Pattern Modeling and Analysis Tool for Java Garbage Collector (<strong>PMAT</strong>)这个工具大同小异，个人使用它只是觉得提供的优化建议可以和上述工具互补，比如它会计算一个适合的-Xk -Xp值出来。</p>
<p style="text-indent: 24pt">对于HP-UX上生成的垃圾回收日志，我们还可以<a title="用HPjtune分析GC日志" href="http://www.hashei.me/2009/07/use-hpjtune-to-analysis-gc-log.html" target="_blank">用HPjmeter下的HPjtune来分析</a>。</p>
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2009/07/java-performance-tuning-resources.html" rel="bookmark" title="Permanent Link: Java性能优化参考资料">Java性能优化参考资料</a></li><li><a href="http://www.hashei.me/2010/05/linux-system-performance-monitoring.html" rel="bookmark" title="Permanent Link: Linux 性能监控">Linux 性能监控</a></li><li><a href="http://www.hashei.me/2009/05/adjust-proper-pool-size.html" rel="bookmark" title="Permanent Link: 你需要多大的池？&mdash; WebSphere性能优化（一）">你需要多大的池？&mdash; WebSphere性能优化（一）</a></li><li><a href="http://www.hashei.me/2009/12/java_performance_slides.html" rel="bookmark" title="Permanent Link: 两个关于JAVA性能优化的PPT">两个关于JAVA性能优化的PPT</a></li><li><a href="http://www.hashei.me/2009/05/code-optimization-for-gc.html" rel="bookmark" title="Permanent Link: JAVA性能优化&mdash;编写符合GC胃口的程序">JAVA性能优化&mdash;编写符合GC胃口的程序</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/06/gc-performance-tuning-with-isa.html/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>WebSphere的类加载机制和故障排查</title>
		<link>http://www.hashei.me/2009/05/websphere-class-loader-troubshooting.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=websphere-class-loader-troubshooting</link>
		<comments>http://www.hashei.me/2009/05/websphere-class-loader-troubshooting.html#comments</comments>
		<pubDate>Wed, 27 May 2009 07:59:51 +0000</pubDate>
		<dc:creator>hashei</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[jar包冲突]]></category>
		<category><![CDATA[类加载]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/05/websphere-class-loader-troubshooting.html</guid>
		<description><![CDATA[本文总结IBM官方网站和Developworks上的关于WebSphere类加载的描述，为解决应用中遇见的ClassCastException、ClassNotFoundException、NoClassDefFoundException、UnsatisfiedLinkError提供解决思路]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt">在部署WebSphere应用的过程中，经常会发生诸如：ClassCastException、ClassNotFoundException、NoClassDefFoundException、UnsatisfiedLinkError的错误。这种有关“类”（Class）的错误，往往来无影——开发环境好的，怎么在生产环境就有问题；而且去无踪——单独建立一个Profile部署一下就没问题了，把Jar包换个目录就OK了。其实要解决这些怪异的问题，首先要了解WebSphere的类加载（Class loader）机制。</p>
<p style="text-indent: 24pt">下文主要内容来自<a href="http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/welcome_nd.html">http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/welcome_nd.html</a> 和IBM developworks中的《如何在WebSphere中解决jar包冲突》一文，我把个人觉得最容易理解的部分总结在一起，方便学习和快速的解决问题。</p>
<h4>使用的类装入器以及使用顺序</h4>
<blockquote><p>WebSphere Application Server 的运行时环境按以下顺序使用下列类装入器来查找和装入应用程序的新类：</p>
<ol>
<li>Java 虚拟机创建的引导程序、扩展和 CLASSPATH 类装入器引导程序类装入器使用引导类路径（通常是 <tt>jre/lib</tt> 中的类）找到并装入类。扩展类装入器使用系统属性 java.ext.dirs（通常是 <tt>jre/lib/ext</tt>）找到并装入类。CLASSPATH 类装入器使用 CLASSPATH 环境变量查找和装入类。CLASSPATH 类装入器装入 WebSphere Application Server 产品在 <tt>j2ee.jar</tt> 文件中提供的 Java 2 Platform, Enterprise Edition（J2EE）应用程序编程接口（API）。由于这个类装入器装入 J2EE API，所以，可以将依赖于 J2EE API 的库添加到类路径系统属性中以扩展服务器类路径。但是，扩展服务器的类路径的首选方法是<a href="http://publib.boulder.ibm.com/tcws_sharedlib_create.html">添加共享库</a>。</li>
<li>WebSphere 扩展类装入器WebSphere 扩展类装入器装入在运行时需要的 WebSphere Application Server 类。扩展类装入器使用 ws.ext.dirs 系统属性来确定装入类时所使用的路径。ws.ext.dirs 类路径中的每个目录和这些目录中的每个 Java 归档（JAR）文件或 ZIP 文件都添加到此类装入器使用的类路径中。如果安装在服务器上的应用程序模块引用了与资源提供程序相关联的资源，并且该提供程序指定了资源驱动程序的目录名称，那么 WebSphere 扩展类装入器还将资源提供程序类装入到服务器中。</li>
<li>一个或多个应用程序模块类装入器，它们负责装入在服务器中运行的企业应用程序的元素应用程序元素可以是 Web 模块、企业 bean（EJB）模块、资源适配器归档（RAR 文件）和依赖项 JAR 文件。应用程序类装入器按照 J2EE 类装入规则从企业应用程序装入类和 JAR 文件。WebSphere Application Server 允许使共享库与应用程序相关联。</li>
<li>零个或多个 Web 模块类装入器缺省情况下，Web 模块类装入器装入 WEB-INF/classes 和 WEB-INF/lib 目录的内容。Web 模块类装入器是应用程序类装入器的子代。可以指定使用应用程序类装入器来装入 Web 模块的内容，而不是使用 Web 模块类装入器来装入这些内容。</li>
</ol>
</blockquote>
<p><a name="N10072"></a><span id="more-345"></span></p>
<p><a href="http://hashei.me/wp-content/uploads/2009/05/class-loader.jpg" target="_blank"><img style="border-top-width: 0px; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" src="http://hashei.me/wp-content/uploads/2009/05/class-loader-thumb.jpg" border="0" alt="class loader" width="492" height="316" /></a></p>
<blockquote><p>关于WebSphere的类加载器的层次结构，以下的几点说明可能更有助于进一步的理解类的查找和加载过程：</p>
<ul>
<li>每个类加载器负责在自身定义的类路径上进行查找和加载类。</li>
<li>一个子类加载器能够委托它的父类加载器查找和加载类，一个加载类的请求会从子类加载器发送到父类加载器，但是从来不会从父类加载器发送到子类加载器。</li>
<li>一旦一个类被成功加载，JVM 会缓存这个类直至其生命周期结束，并把它和相应的类加载器关联在一起，这意味着不同的类加载器可以加载相同名字的类。</li>
<li>如果一个加载的类依赖于另一个或一些类，那么这些被依赖的类必须存在于这个类的类加载器查找路径上，或者父类加载器查找路径上。</li>
<li>如果一个类加载器以及它所有的父类加载器都无法找到所需的类，系统就会抛出ClassNotFoundExecption异常或者NoClassDefFoundError的错误。</li>
</ul>
</blockquote>
<h4>类装入器隔离策略</h4>
<blockquote><p>WebSphere中对类加载器有一些相关的配置，称为类加载器策略（class loader policy）。类加载器策略指类加载器的独立策略（class loader isolation policy）,通过类加载器策略设置，我们可以为WAS和应用程序的类加载器进行独立定义。</p>
<p>每个WAS可以配置自己的应用程序类加载器策略，WAS中的每个应用程序也可以配置自己的Web模块类加载器策略，下面我们对这两种策略分别介绍。</p>
<p><strong>1．应用服务器（WAS）配置：应用程序类加载器策略</strong></p>
<p>应用服务器对应用程序类加载器策略有两种配置：</p>
<ul>
<li>Single：整个应用服务器上的所有应用程序使用同一个类加载器。在这种配置下，每个应用程序不再有自己的类加载器。</li>
<li>Multiple：应用服务器上的每个应用程序使用自己的类加载器。</li>
</ul>
<p><strong>2．应用程序配置：Web模块类加载器策略</strong></p>
<p>应用程序中对Web模块类加载器有两种配置：</p>
<ul>
<li>Application：整个应用程序内的所有的实用程序jar包和Web模块使用同一个类加载器。</li>
<li>Module：应用程序内的每个Web模块使用自己的类加载器。应用程序的类加载器仍然存在，负责加载应用程序中Web模块以外的其它类，包括所有的实用程序jar包。</li>
</ul>
<p>从上面的定义可以看出，不同的类加载器策略的配置下，类加载器的层次结构上的某些类加载器可能不存在。比如在应用程序服务器的应用程序类加载器策略定义为single的情况下，应用程序的类加载器将不存在，同一个应用服务器上的所有应用程序将共用同一个类加载器，这也就意味着不同的应用程序之间的类是共享的，应用程序间不能存在同名的类。</p></blockquote>
<h4>类装入器方式</h4>
<blockquote><p>类加载器有一个重要的属性：委托模式（Delegation Mode，有时也称为加载方式：Classloader mode）。委托模式决定了类加载器在查找一个类的时候， 是先查找类加载器自身指定的类路径还是先查找父类加载器上的类路径。</p>
<p>类加载器的委托模式有两个取值：</p>
<ul>
<li>Parent_First：在加载类的时候，在从类加载器自身的类路径上查找加载类之前，首先尝试在父类加载器的类路径上查找和加载类。</li>
<li>Parent_Last：在加载类的时候，首先尝试从自己的类路径上查找加载类，在找不到的情况下，再尝试父类加载器类路径。</li>
</ul>
<p>有了委托模式的概念，我们可以更加灵活的配置在类加载器的层次结构中类的加载和查找方式。表1中给出了在WebSphere的类加载器层次结构中各个类加载器的委托模式的定义，并给出了不同的类加载器内类的生命周期。</p>
<p><a href="http://hashei.me/wp-content/uploads/2009/05/class-loader2.gif"><img style="border-right: 0px; border-top: 0px; border-left: 0px; border-bottom: 0px" src="http://hashei.me/wp-content/uploads/2009/05/class-loader2-thumb.gif" border="0" alt="class loader2" width="569" height="295" /></a></p></blockquote>
<h4>问题解答</h4>
<blockquote><p>具体分析引起jar包冲突的情况，主要有三种：</p>
<ul>
<li>多个应用程序间jar包冲突：多个应用程序间由于使用了共享jar包的不同版本而造成jar包版本冲突。</li>
<li>应用程序中多个Web模块间jar包冲突：同一个应用程序内部，不同的Web模块间同时使用一个jar包的不同版本而造成jar包版本冲突。</li>
<li>应用程序中同一个Web模块内jar包冲突：同一个应用程序内部，同一个Web模块内，由于需要同时使用同一个jar包的两个版本而造成的jar包冲突</li>
</ul>
<p>本部分根据这三种jar包冲突的情况，讨论三种解决jar包冲突的办法，并具体讨论三种解决办法的实现步骤和适用情况：</p>
<ul>
<li>共享库方式解决jar包冲突：主要解决应用程序间的jar包冲突问题</li>
<li>打包到Web模块中解决jar包冲突：主要解决应用程序中多个Web模块间jar包冲突问题</li>
<li>命令行运行方式解决jar包冲突：主要解决应用程序中同一个Web模块内jar包冲突问题</li>
</ul>
<p>详细操作过程参考《如何在WebSphere中解决jar包冲突》一文</p></blockquote>
<h4>扩展阅读</h4>
<p>类装入器</p>
<p><a href="http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/crun_classload.html">http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.base.doc/info/aes/ae/crun_classload.html</a></p>
<p>类装入异常</p>
<p><a href="http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/rtrb_classload_viewer.html">http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/rtrb_classload_viewer.html</a></p>
<p>类装入器查看器设置（排查重要工具）</p>
<p><a href="http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/utrb_classload_viewer.html">http://publib.boulder.ibm.com/infocenter/wasinfo/v6r0/index.jsp?topic=/com.ibm.websphere.nd.doc/info/ae/ae/utrb_classload_viewer.html</a></p>
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2010/03/inside_java_classloader.html" rel="bookmark" title="Permanent Link: Java 类加载器的又一篇文章">Java 类加载器的又一篇文章</a></li><li><a href="http://www.hashei.me/2009/06/troubshoot-classloader-problems.html" rel="bookmark" title="Permanent Link: 再谈WebSphere的类加载和故障排查">再谈WebSphere的类加载和故障排查</a></li><li><a href="http://www.hashei.me/2010/02/tunning-websphere-application-server-was.html" rel="bookmark" title="Permanent Link: 软硬兼施 优化 WebSphere Application Server">软硬兼施 优化 WebSphere Application Server</a></li><li><a href="http://www.hashei.me/2009/04/websphere-introduce.html" rel="bookmark" title="Permanent Link: Websphere系列介绍">Websphere系列介绍</a></li><li><a href="http://www.hashei.me/2009/08/websphere-performance-troubshooting-1.html" rel="bookmark" title="Permanent Link: 一次WebSphere性能问题诊断过程">一次WebSphere性能问题诊断过程</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/05/websphere-class-loader-troubshooting.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>一次因为系统参数而导致的WAS无响应</title>
		<link>http://www.hashei.me/2009/04/was-too-many-open-files.html?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=was-too-many-open-files</link>
		<comments>http://www.hashei.me/2009/04/was-too-many-open-files.html#comments</comments>
		<pubDate>Thu, 30 Apr 2009 14:41:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Websphere系列]]></category>
		<category><![CDATA[排错]]></category>
		<category><![CDATA[rlimit]]></category>
		<category><![CDATA[too many open files]]></category>
		<category><![CDATA[troubleshooting]]></category>

		<guid isPermaLink="false">http://www.hashei.me/2009/04/was-too-many-open-files.html</guid>
		<description><![CDATA[最近一个项目在做压力测试的时候，压力测试人员设置完脚本运行8小时后，第二天总会发现虽然脚本运行正常，但是有一个节点上的Server没有相应。查看日志说日志没有任何报错，记录的最后一条总是前一天的半夜。因为半夜正好是做批处理的时候，一开始怀疑是否是这导致的宕机，但是停掉批处理程序后依旧发生这种现象（而且是在下班时刻发生），也就排除了这个可能。
下面是我的排错经过：
查看无法提供服务的app2的systemout.log日志，发现从昨天晚上5点09分后，没有新的日志输出。最后的日志为
[4/15/09 17:09:28:193 GMT+08:00] 00000953 IncidentStrea W com.ibm.ws.ffdc.IncidentStreamImpl write FFDC0013I: FFDC failed to write to incident stream file /was/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/APP2_44ca44ca_09.04.15_17.09.28_0.txt, caught exception java.lang.NullPointerException
 
查看system.err文件，没有发现有价值的内容
于是查看ffdc目录下的17：09生成的日志，在APP2_7b2c7b2c_09.04.15_17.09.27_0.txt中发现有Stack Dump = javax.imageio.IIOException: Can&#8217;t create output stream!信息。
在APP2_2e542e54_09.04.15_17.09.27_1.txt中发现有Caused by: [SOAPException: faultCode=SOAP-ENV:Client; msg=Error opening socket: java.net.SocketException: Too many open files; targetException=java.lang.IllegalArgumentException: Error opening socket: java.net.SocketException: Too many open files]
 
于是使用ulimit –a命令查看AIX的limit参数
APP2:/was/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc#ulimit -a
time(seconds) unlimited
file(blocks) 2097151
data(kbytes) 131072
stack(kbytes) 32768
memory(kbytes) 32768
coredump(blocks) [...]]]></description>
			<content:encoded><![CDATA[<p style="text-indent: 24pt">最近一个项目在做压力测试的时候，压力测试人员设置完脚本运行8小时后，第二天总会发现虽然脚本运行正常，但是有一个节点上的Server没有相应。查看日志说日志没有任何报错，记录的最后一条总是前一天的半夜。因为半夜正好是做批处理的时候，一开始怀疑是否是这导致的宕机，但是停掉批处理程序后依旧发生这种现象（而且是在下班时刻发生），也就排除了这个可能。</p>
<p style="text-indent: 24pt">下面是我的排错经过：</p>
<p>查看无法提供服务的app2的systemout.log日志，发现从昨天晚上5点09分后，没有新的日志输出。最后的日志为</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">[4/15/09 17:09:28:193 GMT+08:00] 00000953 IncidentStrea W com.ibm.ws.ffdc.IncidentStreamImpl write FFDC0013I: FFDC failed to write to incident stream file /was/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc/APP2_44ca44ca_09.04.15_17.09.28_0.txt, caught exception java.lang.NullPointerException</div>
<p> </p>
<p>查看system.err文件，没有发现有价值的内容</p>
<p>于是查看ffdc目录下的17：09生成的日志，在APP2_7b2c7b2c_09.04.15_17.09.27_0.txt中发现有Stack Dump = javax.imageio.IIOException: <strong>Can&#8217;t create output stream!</strong>信息。</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">在APP2_2e542e54_09.04.15_17.09.27_1.txt中发现有Caused by: [SOAPException: faultCode=SOAP-ENV:Client; msg=Error opening socket: java.net.SocketException: Too many open files; targetException=java.lang.IllegalArgumentException: Error opening socket: java.net.SocketException: Too many open files]</div>
<p> </p>
<p>于是使用ulimit –a命令查看AIX的limit参数</p>
<div style="border-right: #000000 1px dashed; padding-right: 14px; border-top: #000000 1px dashed; padding-left: 14px; padding-bottom: 14px; border-left: #000000 1px dashed; padding-top: 14px; border-bottom: #000000 1px dashed; background-color: #ffffe0">APP2:/was/IBM/WebSphere/AppServer/profiles/AppSrv01/logs/ffdc#ulimit -a<br />
time(seconds) unlimited<br />
file(blocks) 2097151<br />
data(kbytes) 131072<br />
stack(kbytes) 32768<br />
memory(kbytes) 32768<br />
coredump(blocks) 2097151<br />
nofiles(descriptors) 2000</div>
<p> </p>
<p>发现一个进程可以打开的最大文件数为2000.</p>
<p>用ps –ef | grep java命令找到app2的sid，然后用procfiles sid命令查看app2的进程现在打开了多少文件</p>
<p>APP2:/was/IBM/WebSphere/AppServer/profiles/AppSrv01/bin#procfiles 815304</p>
<p>815304 : /was/IBM/WebSphere/AppServer/java/bin/java -Declipse.security -Dwas.status.sock</p>
<p><strong>Current rlimit: 2000 file descriptors</strong></p>
<p><strong>………………</strong></p>
<p>1993: S_IFREG mode:0444 dev:10,15 ino:160745 uid:0 gid:0 rdev:0,0</p>
<p>O_RDONLY size:15694</p>
<p>1994: S_IFREG mode:0444 dev:10,15 ino:160745 uid:0 gid:0 rdev:0,0</p>
<p>O_RDONLY size:15694</p>
<p>1995: S_IFREG mode:0444 dev:10,15 ino:160745 uid:0 gid:0 rdev:0,0</p>
<p>O_RDONLY size:15694</p>
<p>1999: S_IFREG mode:0444 dev:10,15 ino:164300 uid:0 gid:0 rdev:0,0</p>
<p>O_RDONLY size:1042</p>
<p>打开文件数已经到达2000，于是app2无法创建新的systemout.log日志文件，也就无法再提供服务。</p>
<p>由此分析得AIX的nofiles参数对于websphere来说不够大（对于Hp-ux，安装前调整系统参数时就需要把maxfiles调整为8192，但是AIX内核是自调整，所以IBM安装要求上没有提到需要手动修改什么参数）特别是在压力测试时候，短时间内的积累可能会达到2000的限制，实际生产环境中也可能会遇到如此情况，当然集群中另外一台APP跑的好好的也挺让人奇怪。</p>
<p>接下来就是调整ulimit中的nofiles限制到4096，然后重启nodeagent，再由控制台重启app即可。当然也可以直接用命令重启app，但是因为nodeagent没有重启过，环境变量没有生效，下次在控制台中重启app后nofiles依旧会被限制在2000。</p>
<hr /><h2>Related posts:</h2><ul><li><a href="http://www.hashei.me/2009/06/cannot-open-google-docs.html" rel="bookmark" title="Permanent Link: Google Docs不能打开问题">Google Docs不能打开问题</a></li><li><a href="http://www.hashei.me/2009/08/cr370915_in_weblogic10-3_and_jdk1-6.html" rel="bookmark" title="Permanent Link: Weblogic10.3.0在AIX6.1、JDK1.6下挂起解决方法">Weblogic10.3.0在AIX6.1、JDK1.6下挂起解决方法</a></li><li><a href="http://www.hashei.me/2009/06/separating_static_content_from_dynamic_content.html" rel="bookmark" title="Permanent Link: 各司其职-WebSphere的动静分离">各司其职-WebSphere的动静分离</a></li><li><a href="http://www.hashei.me/2009/05/websphere-topology-terminology.html" rel="bookmark" title="Permanent Link: Server Node Cell Cluster&mdash;Websphere拓扑结构及术语介绍上">Server Node Cell Cluster&mdash;Websphere拓扑结构及术语介绍上</a></li><li><a href="http://www.hashei.me/2009/07/san-guo-sha.html" rel="bookmark" title="Permanent Link: 士别三日">士别三日</a></li></ul><hr /><small>  Copyright &copy; 2008 This feed is for personal, non-commercial use only<br />
<a href=www.hashei.com >聚沙成塔-小哈的记事薄</a> by hashei 
如果喜欢，欢迎订阅<a href=feed.hashei.com >feed.hashei.com</a><br />
Digital Fingerprint:
 10f920a9f2bae51c3c73c4f5fb50a949</small>]]></content:encoded>
			<wfw:commentRss>http://www.hashei.me/2009/04/was-too-many-open-files.html/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

