问题背景
- 系统业务场景:当前服务消费其他系统 MQ,并将数据进行落库;
- 系统出现异常报警,线上异常日志情况:
- 通过日志可以看出是
CreateConnectionThread 异常,造成了获取不到数据库连接。
CreateConnectionThread 源码中的三种异常情况
1 锁异常中断
try {
lock.lockInterruptibly();
} catch (InterruptedException e2) {
break;
}
复制代码
2 线程异常中断
try {
boolean emptyWait = true;
// do something... ...(省略代码)
} catch (InterruptedException e) {
lastCreateError = e;
lastErrorTimeMillis = System.currentTimeMillis();
if (!closing) {
LOG.error("create connection Thread Interrupted, url: " + jdbcUrl, e);
}
break;
} finally {
lock.unlock();
}
复制代码
3 获取物理连接异常中断
try {
connection = createPhysicalConnection();
setFailContinuous(false);
} catch (SQLException e) {
LOG.error("create connection SQLException, url: " + jdbcUrl + ", errorCode " + e.getErrorCode()
+ ", state " + e.getSQLState(), e);
errorCount++;
if (errorCount > connectionErrorRetryAttempts && timeBetweenConnectErrorMillis > 0) {
// do something... ...(省略代码)
try {
Thread.sleep(timeBetweenConnectErrorMillis);
} catch (InterruptedException interruptEx) {
break;
}
}
} catch (RuntimeException e) {
LOG.error("create connection RuntimeException", e);
setFailContinuous(true);
continue;
} catch (Error e) {
LOG.error("create connection Error", e);
setFailContinuous(true);
break;
}
复制代码
CreateConnectionThread 异常原因分析
- 结合线上报错日志和源码分析,很容易就定位到,线上异常原因是由于
CreateConnectionThread 线程异常中断引起的;
- 那 CreateConnectionThread 线程异常中断又是因为什么呢?
- 继续分析报错日志可以看到这样一行记录:
[INFO 2021-10-19 15:33:27.538][][][ClusterManager] c.j.j.c.c.GroupConsumer.doStop consumer xxx to xxx1614:xxx,READ,xxx,NONE is stopped.
复制代码
- 发现 MQ 莫名其妙调用了 doStop 方法,猜测问题大概率是出在了
GroupConsumer.doStop 上;
- 查看该方法的源码实现:
@Override
protected void doStop() {
super.doStop();
threadGroup.interrupt();
threadGroup = null;
// do something... ...(省略代码)
}
复制代码
- 看到这里基本也就破案了,MQ 的 GroupConsumer 调用 doStop 进行退出的时候,调用的是
ThreadGroup 的 interrupt() 方法,查看该方法的代码实现会发现,该方法会中断当前线程组里的全部线程;
/**
* Interrupts all threads in this thread group.
* <p>
* First, the <code>checkAccess</code> method of this thread group is
* called with no arguments; this may result in a security exception.
* <p>
* This method then calls the <code>interrupt</code> method on all the
* threads in this thread group and in all of its subgroups.
*
* @exception SecurityException if the current thread is not allowed
* to access this thread group or any of the threads in
* the thread group.
* @see java.lang.Thread#interrupt()
* @see java.lang.SecurityException
* @see java.lang.ThreadGroup#checkAccess()
* @since 1.2
*/
public final void interrupt() {
int ngroupsSnapshot;
ThreadGroup[] groupsSnapshot;
synchronized (this) {
checkAccess();
for (int i = 0 ; i < nthreads ; i++) {
threads[i].interrupt();
}
ngroupsSnapshot = ngroups;
if (groups != null) {
groupsSnapshot = Arrays.copyOf(groups, ngroupsSnapshot);
} else {
groupsSnapshot = null;
}
}
for (int i = 0 ; i < ngroupsSnapshot ; i++) {
groupsSnapshot[i].interrupt();
}
}
复制代码
- 注释里写的很清楚,该方法会
在当前线程组及其所有子线程组中调用 interrupt() 方法;
- 到这里可能有人还会很疑惑,“这跟 CreateConnectionThread 有什么关系?”,疑惑点无非就是 MQ 的线程和 Druid 的线程为什么会有关联,上面已经讲了,MQ 的 GroupConsumer 调用 doStop 进行退出的时候,调用的是 ThreadGroup 的 interrupt() 方法,现在只需要搞清楚 Druid 的 ThreadGroup 是如何定义的,就能知道两者的确切关系了;
- 首先来看一下 Druid 初始化的两种方式(具体的初始化细节,不详细讲了):
手动调用初始化,即代码中配置数据源时,在 init-method 属性直接调用 init 方法 init-method="init";
- 如果没有配置初始化,那当程序
第一次调用 getConnection 方法时也会自动触发 init 方法完成初始化逻辑。
- 而在 Druid 初始化时,CreateConnectionThread 的创建是调用的父类
Thread 的 start() 方法进行,所以通过源码可以看出 Druid 的 ThreadGroup 是根据当前调用线程来进行定义的;
private CreateConnectionThread createConnectionThread;
protected void createAndStartCreatorThread() {
if (createScheduler == null) {
String threadName = "Druid-ConnectionPool-Create-" + System.identityHashCode(this);
createConnectionThread = new CreateConnectionThread(threadName);
createConnectionThread.start();
return;
}
initedLatch.countDown();
}
复制代码
/* The group of this thread */
private ThreadGroup group;
public synchronized void start() {
/**
* This method is not invoked for the main method thread or "system"
* group threads created/set up by the VM. Any new functionality added
* to this method in the future may have to also be added to the VM.
*
* A zero status value corresponds to state "NEW".
*/
if (threadStatus != 0)
throw new IllegalThreadStateException();
/* Notify the group that this thread is about to be started
* so that it can be added to the group's list of threads
* and the group's unstarted count can be decremented. */
group.add(this);
boolean started = false;
try {
start0();
started = true;
} finally {
try {
if (!started) {
group.threadStartFailed(this);
}
} catch (Throwable ignore) {
/* do nothing. If start0 threw a Throwable then
it will be passed up the call stack */
}
}
}
复制代码
- 再来回顾一下业务场景,消费 MQ 并将数据进行落库;
- 结合 Druid 初始化的两种方式,查看系统 DruidDataSource 配置,发现配置中没有进行 init 操作,那么 CreateConnectionThread 就是由第一次调用 getConnection 时创建的,而创建线程不言而喻,因此
MQ 调用 ThreadGroup.interrupt() 方法时,CreateConnectionThread 会联动退出。
解决方案
- 升级新版本 MQ(
已解决问题版本);
- 在 DruidDataSource 配置文件加上
init 配置或者手动调用 getConnection,避免受到 MQ 线程的影响。
近期评论