当前位置 : 首页 » 文章分类 :  开发  »  Azkaban笔记

Azkaban笔记

Azkaban笔记

https://azkaban.github.io/

https://github.com/azkaban/azkaban

Azkaban-开源任务调度程序(使用篇)
https://www.jianshu.com/p/484564beda1d


相关数据表

executors 表
执行服务器表,每个id对应一台服务器

execution_flows 表,每一次任务调度,都会在这个表中写入一个有新 exec_id 的记录
status
30 表示 running 执行中
70 表示 failed 失败
50 表示 success 成功执行


一次azkaban异常排查

异常表现

异常表现为azkaban web界面无法上传定时任务的zip包到project,等待很长时间后报错,同样也无法删除project以及修改project属性。

我们的azkaban平台是2台Executor服务器,一台web服务器。

排查过程

登录到web服务器后台查看日志,上传任务时抛如下异常,是web服务器在写azkaban 相关mysql表时报锁等待超时错误 Lock wait timeout exceeded

2018/12/14 16:24:10.706 +0800 ERROR [JdbcProjectImpl] [Azkaban] Error initializing project id: 56 version: 7
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: INSERT INTO project_versions (project_id, version, upload_time, uploader, file_type, file_name, md5, num_chunks, resource_id) values (?,?,?,?,?,?,?,?,?) Parameters: [56, 7, 1544775799795, azkaban, zip, sync-user-identity-1.0.0-SNAPSHOT-sync-user-identity.zip, [-27, 52, 115, 62, -67, 107, 55, 49, 95, -107, -41, 27, -81, 90, 22, 115], 0, null]
    at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:403)
    at azkaban.db.DatabaseTransOperator.update(DatabaseTransOperator.java:101)
    at azkaban.project.JdbcProjectImpl.addProjectToProjectVersions(JdbcProjectImpl.java:365)
    at azkaban.project.JdbcProjectImpl.lambda$uploadProjectFile$2(JdbcProjectImpl.java:267)
    at azkaban.db.DatabaseOperator.transaction(DatabaseOperator.java:95)
    at azkaban.project.JdbcProjectImpl.uploadProjectFile(JdbcProjectImpl.java:280)
    at azkaban.storage.DatabaseStorage.put(DatabaseStorage.java:58)
    at azkaban.storage.StorageManager.uploadProject(StorageManager.java:106)
    at azkaban.project.AzkabanProjectLoader.persistProject(AzkabanProjectLoader.java:197)
    at azkaban.project.AzkabanProjectLoader.uploadProject(AzkabanProjectLoader.java:114)
    at azkaban.project.ProjectManager.uploadProject(ProjectManager.java:506)
    at azkaban.webapp.servlet.ProjectManagerServlet.ajaxHandleUpload(ProjectManagerServlet.java:1738)
    at azkaban.webapp.servlet.ProjectManagerServlet.handleUpload(ProjectManagerServlet.java:1821)
    at azkaban.webapp.servlet.ProjectManagerServlet.handleMultiformPost(ProjectManagerServlet.java:201)
    at azkaban.webapp.servlet.LoginAbstractAzkabanServlet.doPost(LoginAbstractAzkabanServlet.java:311)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:688)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2018/12/14 16:24:10.722 +0800 INFO [ProjectManagerServlet] [Azkaban] Installation Failed.
azkaban.project.ProjectManagerException: Error initializing project id: 56 version: 7
    at azkaban.project.JdbcProjectImpl.addProjectToProjectVersions(JdbcProjectImpl.java:371)
    at azkaban.project.JdbcProjectImpl.lambda$uploadProjectFile$2(JdbcProjectImpl.java:267)
    at azkaban.db.DatabaseOperator.transaction(DatabaseOperator.java:95)
    at azkaban.project.JdbcProjectImpl.uploadProjectFile(JdbcProjectImpl.java:280)
    at azkaban.storage.DatabaseStorage.put(DatabaseStorage.java:58)
    at azkaban.storage.StorageManager.uploadProject(StorageManager.java:106)
    at azkaban.project.AzkabanProjectLoader.persistProject(AzkabanProjectLoader.java:197)
    at azkaban.project.AzkabanProjectLoader.uploadProject(AzkabanProjectLoader.java:114)
    at azkaban.project.ProjectManager.uploadProject(ProjectManager.java:506)
    at azkaban.webapp.servlet.ProjectManagerServlet.ajaxHandleUpload(ProjectManagerServlet.java:1738)
    at azkaban.webapp.servlet.ProjectManagerServlet.handleUpload(ProjectManagerServlet.java:1821)
    at azkaban.webapp.servlet.ProjectManagerServlet.handleMultiformPost(ProjectManagerServlet.java:201)
    at azkaban.webapp.servlet.LoginAbstractAzkabanServlet.doPost(LoginAbstractAzkabanServlet.java:311)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:688)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:401)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: INSERT INTO project_versions (project_id, version, upload_time, uploader, file_type, file_name, md5, num_chunks, resource_id) values (?,?,?,?,?,?,?,?,?) Parameters: [56, 7, 1544775799795, azkaban, zip, sync-user-identity-1.0.0-SNAPSHOT-sync-user-identity.zip, [-27, 52, 115, 62, -67, 107, 55, 49, 95, -107, -41, 27, -81, 90, 22, 115], 0, null]
    at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:403)
    at azkaban.db.DatabaseTransOperator.update(DatabaseTransOperator.java:101)
    at azkaban.project.JdbcProjectImpl.addProjectToProjectVersions(JdbcProjectImpl.java:365)
    ... 27 more

修改任务属性时抛如下异常,也是Lock wait timeout exceeded

2018/12/14 20:24:32.577 +0800 ERROR [DatabaseOperator] [Azkaban] update failed
java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: UPDATE execution_flows SET executor_id=? where exec_id=? Parameters: [2, 636386]
    at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:456)
    at azkaban.db.DatabaseOperator.update(DatabaseOperator.java:121)
    at azkaban.executor.AssignExecutorDao.assignExecutor(AssignExecutorDao.java:48)
    at azkaban.executor.JdbcExecutorLoader.assignExecutor(JdbcExecutorLoader.java:312)
    at azkaban.executor.ExecutorManager.dispatch(ExecutorManager.java:1501)
    at azkaban.executor.ExecutorManager.access$1500(ExecutorManager.java:78)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1871)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.handleDispatchExceptionCase(ExecutorManager.java:1959)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1878)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.processQueuedFlows(ExecutorManager.java:1851)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.run(ExecutorManager.java:1789)
2018/12/14 20:24:32.578 +0800 WARN [ExecutorManager] [Azkaban] Executor d-awsbj-uds-uds-azkaban-1536651303:12321 (id: 2) responded with exception for exec: 636386
azkaban.executor.ExecutorManagerException: Error updating executor id 2
    at azkaban.executor.AssignExecutorDao.assignExecutor(AssignExecutorDao.java:54)
    at azkaban.executor.JdbcExecutorLoader.assignExecutor(JdbcExecutorLoader.java:312)
    at azkaban.executor.ExecutorManager.dispatch(ExecutorManager.java:1501)
    at azkaban.executor.ExecutorManager.access$1500(ExecutorManager.java:78)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1871)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.handleDispatchExceptionCase(ExecutorManager.java:1959)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.selectExecutorAndDispatchFlow(ExecutorManager.java:1878)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.processQueuedFlows(ExecutorManager.java:1851)
    at azkaban.executor.ExecutorManager$QueueProcessorThread.run(ExecutorManager.java:1789)
Caused by: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction Query: UPDATE execution_flows SET executor_id=? where exec_id=? Parameters: [2, 636386]
    at org.apache.commons.dbutils.AbstractQueryRunner.rethrow(AbstractQueryRunner.java:363)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:490)
    at org.apache.commons.dbutils.QueryRunner.update(QueryRunner.java:456)
    at azkaban.db.DatabaseOperator.update(DatabaseOperator.java:121)
    at azkaban.executor.AssignExecutorDao.assignExecutor(AssignExecutorDao.java:48)
    ... 8 more

重启了web服务器,还是有问题。

原因和解决

然后连上azkaban数据库,查看 information_schema.INNODB_TRX 表,发现有好多卡住的事务,不知道为什么。
在mysql命令行 kill trx_mysql_thread_id 好像也不起作用。
找DBA帮看下,发现事务卡住是因为azkaban数据库所在的服务器磁盘满了,写不进去。
DBA紧急给做了磁盘扩容,马上就好了。
之所以磁盘占满,一是因为数据库服务器磁盘只有50G,二是因为azkaban会把所有job的日志都写到execution_logs表中,这个表增长非常快,看了下光这个表就占了40G空间。


上一篇 VSCode使用笔记

下一篇 ELK使用笔记

阅读
1,495
阅读预计8分钟
创建日期 2018-08-31
修改日期 2018-12-15
类别
百度推荐