RPC Timeout in Production

Hi Team,

We are having cluster of three master nodes, which are acting as data nodes as well. We are seeing below error randomly on our prod cluster. Cluster becomes unresponsive for an minute before it comes back to normal. Can you please help here?

{“level”:“error”,“message”:“Network error: Rpc timeout, passed: 16.100s, timeout: 15.000s, now: 10377212.806s, last_read_time_: 10377196.706s”,“name”:“SequelizeDatabaseError”,“namespace”:“prod_:server:req”,“original”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”:

“level”:“error”,“message”:“Cannot destructure property ‘roles’ of ‘(intermediate value)’ as it is null.”,“namespace”:“prod_:server:req”,“reqId”:“8b38f9dd-366b-4f7f-a362-aafa305046e5”,“service”:“node-web-server”,“stack”:“TypeError: Cannot destructure property ‘roles’ of ‘(intermediate value)’ as it is null.\n at /var/www/tpp/nodeapi/2024-03-28_13-47-12/routes/association/myAssociations.js:38:13\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)”,“timestamp”:“2024-04-01T16:18:47.767Z”,“worker”:5}
{“level”:“error”,“message”:“Network error: Rpc timeout, passed: 16.076s, timeout: 15.000s, now: 10377212.802s, last_read_time_: 10377196.726s”,“name”:“SequelizeDatabaseError”,“namespace”:“prod_:server:req”,“original”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”

{“level”:“error”,“message”:“Network error: Rpc timeout, passed: 16.092s, timeout: 15.000s, now: 10377212.802s, last_read_time_: 10377196.710s”,“name”:“SequelizeDatabaseError”,“namespace”:“prod_:server:req”,“original”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”

{“level”:“error”,“message”:“Network error: Rpc timeout, passed: 16.100s, timeout: 15.000s, now: 10377212.806s, last_read_time_: 10377196.706s”,“name”:“SequelizeDatabaseError”,“namespace”:“prod_:server:req”,“original”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”:“SELECT count() AS "count" FROM "notifications" AS "notification" WHERE "notification"."target_user" = ‘e9846b18-f4d6-4795-b38b-94e3e97e4f40’ AND "notification"."status" = ‘unread’;"},“parameters”:{},“parent”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”:"SELECT count() AS "count" FROM "notifications" AS "notification" WHERE "notification"."target_user" = ‘e9846b18-f4d6-4795-b38b-94e3e97e4f40’ AND "notification"."status" = ‘unread’;”},“reqId”:“f38b4de1-33f2-44eb-81b1-160242c71c6c”,“service”:“node-web-server”,“sql”:“SELECT count(*) AS "count" FROM "notifications" AS "notification" WHERE "notification"."target_user" = ‘e9846b18-f4d6-4795-b38b-94e3e97e4f40’ AND "notification"."status" = ‘unread’;”,“stack”:“Error\n at Query.run (/var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/dialects/postgres/query.js:50:25)\n at /var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/sequelize.js:315:28\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async PostgresQueryInterface.rawSelect (/var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/dialects/abstract/query-interface.js:434:18)\n at async notification.aggregate (/var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/model.js:1277:19)\n at async notification.count (/var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/model.js:1306:20)\n at async Object.getUnreadNotificationCount (/var/www/tpp/nodeapi/2024-03-28_13-47-12/lib/notifications/notification.js:130:12)\n at async /var/www/tpp/nodeapi/2024-03-28_13-47-12/routes/notification/index.js:202:19”,“timestamp”:“2024-04-01T16:18:47.768Z”,“worker”:2}

{“level”:“info”,“message”:“Network error: Rpc timeout, passed: 16.100s, timeout: 15.000s, now: 10377212.806s, last_read_time_: 10377196.706s”,“namespace”:“prod_:server:failure”,“service”:“node-web-server”,“timestamp”:“2024-04-01T16:18:47.768Z”,“worker”:2}
{“level”:“error”,“message”:“Network error: Rpc timeout, passed: 17.696s, timeout: 15.000s, now: 10377214.694s, last_read_time_: 10377196.998s”,“name”:“SequelizeDatabaseError”,“namespace”:“prod_:server:req”,“original”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”:“SELECT "group_member"."role_id", "group_member"."group_id", "group"."id" AS "group.id", "group"."organization_id" AS "group.organization_id", "group"."title" AS "group.title", "group"."description" AS "group.description", "group"."picture" AS "group.picture", "group->organization"."id" AS "group.organization.id", "group->organization"."title" AS "group.organization.title" FROM "group_members" AS "group_member" INNER JOIN "groups" AS "group" ON "group_member"."group_id" = "group"."id" AND "group"."is_active" = ‘active’ AND "group"."is_deleted" IS NULL INNER JOIN "organizations" AS "group->organization" ON "group"."organization_id" = "group->organization"."id" AND "group->organization"."is_active" = ‘active’ AND "group->organization"."is_deleted" IS NULL WHERE ("group_member"."deleted_at" IS NULL AND ("group_member"."deleted_at" IS NULL AND "group_member"."user_id" = ‘e9846b18-f4d6-4795-b38b-94e3e97e4f40’));”},“parameters”:{},“parent”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”:“SELECT "group_member"."role_id", "group_member"."group_id", "group"."id" AS "group.id", "group"."organization_id" AS "group.organization_id", "group"."title" AS "group.title", "group"."description" AS "group.description", "group"."picture" AS "group.picture", "group->organization"."id" AS "group.organization.id", "group->organization"."title" AS "group.organization.title" FROM "group_members" AS "group_member" INNER JOIN "groups" AS "group" ON "group_member"."group_id" = "group"."id" AND "group"."is_active" = ‘active’ AND "group"."is_deleted" IS NULL INNER JOIN "organizations" AS "group->organization" ON "group"."organization_id" = "group->organization"."id" AND "group->organization"."is_active" = ‘active’ AND "group->organization"."is_deleted" IS NULL WHERE ("group_member"."deleted_at" IS NULL AND ("group_member"."deleted_at" IS NULL AND "group_member"."user_id" = ‘e9846b18-f4d6-4795-b38b-94e3e97e4f40’));”},“reqId”:“8b19de40-8e3d-4bf3-a05c-1adb0524933e”,“service”:“node-web-server”,“sql”:“SELECT "group_member"."role_id", "group_member"."group_id", "group"."id" AS "group.id", "group"."organization_id" AS "group.organization_id", "group"."title" AS "group.title", "group"."description" AS "group.description", "group"."picture" AS "group.picture", "group->organization"."id" AS "group.organization.id", "group->organization"."title" AS "group.organization.title" FROM "group_members" AS "group_member" INNER JOIN "groups" AS "group" ON "group_member"."group_id" = "group"."id" AND "group"."is_active" = ‘active’ AND "group"."is_deleted" IS NULL INNER JOIN "organizations" AS "group->organization" ON "group"."organization_id" = "group->organization"."id" AND "group->organization"."is_active" = ‘active’ AND "group->organization"."is_deleted" IS NULL WHERE ("group_member"."deleted_at" IS NULL AND ("group_member"."deleted_at" IS NULL AND "group_member"."user_id" = ‘e9846b18-f4d6-4795-b38b-94e3e97e4f40’));”,“stack”:“Error\n at Query.run (/var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/dialects/postgres/query.js:50:25)\n at /var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/sequelize.js:315:28\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async PostgresQueryInterface.select (/var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/dialects/abstract/query-interface.js:407:12)\n at async group_member.findAll (/var/www/tpp/nodeapi/2024-03-28_13-47-12/node_modules/sequelize/lib/model.js:1140:21)\n at async Object.getAllGroupForList (/var/www/tpp/nodeapi/2024-03-28_13-47-12/lib/association/group.js:2098:26)\n at async /var/www/tpp/nodeapi/2024-03-28_13-47-12/routes/association/myAssociations.js:38:37”,“timestamp”:“2024-04-01T16:18:47.770Z”,“worker”:6}
{“level”:“error”,“message”:“Cannot destructure property ‘roles’ of ‘(intermediate value)’ as it is null.”,“namespace”:“prod_:server:req”,“reqId”:“8b19de40-8e3d-4bf3-a05c-1adb0524933e”,“service”:“node-web-server”,“stack”:“TypeError: Cannot destructure property ‘roles’ of ‘(intermediate value)’ as it is null.\n at /var/www/tpp/nodeapi/2024-03-28_13-47-12/routes/association/myAssociations.js:38:13\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)”,“timestamp”:“2024-04-01T16:18:47.770Z”,“worker”:6}
{“level”:“error”,“message”:“Network error: Rpc timeout, passed: 16.120s, timeout: 15.000s, now: 10377212.802s, last_read_time_: 10377196.682s”,“name”:“SequelizeDatabaseError”,“namespace”:“prod_:server:req”,“original”:{“code”:“XX000”,“file”:“pg_yb_utils.c”,“length”:189,“line”:“463”,“name”:“error”,“routine”:“HandleYBStatusAtErrorLevel”,“severity”:“ERROR”,“sql”:"

Hi,

  1. Can you make the errors more clear?
  2. You’re having multiple layers all in one place (orm, driver, json formatting of the error).
  3. How are you running yugabytedb?
  4. Did you set any custom configuration?
  5. What version are you using?
  6. What isolation level are you using?

So the whole node or cluster goes down? Not just a single connection? All of them? Can you upload logs while this was happening?