Replication slots fails with SSL handshake error

With client-to-node and node-to-node encryption enabled, I can’t connect to a logical replication slot.
Self-signed CA and certs do not include IP addresses, because yugabyte is running in K8s.

here is flags for tservers:
yb-tserver
–pgsql_proxy_bind_address=0.0.0.0:5433
–rpc_bind_addresses=0.0.0.0
–server_broadcast_addresses=$(SERVER_BROADCAST_ADDRESS) # this follows pattern similar to masters: b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-tserver-0.b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-tserver-6d6ea0-f5dc60-a2bae68fa4-rs-rs001.svc.calm-county.k8s
–fs_data_dirs=$(FS_DATA_DIRS)
–placement_region=$(PLACEMENT_REGION)
–tserver_master_addrs=$(TSERVER_MASTER_ADDRS)
–ysql_max_connections=$(YSQL_MAX_CONNECTIONS)
–memory_limit_hard_bytes=$(MEMORY_LIMIT_HARD_BYTES)
–use_memory_defaults_optimized_for_ysql=$(USE_MEMORY_DEFAULTS_OPTIMIZED_FOR_YSQL)
–logtostderr
–cdc_intent_retention_ms=$(CDC_INTENT_RETENTION_MS)
–cdc_wal_retention_time_secs=$(CDC_WAL_RETENTION_TIME_SECS)
–ysql_pg_conf_csv=“log_connections=true,log_disconnections=true,log_statement=‘ddl’,log_line_prefix=‘%t pid=%p user=%u db=%d host=%r’,log_lock_waits=on,log_min_duration_statement=300000,log_min_error_statement=notice,log_min_messages=notice,log_temp_files=0,log_timezone=‘Europe/Moscow’,password_encryption=scram-sha-256”
–ysql_hba_conf_csv=‘host all all 127.0.0.1/24 trust,"host all +ldap 0.0.0.0/0 ldap ldapurl=’“$(LDAP_URL)”’ ldapbasedn=“'”$(LDAP_BASE_DN)“'” ldapbinddn=“'”${LDAP_BIND_DN}“'” ldapbindpasswd=“'”${LDAP_BIND_PASSWD}“'” ldapsearchattribute=uid",host all all 0.0.0.0/0 scram-sha-256,host all all ::0/0 scram-sha-256’
–use_node_to_node_encryption=true --certs_dir=$(TSERVER_CERTS_DIR)
–use_client_to_server_encryption=true --certs_for_client_dir=$(TSERVER_CERTS_FOR_CLIENT_DIR)
–use_node_hostname_for_local_tserver=true

And masters:
yb-master \
–fs_data_dirs=/data
–placement_region=vigilant-wood
–master_addresses=b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-master-0.b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-master.b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001.svc.calm-county.k8s,b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-master-0.b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-master.b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001.svc.vigilant-wood.k8s,b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-master-0.b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001-master.b0e5-6d6ea0-f5dc60-a2bae68fa4-rs-rs001.svc.trusting-wind.k8s
–replication_factor=3
–memory_limit_hard_bytes=4096000000
–max_replication_slots=10000
–server_broadcast_addresses=$(SERVER_BROADCAST_ADDRESS)
–rpc_bind_addresses=0.0.0.0
–logtostderr
–use_node_to_node_encryption=true
–certs_dir=$(MASTER_CERTS_DIR)
–use_client_to_server_encryption=true
–certs_for_client_dir=$(MASTER_CERTS_FOR_CLIENT_DIR)

2025-12-05 19:11:22 MSK pid=491450 user=yugabyte db=master host=127.0.0.1(58790) LOG: starting logical decoding for slot “master_outboxer_msg_databus_1”
2025-12-05 19:11:22 MSK pid=491450 user=yugabyte db=master host=127.0.0.1(58790) DETAIL: Streaming transactions committing after 0/2, reading WAL from 0/1.
2025-12-05 19:11:22 MSK pid=491450 user=yugabyte db=master host=127.0.0.1(58790) STATEMENT: START_REPLICATION SLOT master_outboxer_msg_databus_1 LOGICAL 0/2(proto_version ‘1’, publication_names ‘outboxer_msg_databus_0’);

I1205 16:11:22.253193   252 cdc_service.cc:4941] Received InitVirtualWALForCDC request: session_id: 15885 stream_id: “df882d7bac279c930442040bc7c53a01” table_id: “00004000000030008000000000004414”
I1205 16:11:22.253960   252 cdcsdk_virtual_wal.cc:195] VWAL [df882d7bac279c930442040bc7c53a01:15885]: Publication table list: [00004000000030008000000000004414]
I1205 16:11:22.256848    24 refined_stream.cc:231] SECURE[C] kHandshake { local: 10.204.212.153:33012 remote: 10.204.212.153:9100 }: Handshake failed: Network error (yb/rpc/secure_stream.cc:942): Handshake failed: Network error (yb/rpc/secure_stream.cc:1191): Endpoint does not match, address: 10.204.212.153, hostname: 10.204.212.153
W1205 16:11:22.256896   252 cdcsdk_virtual_wal.cc:273] VWAL [df882d7bac279c930442040bc7c53a01:15885]: Network error (yb/rpc/secure_stream.cc:942): Handshake failed: Network error (yb/rpc/secure_stream.cc:1191): Endpoint does not match, address: 10.204.212.153, hostname: 10.204.212.153
E1205 16:11:22.256904   252 cdcsdk_virtual_wal.cc:215] VWAL [df882d7bac279c930442040bc7c53a01:15885]: Network error (yb/rpc/secure_stream.cc:942): Error fetching tablet list & checkpoints for table_id: 00004000000030008000000000004414: Handshake failed: Network error (yb/rpc/secure_stream.cc:1191): Endpoint does not match, address: 10.204.212.153, hostname: 10.204.212.153
E1205 16:11:22.256910   252 cdc_service.cc:5034] Network error (yb/rpc/secure_stream.cc:942): VirtualWAL initialisation failed for stream_id: df882d7bac279c930442040bc7c53a01 & session_id: 15885: Handshake failed: Network error (yb/rpc/secure_stream.cc:1191): Endpoint does not match, address: 10.204.212.153, hostname: 10.204.212.153

2025-12-05 19:11:22 MSK pid=491450 user=yugabyte db=master host=127.0.0.1(58790) ERROR: VirtualWAL initialisation failed for stream_id: df882d7bac279c930442040bc7c53a01 & session_id: 15885: Handshake failed: Network error (yb/rpc/secure_stream.cc:1191): Endpoint does not match, address: 10.204.212.153, hostname: 10.204.212.153
2025-12-05 19:11:22 MSK pid=491450 user=yugabyte db=master host=127.0.0.1(58790) CONTEXT: Catalog Version Mismatch: A DDL occurred while processing this query. Try again.
2025-12-05 19:11:22 MSK pid=491450 user=yugabyte db=master host=127.0.0.1(58790) STATEMENT: START_REPLICATION SLOT master_outboxer_msg_databus_1 LOGICAL 0/2(proto_version ‘1’, publication_names ‘outboxer_msg_databus_0’);

I1205 16:11:22.257980   126 cdc_service.cc:5106] DestroyVirtualWALForCDC: Received DestroyVirtualWALForCDC request: session_id: 15885

Ysqlsh / PGX connect and works just fine, cluster seems to behave normally.
ysqlsh (11.2-YB-2024.2.5.1-b0)

SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)

it is specifically replication slots seems to misbehave:

$ ysqlsh "dbname=master replication=database"

ysqlsh (11.2-YB-2024.2.5.1-b0)

SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)

Type "help" for help.



No entry for terminal type "xterm";

using dumb terminal settings.

master=# \c master

SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, bits: 256, compression: off)

You are now connected to database "master" as user "yugabyte".

master=# START_REPLICATION SLOT master_outboxer_msg_databus_1 LOGICAL 0/2(proto_version '1', publication_names 'outboxer_msg_databus_0');

unexpected PQresultStatus: 8

master=#

I’ve encountered similar error when tried to connect some time ago, but it was fixed by use_node_hostname_for_local_tserver flag. Looks like for virtual wal connectivity there should be something similar.

It looks like for the PgClient FQDN is explicitly set up from the configuration, while for VirtualWal local_address from RPC context is used, that is inherently an IP address.
https://github.com/yugabyte/yugabyte-db/blob/d97cc640f78ae3aaa68d2d7921e768eb45e5ec78/src/yb/cdc/cdc_service.cc#L5235https://github.com/yugabyte/yugabyte-db/blob/d97cc640f78ae3aaa68d2d7921e768eb45e5ec78/src/yb/yql/pggate/pg_client.cc#L513-L518

Do I understand correctly that, this will be fixed one this issue is resolved Change RPC calls inside GetConsistentChanges RPC to local function calls · Issue #20946 · yugabyte/yugabyte-db · GitHub?

Hi @hispebarzu , Saw this today, will get back on this in next 24 hours. We do have deployments with both enabled, but I will reconfirm and see if it is a configuration issue or a bug.

Thanks for reporting this issue @hispebarzu. We are able to repro it internally and working on a fix. I’ll keep you posted.

Thanks, any chances this will be backported to 2024.2 LTS series?

Yes, @hispebarzu . We will backport this to 2024.2 releases.