Hi,
I’m trying to learn how to use YB integrating it into an existing project, replacing a pre-existent postgres db.
I’ve succesfully deployed YB on an AWS test node using helm-kubernetes tutorial on yb-site and i can connect to yb-master-ui and db via pgAdmin client.
I’m experiencing weird problems on a frequent base:
pgAdmin disconnects from db server at an annoying rate
my Wildfly AS often looses connection during a transaction
I’ve tried to change db timeouts (like tcp_keepalives_idle parameters) but using the following command
./yb-ctl restart --tserver_flags ysql_pg_conf=“tcp_keepalives_idle=1200”
raised “Cannot form yb-admin command without knowing master addresses” errors.
There’s any how to’s or documentation to follow to try to walkthrough this issue?
Hi Dorian,
Yes Wildfly is Jboss.
I’m not using any “metal” hardware, i’m using a m5a.2xlarge instance on AWS, using suggested specs as per YugabyteDB documentation.
Im’ using kubernetes so Im’ trying to start pods in a minimal configuration (1 master and 1 tserver) just to try having a DB running.
Hi @dorian_yugabyte,
Thanks for your reply.
As far as i can see, the server is just fine, i see a CPU utilization of something like 25% with very low usage on the volumes side.
ON postgresql…log i can see a lot og
I’ve tried to add resources using c5a.4xlarge instance instead of the 2xlarge but as I stated before, the cpu was not stressed with something like 25% usage on a 2xlarge and like 18% on a 4xlarge as a peak usage… I can try to split the load starting more than 1 instance and leaving YB alone on one pod and the AS on another pod to maximize node usage by YB pod.
I’ve tried to set transaction isolation level to serializable read only deferrable but i cannot observe any improvement on the run i did. Also, it creates an issue to my test architecture 'cause if the pod is destroyed, i’ve to log again to run that instruction.
Thanks for your reply Dorian.
I’m using the Helm version: 2.9.1
that starts YB with te following command (master first and tserver following):
command:
- "/bin/bash"
- "-c"
- |
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="$(HOSTNAME).yb-masters.$(NAMESPACE).svc.cluster.local" \
--port="7100"
fi && \
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="$(HOSTNAME).yb-masters.$(NAMESPACE).svc.cluster.local:7100" \
--port="7100"
fi && \
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="0.0.0.0" \
--port="7000"
fi && \
exec /home/yugabyte/bin/yb-master \
--fs_data_dirs=/mnt/disk0 \
--master_addresses= \
--replication_factor=0 \
--enable_ysql=true \
--metric_node_name=$(HOSTNAME) \
--memory_limit_hard_bytes=1824522240 \
--stderrthreshold=0 \
--num_cpus=2 \
--undefok=num_cpus,enable_ysql \
--default_memory_limit_to_ram_ratio="0.85" \
--rpc_bind_addresses=$(HOSTNAME).yb-masters.$(NAMESPACE).svc.cluster.local \
--server_broadcast_addresses=$(HOSTNAME).yb-masters.$(NAMESPACE).svc.cluster.local:7100 \
--webserver_interface=0.0.0.0
command:
- "/bin/bash"
- "-c"
- |
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local" \
--port="9100"
fi && \
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local:9100" \
--port="9100"
fi && \
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="0.0.0.0" \
--port="9000"
fi && \
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local" \
--port="9042"
fi && \
if [ -f /home/yugabyte/tools/k8s_preflight.py ]; then
PYTHONUNBUFFERED="true" /home/yugabyte/tools/k8s_preflight.py \
--addr="0.0.0.0:5433" \
--port="5433"
fi && \
exec /home/yugabyte/bin/yb-tserver \
--fs_data_dirs=/mnt/disk0 \
--tserver_master_addrs= \
--metric_node_name=$(HOSTNAME) \
--memory_limit_hard_bytes=3649044480 \
--stderrthreshold=0 \
--num_cpus=2 \
--undefok=num_cpus,enable_ysql \
--use_node_hostname_for_local_tserver=true \
--rpc_bind_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local \
--server_broadcast_addresses=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local:9100 \
--webserver_interface=0.0.0.0 \
--enable_ysql=true \
--pgsql_proxy_bind_address=0.0.0.0:5433 \
--cql_proxy_bind_address=$(HOSTNAME).yb-tservers.$(NAMESPACE).svc.cluster.local
As for now, i’m trying a simple workload, like a rest service that receives request and saves on some different tables what receives as input. I’ve a little shell script that creates the request and sends to my rest service, deployed into that JBoss AS that is connected to YB with a connection pool. I’ve tried to connect my AS to a different DB (a postgres instance) without any problem. For my test i’ve just changed the address and port to yugabyte’s one.
Can you try with any of our benchmarking tools Benchmark YugabyteDB | YugabyteDB Docs ?
Otherwise we’d need a way to reproduce the benchmark and see what/why goes wrong.
Hi Dorian,
After some more testing using the application (instead of benchmarking tools that i haven’t got any chance to try as for now), seems that our issue regards concurrent transactions. I’ve tried to start yugabyte using isolation flag for ysql (both SERIALIZABLE that REPEATABLE READ, to achieve a read commited isolation level-type) but forcing serializable, seems to worsen the issue, 'cause we have a lot of transaction in parallel.
As for my analysis, seems that a transaction locks data for a larger time span as expected and concurrent transactions find those data locked and receive error instead of some “queuing” of those instruction.
I was expecting a behaviour like “if data is uncommitted a concurrent transaction may only enqueue what it wants to do on the same data, those instruction will be executed in order after the first transaction commits”, but seems that I cannot achieve this behaviour, what I’m missing?
In our case you have to retry the transaction that is aborted. Because we support Optimistic Transactions like explained here: Explicit locking | YugabyteDB Docs. In 2 conflicting transactions, 1 of them gets aborted and needs to be retried by the client.
we’re not doing batch operation, but it’s something like a batch. We have long transactions, with a lot of operations done by our application during a single transaction. We have many threads running with an high degree of parallelism so maybe you can see it like a number of batches running in parallel.
So, if I understand well what you’re saying, there’s no way to use YB in a context where we have long transactions running lasting several seconds, without having to manage transaction clash via resubmission of the entire transaction? We’re using vanilla Postgres without any issue, maybe there’s any configuration that may be done or it’s something that is unsupported by YB at the current moment?
I am evaluating YB and I am facing the same issues described by @m3lkor, basically we have several long-running transactions (several seconds with hundreds of updates per transaction, so mini-batches if you want to call them like that) distributed on several parallel threads. We are experimenting the same errors described in this thread due to clashes of writes (“Restart read required”, “Aborted transaction”, etc.) and if we have to rollback and retry every clashed transaction, it would introduce an unbearable overhead.
Please note that the same application works well with native PostgreSQL (and also with Oracle DB), but we are evaluating YB to support horizontal scalability as a main driver (not much interested to geographic distribution for now).
I endorse @m3lkor question to see if there is a configuration in YB to support this kind of workload - so basically, let the db manage this kind of parallel updates, maybe putting some locks, but avoid to raise exceptions and force the client retry the whole transaction.
No worries Dorian, I finally realized that @m3lkor is a colleague of mine and we are working on the same evaluation!
The error we are receiving is “Restart read required”.
We made some progress in the analysis and it turned out that the issue appeared when we had some update statements that were not using indexes, and so they performed the updates using a Full Table Scan.
We added the missing indexes and now the failure rate is really low, almost absent I can say.
What do you think, could this be a valid root cause?
So let’s say that I have an update statement with a where condition that is not indexed, maybe the resulting Full Table Scan may “lock” also records that are not in interest, and so some parallel queries may receive the “Restart read required” error?
Yes. In this case, since the update is taking too long, it’s possible that the data underneath has changed (from compactions) since the transaction started.
This & other slow queries can exaggerate a lot in distributed scenarios (example: the overhead of synchronous replication over multiple regions) compared to a single PostgreSQL/RDBMS with streaming replication.
In this case, you can either show your tables with the most popular queries and we can decide a better structure. Or you can paste here any slow query along with an “explain analyze” and with the schema and we can help on how to better structure it.
In this example, the big update with no index is an anti-pattern and will bite you sooner or later depending on the data size.