Insert of large JSON data for Cassandra driver very slow

Same setup as in cassandra.OperationTimedOut
expect running a single node instance of YB.

Inserting some JSON files with size > 10 MB takes over 40 seconds and a cases where
I reach the request_timeout set to 60 seconds:

/home/ajung/content_plone_portal_2019-07-19-13-10-34/15/15812.json 10428526
38.368969202041626

/home/ajung/content_plone_portal_2019-07-19-13-10-34/15/15695.json 11362748
45.551910400390625

/home/ajung/content_plone_portal_2019-07-19-13-10-34/15/15493.json 16835357
60.0945680141449

The files basically contain something like 40-50 key-value pairs and a binary data field encoded as base64.

My benchmark here is ArangoDB where I can import the same data very fast (below a second for these large data JSON files) on the same hardware.

Any idea?

The JSONB type in YCQL is based on the Postgres data type of the same name. JSONB will actually sort the document before it is serialized, so this could account for some of the performance difference. Try sorting the attributes before INSERTing the JSON and see if that helps.

That does not make a difference.

This is my code:

import os
import json
import time
from cassandra.cluster import Cluster, ExecutionProfile

# Create the cluster connection.
cluster = Cluster(['127.0.0.1'])

profile = ExecutionProfile()
profile.request_timeout = 60
cluster.add_execution_profile('standard', profile)
session = cluster.connect()

# Create the keyspace.
session.execute('CREATE KEYSPACE IF NOT EXISTS ybdemo;')
print("Created keyspace ybdemo")

# Create the table.
session.execute(
  """
  CREATE TABLE IF NOT EXISTS ybdemo.ugent (id int PRIMARY KEY,
                                           data jsonb);
  """)
print("Created table employee")


dirname = '/home/ajung/content_plone_portal_2019-07-19-13-10-34'

i = 0
for dirname, dirnames, filenames in os.walk(dirname):

    for filename in filenames:
        fn = os.path.join(dirname, filename)
        if not fn.endswith('.json'):
            continue
        with open(fn) as fp:
            data = fp.read()

        print(fn, len(data))
        i = i + 1

        data = data.replace("'", "")
        data = json.loads(data)
        data = json.dumps(data, sort_keys=True)
        try:
            ts = time.time()
            s = f"INSERT INTO ybdemo.ugent(id, data) VALUES ({i}, '{data}'); """
            session.execute(s, execution_profile=profile)
        except Exception as e:
            raise
        finally:
            print(time.time() - ts)

# Close the connection.
cluster.shutdown()

I can provide a sample JSON file for reproducing the issue or create a smaller testcase.

Shall I submit a bug report?

Thanks for sharing the code. I have a couple of additional observations / questions for you on this setup.

Observation/Clarification: Earlier in this thread you said single node, which implies RF=1, but you reference the earlier thread on the Operation TimedOut that you were running RF=3. Docker RF=3 creates 3 nodes which is not ideal for performance testing.
Question: Are you testing clustered Arango or single node Arango in this example?
Question: Is the filesystem where Docker is running HDD or SSD?

The default tserver memory is 1GB per node. With your large JSON inserts you may be running into stalling from the frequent flushes. You can exec into the yb-tserver-nx node and examine the logs in /mnt/disk0/yb-data/tserver/logs/yb-tserver.INFO and grep for ‘stalling’. Do you see any messages like that on any of the tserver containers?

Thanks.
–Alan

The numbers above are on a single node cluster, local SDD, decent CPU, 16GB RAM.

I don’t see any ‘stalling’ message in the logs.

ArangoDB (also based on RocksDB) writes related documents to disk in less than a second.

Also Postgres stores a JSON document of 10 MB under a second.

So are you running with RF=3 or RF=1? That wasn’t clear in your response. Can you attach an output of docker ps?

Thanks.
–Alan

Also can you paste the output of docker exec yb-tserver-n1 sh -c 'curl 127.0.0.1:9000/varz'

Thanks.
–Alan

 docker ps
CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS              PORTS                                                                                                                                                           NAMES
c02042cd06b0        yugabytedb/yugabyte:latest   "/home/yugabyte/bin/…"   22 seconds ago      Up 21 seconds       0.0.0.0:5433->5433/tcp, 7000/tcp, 0.0.0.0:6379->6379/tcp, 0.0.0.0:9000->9000/tcp, 7100/tcp, 9100/tcp, 10100/tcp, 11000/tcp, 0.0.0.0:9042->9042/tcp, 12000/tcp   yb-tserver-n1
1d084b6cf3bb        yugabytedb/yugabyte:latest   "/home/yugabyte/bin/…"   23 seconds ago      Up 22 seconds       5433/tcp, 6379/tcp, 7100/tcp, 9000/tcp, 9042/tcp, 9100/tcp, 10100/tcp, 11000/tcp, 12000/tcp, 0.0.0.0:7000->7000/tcp                                             yb-master-n1

YugaByte DB
  • Dashboards
  • Tables
  • Tablets
  • Utilities
<div class='yb-main container-fluid'><h2>Command-line Flags</h2><pre>--flagfile=

–fromenv=
–tryfromenv=
–undefok=
–tab_completion_columns=80
–tab_completion_word=
–help=false
–helpfull=false
–helpmatch=
–helpon=
–helppackage=false
–helpshort=false
–helpxml=false
–version=false
–master_backup_svc_queue_length=50
–allow_insecure_connections=true
–dump_certificate_entries=false
–certs_dir=
–certs_for_client_dir=
–use_client_to_server_encryption=false
–use_node_to_node_encryption=false
–ts_backup_svc_num_threads=4
–ts_backup_svc_queue_length=50
–detect_duplicates_for_retryable_requests=true
–forward_redis_requests=true
–combine_batcher_errors=false
–redis_allow_reads_from_followers=false
–assert_local_tablet_server_selected=false
–assert_tablet_server_select_is_in_zone=
–client_suppress_created_logs=false
–yb_client_admin_operation_timeout_sec=60
–yb_client_num_reactors=16
–max_concurrent_master_lookups=500
–retry_failed_replica_ms=60000
–verify_all_replicas_alive=false
–update_permissions_cache_msecs=2000
–client_read_write_timeout_ms=60000
–yb_num_total_tablets=0
–assert_local_op=false
–force_lookup_cache_refresh_secs=0
–transaction_disable_heartbeat_in_tests=false
–transaction_disable_proactive_cleanup_in_tests=false
–transaction_heartbeat_usec=500000
–transaction_pool_cleanup_interval_ms=5000
–transaction_pool_reserve_factor=2
–yb_system_namespace_readonly=true
–yb_num_shards_per_tserver=2
–yql_max_value_size=67108864
–transaction_rpc_timeout_ms=5000
–use_private_ip=never
–consensus_rpc_timeout_ms=3000
–enable_remote_bootstrap=true
–fault_crash_on_leader_request_fraction=0
–max_wait_for_processresponse_before_closing_ms=5000
–consensus_inject_latency_ms_in_notifications=0
–consensus_max_batch_size_bytes=33554432
–follower_unavailable_considered_failed_sec=300
–propagate_safe_time=true
–consensus_log_scoped_watch_delay_append_threshold_ms=1000
–consensus_log_scoped_watch_delay_callback_threshold_ms=1000
–log_consider_all_ops_safe=false
–log_inject_append_latency_ms_max=0
–log_inject_latency=false
–log_inject_latency_ms_mean=100
–log_inject_latency_ms_stddev=100
–log_min_seconds_to_retain=300
–log_min_segments_to_retain=2
–global_log_cache_size_limit_mb=1024
–log_cache_size_limit_mb=128
–bytes_durable_wal_write_mb=1
–durable_wal_write=false
–interval_durable_wal_write_ms=1000
–log_async_preallocate_segments=true
–log_preallocate_segments=true
–log_segment_size_bytes=0
–log_segment_size_mb=64
–require_durable_wal_write=false
–after_stepdown_delay_election_multiplier=5
–do_not_start_election_test_only=false
–enable_leader_failure_detection=true
–enable_lease_revocation=true
–evict_failed_followers=true
–follower_fail_all_prepare=false
–follower_reject_update_consensus_requests=false
–follower_reject_update_consensus_requests_seconds=0
–ht_lease_duration_ms=2000
–inject_delay_leader_change_role_append_secs=0
–leader_failure_exp_backoff_max_delta_ms=20000
–leader_failure_max_missed_heartbeat_periods=6
–leader_lease_duration_ms=2000
–log_change_config_every_n=1
–min_leader_stepdown_retry_interval_ms=20000
–pause_update_majority_replicated=false
–pause_update_replica=false
–quick_leader_election_on_create=true
–raft_heartbeat_interval_ms=500
–return_error_on_change_config=0
–temporary_disable_preelections_timeout_ms=600000
–use_preelection=true
–inject_delay_commit_pre_voter_to_voter_secs=0
–retryable_request_range_time_limit_secs=30
–retryable_request_timeout_secs=300
–pause_write_apply_after_if=false
–ycql_consistent_transactional_paging=false
–test_tserver_timeout=false
–docdb_sort_weak_intents_in_tests=false
–aborted_intent_cleanup_max_batch_size=256
–aborted_intent_cleanup_ms=60000
–compaction_thread_pool_size=2
–db_block_size_bytes=32768
–db_filter_block_size_bytes=65536
–db_index_block_size_bytes=32768
–db_min_keys_per_index_block=100
–db_write_buffer_size=-1
–enable_ondisk_compression=true
–initial_seqno=1125899906842624
–max_nexts_to_avoid_seek=1
–memstore_size_mb=128
–num_reserved_small_compaction_threads=-1
–rocksdb_base_background_compactions=1
–rocksdb_compact_flush_rate_limit_bytes_per_sec=268435456
–rocksdb_compaction_size_threshold_bytes=2147483648
–rocksdb_disable_compactions=false
–rocksdb_level0_file_num_compaction_trigger=5
–rocksdb_level0_slowdown_writes_trigger=24
–rocksdb_level0_stop_writes_trigger=48
–rocksdb_max_background_compactions=1
–rocksdb_max_background_flushes=1
–rocksdb_max_file_size_for_compaction=0
–rocksdb_universal_compaction_min_merge_width=4
–rocksdb_universal_compaction_size_ratio=20
–trace_docdb_calls=false
–use_docdb_aware_bloom_filter=true
–use_multi_level_index=true
–transaction_allow_rerequest_status_in_tests=true
–emulate_redis_responses=true
–enable_data_block_fsync=true
–fs_wal_dirs=/mnt/disk0,/mnt/disk1
–instance_uuid_override=
–simulate_fs_create_failure=false
–num_cpus=0
–slowdown_master_async_rpc_tasks_by_ms=0
–unresponsive_ts_rpc_retry_limit=20
–unresponsive_ts_rpc_timeout_ms=3600000
–callhome_collection_level=medium
–callhome_enabled=true
–callhome_interval_secs=3600
–callhome_tag=
–callhome_url=https://diagnostics.yugabyte.com
–catalog_manager_check_ts_count_for_create_table=true
–catalog_manager_simulate_system_table_create_failure=false
–catalog_manager_wait_for_new_tablets_to_elect_leader=true
–cluster_uuid=
–hide_pg_catalog_table_creation_logs=false
–inject_latency_during_remote_bootstrap_secs=0
–master_auto_run_initdb=false
–master_enable_metrics_snapshotter=false
–master_failover_catchup_timeout_ms=30000
–master_tombstone_evicted_tablet_replicas=true
–master_ts_rpc_timeout_ms=30000
–max_create_tablets_per_ts=50
–metrics_snapshots_table_num_tablets=0
–replication_factor=3
–return_error_if_namespace_not_found=false
–simulate_slow_system_tablet_bootstrap_secs=0
–simulate_slow_table_create_secs=0
–tablet_creation_timeout_ms=30000
–transaction_table_num_tablets=0
–catalog_manager_bg_task_wait_ms=1000
–balancer_load_max_standard_deviation=2
–enable_load_balancing=true
–leader_balance_threshold=0
–leader_balance_unresponsive_timeout_ms=3000
–load_balancer_max_concurrent_adds=1
–load_balancer_max_concurrent_moves=1
–load_balancer_max_concurrent_removals=1
–load_balancer_max_concurrent_tablet_remote_bootstraps=2
–load_balancer_max_over_replicated_tablets=1
–load_balancer_num_idle_runs=5
–create_initial_sys_catalog_snapshot=false
–enable_ysql=false
–initial_sys_catalog_snapshot_path=
–use_initial_sys_catalog_snapshot=false
–master_consensus_svc_num_threads=10
–master_consensus_svc_queue_length=1000
–master_remote_bootstrap_svc_num_threads=10
–master_remote_bootstrap_svc_queue_length=50
–master_rpc_timeout_ms=1500
–master_svc_num_threads=10
–master_svc_queue_length=1000
–master_tserver_svc_num_threads=10
–master_tserver_svc_queue_length=1000
–create_cluster=false
–master_addresses=
–master_replication_factor=0
–master_leader_rpc_timeout_ms=500
–master_inject_latency_on_tablet_lookups_ms=0
–master_inject_latency_on_transactional_tablet_lookups_ms=0
–master_slow_get_registration_probability=0
–notify_peer_of_removal_from_cluster=true
–tasks_tracker_keep_time_multiplier=300
–tasks_tracker_num_tasks=100
–tserver_unresponsive_timeout_ms=60000
–memstore_arena_size_kb=1024
–aggressive_compaction_for_read_amp=false
–dump_dbimpl_info=false
–fault_crash_after_rocksdb_flush=0
–flush_rocksdb_on_shutdown=true
–use_compaction_thread_pool_for_compactions=true
–use_compaction_thread_pool_for_flushes=false
–rocksdb_nothing_in_memtable_to_flush_sleep_ms=10
–use_per_file_metadata_for_flushed_frontier=false
–TEST_allow_stop_writes=true
–cache_single_touch_ratio=0.20000000000000001
–allow_preempting_compactions=true
–rpc_acceptor_listen_backlog=128
–rpc_connection_timeout_ms=15000
–read_buffer_memory_limit=-5
–collect_end_to_end_traces=false
–print_trace_every=0
–rpc_dump_all_traces=false
–rpc_slow_query_threshold_ms=10000
–io_thread_pool_size=4
–outbound_rpc_memory_limit=0
–rpc_default_keepalive_time_ms=65000
–rpc_queue_limit=10000
–rpc_workers_limit=256
–socket_receive_buffer_size=0
–rpc_callback_max_cycles=100000000
–num_connections_to_server=8
–proxy_resolve_cache_ms=5000
–max_backoff_ms_exponent=16
–min_backoff_ms_exponent=7
–retryable_rpc_single_call_timeout_ms=2500
–rpcs_shutdown_extra_delay_ms=5000
–rpcs_shutdown_timeout_ms=15000
–backpressure_recovery_period_ms=600000
–enable_backpressure_mode_for_testing=false
–max_time_in_queue_ms=6000
–TEST_delay_connect_ms=0
–enable_rpc_keepalive=true
–rpc_max_message_size=267386880
–web_log_bytes=1048576
–time_source=
–use_hybrid_clock=true
–rpc_bind_addresses=yb-tserver-n1:9100
–rpc_server_allow_ephemeral_ports=false
–TEST_check_broadcast_address=true
–generic_svc_num_threads=10
–generic_svc_queue_length=50
–num_reactor_threads=4
–yb_test_name=
–master_discovery_timeout_ms=3600000
–metrics_log_interval_ms=0
–placement_cloud=cloud1
–placement_region=datacenter1
–placement_uuid=
–placement_zone=rack1
–server_broadcast_addresses=
–server_dump_info_format=json
–server_dump_info_path=
–memory_limit_termination_threshold_pct=200
–total_mem_watcher_interval_millis=1000
–webserver_max_post_length_bytes=1048576
–webserver_authentication_domain=
–webserver_certificate_file=
–webserver_doc_root=/home/yugabyte/www
–webserver_enable_doc_root=true
–webserver_interface=
–webserver_num_worker_threads=50
–webserver_password_file=
–webserver_port=9000
–enable_maintenance_manager=true
–maintenance_manager_history_size=8
–maintenance_manager_num_threads=1
–maintenance_manager_polling_interval_ms=250
–TEST_delay_execute_async_ms=0
–tablet_operation_memory_limit_mb=1024
–tablet_inject_latency_on_apply_write_txn_ms=0
–tablet_pause_apply_write_ops=false
–max_group_replicate_batch_size=16
–intents_flush_max_delay_ms=2000
–tablet_bloom_block_size=4096
–tablet_bloom_target_fp_rate=0.0099999997764825821
–tablet_do_compaction_cleanup_for_intents=true
–tablet_do_dup_key_checks=true
–tablet_rocksdb_ops_quiet_down_timeout_ms=60000
–tablet_verify_flushed_frontier_after_modifying=false
–fault_crash_during_log_replay=0
–force_recover_flushed_frontier=false
–skip_remove_old_recovery_dir=false
–skip_wal_rewrite=true
–enable_tablet_orphaned_block_deletion=true
–delay_init_tablet_peer_ms=0
–timestamp_history_retention_interval_sec=10
–transaction_check_interval_usec=500000
–transaction_max_missed_heartbeat_periods=10
–transaction_resend_applying_interval_usec=5000000
–transaction_delay_status_reply_usec_in_tests=0
–transaction_ignore_applying_probability_in_tests=0
–heartbeat_interval_ms=1000
–heartbeat_max_failures_before_backoff=3
–heartbeat_rpc_timeout_ms=15000
–tserver_disable_heartbeat_test_only=false
–metrics_snapshotter_interval_ms=30000
–metrics_snapshotter_table_metrics_whitelist=rocksdb_sst_read_micros_sum,rocksdb_sst_read_micros_count
–metrics_snapshotter_tserver_metrics_whitelist=handler_latency_yb_client_read_local_sum,handler_latency_yb_client_read_local_count
–metrics_snapshotter_ttl_ms=604800000
–tserver_metrics_snapshotter_yb_client_default_timeout_ms=5000
–bytes_remote_bootstrap_durable_write_mb=8
–committed_config_change_role_timeout_sec=30
–fault_crash_bootstrap_client_before_changing_role=0
–remote_boostrap_rate_limit_bytes_per_sec=0
–remote_bootstrap_begin_session_timeout_ms=3000
–remote_bootstrap_end_session_timeout_sec=15
–remote_bootstrap_max_chunk_size=1048576
–remote_bootstrap_rate_limit_bytes_per_sec=268435456
–remote_bootstrap_save_downloaded_metadata=false
–simulate_long_remote_bootstrap_sec=0
–fault_crash_leader_after_changing_role=0
–fault_crash_leader_before_changing_role=0
–fault_crash_on_handle_rb_fetch_data=0
–inject_latency_before_change_role_secs=0
–remote_bootstrap_change_role_timeout_ms=15000
–remote_bootstrap_idle_timeout_ms=180000
–remote_bootstrap_timeout_poll_period_ms=10000
–skip_change_role=false
–cql_proxy_bind_address=0.0.0.0:9042
–cql_proxy_webserver_port=12000
–enable_direct_local_tablet_server_call=true
–inbound_rpc_memory_limit=0
–pgsql_proxy_bind_address=yb-tserver-n1:5433
–redis_proxy_bind_address=0.0.0.0:6379
–redis_proxy_webserver_port=11000
–start_pgsql_proxy=true
–tablet_server_svc_num_threads=128
–tablet_server_svc_queue_length=5000
–ts_admin_svc_num_threads=10
–ts_admin_svc_queue_length=50
–ts_consensus_svc_num_threads=128
–ts_consensus_svc_queue_length=5000
–ts_remote_bootstrap_svc_num_threads=10
–ts_remote_bootstrap_svc_queue_length=50
–tserver_enable_metrics_snapshotter=false
–cql_proxy_broadcast_rpc_address=
–start_cql_proxy=true
–start_redis_proxy=true
–tserver_tcmalloc_max_total_thread_cache_bytes=268435456
–assert_reads_from_follower_rejected_because_of_staleness=false
–assert_reads_served_by_follower=false
–max_stale_read_bound_time_ms=0
–max_wait_for_safe_time_ms=5000
–parallelize_read_ops=true
–respond_write_failed_probability=0
–rpc_delete_tablet_fail=false
–scanner_batch_size_rows=100
–scanner_default_batch_size_bytes=65536
–scanner_inject_latency_on_each_batch_ms=0
–scanner_max_batch_size_bytes=8388608
–simulate_time_out_failures=false
–tserver_noop_read_write=false
–db_block_cache_num_shard_bits=4
–db_block_cache_size_bytes=-1
–db_block_cache_size_percentage=50
–fault_crash_after_blocks_deleted=0
–fault_crash_after_cmeta_deleted=0
–fault_crash_after_rb_files_fetched=0
–fault_crash_after_wal_deleted=0
–flush_background_task_interval_msec=0
–global_memstore_size_mb_max=2048
–global_memstore_size_percentage=10
–num_tablets_to_open_simultaneously=0
–pretend_memory_exceeded_enforce_flush=false
–read_pool_max_queue_size=128
–read_pool_max_threads=128
–sleep_after_tombstoning_tablet_secs=0
–tablet_start_warn_threshold_ms=500
–tserver_yb_client_default_timeout_ms=5000
–tserver_master_addrs=yb-master-n1:7100
–tserver_master_replication_factor=0
–use_icu_timezones=true
–use_libbacktrace=false
–trace_to_console=
–TEST_simulate_fs_without_fallocate=false
–never_fsync=false
–o_direct_block_alignment_bytes=4096
–o_direct_block_size_bytes=4096
–suicide_on_eio=true
–writable_file_use_fsync=false
–running_test=false
–dump_metrics_json=false
–enable_process_lifetime_heap_profiling=false
–heap_profile_path=/tmp/yb-tserver.1
–svc_queue_length_default=50
–fs_data_dirs=/mnt/disk0,/mnt/disk1
–stop_on_parent_termination=false
–hung_task_check_interval_ms=200
–fatal_details_path_prefix=
–log_filename=yb-tserver
–minicluster_daemon_id=
–ref_counted_debug_type_name_regex=
–default_memory_limit_to_ram_ratio=0.84999999999999998
–mem_tracker_log_stack_trace=false
–mem_tracker_logging=false
–mem_tracker_update_consumption_interval_us=2000000
–memory_limit_hard_bytes=1073741824
–memory_limit_soft_percentage=85
–memory_limit_warn_threshold_percentage=98
–tcmalloc_max_free_bytes_percentage=10
–arena_warn_threshold_bytes=268435456
–allocator_aligned_mode=false
–metric_node_name=DEFAULT_NODE_NAME
–metrics_retirement_age_ms=120000
–dns_num_resolver_threads=1
–rate_limiter_min_rate=1000
–rate_limiter_min_size=32768
–local_ip_for_outbound_sockets=
–socket_inject_short_recvs=false
–disable_clock_sync_error=true
–max_clock_skew_usec=50000
–max_clock_sync_error_usec=10000000
–lock_contention_trace_threshold_cycles=2000000
–taskstream_queue_max_size=100000
–taskstream_queue_max_wait_ms=1000
–enable_tracing=false
–version_file_json_path=
–max_message_length=266338304
–rpcz_max_cql_batch_dump_count=4096
–rpcz_max_cql_query_dump_size=4096
–cql_nodelist_refresh_interval_secs=300
–cql_rpc_memory_limit=0
–cql_service_queue_length=10000
–cql_rpc_keepalive_time_ms=120000
–cql_service_max_prepared_statement_size_bytes=134217728
–cql_ybclient_reactor_threads=24
–enable_uncovered_index_select=true
–allow_index_table_read_write=false
–use_cassandra_authentication=false
–pggate_ignore_tserver_shm=false
–pggate_master_addresses=
–pggate_proxy_bind_address=
–pggate_rpc_timeout_secs=60
–pggate_tserver_shm_fd=-1
–pggate_ybclient_reactor_threads=2
–pgsql_rpc_keepalive_time_ms=0
–ysql_backward_prefetch_scale_factor=0.0625
–ysql_beta_feature_extension=false
–ysql_beta_feature_function=false
–ysql_beta_feature_trigger=false
–ysql_beta_features=true
–ysql_prefetch_limit=1024
–ysql_session_max_batch_size=512
–pg_proxy_bind_address=
–pg_transactions_enabled=true
–pgsql_proxy_webserver_port=13000
–redis_keys_threshold=10000
–redis_passwords_separator=,
–use_hashed_redis_password=true
–yedis_enable_flush=true
–redis_connection_soft_limit_grace_period_sec=60
–redis_max_batch=500
–redis_max_concurrent_commands=1
–redis_max_queued_bytes=134217728
–redis_max_read_buffer_size=134217728
–rpcz_max_redis_query_dump_size=4096
–redis_rpc_block_size=1048576
–redis_rpc_memory_limit=0
–redis_svc_queue_length=5000
–redis_rpc_keepalive_time_ms=0
–enable_redis_auth=true
–redis_callbacks_threadpool_size=64
–redis_max_command_size=265289728
–redis_max_value_size=67108864
–redis_password_caching_duration_ms=5000
–redis_safe_batch=true
–redis_service_yb_client_timeout_millis=3000
–alsologtoemail=
–alsologtostderr=false
–colorlogtostderr=false
–drop_log_memory=true
–log_backtrace_at=
–log_dir=/mnt/disk0/yb-data/tserver/logs
–log_link=
–log_prefix=true
–logbuflevel=-1
–logbufsecs=30
–logemaillevel=999
–logmailer=/bin/mail
–logtostderr=false
–max_log_size=1800
–minloglevel=0
–stderrthreshold=3
–stop_logging_if_full_disk=false
–symbolize_stacktrace=true
–v=0
–vmodule=

 Congratulations on installing YugaByte DB. We'd like to welcome you to the community with a free t-shirt and pack of stickers! Please claim your reward here: https://www.yugabyte.com/community-rewards/
version 1.3.1.0 build 16 revision f10e23d319a5e5df008467d19ebab95ce631a977 build_type RELEASE built at 31 Jul 2019 06:43:06 UTC
server uuid 1e471051fbf34fffbfe35cc31cfc77b9
```

Thanks for posting that information. I would recommend that you increase the amount of memory assigned by editing line 35 in yb-docker-ctl from:

   MEM_LIMIT_BYTES = 1024 * 1024 * 1024

to

   MEM_LIMIT_BYTES = 8 * 1024 * 1024 * 1024

This will increase the sizes of other tunables and should help improve performance.
Hope this helps.
–Alan

No difference.

Note that the slowness is independent of the number docs imported so far. I just created a new cluster RF=1 and the slowness is directly visible with the first three documents upon insertion. So raising the memory limit is unlikely a problem.

I created a sample JSON file

{
"data": "some very long base64 encoded string...."
}

and the insertion time seems to be somewhat linear with size of the “data” string.

Cross-check on Mac with Docker and same setup: same slow performance.

The following code show that the insertion time for a simple JSON document with variable payload takes extraordinary long. Processing and storing 10 MB of data should not take 60 seconds…do you see a pattern?

import os
import json
import time
from cassandra.cluster import Cluster, ExecutionProfile

cluster = Cluster(['127.0.0.1'])
profile = ExecutionProfile()
profile.request_timeout = 60
cluster.add_execution_profile('standard', profile)
session = cluster.connect()
session.execute('CREATE KEYSPACE IF NOT EXISTS ybdemo;')

session.execute(
  """
  CREATE TABLE IF NOT EXISTS ybdemo.ugent (id int PRIMARY KEY,
                                           data jsonb);
  """)

for i in range (0,100):
    data = dict(text='#'*int(1.5**i))
    data = json.dumps(data, sort_keys=True)
    try:
      
        s = f"INSERT INTO ybdemo.ugent(id, data) VALUES (0, '{data}'); """
        ts = time.time()
        session.execute(s, execution_profile=profile)
    except Exception as e:
        raise
    finally:
        print(len(data), time.time() - ts)
cluster.shutdown()
1489 0.0005118846893310547
2228 0.0005457401275634766
3337 0.0005476474761962891
4999 0.044568538665771484
7493 0.04389166831970215
11234 0.04388737678527832
16846 0.04402732849121094
25263 0.04415297508239746
37888 0.044014692306518555
56827 0.04466843605041504
85234 0.04521751403808594
127846 0.0510096549987793
191763 0.05900382995605469
287638 0.07605910301208496
431451 0.1303420066833496
647171 0.15634393692016602
970751 0.3462944030761719
1456121 0.810868501663208
2184176 1.6984179019927979
3276258 3.7382729053497314
4914381 8.33582878112793
7371566 18.58289861679077
11057344 41.855125188827515
16586010 60.03482794761658
Traceback (most recent call last):
  File "test3.py", line 25, in <module>
    session.execute(s, execution_profile=profile)
  File "cassandra/cluster.py", line 2141, in cassandra.cluster.Session.execute
  File "cassandra/cluster.py", line 4033, in cassandra.cluster.ResponseFuture.result
cassandra.OperationTimedOut: errors={'127.0.0.1': 'Client request timeout. See Session.execute[_async](timeout)'}, last_host=127.0.0.1

Thanks for sharing this. I have reproduced it in house and will fill a github issue to have it looked at.

–Alan

Opened issue #2110: https://github.com/YugaByte/yugabyte-db/issues/2110 to track this. Feel free to subscribe/comment further.

Thanks.

–Alan

@zopyx

Please see Slow performance INSERT'ing large JSONB column in YCQL · Issue #2110 · yugabyte/yugabyte-db · GitHub. If you used prepare/bind/execute model, and reuse the prepared statement, you should see much better performance.

The performance with a prepared statement is clearly better.

4490914 0.09193658828735352
6791914 0.11825180053710938
10243426 0.24472475051879883
15420706 0.3097679615020752
23186626 0.4547543525695801
35291615 0.7077429294586182
53492990 1.0943055152893066

However inserting the next record of about 7 MB size just blocks without any error, without any timeout. The logs record myriads of these messages.

E0821 04:49:58.241580 18126 process_context.cc:180] SQL Error: Invalid Arguments. YQL value too long
INSERT INTO ybdemo.ugent(id, data) VALUES (?, ?)

Likely there is a way to to increase the maximum size of here. But this error must be propagated back to the client.