YugaByte DB Community Forum

Apache Spark real-time analytics with YugaByte


As part of our vision to integrate transactional and fast-data applications into a unified database platform, we are happy to announce that we now support using Apache Spark on top of YugaByte. Since YugaByte DB is API-compatible with Cassandra, you can use the Spark Cassandra Connector to quickly get started with Spark and YugaByte.


You can run our Spark-based sample app with:

java -jar yb-sample-apps.jar --workload CassandraSparkWordCount --nodes

It will read from a table with sentences (by default will generate an input table ybdemo_keyspace.lines), compute the word frequency, and write to result to an output table (by default ybdemo_keyspace.wordcounts).

See also the general instructions for how to run the YugaByte sample apps and how to connect with cqlsh to check the result.

You can also easily migrate your existing Spark and Cassandra applications to YugaByte. Just install and start YugaByte, then switch the spark.cassandra.connection.host setting in your Spark configuration to the YugaByte server IPs.

YugaByte Spark Connector now supports query locality with Apache Spark