Apache Spark real-time analytics with YugaByte

mihnea · December 9, 2017, 12:01am

As part of our vision to integrate transactional and fast-data applications into a unified database platform, we are happy to announce that we now support using Apache Spark on top of YugaByte. Since YugaByte DB is API-compatible with Cassandra, you can use the Spark Cassandra Connector to quickly get started with Spark and YugaByte.

You can run our Spark-based sample app with:

java -jar yb-sample-apps.jar --workload CassandraSparkWordCount --nodes 127.0.0.1:9042

It will read from a table with sentences (by default will generate an input table ybdemo_keyspace.lines), compute the word frequency, and write to result to an output table (by default ybdemo_keyspace.wordcounts).

See also the general instructions for how to run the YugaByte sample apps and how to connect with cqlsh to check the result.

You can also easily migrate your existing Spark and Cassandra applications to YugaByte. Just install and start YugaByte, then switch the spark.cassandra.connection.host setting in your Spark configuration to the YugaByte server IPs.

Topic		Replies	Views
YugaByte Spark Connector now supports query locality with Apache Spark General	1	1008	March 8, 2023
Java Driver for YugaByte DB is now available on Maven Central General	1	910	March 8, 2023
YCSB Benchmark Results for YugaByte and Apache Cassandra General	1	13887	April 17, 2021
JanusGraph now works on YugaByte DB Announcements	0	1529	January 17, 2018
YCSB Benchmark Results for YugaByte and Apache Cassandra (again) with P99 Latencies General	1	6811	March 8, 2023

Apache Spark real-time analytics with YugaByte

Related topics