YugaByte Spark Connector now supports query locality with Apache Spark

mihnea · January 12, 2018, 1:16am

A few weeks ago we announced support for running Apache Spark on top of YugaByte using the Apache Cassandra API.

Today, we are excited to announce support for query locality with Apache Spark using our YugaByte fork of the Spark Cassandra Connector. You can find the deployed packages here.

Our fork ensures that the Spark partitions are based on the internal sharding of YugaByte tables and that Spark queries are efficiently routed to the right YugaByte node (read more about YugaByte data sharding in our docs).

To give it a try, update the package configuration for your existing YugaByte-based application:
Java/Maven:
Add the following snippet to your pom.xml

<dependency>
<groupId>com.yugabyte.spark</groupId>
 <artifactId>spark-cassandra-connector_2.10</artifactId>
 <version>2.0.5-yb-1</version>
</dependency>

Scala/sbt:
Add the following library dependency to your project configuration:

libraryDependencies += "com.yugabyte.spark" %% "spark-cassandra-connector" % "2.0.5-yb-1"

Python:
Start PySpark with:

$ pyspark --packages com.yugabyte.spark:spark-cassandra-connector_2.10:2.0.5-yb-1

If you don’t have an existing app, you can get started quickly by installing YugaByte and trying out our Spark sample apps.

Topic		Replies	Views
Apache Spark real-time analytics with YugaByte General	0	2039	December 9, 2017
Java Driver for YugaByte DB is now available on Maven Central General	1	912	March 8, 2023
YCSB Benchmark Results for YugaByte and Apache Cassandra General	1	13901	April 17, 2021
YCSB Benchmark Results for YugaByte and Apache Cassandra (again) with P99 Latencies General	1	6828	March 8, 2023
JanusGraph now works on YugaByte DB Announcements	0	1529	January 17, 2018

YugaByte Spark Connector now supports query locality with Apache Spark

Related topics