Spark-Scala-Gradle-Bootstrap: A Spark Project in Scala with Gradle

Mahendran
3 min readOct 23, 2023

--

TL;DR: Setting up a development environment for Spark, Scala, Gradle, Delta lake, unit tests, and Scala code coverage can be time-consuming. Instead, you can use preconfigured projects like Mahendran’s GitHub repository.

Setting up a development environment for Apache Spark, Scala, Gradle, unit tests, and Scala code coverage can often be a time-consuming and daunting task. However, by embracing a preconfigured project and adopting best practices, you can take a significant leap forward in streamlining your development process. Here’s a brief description of how this approach can make a difference, illustrated with an example:

The Leap with a Preconfigured Project

Instant Readiness: Spark, Scala, Gradle, and testing tools, are already set up.

Simplified Configuration: gradle as build tool, checktstyle and spotless for clean code with dependencies including kafka, aws, scala test

3. Built-in Testing: Comes with unit tests that can be run in IDE and dev environment.

4. Code Coverage Integration: Measure the coverage of unit tests with Scala code coverage tools (scoverage)

Prerequisites

Environment must be configured to have:

  1. Java 11
  2. Scala 2.12
  3. Apache Spark 3.4.1

Libraries Included

The project includes the following important library versions:

  • Java Version: 1.11
  • Spark Version: 3.4.1
  • Scala Version: 2.12
  • Delta Lake Version: 2.4.0
  1. Visit the GitHub Repository: Go to Mahendran’s GitHub repository.
  2. Clone the Repository: Clone the repository to your local machine.
  3. Run Gradle Tasks: For instance, run ./gradlew build to build your project, ./gradlew check to run unit tests, and ./gradlew reportTestScoverage to generate code coverage reports.
  4. Focus on Your Application: With your development environment ready, you can concentrate on writing your Spark application, assured that your tools are well-integrated and tests are in place.

Build

To build the project, follow these steps:

  1. Confirm your Java version by running:
java -version

You should see a response similar to:

openjdk version "11.0.20" 2023-07-18 
OpenJDK Runtime Environment Homebrew (build 11.0.20+0)
OpenJDK 64-Bit Server VM Homebrew (build 11.0.20+0, mixed mode)

2. Build the project using Gradle:

./gradlew clean build

Testing

To run the tests, execute:

./gradlew check

Run Test Coverage

For test coverage reports, run:

./gradlew reportTestScoverage

This will help you assess the quality of your code and identify areas that may need improvement.

Running the Project

Running Spark Submit Task

The Gradle sparkSubmit task is configured to run with the class dev.template.spark.RddCollect. You can execute it with:

./gradlew sparkSubmit

Note: Make sure a local Spark instance is up and running. You can access the Spark web UI at http://localhost:8080/ and the history server at http://localhost:18080/.

Follow the Readme section here to run examples

With this Bootstrap projecy, you’re well-equipped to jumpstart your Spark project in Scala using Gradle. Happy coding and data processing!

--

--

Mahendran

A Software/Data Engineer, Photographer, Mentor, and Traveler