Spark-Scala-Gradle-Bootstrap: A Spark Project in Scala with Gradle
TL;DR: Setting up a development environment for Spark, Scala, Gradle, Delta lake, unit tests, and Scala code coverage can be time-consuming. Instead, you can use preconfigured projects like Mahendran’s GitHub repository.
Setting up a development environment for Apache Spark, Scala, Gradle, unit tests, and Scala code coverage can often be a time-consuming and daunting task. However, by embracing a preconfigured project and adopting best practices, you can take a significant leap forward in streamlining your development process. Here’s a brief description of how this approach can make a difference, illustrated with an example:
The Leap with a Preconfigured Project
Instant Readiness: Spark, Scala, Gradle, and testing tools, are already set up.
Simplified Configuration: gradle as build tool, checktstyle and spotless for clean code with dependencies including kafka, aws, scala test
3. Built-in Testing: Comes with unit tests that can be run in IDE and dev environment.
4. Code Coverage Integration: Measure the coverage of unit tests with Scala code coverage tools (scoverage)
Prerequisites
Environment must be configured to have:
- Java 11
- Scala 2.12
- Apache Spark 3.4.1
Libraries Included
The project includes the following important library versions:
- Java Version: 1.11
- Spark Version: 3.4.1
- Scala Version: 2.12
- Delta Lake Version: 2.4.0
- Visit the GitHub Repository: Go to Mahendran’s GitHub repository.
- Clone the Repository: Clone the repository to your local machine.
- Run Gradle Tasks: For instance, run
./gradlew build
to build your project,./gradlew check
to run unit tests, and./gradlew reportTestScoverage
to generate code coverage reports. - Focus on Your Application: With your development environment ready, you can concentrate on writing your Spark application, assured that your tools are well-integrated and tests are in place.
Build
To build the project, follow these steps:
- Confirm your Java version by running:
java -version
You should see a response similar to:
openjdk version "11.0.20" 2023-07-18
OpenJDK Runtime Environment Homebrew (build 11.0.20+0)
OpenJDK 64-Bit Server VM Homebrew (build 11.0.20+0, mixed mode)
2. Build the project using Gradle:
./gradlew clean build
Testing
To run the tests, execute:
./gradlew check
Run Test Coverage
For test coverage reports, run:
./gradlew reportTestScoverage
This will help you assess the quality of your code and identify areas that may need improvement.
Running the Project
Running Spark Submit Task
The Gradle sparkSubmit task is configured to run with the class dev.template.spark.RddCollect
. You can execute it with:
./gradlew sparkSubmit
Note: Make sure a local Spark instance is up and running. You can access the Spark web UI at http://localhost:8080/ and the history server at http://localhost:18080/.
Follow the Readme section here to run examples
With this Bootstrap projecy, you’re well-equipped to jumpstart your Spark project in Scala using Gradle. Happy coding and data processing!