Here’s the basic spark-submit command you might be using in production

--packages org.apache.spark:spark-avro_2.11:2.4.0
--class class-name
--master yarn
--deploy-mode cluster
--driver-memory 4g
--num-executors 2
--executor-memory 2g
--executor-cores 2
--files sample_conf.json

tl;dr: skip the reading and calculate the resources

Lifecycle of A Spark Application [Simple Overview]

Before we dive deep, in-a-nutshell, when a lifecycle of a spark application is:


Configure the Spark Web UI in your local environment.


Chapter 1: Install Spark on you local

1. Configure spark-defaults.conf

1.1 Go to the apache-spark conf directory

cd /usr/local/Cellar/apache-spark/3.0.0/libexec/conf

1.2 Copy the spark-defaults.conf template

cp spark-defaults.conf.template spark-defaults.conf

1.3 Add these values

spark.master http://localhost:7077
spark.eventLog.enabled true
spark.eventLog.dir file:///tmp/spark-events
spark.history.fs.logDirectory file:///tmp/spark-events

2. Configure

2.1 Copy the


2.2 Add the below values

# Options for the daemons used in the standalone deploy mode

3. Start the spark daemons

cd /usr/local/Cellar/apache-spark/3.0.0/libexec/sbin./

Apache Spark 3.0 is released. Get started today by downloading the simple WordCount program with Maven on your IDE[IntelliJ IDEA], run it and learn more start your spark learning today.

Chapter 1: Spark 3.0 Get Started Locally


Apache Spark 3.0 is released. Get your hands dirty by running a simple workdcount program in < 5 mins. The project is created with maven scala-spark 3.0 with funspec as test suite


java 11 | brew cask install java11

scala 2.12 | brew cask install java

apach-spark 3.0 | brew install apache-spark

Alternatively, click the image to Create a new Maven-Scala-Spark project from…

Setup your Spark(3.0)-Scala-Maven project with spark tests in IntelliJ IDEA


Setup your local environment with Maven, Scala, Spark 3.0. The setup takes hours and I believe this tutorial will get through the setup easily.


java 11

Scala 2.12

apache-spark 3.0

Alternatively, Click here to download spark-poc and get started.

1. Install Apache-spark 3.0 or update

brew upgrade && brew update // updates your spark  orbrew install apache-spark

Write avro records from kafka to minio(local s3)

Github Repo



Run docker kafka containers to producer data to kafka and then write to s3 [minio]. All that in less < 3 mins


brew install gradle
brew install minio/stable/mc

1. Clone the project

git clone

2. Build the project

cd kafka-ksql-to-sink

gradle clean build

3. Run docker compose

docker-compose up


A Senior Engineer/Bigdata enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store