Here’s the basic spark-submit command you might be using in production

spark-submit
--packages org.apache.spark:spark-avro_2.11:2.4.0
--class class-name
--master yarn
--deploy-mode cluster
--driver-memory 4g
--num-executors 2
--executor-memory 2g
--executor-cores 2
--files sample_conf.json

tl;dr: skip the reading and calculate the resources

Lifecycle of A Spark Application [Simple Overview]

Before we dive deep, in-a-nutshell, when a lifecycle of a spark application is:


Objective

Configure the Spark Web UI in your local environment.

Prerequisites

Chapter 1: Install Spark on you local

1. Configure spark-defaults.conf

1.1 Go to the apache-spark conf directory

cd /usr/local/Cellar/apache-spark/3.0.0/libexec/conf

1.2 Copy the spark-defaults.conf template

cp spark-defaults.conf.template spark-defaults.conf

1.3 Add these values

spark.master http://localhost:7077
spark.eventLog.enabled true
spark.eventLog.dir file:///tmp/spark-events
spark.history.fs.logDirectory file:///tmp/spark-events

2. Configure spark-env.sh

2.1 Copy the spark-env.sh.template

cp spark-env.sh.template spark-env.sh

2.2 Add the below values

# Options for the daemons used in the standalone deploy mode
SPARK_MASTER_HOST=localhost
SPARK_MASTER_PORT=7077
SPARK_MASTER_WEBUI_PORT=8080
SPARK_LOCAL_IP=localhost

3. Start the spark daemons

cd /usr/local/Cellar/apache-spark/3.0.0/libexec/sbin./start-all.sh
./start-history-server.sh

Apache Spark 3.0 is released. Get started today by downloading the simple WordCount program with Maven on your IDE[IntelliJ IDEA], run it and learn more start your spark learning today.

Chapter 1: Spark 3.0 Get Started Locally

Objective

Apache Spark 3.0 is released. Get your hands dirty by running a simple workdcount program in < 5 mins. The project is created with maven scala-spark 3.0 with funspec as test suite

Prerequisites

java 11 | brew cask install java11

scala 2.12 | brew cask install java

apach-spark 3.0 | brew install apache-spark

Alternatively, click the image to Create a new Maven-Scala-Spark project from…


Setup your Spark(3.0)-Scala-Maven project with spark tests in IntelliJ IDEA

Objective

Setup your local environment with Maven, Scala, Spark 3.0. The setup takes hours and I believe this tutorial will get through the setup easily.

Prerequisites

java 11

Scala 2.12

apache-spark 3.0

Alternatively, Click here to download spark-poc and get started.

1. Install Apache-spark 3.0 or update

brew upgrade && brew update // updates your spark  orbrew install apache-spark


Write avro records from kafka to minio(local s3)

Github Repo

kafka-ksql-jdbc-sink

Objective

Run docker kafka containers to producer data to kafka and then write to s3 [minio]. All that in less < 3 mins

Prerequisites

brew install gradle
brew install minio/stable/mc

1. Clone the project

git clone https://github.com/mahen-github/kafka-ksql-jdbc-sink.git

2. Build the project

cd kafka-ksql-to-sink

gradle clean build

3. Run docker compose

docker-compose up

Mahendran

A Senior Engineer/Bigdata enthusiast.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store