Deploy Apache Spark 3.5.6 on a Single-Node Bare Metal Ubuntu 24.04 Server

This guide walks you through deploying Apache Spark 3.5.6 on a single-node Ubuntu 24.04 bare metal server using standalone mode. The setup includes:

  • OpenJDK 17
  • Spark Master and Worker
  • PostgreSQL-backed persistent metastore
  • Hive-compatible SQL queries
  • systemd services for lifecycle management
  • Spark Web UI and local warehouse directory

This is a robust setup for testing, development, or extending into a production cluster.


  1. Install Java
sudo apt install -y openjdk-17-jdk

  1. Install PostgreSQL 16
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt install -y postgresql-16

  1. Create Metastore Database and User
sudo runuser -l postgres -c 'createuser -P hive'
sudo runuser -l postgres -c 'createdb -O hive hive_metastore'

  1. Download and Install Apache Spark 3.5.6
wget https://dlcdn.apache.org/spark/spark-3.5.6/spark-3.5.6-bin-hadoop3.tgz
tar -xzf spark-3.5.6-bin-hadoop3.tgz
sudo mv spark-3.5.6-bin-hadoop3 /opt/spark

  1. Create spark User
sudo useradd -m -s /bin/bash spark
sudo chown -R spark:spark /opt/spark

  1. Set Environment for Spark User

Edit .bashrc:

sudo -u spark vi /home/spark/.bashrc

Append:

export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

  1. Configure Hive Metastore in Spark
sudo -u spark vi /opt/spark/conf/hive-site.xml

Paste:

<configuration>
   <property>
      <name>javax.jdo.option.ConnectionURL</name>
      <value>jdbc:postgresql://localhost:5432/hive_metastore</value>
      <description>JDBC connect string for a JDBC metastore</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>org.postgresql.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
      <description>Username to use against metastore database</description>
   </property>
   <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hive</value>
      <description>Password to use against metastore database</description>
   </property>
   <property>
      <name>datanucleus.schema.autoCreateTables</name>
      <value>true</value>
   </property>
   <property>
      <name>hive.metastore.schema.verification</name>
      <value>false</value>
   </property>
</configuration>


  1. Add PostgreSQL JDBC Driver
wget https://jdbc.postgresql.org/download/postgresql-42.7.6.jar
sudo mv postgresql-42.7.6.jar /opt/spark/jars/

  1. Configure Spark Master Hostname
sudo -u spark vi /opt/spark/conf/spark-env.sh

Add:

export SPARK_MASTER_HOST=spark.maksonlee.com

  1. Configure Spark Defaults
sudo -u spark vi /opt/spark/conf/spark-defaults.conf

Paste:

spark.sql.warehouse.dir          /opt/spark/warehouse

  1. Create systemd Services
  • Spark Master
sudo vi /etc/systemd/system/spark-master.service
[Unit]
Description=Apache Spark Master
After=network.target

[Service]
Type=forking
User=spark
Group=spark
Environment=SPARK_HOME=/opt/spark
ExecStart=/opt/spark/sbin/start-master.sh
ExecStop=/opt/spark/sbin/stop-master.sh
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target
  • Spark Worker
sudo vi /etc/systemd/system/spark-worker.service
[Unit]
Description=Apache Spark Worker
After=network.target spark-master.service
Requires=spark-master.service

[Service]
Type=forking
User=spark
Group=spark
Environment=SPARK_HOME=/opt/spark
ExecStart=/opt/spark/sbin/start-worker.sh spark://spark.maksonlee.com:7077
ExecStop=/opt/spark/sbin/stop-worker.sh
Restart=on-failure
LimitNOFILE=65536

[Install]
WantedBy=multi-user.target

  1. Start Spark Services
sudo systemctl daemon-reload
sudo systemctl enable --now spark-master
sudo systemctl enable --now spark-worker

  1. Access Spark Web UI

Visit:

http://spark.maksonlee.com:8080

Summary

You’ve successfully deployed Apache Spark 3.5.6 on a single-node Ubuntu 24.04 system using:

  • Spark Standalone Master + Worker
  • systemd-managed services
  • PostgreSQL as a persistent catalog backend
  • Hive-compatible SQL without installing Hive
  • Web UI available at spark.maksonlee.com

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top