This guide walks you through deploying Apache Spark 3.5.6 on a single-node Ubuntu 24.04 bare metal server using standalone mode. The setup includes:
- OpenJDK 17
- Spark Master and Worker
- PostgreSQL-backed persistent metastore
- Hive-compatible SQL queries
- systemd services for lifecycle management
- Spark Web UI and local warehouse directory
This is a robust setup for testing, development, or extending into a production cluster.
- Install Java
sudo apt install -y openjdk-17-jdk
- Install PostgreSQL 16
sudo apt install -y postgresql-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
sudo apt install -y postgresql-16
- Create Metastore Database and User
sudo runuser -l postgres -c 'createuser -P hive'
sudo runuser -l postgres -c 'createdb -O hive hive_metastore'
- Download and Install Apache Spark 3.5.6
wget https://dlcdn.apache.org/spark/spark-3.5.6/spark-3.5.6-bin-hadoop3.tgz
tar -xzf spark-3.5.6-bin-hadoop3.tgz
sudo mv spark-3.5.6-bin-hadoop3 /opt/spark
- Create
spark
User
sudo useradd -m -s /bin/bash spark
sudo chown -R spark:spark /opt/spark
- Set Environment for Spark User
Edit .bashrc
:
sudo -u spark vi /home/spark/.bashrc
Append:
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
- Configure Hive Metastore in Spark
sudo -u spark vi /opt/spark/conf/hive-site.xml
Paste:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://localhost:5432/hive_metastore</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>Username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>Password to use against metastore database</description>
</property>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
</configuration>
- Add PostgreSQL JDBC Driver
wget https://jdbc.postgresql.org/download/postgresql-42.7.6.jar
sudo mv postgresql-42.7.6.jar /opt/spark/jars/
- Configure Spark Master Hostname
sudo -u spark vi /opt/spark/conf/spark-env.sh
Add:
export SPARK_MASTER_HOST=spark.maksonlee.com
- Configure Spark Defaults
sudo -u spark vi /opt/spark/conf/spark-defaults.conf
Paste:
spark.sql.warehouse.dir /opt/spark/warehouse
- Create systemd Services
- Spark Master
sudo vi /etc/systemd/system/spark-master.service
[Unit]
Description=Apache Spark Master
After=network.target
[Service]
Type=forking
User=spark
Group=spark
Environment=SPARK_HOME=/opt/spark
ExecStart=/opt/spark/sbin/start-master.sh
ExecStop=/opt/spark/sbin/stop-master.sh
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
- Spark Worker
sudo vi /etc/systemd/system/spark-worker.service
[Unit]
Description=Apache Spark Worker
After=network.target spark-master.service
Requires=spark-master.service
[Service]
Type=forking
User=spark
Group=spark
Environment=SPARK_HOME=/opt/spark
ExecStart=/opt/spark/sbin/start-worker.sh spark://spark.maksonlee.com:7077
ExecStop=/opt/spark/sbin/stop-worker.sh
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
- Start Spark Services
sudo systemctl daemon-reload
sudo systemctl enable --now spark-master
sudo systemctl enable --now spark-worker
- Access Spark Web UI
Visit:
http://spark.maksonlee.com:8080
Summary
You’ve successfully deployed Apache Spark 3.5.6 on a single-node Ubuntu 24.04 system using:
- Spark Standalone Master + Worker
- systemd-managed services
- PostgreSQL as a persistent catalog backend
- Hive-compatible SQL without installing Hive
- Web UI available at
spark.maksonlee.com