Bash Scripting Basics
Shebang and safe defaults
The first line tells the OS which interpreter to use. The set options below prevent the most common categories of silent failure in shell scripts.
#!/usr/bin/env bash
set -euo pipefail
-e — exit immediately if any command returns non-zero. -u — treat unset variables as an error (prevents typos like $HOEM silently expanding to empty). -o pipefail — if any command in a pipeline fails, the whole pipeline fails (otherwise only the last command's exit code counts).
set -euo pipefail.
set -e caveats
set -e has several well-known cases where it does not exit on failure, which can surprise you:
set -e
# DOES exit on failure — normal command
cp missing.txt /tmp/
# Does NOT exit — commands after || are considered "handled"
grep "pattern" file.txt || echo "not found" # no exit even if grep fails
# Does NOT exit inside [[ ]] tests
[[ -f missing.txt ]] # false but no exit
# Does NOT exit in if conditions (by design — if checks the exit code)
if grep "pattern" file.txt; then
echo "found"
fi
# Subshell failure DOES propagate (with pipefail)
cat file.txt | grep "pattern" | head -1 # if grep fails, pipeline fails
# Workaround: capture exit code without triggering -e
grep "pattern" file.txt && found=true || found=false
The most common trap: grep returns exit code 1 when it finds no matches. In a script with set -e, a grep that finds nothing will exit the script — unless you add || true or use it in an if statement. Check grep returns carefully.
Variables
# Assignment — no spaces around =
name="alice"
count=42
path="/etc/nginx/nginx.conf"
# Reading a variable — always quote to prevent word splitting
echo "$name"
echo "User: $name, count: $count"
# Command substitution — capture output of a command
hostname=$(hostname -f)
today=$(date +%Y-%m-%d)
echo "Running on $hostname at $today"
# Arithmetic
total=$((count + 10))
echo "$total"
# Default value — use fallback if variable is unset or empty
log_dir="${LOG_DIR:-/var/log/myapp}"
user="${DEPLOY_USER:-deploy}"
${VAR:-default} is the most useful substitution: use the value of VAR if set and non-empty, otherwise use the default. It does not change VAR itself.
Special variables
$0 # script name
$1 $2 # positional arguments (first arg, second arg)
$@ # all arguments as separate words — use this in loops
$# # number of arguments
$$ # PID of the current shell
$? # exit code of the last command (0 = success)
Conditionals
# Basic if / elif / else
if [[ "$1" == "start" ]]; then
echo "Starting service"
elif [[ "$1" == "stop" ]]; then
echo "Stopping service"
else
echo "Usage: $0 start|stop"
exit 1
fi
# File tests
if [[ -f "/etc/nginx/nginx.conf" ]]; then
echo "Config exists"
fi
if [[ ! -d "/var/log/myapp" ]]; then
mkdir -p /var/log/myapp
fi
# String tests
if [[ -z "$name" ]]; then echo "name is empty"; fi
if [[ -n "$name" ]]; then echo "name is set: $name"; fi
# Number comparison
if [[ $count -gt 10 ]]; then echo "more than 10"; fi
if [[ $count -eq 0 ]]; then echo "zero"; fi
Common test operators
# Files
-f FILE # exists and is a regular file
-d DIR # exists and is a directory
-r FILE # exists and is readable
-w FILE # exists and is writable
-x FILE # exists and is executable
-s FILE # exists and has size > 0
# Strings
-z STRING # string is empty (zero length)
-n STRING # string is non-empty
== # strings are equal
!= # strings are not equal
# Numbers (integer comparison)
-eq -ne -lt -le -gt -ge
Always use [[ ]] (double bracket) in bash scripts, not [ ] (single bracket). Double bracket is safer: no word splitting inside, supports && and ||, handles empty variables without errors.
Loops
# Loop over a list of items
for host in web01 web02 web03; do
echo "Checking $host"
ssh "$host" uptime
done
# Loop over files
for conf in /etc/nginx/conf.d/*.conf; do
echo "Validating $conf"
nginx -t -c "$conf"
done
# Loop over command output (use process substitution — avoid pipefail issues)
while IFS= read -r line; do
echo "Processing: $line"
done < <(grep "ERROR" /var/log/app.log)
# Loop with a counter
for i in $(seq 1 5); do
echo "Attempt $i"
done
# While loop
attempts=0
while [[ $attempts -lt 3 ]]; do
# try something
attempts=$((attempts + 1))
done
Use while IFS= read -r line; done < <(command) when looping over command output — it handles lines with spaces correctly and preserves the set -e exit code behaviour. Avoid for line in $(command) which splits on whitespace and breaks on filenames with spaces.
Functions
#!/usr/bin/env bash
set -euo pipefail
# Define before use
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
}
die() {
echo "ERROR: $*" >&2
exit 1
}
require_root() {
if [[ $EUID -ne 0 ]]; then
die "This script must be run as root"
fi
}
# Functions take positional args just like scripts
deploy_service() {
local service="$1" # local — variable is scoped to this function
local env="${2:-prod}"
log "Deploying $service to $env"
systemctl restart "$service"
}
# Call them
require_root
deploy_service nginx
deploy_service postfix staging
Use local for all variables inside functions. Without it, variables are global and will leak into the rest of the script. Use $* to pass all arguments to a command as a single string; use "$@" to pass them as separate, properly-quoted words.
stdin / stdout / stderr
# stdout — normal output (file descriptor 1)
echo "This goes to stdout"
# stderr — error/status messages (file descriptor 2)
echo "This is an error" >&2
# Redirect stdout to a file
command > /tmp/output.txt
# Append stdout to a file
command >> /tmp/output.txt
# Redirect stderr to a file
command 2> /tmp/errors.txt
# Redirect both stdout and stderr to the same file
command > /tmp/all.txt 2>&1
# Or in bash 4+:
command &> /tmp/all.txt
# Discard output (send to /dev/null)
command > /dev/null # discard stdout
command > /dev/null 2>&1 # discard both
# Pipe stdout to another command
command | grep "pattern"
# Pipe both stdout and stderr
command 2>&1 | grep "ERROR"
Exit codes and $?
# Every command returns an exit code: 0 = success, non-zero = failure
grep "pattern" file.txt
echo "grep returned: $?" # 0 if found, 1 if not found, 2 if error
# Common exit code pattern — stop on failure with a message
systemctl start nginx || { echo "Failed to start nginx" >&2; exit 1; }
# Run a command and check result without set -e stopping the script
if ! systemctl is-active --quiet nginx; then
echo "nginx is not running" >&2
exit 1
fi
# Return a value from a function via exit code
is_port_open() {
nc -z -w2 "$1" "$2" > /dev/null 2>&1
# nc returns 0 if port is open, 1 if not
}
if is_port_open web01 80; then
echo "Port 80 is open"
fi
# Explicit exit codes in your script
exit 0 # success
exit 1 # general failure (most common)
exit 2 # misuse of shell builtins (usage error)
String operations
str="hello-world-2024"
# Length
echo "${#str}" # 16
# Substring (offset, length)
echo "${str:6:5}" # world
# Strip prefix (shortest match)
echo "${str#hello-}" # world-2024
# Strip prefix (longest match — greedy)
echo "${str##*-}" # 2024
# Strip suffix
echo "${str%-*}" # hello-world
# Replace first occurrence
echo "${str/world/WORLD}" # hello-WORLD-2024
# Replace all occurrences
echo "${str//l/L}" # heLLo-worLd-2024
# Uppercase / lowercase (bash 4+)
echo "${str^^}" # HELLO-WORLD-2024
echo "${str,,}" # hello-world-2024
Arrays
# Declare and populate
hosts=("web01" "web02" "db01")
fruits=("apple" "banana" "cherry")
# Access element
echo "${hosts[0]}" # web01
# All elements (preserve quoting)
echo "${hosts[@]}"
# Number of elements
echo "${#hosts[@]}" # 3
# Loop over array — always quote [@]
for host in "${hosts[@]}"; do
ssh "$host" uptime
done
# Append to array
hosts+=("db02")
# Slice (elements 1 and 2)
echo "${hosts[@]:1:2}"
Here-docs
Here-docs let you write multi-line strings inline. Useful for generating config files or sending multi-line input to commands.
# Write a multi-line file
cat > /etc/motd << 'EOF'
Welcome to production.
Unauthorised access is prohibited.
EOF
# Indented here-doc (strip leading tabs with <<-)
# Note: must use actual tab characters, not spaces
cat <<- EOF
This line has a leading tab stripped.
So does this one.
EOF
# Pass multi-line input to a command
ssh deploy@server bash << 'REMOTE'
cd /opt/myapp
git pull
systemctl restart myapp
REMOTE
# Here-string (single line, no file)
grep "root" <<< "$(cat /etc/passwd)"
Use << 'EOF' (quoted delimiter) to prevent variable expansion inside the here-doc. Use << EOF (unquoted) when you want variables like $hostname to expand.
Common patterns
Require arguments
#!/usr/bin/env bash
set -euo pipefail
usage() {
echo "Usage: $0 <service> <environment>"
echo " service — systemd service name"
echo " environment — prod|staging|dev"
exit 1
}
[[ $# -lt 2 ]] && usage
SERVICE="$1"
ENV="$2"
Temporary files and cleanup
#!/usr/bin/env bash
set -euo pipefail
# Create a temp file and clean it up on exit (even if script fails)
tmpfile=$(mktemp /tmp/deploy.XXXXXX)
trap 'rm -f "$tmpfile"' EXIT
# Work with the temp file
ansible-playbook site.yml --check > "$tmpfile" 2>&1
grep "changed=" "$tmpfile"
Lock file — prevent concurrent runs
LOCKFILE=/var/run/myscript.lock
if ! mkdir "$LOCKFILE" 2>/dev/null; then
echo "Script already running (lockdir: $LOCKFILE)" >&2
exit 1
fi
trap 'rmdir "$LOCKFILE"' EXIT
# rest of script
Parse simple flags
dry_run=false
verbose=false
while [[ $# -gt 0 ]]; do
case "$1" in
-n|--dry-run) dry_run=true ;;
-v|--verbose) verbose=true ;;
-h|--help) usage ;;
*) echo "Unknown option: $1" >&2; usage ;;
esac
shift
done
Retry loop
wait_for_service() {
local host="$1" port="$2" retries=10 delay=3
for i in $(seq 1 $retries); do
if nc -z -w2 "$host" "$port" > /dev/null 2>&1; then
return 0
fi
echo "Waiting for $host:$port (attempt $i/$retries)..."
sleep "$delay"
done
echo "Timed out waiting for $host:$port" >&2
return 1
}
Debugging scripts
# Run with trace output — prints each command before executing
bash -x script.sh
# Or add to the top of the script
set -x
# Print only specific sections
set -x
some_complex_function
set +x # turn off trace
# Dry-run pattern — check what would happen
DRY_RUN="${DRY_RUN:-false}"
run() {
if [[ "$DRY_RUN" == "true" ]]; then
echo "[DRY RUN] $*"
else
"$@"
fi
}
run systemctl restart nginx
run rm -rf /tmp/old-deploy
# shellcheck — static analysis for bash scripts
# Install: dnf install ShellCheck / apt install shellcheck
shellcheck script.sh
shellcheck catches common mistakes like unquoted variables, deprecated syntax, and portability issues. Run it on any non-trivial script before deploying.
Security in shell scripts
curl https://example.com/install.sh | bash— runs remote code without inspectioneval "$user_input"— executes arbitrary code from external input- Building commands with unvalidated external data (command injection)
# Bad: unquoted variable allows word splitting / globbing attacks
rm -rf $user_directory
# Good: quote everything
rm -rf "$user_directory"
# Bad: using user input directly in a command
find /var/log -name "$search_term" # if $search_term is "* ; rm -rf /"...
# Good: validate and restrict input first
if [[ "$search_term" =~ ^[a-zA-Z0-9._-]+$ ]]; then
find /var/log -name "$search_term"
else
echo "Invalid input" >&2
exit 1
fi
# Never use -x (trace) in scripts that handle passwords or tokens
# set -x will print the values to stdout — check before enabling in CI
cron environment
Cron runs with a minimal, stripped-down environment. Scripts that work perfectly in your shell often fail silently in cron because PATH is different and environment variables are missing.
# Cron's default PATH is roughly:
# /usr/bin:/bin (no /usr/local/bin, no ~/bin, no custom paths)
# Fix 1: Set PATH at the top of the crontab
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
# Fix 2: Use full paths in cron commands
0 2 * * * /usr/local/bin/backup.sh
# Fix 3: Source the user environment in the script
#!/usr/bin/env bash
source /etc/profile
source ~/.bash_profile
# Debug cron environment: capture what cron sees
* * * * * env > /tmp/cron-env.txt
Cron also does not have HOME, USER, DISPLAY, SSH_AUTH_SOCK, or any of the variables you see in an interactive shell. If your script relies on any of these, set them explicitly or source the appropriate profile file. The env > /tmp/cron-env.txt trick lets you see exactly what environment cron provides on your specific system.
shellcheck: static analysis
shellcheck is the single most effective lint for shell scripts — it catches quoting errors, [-vs-[[ mistakes, useless cats, SC2086 word-splitting bugs, and hundreds of other subtle issues that bash -n will never find.
# Install
dnf install ShellCheck # RHEL / Fedora
apt install shellcheck # Debian / Ubuntu
brew install shellcheck # macOS
# Lint a single script
shellcheck script.sh
# Lint every *.sh tracked by git (ignore vendored / third-party)
git ls-files '*.sh' | xargs -r shellcheck
# Enforce a minimum severity and fail the build on any warning+
shellcheck -S warning -x script.sh
# Suppress a specific finding inline (with justification)
# shellcheck disable=SC2086 # intentional: we want word splitting here
rm -rf $paths_from_trusted_source
CI one-liner
# .gitlab-ci.yml — lint every shell script in the repo
shellcheck:
image: koalaman/shellcheck-alpine:stable
stage: lint
script:
- shellcheck -S warning $(git ls-files '*.sh' '*.bash')
Add #!/usr/bin/env bash as the first line of every script — shellcheck uses the shebang to pick the right dialect. For sh-only shops (busybox, POSIX), use shellcheck -s sh explicitly so bash-isms are flagged.
mktemp + trap: safe temp files
Long-running scripts inevitably create scratch files and directories. The safe pattern is always mktemp for a unique path, then an EXIT trap that cleans up regardless of how the script terminates — normal completion, early exit 1, or an uncaught signal under set -e.
#!/usr/bin/env bash
set -euo pipefail
# Create a private temp directory (user-only perms, unpredictable name)
tmp=$(mktemp -d -t deploy.XXXXXX)
# Clean up on any exit — normal, error, or interrupt.
# Using -rf is safe because $tmp was just created by mktemp.
trap 'rm -rf "$tmp"' EXIT
# Use the temp dir freely
ansible-playbook site.yml --check > "$tmp/check.log" 2>&1
grep -c changed= "$tmp/check.log"
tar -C "$tmp" -xzf /var/cache/artefacts/build.tgz
cp "$tmp/config.yaml" /etc/myapp/
Why this specific shape:
mktemp -dcreates a directory with0700perms, so other users can't read your scratch files — important when handling secrets or tokens.-t deploy.XXXXXXgives the path a recognisable prefix when debugging (/tmp/deploy.abc123) while the random suffix avoids collisions.- The
EXITpseudo-signal fires for every termination path, includingkill -TERMfrom systemd, so the temp dir never leaks on crashes. - Quote
"$tmp"— a path containing spaces will destroy the wrong directory if you forget.
To trap multiple signals without losing your cleanup, stack them: trap 'rm -rf "$tmp"' EXIT INT TERM HUP. On most systems EXIT alone is enough because the other signals cause the shell to exit, which in turn runs the EXIT trap.
printf vs echo
echo is convenient but portability-hostile: the behaviour of echo -n, echo -e, and backslash escapes differs between bash, dash, BusyBox, and the POSIX spec. A script that works in your interactive bash shell can emit literal "-n" when run under /bin/sh on Alpine or from a Dockerfile RUN.
# Portability quirks of echo
echo -n "hello" # bash: no newline. dash: prints "-n hello".
echo -e "a\tb" # bash: literal tab. dash: prints "-e a\tb".
echo "$var" # fine everywhere — if $var doesn't start with "-".
# printf is defined by POSIX and behaves identically everywhere
printf 'hello' # no newline, no surprises
printf '%s\n' "$var" # always one value, one newline
printf '%-20s %s\n' "$user" "$id" # formatted columns
# Multi-line output — printf handles it without -e
printf 'Starting deploy\n host: %s\n env: %s\n' "$host" "$env"
Use printf '%s\n' "$x" instead of echo "$x" in any script that might run under a non-bash sh (cron on Debian, Alpine containers, initramfs scripts, Make recipes). Reserve echo for quick interactive one-liners where portability doesn't matter.