How to Safely Unit Test Shell Scripts from LLMs
So, you just got a shiny new shell script from ChatGPT (or Copilot, or your favorite AI buddy). It looks legit. It even feels right. But then that creeping doubt sets in:
"Wait… is this thing safe to run on production?"
Welcome to the world of unit testing shell scripts generated by LLMs — where the stakes are high, sudo
is dangerous, and one wrong rm -rf
can ruin your whole day.
In this post, we'll walk through a battle-tested way to safely test and validate scripts that manage real services like PM2, Docker, Nginx, or anything that touches system state.
#
The Problem With Trusting LLM Shell Scripts
Large Language Models like ChatGPT are awesome at generating quick shell scripts. But even the best LLM:
- Can make assumptions about your environment
- Might use the wrong binary name (like
pgrep -x PM2
instead ofpm2
) - Can forget that
systemctl restart docker
isn't always a no-op
Even if the logic is 90% correct, that 10% can:
- Restart your services at the wrong time
- Write to incorrect log paths
- Break idempotency (runs that shouldn't change state do)
According to a recent study on AI-generated code, about 15% of LLM-generated shell scripts contain potentially dangerous commands when run in production environments.
--dry-run
Mode#
Strategy 1: Add a Every LLM-generated script should support a --dry-run
flag. This lets you preview what the script would do — without actually doing it.
Here's how you add it:
This pattern gives you traceable, reversible operations.
For more advanced dry-run implementations, check this guide.
#
Strategy 2: Mock External CommandsYou don't want docker restart
or pm2 resurrect
running during testing. You can override them like this:
Now, any call to docker
will echo a harmless line instead of nuking your containers. Symlink other dangerous binaries like systemctl
, pm2
, and rm
as needed.
This technique is borrowed from Bash Automated Testing System (BATS), which uses mocking extensively.
shellcheck
#
Strategy 3: Use LLMs sometimes mess up quoting, variables, or command usage. ShellCheck
is your best friend.
Just run:
And it'll tell you:
- If variables are unquoted (
"$var"
vs$var
) - If commands are used incorrectly
- If your
if
conditions are malformed
It's like a linter, but for your shell’s sanity.
#
Strategy 4: Use Functions, Not One Big BlobBreak your script into testable chunks:
Now you can mock and call these functions directly in a test harness without running the whole script. This modular approach mirrors modern software testing principles.
#
Strategy 5: Log Everything. Seriously.Log every decision point. Why? Because "works on my machine" isn't helpful when the container didn't restart or PM2 silently failed.
#
Strategy 6: Test in a SandboxIf you've got access to Docker or a VM, spin up a replica and try running the script in that environment. Better to break a fake server than your actual one.
Try:
Check this Docker-based testing guide
#
Bonus: Tools You Might Love
- BATS: Bash unit testing framework
- shunit2: xUnit-style testing for POSIX shell
- assert.sh: dead-simple shell assertion helper
- shellspec: full-featured, RSpec-like shell test framework
#
Final Thoughts: Don't Just Run It — Test It
It's tempting to copy-paste that LLM-generated shell script and run it. But in production environments — especially ones with critical services like PM2 and Nginx — the safer path is to test before trust.
Use dry-run flags. Mock your commands. Run scripts through shellcheck
. Add logging. Test in Docker. Break things in safe places.
With these strategies, you can confidently validate AI-generated shell scripts and ensure they behave as expected before hitting your production servers.
Nife, a hybrid cloud platform, offers a seamless solution for deploying and managing applications across edge, cloud, and on-premise infrastructure. If you're validating shell scripts that deploy services via Docker, PM2, or Kubernetes, it's worth exploring how Nife can simplify and secure that pipeline.
Its containerized app deployment capabilities allow you to manage complex infrastructure with minimal configuration. Moreover, through features like OIKOS Deployments, you gain automation, rollback support, and a centralized view of distributed app lifecycles — all crucial for testing and observability.