Debugging "Failed to start" systemd Services

Debugging "Failed to start" systemd Services

What You'll Learn

  • Why systemctl start fails with Failed to start and how to isolate the cause
  • Where to look in status and journalctl
  • The usual culprits: exit codes, the ExecStart path, permissions, dependencies, and start-limit

Quick Summary (the diagnostic order)

Almost every Failed to start resolves with this flow. Work top to bottom.

  1. Read systemctl status for state and the Result line
  2. Read the raw failure log with journalctl -xeu
  3. Read the exit code (203/EXEC, 200/CHDIR, 217/USER have systemd-specific meaning)
  4. Check the ExecStart path, exec permission, user, and working directory
  5. Rule out dependencies, start-limit, and a missing daemon-reload after editing

Assumptions (target environment)

  • A systemd-based distro (Ubuntu / Debian / RHEL / CentOS / Fedora, etc.)
  • myapp.service is used as the example name; substitute your own
  • System services (root-managed) are the focus. For user services, add --user

Where Do I Start Looking?

Conclusion: Start with systemctl status <service>. The Active: state, the Main PID exit code, and the last ~10 log lines usually point you at the cause. Whether it says failed or activating (auto-restart) splits the path.

$ systemctl status myapp
× myapp.service - My Application
     Loaded: loaded (/etc/systemd/system/myapp.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Fri 2026-06-05 10:00:01 UTC; 5s ago
   Main PID: 12345 (code=exited, status=203/EXEC)
        CPU: 4ms

Jun 05 10:00:01 host systemd[1]: myapp.service: Main process exited, code=exited, status=203/EXEC
Jun 05 10:00:01 host systemd[1]: myapp.service: Failed with result 'exit-code'.

What to read:

  • Loaded: — the unit path and enabled / disabled. If it shows not-found, the unit itself isn't being found.
  • Active:failed means it started and died. activating (auto-restart) means it's stuck in a restart loop.
  • Result:exit-code (non-zero exit) / timeout (start didn't complete in time) / signal (killed by a signal) / start-limit-hit (restarted too often).
  • status=NNN/NAME — the exit code. As shown below, 2xx values carry systemd-specific meaning.

The status output truncates the trailing logs to the terminal width. Read the full text with journalctl. The Result: value is your first branch point.

How Do I Read the Failure Log with journalctl?

Conclusion: journalctl -xeu <service> is the key command. -u limits to the service, -e jumps to the end, -x adds systemd's explanatory hints. The app's own error (command not found, Permission denied, bind: address already in use) lands here.

# Read the tail, limited to the service (most common)
$ journalctl -xeu myapp

# Narrow to the last few minutes
$ journalctl -u myapp --since "5 min ago"

# Limit to the current boot
$ journalctl -b -u myapp

For a systemd-origin code like status=203/EXEC, the application's own error message often appears only in the journal. Cross-check both.

When the log is empty or stale:

  • Missing daemon-reload: you edited the unit but it wasn't applied (see below).
  • Clock skew: if --since behaves oddly, suspect the server time.
  • User services: use journalctl --user -u myapp. It won't appear in root's journal.

What Do exit codes 203 / 200 / 217 Mean?

Conclusion: systemd assigns dedicated exit codes 200-243 to failures during start-up setup. The common ones are 203/EXEC (executable missing or not executable), 200/CHDIR (WorkingDirectory does not exist), and 217/USER (the User= account does not exist). They are distinct from the generic codes an application returns.

Read status=NNN/NAME on the Main PID line. A 2xx value signals "systemd failed before the app ran," which nearly always pins the cause to the unit configuration.

status Name Typical cause
203/EXEC EXEC Wrong ExecStart path / no exec permission / bad shebang
200/CHDIR CHDIR The WorkingDirectory= directory does not exist
217/USER USER The user named in User= does not exist
1+ (app) A generic error from the app itself; read the journal body
# Isolating 203/EXEC: check the path and exec permission
$ systemctl cat myapp | grep ExecStart
ExecStart=/opt/myapp/bin/server --config /etc/myapp.conf

$ ls -l /opt/myapp/bin/server      # exists? has the x bit?
$ head -1 /opt/myapp/bin/server    # if a script, check the shebang

ExecStart must begin with an absolute path. A bare server or a PATH-dependent name is not allowed. The equivalent of command not found surfaces as 203/EXEC.

How Do I Check the Unit's Content and Syntax?

Conclusion: Don't read the original file you edited; read what is actually in effect with systemctl cat <service>. It also merges any drop-in (*.d/*.conf) overrides. Validate syntax mechanically with systemd-analyze verify.

# Show the effective unit, including drop-ins
$ systemctl cat myapp

# Validate the unit's syntax and references
$ systemd-analyze verify /etc/systemd/system/myapp.service

systemd-analyze verify warns about unknown directives, unresolvable dependencies, and a missing ExecStart. No output means no syntax problem.

Common configuration mistakes:

  • Type= mismatch: setting Type=forking for a process that stays in the foreground makes systemd wait for a child that never comes, then time out. If your process does not fork and daemonize, use Type=simple (the default).
  • Relative ExecStart: as above, an absolute path is required.
  • Missing environment variables: an interactive shell's .bashrc is not read. Set them explicitly with Environment= or EnvironmentFile=.

If you do use Type=forking, add a PIDFile= as well. Without it, systemd may track the wrong main process and report active while the real process has already died. When in doubt, start from Type=simple.

Why Doesn't My Edit Take Effect?

Conclusion: systemd caches unit files in memory. If you don't run systemctl daemon-reload after editing, it starts with the old definition. This is the classic "I fixed it but get the same error" cause.

$ sudo vim /etc/systemd/system/myapp.service
$ sudo systemctl daemon-reload      # <- skip this and your edit is ignored
$ sudo systemctl restart myapp

When daemon-reload is missing, systemctl cat shows the edited content while the start-up behavior still uses the old definition — a confusing mismatch. Make edit -> daemon-reload -> restart a single habit.

Instead of editing the unit with vim directly, use systemctl edit myapp (creates a drop-in) or systemctl edit --full myapp (edits the whole unit). On save it runs the equivalent of daemon-reload automatically, structurally preventing the missed-reload mistake.

Isolating Permissions, Dependencies, and Timeouts

Conclusion: When the app runs by hand but fails as a service, the usual causes are insufficient privileges for the run user, a dependency service not being up yet, or a start-up timeout. Check User= permissions, After=/Requires=, and TimeoutStartSec in order.

Permissions (works by hand, Permission denied as a service)

systemctl start runs as User= (root by default). If that differs from the user you tested with, file, port, or socket access can fail with Permission denied.

# Reproduce manually as the service's run user
$ sudo -u myappuser /opt/myapp/bin/server --config /etc/myapp.conf

If the error reproduces, the cause is on the app/permission side. If not, suspect the unit configuration. For permission basics, see Fixing Permission denied.

Dependencies (a required service isn't up yet)

If the service needs a DB or network-online but the ordering isn't guaranteed, it dies on a connection failure right after start.

[Unit]
After=network-online.target postgresql.service
Wants=network-online.target

After= controls ordering only; Requires=/Wants= express the dependency. Revisit this for "the target isn't there yet" failures.

Timeouts (Result: timeout)

If status shows timeout, the service didn't signal "start complete" within the default 90 seconds. For services with heavy initialization, raise TimeoutStartSec= or use Type=notify to explicitly signal readiness.

How Do I Handle the "start request repeated too quickly" Loop?

Conclusion: After a set number of failures in a short window (default StartLimitBurst=5 within StartLimitIntervalSec=10s), systemd suppresses further starts and reports start-limit-hit. Fix the root cause, then clear the counter with systemctl reset-failed.

myapp.service: Start request repeated too quickly.
myapp.service: Failed with result 'start-limit-hit'.

This message is a result, not the cause. The real reason is in the preceding failure logs. Steps:

# 1) Scroll back to the real failure reason
$ journalctl -xeu myapp

# 2) Fix the cause (ExecStart / permissions / dependencies, etc.)

# 3) Clear the failure counter, then start
$ sudo systemctl reset-failed myapp
$ sudo systemctl start myapp

reset-failed only clears the counter; it does not fix the cause. If you don't address the failure first, you re-enter the same loop and start-limit-hit returns. Keep the order.

Diagnostic Checklist

Conclusion: Work top to bottom — status -> journalctl -> exit code -> unit content -> daemon-reload -> permissions/dependencies -> start-limit — and you'll pin down nearly any Failed to start.

Check each in order.

  • [ ] Read Active: / Result: / status=NNN from systemctl status myapp
  • [ ] Checked the app's own error in journalctl -xeu myapp
  • [ ] Identified the exit code (203/EXEC, 200/CHDIR, 217/USER are unit-config issues)
  • [ ] Confirmed the effective unit with systemctl cat myapp; ExecStart is an absolute path
  • [ ] Validated syntax with systemd-analyze verify
  • [ ] Ran systemctl daemon-reload after editing the unit
  • [ ] Type= matches the process behavior (whether it forks)
  • [ ] Reproduced with sudo -u <User> to isolate permissions, dependencies, and timeouts
  • [ ] Cleared start-limit-hit with reset-failed only after fixing the cause

Next Reading