We have an Azure Linux Virtual Machine that has suddenly become inaccessible.

Vipul Om 0 Reputation points
2026-06-17T09:46:26.8133333+00:00

Symptoms:

  • SSH connection to the VM times out.
  • HTTP/HTTPS services hosted on the VM are not accessible.
  • Azure Run Command hangs and does not complete.
  • Redeploy + Reapply has already been attempted but did not resolve the issue.
  • A snapshot of the OS disk has been taken before troubleshooting.

Network checks performed:

  • VM power state is "Running".
  • NSG inbound rules allow ports 22, 80, and 443.
  • Public IP is attached to the VM.
  • Connectivity from the internet still times out.

Serial Console / Boot Diagnostics:

  • The Serial Console shows a Linux kernel panic with: "Kernel panic - not syncing: Attempted to kill init!"

Azure Agent logs also repeatedly report:

"WALinuxAgent launched with command 'python3 -u /usr/sbin/waagent -run-exthandlers' failed with exception: [Errno 2] No such file or directory: 'python3'"

Business impact:

  • This VM hosts live production data and services.
  • The application is currently unavailable to users.
  • We need to recover the existing VM and data with minimal downtime and without data loss if possible.

We request assistance with:

  1. Determining the root cause of the kernel panic.
  2. Restoring access to the VM (SSH or console).
  3. Recovering the operating system and Azure VM Agent.
  4. Advising on the safest recovery procedure while preserving existing data.
Azure Virtual Machines
Azure Virtual Machines

An Azure service that is used to provision Windows and Linux virtual machines.

0 comments No comments

3 answers

Sort by: Most helpful
  1. Manish Deshpande 7,515 Reputation points Microsoft External Staff Moderator
    2026-06-29T11:28:23.9866667+00:00

    Hello @Vipul Om

    Thanks for the detailed write-up you've given us exactly what we need to diagnose this. Good news: the root cause is clear, and there's a structured recovery path that preserves your data.

    Root Cause: Two Related Issues

    The Serial Console output tells the whole story:

    1.Kernel panic - not syncing: Attempted to kill init! This is the primary cause of the outage. The init process (systemd/PID 1) crashed during boot, which is why SSH, HTTP, and Run Command all fail — the OS never fully starts. This is commonly triggered by a missing or corrupted initramfs, a failed kernel/package update, or corrupted system core libraries.

    2.WALinuxAgent `[Errno 2] No such file or directory: 'python3' This is a symptom, not a separate problem. Python3 being missing points to broader system file corruption, consistent with a bad package update or incomplete OS upgrade that left critical binaries in a broken state.

    Since Redeploy + Reapply didn't help (expected — those don't fix OS-level corruption), the right path is an offline disk repair.

    Recovery Plan — Start Here (Fastest Path)

    Step 1: Try ALAR Automated Repair First

    Microsoft's Azure Linux Auto Repair (ALAR) tool can automatically regenerate a missing initramfs and GRUB configuration in an unattended mode — no manual disk swap required. Run the following

    User's image

    This creates a temporary repair VM, fixes the initramfs/GRUB config on a copy of your OS disk, swaps the disk back, and cleans up the repair VM automatically. **Your data is not touched.

    Recover Azure Linux VM from kernel-related boot issues
    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/kernel-related-boot-issues

    Step 2: If Step 1 Doesn't Resolve It — ALAR Kernel Repair

    If the issue is specifically with a broken kernel (not just missing initramfs), use the kernel rollback option:

    User's image This replaces the broken kernel with the previously installed working version.

    Use Azure Linux Auto Repair (ALAR) to fix a Linux VM
    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/repair-linux-vm-using-alar

    Step 3: If ALAR Doesn't Fully Resolve It — Manual Repair VM**

    If the corruption goes deeper (missing core system libraries, which the python3 error hints at), you'll need to go in manually via a repair VM and chroot:

    1. Create a repair VM with the OS disk attached:
    az vm repair create -g <YourResourceGroup> -n <YourVMName> \
      --repair-username repairadm --repair-password 'Repair@1234!'
    
    1. SSH into the repair VM, enter the chroot environment, and investigate:

    User's image

    1. Reinstall missing packages if identified (inside chroot), regenerate initramfs, and update GRUB:
    # Regenerate initramfs (Ubuntu example)
    update-initramfs -u -k all
    update-grub
    # Regenerate initramfs (RHEL/CentOS example)
    dracut -f --regenerate-all
    grub2-mkconfig -o /boot/grub2/grub.cfg
    
    1. Restore the disk back to the original VM:
    az vm repair restore -g <YourResourceGroup> -n <YourVMName>
    

    Repair a Linux VM by using the Azure Virtual Machine repair commands
    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/repair-linux-vm-using-azure-virtual-machine-repair-commands

    Troubleshoot a Linux VM by attaching the OS disk to a recovery VM
    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/troubleshoot-recovery-disks-portal-linux

    After the VM Boots Fix WALinuxAgent

    Once the OS is back up, fix the python3/WALinuxAgent issue:

    User's image

    Troubleshoot the Azure Linux Agent
    https://learn.microsoft.com/en-us/troubleshoot/azure/virtual-machines/linux/linux-azure-guest-agent

    Last Resort: Restore from Your Snapshot

    You mentioned a snapshot was taken before troubleshooting. If all repair attempts fail, you can restore from that snapshot to get the VM back quickly, then investigate the root cause separately on a copy.

    How to restore Azure VM data in Azure portal
    https://learn.microsoft.com/en-us/azure/backup/backup-azure-arm-restore-vms

    Recommended Order of Actions

    User's image

    Start with Step 1 it handles the majority of kernel panic scenarios automatically and is the least invasive option.

    The data on the disk is safe throughout all these steps.

    Hope this gets you back online quickly!

    Thanks,
    Manish.

    Was this answer helpful?


  2. SUNOJ KUMAR YELURU 18,336 Reputation points MVP Volunteer Moderator
    2026-06-18T17:28:17.0766667+00:00

    Hello @Vipul Om,

    Thank you for reaching out Q&A forum.

    Step 1: Check the VM Power State

    In the Azure Portal → Virtual Machines → your VM → Overview, confirm the Status field.

    Status Meaning

    Running OS-level or network issue

    Stopped (deallocated) VM was stopped — just start it

    Failed Platform-level provisioning failure

    Step 2: Open Boot Diagnostics (Serial Console)

    Azure Portal → VM → Support + troubleshooting → Boot diagnostics → Serial log

    This is the fastest way to determine the root cause:

    What you see in the serial log Likely cause

    Kernel panic - not syncing OS/kernel corruption — needs rescue VM

    GRUB menu stuck Boot loader issue

    Give root password for maintenance fstab mount failure

    Started Session / login prompt OS booted fine → network/NSG issue

    cloud-init errors VM agent / provisioning failure

    Blank / no output Host-level issue or GPU/disk failure

    Step 3:

    Kernel Panic or filesystem errors → OS-level recovery needed

    Detach the OS disk and attach it to a Rescue VM as a data disk

    Run fsck, audit /etc/fstab, check /boot for missing initramfs

    See the detailed recovery steps in my previous response above

    VM appears booted but SSH/HTTP still times out → Network issue

    Check these in order:

    NSG rules — Does an inbound rule allow port 22 (SSH) or 80/443 from your source IP? (Not just "any")

    Azure Firewall / UDR — Is there a route table sending traffic to a firewall that may be blocking it?

    sshd service — Use Azure Run Command (if available) to check:

    bash

    systemctl status sshd

    Host-based firewall — Check iptables or firewalld rules inside the VM via Run Command

    VM was recently resized, updated, or a disk was attached → Configuration change issue

    Review Azure Activity Log (Portal → VM → Activity log) for any recent changes

    Check if a failed apt upgrade / yum update broke a kernel or removed Python 3


    If this answers your query, do click Accept Answer and Up-Vote for the same. And, if you have any further query do let us know.

    Was this answer helpful?

    0 comments No comments

  3. Léo 0 Reputation points
    2026-06-17T14:55:24.8433333+00:00

    Hello,

    The issue may come from a broken initramfs or python3 not installed (which can break initramfs)

    You can try to access the VM via Serial Console (bypasses NSG and SSH timeouts) then apply these steps:

    1. When VM is booting up, repeatedly press ESC or Shift to show up GRUB
    2. Select Recovery Mode if available, if not, select Previous Kernel Version
    3. Fix python3/initramfs installation :
      apt update apt install --reinstall python3 python3-minimal initramfs-tools
    4. Update initramfs: update-initramfs -u -f reboot

    If kernel panic still appears after trying to enter Recovery Mode/Previous Kernel in GRUB, the initramfs is fully broken. You should create a rescue VM since you made a snapshot of the disk.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.