FPGA Administration

Overview

This guide pertains to the AMD/Xilinx Alveo U55C FPGAs used on the NRP cluster.

After the 2026-05 hardware migration the FPGAs were consolidated onto fewer hosts. The cards previously distributed across node-1-1..4 and node-2-1..4 (with their JTAG cables routed to node-2-10) were moved to dense FPGA hosts. All JTAG cables are now plugged into the same host as the card itself — there is no longer a JTAG concentrator.

Current FPGA inventory spreadsheet: the live mapping (serial / iSerial / node / PCI BDF) is regenerated by the discovery tooling in the fpga-testing repo. The committed reference is at FPGA Inventory (Google Sheets).

Current FPGA hosts

Host	Cards	XRT version	Shell	Notes
`node-2-6.sdsc.optiputer.net`	2	2.15.225 (2023.1)	`xilinx_u55c_gen3x16_xdma_base_3`	also has one non-Alveo FT232 (USB-serial console)
`node-2-7.sdsc.optiputer.net`	2	2.15.225	`xilinx_u55c_gen3x16_xdma_base_3`
`node-2-8.sdsc.optiputer.net`	2	2.15.225	`xilinx_u55c_gen3x16_xdma_base_3`
`node-2-9.sdsc.optiputer.net`	1	2.16.204 (2023.2)	`xilinx_u55c_gen3x16_xdma_base_3`
`node-2-10.sdsc.optiputer.net`	1	2.16.204	`xilinx_u55c_gen3x16_xdma_base_3`	was the old JTAG concentrator; only 1 local card today
`node-2-11.sdsc.optiputer.net`	2	2.16.204	`xilinx_u55c_gen3x16_xdma_base_3`	flashed from custom shell to stock 2026-05-13
`k8s-stratix-10-02.sdsc.optiputer.net`	7	2.16.204	`xilinx_u55c_gen3x16_xdma_base_3`	joined to cluster on 2026-06-03
`prp-gpu-2.t2.ucsd.edu`	6	2.19.194 (2025.1)	`xilinx_u55c_gen3x16_xdma_base_3`	Ubuntu 24.04 / kernel 6.8 — must use XRT 2024.x+; older XRT will not build
`k8s-stratix-10-01.sdsc.optiputer.net`	—	—	—	OFFLINE as of writing — likely holds the 9 unaccounted-for cards from the inventory

Total: 23 paired cards online, plus the still-offline k8s-stratix-10-01 accounting for the remainder of the original 32.

Services on every FPGA host

Four cluster-level components together make an FPGA host useful. The first two (XRT and xilinx-device-plugin-daemonset) are strictly required — without them the host doesn’t advertise cards and no FPGA-bearing pod can run on it. KubeVirt virt-handler is required if you want pods or VMs to access JTAG via xilinx.com/fpga_jtag; if your only workloads are .xclbin programming via the PCIe-side resource you can technically skip it. smarter-device-manager is optional — strictly speaking the Vivado / .xclbin / JTAG flow does not need it, but specific workflows like ESnet SmartNIC sn-cli (UART-only access to the satellite controller) and VFIO PCIe passthrough do require it. In practice we run all four on every FPGA host so users don’t have to remember per-host capability differences.

1. XRT (Xilinx Runtime) installed on the host OS — required

XRT is not containerised; it must be installed in the host’s userland. Two reasons:

It ships the xclmgmt and xocl kernel modules (built via DKMS at install time). Without these, the cards are present in lspci but no driver is bound: /sys/bus/pci/drivers/xclmgmt/ doesn’t exist, the FPGAs are invisible to anything that doesn’t poke raw PCIe, and the device plugin DaemonSet won’t find any cards to advertise.
It ships the xbutil/xbmgmt host-side userspace tools and the libraries the device plugin links against. The device plugin reads the cards’ XMC serials, status, and platform UUIDs by talking to XRT via the same shared libraries (libxrt_core/libxrt_coreutil); a containerised plugin can do that against the host’s /dev/xclmgmt* only because XRT laid down those device nodes.

Required minimum: a working /opt/xilinx/xrt/setup.sh, lsmod | grep -E "^xocl|^xclmgmt" showing both modules, and xbutil examine reporting Device Ready: Yes for each card. Version per host listed in the table above; see XRT installation below for which deb to use on which Ubuntu release.

2. `xilinx-device-plugin-daemonset` (namespace `kube-system`) — required

This DaemonSet is what registers the FPGAs themselves with kubelet. Without it, the host can have XRT and 7 happy cards and kubectl describe node still shows zero amd.com/xilinx_u55c_* resources, so no pod can ever schedule onto them. Specifically:

Runs the AMD k8s-device-plugin binary (public.ecr.aws/xilinx_dcg/k8s-device-plugin:1.1.0), which talks the k8s device-plugin gRPC protocol over the socket /var/lib/kubelet/device-plugins/xilinx_u55c_gen3x16_xdma_base_3-0-fpga.sock.
Reads card inventory by enumerating /sys/bus/pci/drivers/xclmgmt/ and walking each card’s XMC serial_num. (You can see this in the pod logs: Check SeialNums arry: [XFL1H4XIZQLE XFL1GHBRTQ42] … Sending 2 device(s) [0000:a1:00.1, 0000:21:00.1] to kubelet.)
Advertises one resource: amd.com/xilinx_u55c_gen3x16_xdma_base_3-0, with one count per ready card. When a pod requests N of it, the plugin allocates specific PCIe BDFs and tells kubelet to mount the matching /dev/xclmgmt* / /dev/dri/renderD* into the container.

The DaemonSet has a hardcoded nodeAffinity host list in addition to the nodeSelector: fpga=true — both have to permit the node or the plugin won’t run there. See Kubernetes integration for the two-step onboarding ritual.

3. `smarter-device-manager` (namespace `kube-system`) — optional, but install it anyway

This one is not needed for the standard FPGA flow — loading .xclbins through xbutil program --user, flashing via xilinx.com/fpga_jtag (KubeVirt), running Vivado, etc. all work without it. It’s needed for two specific workflows that share an FPGA host:

ESnet SmartNIC sn-cli — talks to the on-card satellite controller over UART (/dev/ttyUSBN), not raw USB. Without smarter-device-manager, the esnet Coder template (and deploy-esnet in the templates repo) can’t allocate smarter-devices/ttyUSB* and ESnet pods fail to schedule.
VFIO PCIe passthrough from regular pods (DPDK-style). Needs smarter-devices/vfio (/dev/vfio group device).

It also exposes smarter-devices/fuse, which is not FPGA-specific.

Because the cost of running the DaemonSet is tiny and we have ESnet users on these hosts, install it on every FPGA host. The fleet-wide install just means labelling: kubectl label node <fqdn> smarter-device-manager=enabled --overwrite.

The smarter-device-manager DaemonSet exposes specific /dev/... files as schedulable k8s resources via the same device-plugin gRPC protocol. For the FPGA side:

smarter-devices/ttyUSB0, ttyUSB1, ttyUSB5, ttyUSB10, ttyUSB11, ttyUSB15 — the FT4232H UART channels. Each U55C’s onboard FTDI exposes four UARTs as /dev/ttyUSBN. Lets pods talk to the cards’ satellite controllers over UART (used by ESnet SmartNIC’s sn-cli, by xsdb’s serial backend, by anything talking to the SC for serial console).
smarter-devices/vfio — the /dev/vfio group device. Required for any pod doing VFIO PCIe passthrough.

The configmap is in kube-system/smarter-device-manager (a single conf.yaml with devicematch: regexes; the FPGA regex is ^ttyUSB[0-15]*$, which is why only the names listed above are advertised — the regex character class is buggy but intentional today).

Note: xilinx.com/fpga_jtag (KubeVirt) provides raw USB at /dev/bus/usb/<bus>/<dev>, which is enough for any JTAG operation including reading the SC via libftdi. So a user who needs both JTAG TAP and SC UART can use the KubeVirt resource alone and bypass smarter-devices/ttyUSB* entirely — smarter-devices/ttyUSB* is only the right answer when the pod wants the kernel-cooked tty interface (e.g. picocom /dev/ttyUSB1) without raw-USB privileges.

4. KubeVirt `virt-handler` (namespace `kubevirt`) — for `xilinx.com/fpga_jtag` — required if you want JTAG access from pods/VMs

This one isn’t FPGA-specific (it’s KubeVirt’s normal node agent), but it’s the component that actually registers xilinx.com/fpga_jtag with kubelet, based on the cluster’s KubeVirt CR permittedHostDevices.usb config:

permittedHostDevices:
  usb:
  - resourceName: xilinx.com/fpga_jtag
    selectors:
    - vendor: "0403"
      product: "6011"

A pod that requests xilinx.com/fpga_jtag: 1 gets /dev/bus/usb/<bus>/<dev> for one of the FT4232H devices on the host — the raw-USB device file needed for JTAG operations (Vivado hw_server, OpenOCD, xbmgmt program). Without virt-handler, the resource is simply not advertised; without the permittedHostDevices.usb entry, the resource exists but matches no USB devices.

Adding a per-iSerial resource (rare; only for KubeVirt VMs that want to pin to a specific card). Regular Pods can’t pin to a specific FT4232H by iSerial — the generic xilinx.com/fpga_jtag resource is a pool keyed only on vendor:0403/product:6011. For most users that’s fine: the user-doc Example 3 shows how to read the allocated cable’s iSerial at runtime and pick the matching FPGA BDF in the pod’s code.

The exception is KubeVirt VMs, where the VM has to bind the USB device at boot — the runtime-pairing trick doesn’t apply because there’s no startup script that can “pick” between two attached USB devices. If a user files an nrp-help ticket asking for a VM with a specific card by serial, add a per-serial entry to the KubeVirt CR:

kubectl edit kubevirt -n kubevirt kubevirt

Append (or insert alongside the existing xilinx.com/fpga_jtag entry) under .spec.configuration.permittedHostDevices.usb:

- resourceName: xilinx.com/fpga_jtag_XFL1GHBRTQ42        # <- card's iSerial
  selectors:
  - vendor:  "0403"
    product: "6011"
    serial:  "XFL1GHBRTQ42"

After the edit, virt-handler picks it up automatically. Verify the new resource appears on the node that has that card:

HOST=node-2-7.sdsc.optiputer.net
kubectl get node "$HOST" -o jsonpath='{.status.allocatable}' | jq 'with_entries(select(.key | test("fpga_jtag")))'
# should now include "xilinx.com/fpga_jtag_XFL1GHBRTQ42": "1"

Tell the user to reference the per-serial resource in their VM’s spec.template.spec.domain.devices.hostDevices.deviceName. Don’t remove the generic xilinx.com/fpga_jtag entry — leaving it lets other pods/VMs still get unspecified-card allocations.

Summary: dependency for what

Component	Status	Without it you lose…
XRT (host)	required	`xclmgmt`/`xocl` modules; everything below depends on this
`xilinx-device-plugin-daemonset`	required	The `amd.com/xilinx_u55c_*` resource → no FPGA pods at all on the node
KubeVirt `virt-handler` + CR	required for JTAG access	`xilinx.com/fpga_jtag` → no JTAG TAP access from pods/VMs (Vivado `hw_server`/OpenOCD/`xbmgmt program`)
`smarter-device-manager`	optional (recommended; needed for ESnet `sn-cli` and VFIO)	`smarter-devices/ttyUSB*` → no UART-only access from pods; `smarter-devices/vfio` → no VFIO

Xilinx FlexLM license server (`xilinx-dev` namespace)

Vivado, Vitis, and the AMD/Xilinx IP cores users build with on the cluster are gated by FlexLM licenses. We run a single in-cluster lmgrd that all Vivado/Vitis pods point at via [email protected]. This section is what cluster admins need to keep that server running and the license current.

What’s deployed

Object	Namespace	Purpose
`Deployment/xilinxd`	`xilinx-dev`	One pod running `lmgrd -c /etc/xilinx/xilinx.lic -z` (FlexLM license daemon + `xilinxd` vendor daemon)
`Service/xilinxd` (ClusterIP)	`xilinx-dev`	Exposes ports 2100 (lmgrd), 27000 (vendor daemon), 6978 (alt vendor port)
`ConfigMap/xilinx-lic` (key `xilinx.lic`)	`xilinx-dev`	The actual FlexLM license file; mounted at `/etc/xilinx/xilinx.lic` in the pod
`Secret/regcred`	`xilinx-dev`	Pull secret for the private `gitlab-registry.nrp-nautilus.io/nrp/xilinxd` image

The DNS name xilinxd.xilinx-dev resolves (cluster-internal) to the service ClusterIP. So any pod in any namespace can use the standard FlexLM port-at-host form: [email protected].

Why the MAC address is pinned

The deployment’s container command starts with ifconfig eth0 hw ether b6:e1:09:31:ba:0e. Do not change this. FlexLM licenses from AMD are tied to the host’s MAC (“hostid”), and the license file’s SERVER line is:

SERVER xilinxd b6e10931ba0e 2100

If the pod’s eth0 MAC doesn’t match b6e10931ba0e, lmgrd will refuse to serve and every Vivado client will report “Cannot find SERVER hostname in network database.” That’s also why the deployment carries securityContext.capabilities.add: ["NET_ADMIN"] — it needs CAP_NET_ADMIN to rewrite the eth0 MAC on each pod start.

Concretely: if you ever rebuild the image, rescale, or move to a new node, the MAC override in the command is what keeps the license valid across reschedules. Don’t drop it; don’t replace it with a MAC=... env var unless you also adjust the entrypoint.

Updating the license (every ~90 days)

AMD’s evaluation/university-program licenses for the relevant IP cores expire in roughly 90-day cycles (the precise dates vary per feature; check the START=… and expiry dates in the current license). When lmgrd starts logging “license expired” or Vivado clients report Feature unavailable, follow this procedure:

Back up the current license file first so you have something to roll back to if the new one is malformed or has fewer features than the old one:
Terminal window
```
kubectl -n xilinx-dev get configmap xilinx-lic -o jsonpath='{.data.xilinx\.lic}' \
    > "$HOME/xilinx-lic-backup-$(date -u +%Y-%m-%d).lic"
```
Keep this on the admin workstation (or in your usual personal-backup location). The cluster has no canonical secret-backup pattern for this; a dated local copy is sufficient since rollback is just “re-apply the previous configmap.” This file is small (a few KB) — keep a couple of generations.
Get a new license from AMD’s website. Go to https://www.xilinx.com/getlicense (AMD Licensing Site). When asked for the host configuration, use:
- Host name: xilinxd
- Host ID type: Ethernet MAC
- Host ID: b6e10931ba0e (must match the pinned MAC above)
- Port: 2100
Download the .lic file AMD emails back. It must include the SERVER xilinxd b6e10931ba0e 2100 line and VENDOR xilinxd PORT=27000 (or USE_SERVER-style block).
Patch the configmap. From a workstation with cluster admin kubeconfig:
Terminal window
```
kubectl -n xilinx-dev create configmap xilinx-lic \
    --from-file=xilinx.lic=./new-xilinx.lic \
    -o yaml --dry-run=client \
  | kubectl apply -f -
```
(You can also edit it in place with kubectl -n xilinx-dev edit configmap xilinx-lic, but the from-file dance is less error-prone for a multi-line license body with \r\n line endings, which lmgrd is fussy about.)
Restart the daemon to pick up the new file. The pod mounts the configmap, but lmgrd reads the file once at startup; it doesn’t watch for changes:
Terminal window
```
kubectl -n xilinx-dev rollout restart deployment xilinxd
kubectl -n xilinx-dev rollout status  deployment xilinxd
```

Verify. From any pod in any namespace:

# quick TCP check
nc -vz xilinxd.xilinx-dev 2100

# full check inside a Vivado-enabled pod
export XILINXD_LICENSE_FILE=2100@xilinxd.xilinx-dev
source /tools/Xilinx/Vivado/2023.1/settings.sh
vlm     # Vivado License Manager — should list all available features and dates

And on the server side:

kubectl -n xilinx-dev logs deployment/xilinxd --tail=100
# look for "lmgrd tcp-port 2100" and "xilinxd: Server started on xilinxd"

Rotating the MAC (only if you really need to)

If the pinned MAC ever needs to change — e.g. AMD reissues against a different hostid — you must update both the SERVER line in the new license file and the ifconfig eth0 hw ether ... argument in the deployment’s container command. They must match exactly (lowercase, no separators) or lmgrd will not start. Roll the deployment, then re-verify with vlm from a client pod.

Troubleshooting

vlm shows “Cannot connect to license server system.” Service may not be resolving — kubectl -n xilinx-dev get svc xilinxd and nslookup xilinxd.xilinx-dev from a debug pod. If DNS is fine, check the lmgrd pod logs and nc -vz xilinxd.xilinx-dev 2100.
vlm connects but says “No such feature exists.” The license has been served but the specific feature (e.g. SDNET, v6_pcie, TPG) isn’t in it. Confirm with kubectl -n xilinx-dev get cm xilinx-lic -o yaml | grep INCREMENT. If the user’s feature is missing, request it from AMD University Program for the same hostid; don’t issue a brand-new server license.
lmgrd exits immediately with “Wrong hostid on SERVER line.” The MAC override was not applied. Confirm the container started with NET_ADMIN and check kubectl -n xilinx-dev exec deploy/xilinxd -- ip a show dev eth0 returns the pinned MAC.

PCIe Availability

Every working Alveo card appears twice in lspci — once as the management physical function (PF, ending in .0) and once as the user PF (.1):

nautilus@node-2-7:~$ lspci -d 10ee: -nD
0000:21:00.0 Processing accelerators: Xilinx Corporation Device 505c
0000:21:00.1 Processing accelerators: Xilinx Corporation Device 505d
0000:a1:00.0 Processing accelerators: Xilinx Corporation Device 505c
0000:a1:00.1 Processing accelerators: Xilinx Corporation Device 505d

505c is the management function (driven by xclmgmt).
505d is the user function (driven by xocl).
A card stuck in golden / recovery shell still shows the same PCI IDs but xbutil examine reports 0 devices found because the XMC subdevice isn’t loaded. Reflash with xbmgmt program --base (see Flashing below) and cold-reboot.

A “phantom” PCIe device showing (rev ff) is a sign of stale kernel state — typically a card that was physically pulled while the driver was bound. A cold (ipmitool power cycle) reboot clears it.

USB JTAG (FTDI) Availability

Each U55C has an on-board FTDI FT4232H that exposes JTAG over USB-C. The FTDI’s USB iSerial is identical to the card’s XMC serial number — the same string from lsusb and from /sys/bus/pci/drivers/xclmgmt/<BDF>/xmc.*/serial_num. That equality is what verify_jtag_serials.sh in fpga-testing/ exploits to validate the cable-to-card mapping.

Quick checks from the host:

# count Alveo JTAG cables
lsusb | grep -c "Future Technology Devices International, Ltd FT4232H"

# Alveo card serial via XRT-loaded sysfs (requires the xclmgmt driver)
for d in /sys/bus/pci/drivers/xclmgmt/0000:*; do
  for sn in "$d"/xmc.*/serial_num; do
    [ -f "$sn" ] && echo "$(basename $d)  $(cat $sn)"
  done
done

Every FT4232H exposes four UART channels (/dev/ttyUSB0..3) per card, so 2 cards → 8 ttyUSBs, 7 cards → 28, etc. The smarter-device-manager DaemonSet surfaces these to k8s as smarter-devices/ttyUSB* resources (see below).

Kubernetes integration

Two DaemonSets cover FPGA workloads:

1. `xilinx-device-plugin-daemonset` (namespace `kube-system`)

Source: AMD’s k8s-device-plugin. Advertises each programmed FPGA as a schedulable resource.

Property	Value
Node selector	`fpga=true`
Node affinity	Hardcoded list of hostnames under `kubernetes.io/hostname`
Resource added	`amd.com/xilinx_u55c_gen3x16_xdma_base_3-0`
JTAG resource	`xilinx.com/fpga_jtag` (one count per Alveo card)
Container image	`public.ecr.aws/xilinx_dcg/k8s-device-plugin:1.1.0`

To onboard a new FPGA host into the device plugin you must do BOTH:

Label the node:

kubectl label node <fqdn> fpga=true --overwrite

Add the FQDN to the DaemonSet’s nodeAffinity hostname list:
Terminal window
```
kubectl -n kube-system get ds xilinx-device-plugin-daemonset -o json \
  | jq '.spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].values |= (. + ["<fqdn>"] | unique)' \
  | kubectl apply -f -
```
The label alone is not enough — the DaemonSet has both nodeSelector: fpga=true and a requiredDuringScheduling node affinity on kubernetes.io/hostname. Forgetting step 2 leaves DESIRED one short of the labelled node count.

2. `smarter-device-manager` (DaemonSet in `kube-system`)

Property	Value
Node selector	`smarter-device-manager=enabled`
Resources	`smarter-devices/ttyUSB0`, `smarter-devices/ttyUSB1`, `smarter-devices/ttyUSBN`, `smarter-devices/vfio`, etc.

Each smarter-devices/ttyUSBN resource defaults to a multi-allocation count of 16 (the same /dev/ttyUSBN can be claimed by up to 16 concurrent pods). This makes JTAG cables (/dev/ttyUSB0..3 per FPGA) reachable from interactive FPGA-dev pods.

To onboard a new FPGA host:

kubectl label node <fqdn> smarter-device-manager=enabled --overwrite

There is also a second DaemonSet smarter-device-manager/smarter-device-manager (no node selector) that runs cluster-wide — leave it alone; it doesn’t expose FPGA-specific resources.

Pod spec example (JTAG + Alveo card)

resources:
  limits:
    amd.com/xilinx_u55c_gen3x16_xdma_base_3-0: 1   # one whole FPGA
    smarter-devices/ttyUSB0: 1                     # one JTAG UART channel
    smarter-devices/ttyUSB1: 1
    smarter-devices/ttyUSB2: 1
    smarter-devices/ttyUSB3: 1

XRT installation

Three flavours of XRT are deployed today depending on the host’s Ubuntu version:

OS	XRT package	Notes
Ubuntu 22.04 (most hosts)	`xrt_2.16.204_amd64.deb` (2023.2) or `2.15.225` (2023.1)	Userspace bins built against 20.04 — extra libs required (see below)
Ubuntu 24.04 (`prp-gpu-2`)	`xrt_202510.2.19.194_24.04-amd64-xrt.deb` (2025.1)	Required for kernel 6.8; 2023.x DKMS will not compile on this kernel

The 22.04 XRT 2023.2 deb declares dependencies on Ubuntu-20.04 versions of libboost-*1.71.0, libssl1.1, libprotobuf17. Two workarounds are in use:

sudo dpkg -i --force-depends /tmp/xrt_2.16.204_amd64.deb
sudo tar xzf /tmp/xrt-extra-libs.tgz -C /usr/lib/x86_64-linux-gnu/   # libboost_filesystem.so.1.71.0, libboost_program_options.so.1.71.0
sudo tar xzf /tmp/ssl11.tgz         -C /usr/lib/x86_64-linux-gnu/   # libssl.so.1.1, libcrypto.so.1.1 (needed for xbmgmt + xclbinutil)
sudo ldconfig
sudo modprobe xclmgmt && sudo modprobe xocl

ssl11.tgz and xrt-extra-libs.tgz are the artifacts shipped to each FPGA host under /tmp/.

After install, smoke-test:

source /opt/xilinx/xrt/setup.sh
xbutil examine          # all cards report Device Ready: Yes
xbmgmt examine          # mgmt side; shows installed shells

Flashing a card

Required when:

xbutil examine reports 0 devices found but lspci -d 10ee: shows the cards.
xbmgmt examine -d <BDF> --report platform shows xilinx_u55c_recovery (the cards are sitting in golden/rescue mode).
A node was rebooted after the U55C platform deb was removed (e.g. as a side-effect of apt --fix-broken install).

One-time platform deb installation

The U55C XDMA shell is not in any apt repo; it must be installed from the AMD-supplied .tar.gz artifact. The 4 debs inside are:

xilinx-cmc-u55_*.deb                    # Card Management Controller firmware
xilinx-sc-fw-u55_*.deb                  # Satellite Controller firmware
xilinx-u55c-gen3x16-xdma-base_*.deb     # the deployable shell + xsabin builder
xilinx-u55c-gen3x16-xdma-validate_*.deb # validate xclbin

Install all four with dpkg -i --force-depends. The *-base*.deb’s postinst runs create_xsabin.sh, which invokes xclbinutil and requires libcrypto.so.1.1 to be reachable (hence the ssl11.tgz step above). If postinst fails the package ends in pFR (purge-failed-reinstreq); see the troubleshooting section below.

Programming the shell

sudo /opt/xilinx/xrt/bin/xbmgmt program \
    --base \
    --device <BDF>.0 \
    --image  xilinx_u55c_gen3x16_xdma_base_3

Each card takes 30–45 min to flash. Multiple cards on the same host can be flashed in parallel — each xbmgmt process drives its own card.

A cold reboot (chassis power cycle, not systemctl reboot) is required for the FPGA to load the freshly written flash partition:

sudo ipmitool power cycle

Verify post-reboot:

source /opt/xilinx/xrt/setup.sh && xbutil examine
# both cards: Device Ready: Yes

When a custom shell is intentional

node-2-11 previously ran a custom shell — xbutil examine returned 0 devices found by design, while the cards were still bound to xclmgmt. As of 2026-05-13 it was flashed to stock xilinx_u55c_gen3x16_xdma_base_3 so the device plugin can advertise it. If a future card needs a custom shell again, also remove it from the device plugin’s nodeAffinity list so the DaemonSet stops scheduling there (see device-plugin section above).

Recovering a user-bricked card

The user-facing FPGA docs explicitly tell users to coordinate before flashing, but in practice cards still occasionally end up in a bad state — a flash aborted halfway, a wrong image written, or a dpkg -i side-effect that nuked the platform deb. Symptoms cluster into three patterns:

Symptom on the host	Likely state
`lspci -d 10ee:` shows the card; `xbutil examine` says `0 devices found`	Card sitting in golden / recovery shell (`xilinx_u55c_recovery`). Subdevices not loaded; XMC missing.
`lspci -d 10ee:` shows `(rev ff)` for the card; nothing else	PCIe lost track of the device; happens when the FPGA was reflashed without a cold reboot or mid-flash crash.
`xbmgmt examine` lists the card but `--report platform` shows missing/wrong shell	Shell file present but not the one matching what’s flashed; common after a partial `dpkg -i` of the platform deb.

The recovery procedure is the same for all three — reflash the base shell from the host (you don’t need JTAG for this; PCIe is enough as long as the card enumerates), then cold-reboot.

Step 1 — drain and taint the node

HOST=node-2-7.sdsc.optiputer.net
kubectl cordon "$HOST"
kubectl taint nodes "$HOST" nautilus.io/issue=fpga-install:NoSchedule --overwrite
kubectl drain "$HOST" --ignore-daemonsets --delete-emptydir-data --force --grace-period=30

Step 2 — confirm the card’s BDF and current state

SSH to the host:

ssh nautilus@$HOST

# Cards Xilinx can see
lspci -d 10ee: -nD

# What XRT thinks of them
source /opt/xilinx/xrt/setup.sh
xbmgmt examine
xbmgmt examine --device 0000:21:00.0 --report platform     # repeat per BDF

If lspci shows (rev ff) for a card, a cold reboot first is required to get PCIe to re-enumerate it before any flash will work — jump to Step 4, then come back and start Step 3.

If the platform deb is in pFR (purge-failed-reinstreq) state, run the dpkg state recovery procedure to clean it up, then reinstall the base deb cleanly before flashing.

Step 3 — reflash the base shell

For each affected card:

sudo /opt/xilinx/xrt/bin/xbmgmt program \
    --base \
    --device 0000:21:00.0 \
    --image  xilinx_u55c_gen3x16_xdma_base_3

Each card: 30–45 min. Multiple cards on the same host can flash in parallel. xbmgmt will end with Cold reboot required.

Step 4 — cold-reboot the chassis

sudo ipmitool power cycle

A warm reboot is not enough — the flash partition is only loaded into the FPGA at chassis power-on. While the host is down, watch for kubelet to mark the node NotReady (~30s) and stay there until the box comes back. Total downtime is usually 5–10 min depending on POST and boot.

Step 5 — clear stale kubelet state if needed

If after the boot, kubelet crash-loops with Static policy invalid state (CPU manager state from before the reboot), see Stale kubelet state after reboot.

Step 6 — verify and uncordon

# back to ops workstation
$HOME/fpga-testing/verify_fpga_fleet.sh        # whole-fleet check, or:

ssh nautilus@$HOST 'source /opt/xilinx/xrt/setup.sh && xbutil examine'
# every card: Device Ready: Yes, shell xilinx_u55c_gen3x16_xdma_base_3

kubectl taint nodes "$HOST" nautilus.io/issue-
kubectl uncordon "$HOST"

The xilinx-device-plugin pod on that node will re-detect the cards and start advertising the resource again within a minute or so. Confirm:

kubectl get node "$HOST" -o jsonpath='{.status.allocatable.amd\.com/xilinx_u55c_gen3x16_xdma_base_3-0}'

should match the lspci card count.

When PCIe-side reflash isn’t enough — JTAG fallback

In rare cases (mid-flash power loss during the boot ROM region, not just the shell partition) PCIe can lose the card entirely — lspci -d 10ee: shows nothing for the slot and a cold reboot doesn’t recover it. The only fix is flashing through JTAG with Vivado hw_server + xsdb, attached to the on-card FT4232H. That’s outside the scope of this runbook; if you hit it, escalate to whoever maintains the AMD relationship — there’s a documented JTAG recovery path on AMD’s Alveo flashing guide.

Joining a new FPGA host to the cluster

End-to-end procedure:

1. Generate a join token (on controller)

sudo kubeadm token create

2. Run `nautilus-ansible` setup (from operator workstation)

cd nautilus-ansible
./run.sh setup <fqdn> <token>

The setup playbook installs containerd / kubelet / kubeadm, runs kubeadm join, configures local HAProxy LB, applies topology + netbox labels, and taints with nautilus.io/testing=true:NoSchedule until the node is verified.

Caveats observed during stratix-10-02 join (2026-06-03):

lvm role: if the inventory entry has lv_devices: /dev/sda but the host already runs root on /dev/md0 (RAID1 of sda/sdb), run with --skip-tags lvm. pvcreate would safely refuse to overwrite existing signatures, but the playbook halts before the kubernetes role.
apt state: if XRT was previously force-installed with broken deps, the os : remove packages task fails because ubuntu-server depends on multipath-tools (one of the packages it tries to remove). Run sudo apt --fix-broken install -y on the host first (this removes the broken XRT package — reinstall it after the join).
sudo timeout on hold kubernetes: intermittent. Re-running the playbook with -t kubernetes,node-labels,multus-removal,chrony,vis is idempotent and finishes the join.

3. Add the FPGA-specific labels

kubectl label node <fqdn> fpga=true --overwrite
kubectl label node <fqdn> smarter-device-manager=enabled --overwrite

4. Add the host to `xilinx-device-plugin-daemonset`’s nodeAffinity list

kubectl -n kube-system get ds xilinx-device-plugin-daemonset -o json \
  | jq '.spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].values |= (. + ["<fqdn>"] | unique)' \
  | kubectl apply -f -

5. Install XRT + platform deb

See the XRT installation and Flashing sections above.

6. Remove the testing taint

kubectl taint nodes <fqdn> nautilus.io/testing-

7. Verify

# all in one
fpga-testing/verify_fpga_fleet.sh

A passing host shows: Ready=True, netbox.io/site set, xdma=N, fpga_jtag=N, XRT version + xmc_paired=N matching the lspci card count.

Common admin operations

Stale kubelet state after reboot

After an ipmitool power cycle on a node using the static CPU manager, kubelet may crash-loop with Static policy invalid state, please drain node and remove policy state file. Fix:

sudo systemctl stop kubelet
sudo rm -f /var/lib/kubelet/cpu_manager_state /var/lib/kubelet/memory_manager_state
sudo systemctl start kubelet

Then watch for the node.kubernetes.io/unreachable taint to clear (typically within 30s of kubelet posting heartbeats again).

Stale allocatable counts on a node

If kubectl describe node shows amd.com/xilinx_u55c_gen3x16_xdma_base_3-0=3 but only 1 card is actually present, delete the local device-plugin pod to force re-detection:

POD=$(kubectl -n kube-system get pods --field-selector spec.nodeName=<fqdn> -o name | grep xilinx-device-plugin)
kubectl -n kube-system delete "$POD"

dpkg state recovery (U55C platform deb)

If the xilinx-u55c-gen3x16-xdma-base package is stuck in pFR (“very bad inconsistent state”), the prerm tries to remove files that don’t exist. Workaround: pre-create the file paths it expects, then purge and reinstall:

sudo mkdir -p /opt/xilinx/firmware/u55c/gen3x16-xdma/base/firmware
sudo touch /lib/firmware/xilinx/b7ac1abe1e3e1cb686d5a81232452676 \
           /opt/xilinx/firmware/u55c/gen3x16-xdma/base/{partition_metadata.json,partition.xsabin} \
           /opt/xilinx/firmware/u55c/gen3x16-xdma/base/firmware/{cmc-u55,ert-v30,sc-fw-u55}
sudo dpkg --remove --force-remove-reinstreq --force-all xilinx-u55c-gen3x16-xdma-base
sudo rm -rf /lib/firmware/xilinx/97088961feaeda9152a21d9dfd63ccef \
            /lib/firmware/xilinx/b7ac1abe1e3e1cb686d5a81232452676 \
            /opt/xilinx/firmware/u55c
sudo dpkg -i --force-depends /tmp/xilinx-u55c-gen3x16-xdma-base_*.deb

NEVER do `dpkg-deb -x <u55c-base.deb> /`

The U55C base deb’s tar archive contains a ./lib/ directory entry. On Ubuntu 22.04, /lib is a symlink to /usr/lib. Extracting the tar to / replaces the symlink with a real directory, which silently breaks every dlopen of PAM modules (/lib/security/pam_*.so no longer resolves to /usr/lib/x86_64-linux-gnu/security/). The visible symptom is sshd accepting the public key and then closing the connection (PAM unable to dlopen lines in /var/log/auth.log).

Recovery (from a privileged pod with host root mounted at /host, since SSH is broken):

mv /host/lib/firmware/xilinx /host/usr/lib/firmware/xilinx   # save extracted files
rmdir /host/lib/firmware /host/lib                            # drop the broken dir
ln -s usr/lib /host/lib                                       # restore the symlink

If you ever genuinely need to peek at the deb’s contents, do it in a scratch directory — never extract to /.

Examining FPGAs from the host

source /opt/xilinx/xrt/setup.sh
xbmgmt examine

Expected:

Device(s) Present
|BDF             ||Shell                            ||Logic UUID                            ||Device ID         ||Device Ready*  |
|----------------||---------------------------------||--------------------------------------||------------------||---------------|
|[0000:21:00.0]  ||xilinx_u55c_gen3x16_xdma_base_3  ||97088961-FEAE-DA91-52A2-1D9DFD63CCEF  ||mgmt(inst=128)    ||Yes            |
|[0000:a1:00.0]  ||xilinx_u55c_gen3x16_xdma_base_3  ||97088961-FEAE-DA91-52A2-1D9DFD63CCEF  ||mgmt(inst=129)    ||Yes            |

Vivado is available on the admin instance of Coder in an FPGA Flashing template: Coder Dev. AMD’s flashing reference: AMD/Xilinx Flashing Guide.

Periodic health check

The fleet currently has no Prometheus exporter for FPGA-specific health, and the device plugin’s “card not advertised” failure mode is silent — you only notice when a user complains. Until a proper exporter is in place, the recommended pattern is to run fpga-testing/verify_fpga_fleet.sh (or its successor) periodically from an admin workstation or a cluster cron.

What it checks per host:

Node Ready=True
netbox.io/site label is set and non-DaemonSet pods on the node are all Running/Completed
HW visible: lspci Xilinx count, lsusb FT4232H count, /dev/ttyUSB* count
k8s allocatable: amd.com/xilinx_u55c_gen3x16_xdma_base_3-0, xilinx.com/fpga_jtag, smarter-devices ttyUSB resource kinds
XRT health: xbutil --version, lsmod for xocl/xclmgmt, count of cards bound to xclmgmt, XMC paired count

A passing host shows all five lines populated. Typical failures and what they mean:

Symptom in script output	Means…
`(1) joined: Ready=` (empty)	Node not in cluster, or kubelet not heartbeating. Check `journalctl -u kubelet`.
`(3) HW: lspci=0`	Card lost from PCIe — cold reboot first; investigate physical seating after.
`(3) HW: lspci=N lsusb_future=N` but `(4) xdma=` (empty)	Device plugin doesn’t have this host in its nodeAffinity list, or the `fpga=true` label is missing. See Kubernetes integration.
`(5) modules_loaded=0`	XRT installed but kernel modules aren’t loaded; `sudo modprobe xclmgmt xocl` on the host.
`(5) cards_bound=N`, `xmc_paired=0`	Cards are bound but in golden/recovery shell — reflash per Recovering a user-bricked card.
`(5) xrt_version=` (empty)	XRT not installed (or `setup.sh` path moved). See XRT installation.

A minimum-viable cron is just verify_fpga_fleet.sh > out.txt && grep -E "(empty|=0$)" out.txt && mail -s 'FPGA fleet drift' … on the admin workstation; running it nightly catches drift before users do. A more proper Prometheus exporter (reading the same five signals and exposing them as gauges) is a worthwhile follow-up but doesn’t exist today.

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.