FPGA Administration
Overview
This guide pertains to the AMD/Xilinx Alveo U55C FPGAs used on the NRP cluster.
After the 2026-05 hardware migration the FPGAs were consolidated onto fewer hosts. The cards previously distributed across node-1-1..4 and node-2-1..4 (with their JTAG cables routed to node-2-10) were moved to dense FPGA hosts. All JTAG cables are now plugged into the same host as the card itself — there is no longer a JTAG concentrator.
Current FPGA inventory spreadsheet: the live mapping (serial / iSerial / node / PCI BDF) is regenerated by the discovery tooling in the fpga-testing repo. The committed reference is at FPGA Inventory (Google Sheets).
Current FPGA hosts
| Host | Cards | XRT version | Shell | Notes |
|---|---|---|---|---|
node-2-6.sdsc.optiputer.net | 2 | 2.15.225 (2023.1) | xilinx_u55c_gen3x16_xdma_base_3 | also has one non-Alveo FT232 (USB-serial console) |
node-2-7.sdsc.optiputer.net | 2 | 2.15.225 | xilinx_u55c_gen3x16_xdma_base_3 | |
node-2-8.sdsc.optiputer.net | 2 | 2.15.225 | xilinx_u55c_gen3x16_xdma_base_3 | |
node-2-9.sdsc.optiputer.net | 1 | 2.16.204 (2023.2) | xilinx_u55c_gen3x16_xdma_base_3 | |
node-2-10.sdsc.optiputer.net | 1 | 2.16.204 | xilinx_u55c_gen3x16_xdma_base_3 | was the old JTAG concentrator; only 1 local card today |
node-2-11.sdsc.optiputer.net | 2 | 2.16.204 | xilinx_u55c_gen3x16_xdma_base_3 | flashed from custom shell to stock 2026-05-13 |
k8s-stratix-10-02.sdsc.optiputer.net | 7 | 2.16.204 | xilinx_u55c_gen3x16_xdma_base_3 | joined to cluster on 2026-06-03 |
prp-gpu-2.t2.ucsd.edu | 6 | 2.19.194 (2025.1) | xilinx_u55c_gen3x16_xdma_base_3 | Ubuntu 24.04 / kernel 6.8 — must use XRT 2024.x+; older XRT will not build |
k8s-stratix-10-01.sdsc.optiputer.net | — | — | — | OFFLINE as of writing — likely holds the 9 unaccounted-for cards from the inventory |
Total: 23 paired cards online, plus the still-offline k8s-stratix-10-01 accounting for the remainder of the original 32.
Services on every FPGA host
Four cluster-level components together make an FPGA host useful. The first two (XRT and xilinx-device-plugin-daemonset) are strictly required — without them the host doesn’t advertise cards and no FPGA-bearing pod can run on it. KubeVirt virt-handler is required if you want pods or VMs to access JTAG via xilinx.com/fpga_jtag; if your only workloads are .xclbin programming via the PCIe-side resource you can technically skip it. smarter-device-manager is optional — strictly speaking the Vivado / .xclbin / JTAG flow does not need it, but specific workflows like ESnet SmartNIC sn-cli (UART-only access to the satellite controller) and VFIO PCIe passthrough do require it. In practice we run all four on every FPGA host so users don’t have to remember per-host capability differences.
1. XRT (Xilinx Runtime) installed on the host OS — required
XRT is not containerised; it must be installed in the host’s userland. Two reasons:
- It ships the
xclmgmtandxoclkernel modules (built via DKMS at install time). Without these, the cards are present inlspcibut no driver is bound:/sys/bus/pci/drivers/xclmgmt/doesn’t exist, the FPGAs are invisible to anything that doesn’t poke raw PCIe, and the device plugin DaemonSet won’t find any cards to advertise. - It ships the
xbutil/xbmgmthost-side userspace tools and the libraries the device plugin links against. The device plugin reads the cards’ XMC serials, status, and platform UUIDs by talking to XRT via the same shared libraries (libxrt_core/libxrt_coreutil); a containerised plugin can do that against the host’s/dev/xclmgmt*only because XRT laid down those device nodes.
Required minimum: a working /opt/xilinx/xrt/setup.sh, lsmod | grep -E "^xocl|^xclmgmt" showing both modules, and xbutil examine reporting Device Ready: Yes for each card. Version per host listed in the table above; see XRT installation below for which deb to use on which Ubuntu release.
2. xilinx-device-plugin-daemonset (namespace kube-system) — required
This DaemonSet is what registers the FPGAs themselves with kubelet. Without it, the host can have XRT and 7 happy cards and kubectl describe node still shows zero amd.com/xilinx_u55c_* resources, so no pod can ever schedule onto them. Specifically:
- Runs the AMD k8s-device-plugin binary (
public.ecr.aws/xilinx_dcg/k8s-device-plugin:1.1.0), which talks the k8s device-plugin gRPC protocol over the socket/var/lib/kubelet/device-plugins/xilinx_u55c_gen3x16_xdma_base_3-0-fpga.sock. - Reads card inventory by enumerating
/sys/bus/pci/drivers/xclmgmt/and walking each card’s XMCserial_num. (You can see this in the pod logs:Check SeialNums arry: [XFL1H4XIZQLE XFL1GHBRTQ42] … Sending 2 device(s) [0000:a1:00.1, 0000:21:00.1] to kubelet.) - Advertises one resource:
amd.com/xilinx_u55c_gen3x16_xdma_base_3-0, with one count per ready card. When a pod requests N of it, the plugin allocates specific PCIe BDFs and tells kubelet to mount the matching/dev/xclmgmt*//dev/dri/renderD*into the container.
The DaemonSet has a hardcoded nodeAffinity host list in addition to the nodeSelector: fpga=true — both have to permit the node or the plugin won’t run there. See Kubernetes integration for the two-step onboarding ritual.
3. smarter-device-manager (namespace kube-system) — optional, but install it anyway
This one is not needed for the standard FPGA flow — loading .xclbins through xbutil program --user, flashing via xilinx.com/fpga_jtag (KubeVirt), running Vivado, etc. all work without it. It’s needed for two specific workflows that share an FPGA host:
- ESnet SmartNIC
sn-cli— talks to the on-card satellite controller over UART (/dev/ttyUSBN), not raw USB. Without smarter-device-manager, theesnetCoder template (anddeploy-esnetin the templates repo) can’t allocatesmarter-devices/ttyUSB*and ESnet pods fail to schedule. - VFIO PCIe passthrough from regular pods (DPDK-style). Needs
smarter-devices/vfio(/dev/vfiogroup device).
It also exposes smarter-devices/fuse, which is not FPGA-specific.
Because the cost of running the DaemonSet is tiny and we have ESnet users on these hosts, install it on every FPGA host. The fleet-wide install just means labelling: kubectl label node <fqdn> smarter-device-manager=enabled --overwrite.
The smarter-device-manager DaemonSet exposes specific /dev/... files as schedulable k8s resources via the same device-plugin gRPC protocol. For the FPGA side:
smarter-devices/ttyUSB0,ttyUSB1,ttyUSB5,ttyUSB10,ttyUSB11,ttyUSB15— the FT4232H UART channels. Each U55C’s onboard FTDI exposes four UARTs as/dev/ttyUSBN. Lets pods talk to the cards’ satellite controllers over UART (used by ESnet SmartNIC’ssn-cli, byxsdb’s serial backend, by anything talking to the SC for serial console).smarter-devices/vfio— the/dev/vfiogroup device. Required for any pod doing VFIO PCIe passthrough.
The configmap is in kube-system/smarter-device-manager (a single conf.yaml with devicematch: regexes; the FPGA regex is ^ttyUSB[0-15]*$, which is why only the names listed above are advertised — the regex character class is buggy but intentional today).
Note: xilinx.com/fpga_jtag (KubeVirt) provides raw USB at /dev/bus/usb/<bus>/<dev>, which is enough for any JTAG operation including reading the SC via libftdi. So a user who needs both JTAG TAP and SC UART can use the KubeVirt resource alone and bypass smarter-devices/ttyUSB* entirely — smarter-devices/ttyUSB* is only the right answer when the pod wants the kernel-cooked tty interface (e.g. picocom /dev/ttyUSB1) without raw-USB privileges.
4. KubeVirt virt-handler (namespace kubevirt) — for xilinx.com/fpga_jtag — required if you want JTAG access from pods/VMs
This one isn’t FPGA-specific (it’s KubeVirt’s normal node agent), but it’s the component that actually registers xilinx.com/fpga_jtag with kubelet, based on the cluster’s KubeVirt CR permittedHostDevices.usb config:
permittedHostDevices: usb: - resourceName: xilinx.com/fpga_jtag selectors: - vendor: "0403" product: "6011"A pod that requests xilinx.com/fpga_jtag: 1 gets /dev/bus/usb/<bus>/<dev> for one of the FT4232H devices on the host — the raw-USB device file needed for JTAG operations (Vivado hw_server, OpenOCD, xbmgmt program). Without virt-handler, the resource is simply not advertised; without the permittedHostDevices.usb entry, the resource exists but matches no USB devices.
Adding a per-iSerial resource (rare; only for KubeVirt VMs that want to pin to a specific card). Regular Pods can’t pin to a specific FT4232H by iSerial — the generic xilinx.com/fpga_jtag resource is a pool keyed only on vendor:0403/product:6011. For most users that’s fine: the user-doc Example 3 shows how to read the allocated cable’s iSerial at runtime and pick the matching FPGA BDF in the pod’s code.
The exception is KubeVirt VMs, where the VM has to bind the USB device at boot — the runtime-pairing trick doesn’t apply because there’s no startup script that can “pick” between two attached USB devices. If a user files an nrp-help ticket asking for a VM with a specific card by serial, add a per-serial entry to the KubeVirt CR:
kubectl edit kubevirt -n kubevirt kubevirtAppend (or insert alongside the existing xilinx.com/fpga_jtag entry) under .spec.configuration.permittedHostDevices.usb:
- resourceName: xilinx.com/fpga_jtag_XFL1GHBRTQ42 # <- card's iSerial selectors: - vendor: "0403" product: "6011" serial: "XFL1GHBRTQ42"After the edit, virt-handler picks it up automatically. Verify the new resource appears on the node that has that card:
HOST=node-2-7.sdsc.optiputer.netkubectl get node "$HOST" -o jsonpath='{.status.allocatable}' | jq 'with_entries(select(.key | test("fpga_jtag")))'# should now include "xilinx.com/fpga_jtag_XFL1GHBRTQ42": "1"Tell the user to reference the per-serial resource in their VM’s spec.template.spec.domain.devices.hostDevices.deviceName. Don’t remove the generic xilinx.com/fpga_jtag entry — leaving it lets other pods/VMs still get unspecified-card allocations.
Summary: dependency for what
| Component | Status | Without it you lose… |
|---|---|---|
| XRT (host) | required | xclmgmt/xocl modules; everything below depends on this |
xilinx-device-plugin-daemonset | required | The amd.com/xilinx_u55c_* resource → no FPGA pods at all on the node |
KubeVirt virt-handler + CR | required for JTAG access | xilinx.com/fpga_jtag → no JTAG TAP access from pods/VMs (Vivado hw_server/OpenOCD/xbmgmt program) |
smarter-device-manager | optional (recommended; needed for ESnet sn-cli and VFIO) | smarter-devices/ttyUSB* → no UART-only access from pods; smarter-devices/vfio → no VFIO |
Xilinx FlexLM license server (xilinx-dev namespace)
Vivado, Vitis, and the AMD/Xilinx IP cores users build with on the cluster are gated by FlexLM licenses. We run a single in-cluster lmgrd that all Vivado/Vitis pods point at via [email protected]. This section is what cluster admins need to keep that server running and the license current.
What’s deployed
| Object | Namespace | Purpose |
|---|---|---|
Deployment/xilinxd | xilinx-dev | One pod running lmgrd -c /etc/xilinx/xilinx.lic -z (FlexLM license daemon + xilinxd vendor daemon) |
Service/xilinxd (ClusterIP) | xilinx-dev | Exposes ports 2100 (lmgrd), 27000 (vendor daemon), 6978 (alt vendor port) |
ConfigMap/xilinx-lic (key xilinx.lic) | xilinx-dev | The actual FlexLM license file; mounted at /etc/xilinx/xilinx.lic in the pod |
Secret/regcred | xilinx-dev | Pull secret for the private gitlab-registry.nrp-nautilus.io/nrp/xilinxd image |
The DNS name xilinxd.xilinx-dev resolves (cluster-internal) to the service ClusterIP. So any pod in any namespace can use the standard FlexLM port-at-host form: [email protected].
Why the MAC address is pinned
The deployment’s container command starts with ifconfig eth0 hw ether b6:e1:09:31:ba:0e. Do not change this. FlexLM licenses from AMD are tied to the host’s MAC (“hostid”), and the license file’s SERVER line is:
SERVER xilinxd b6e10931ba0e 2100If the pod’s eth0 MAC doesn’t match b6e10931ba0e, lmgrd will refuse to serve and every Vivado client will report “Cannot find SERVER hostname in network database.” That’s also why the deployment carries securityContext.capabilities.add: ["NET_ADMIN"] — it needs CAP_NET_ADMIN to rewrite the eth0 MAC on each pod start.
Concretely: if you ever rebuild the image, rescale, or move to a new node, the MAC override in the command is what keeps the license valid across reschedules. Don’t drop it; don’t replace it with a MAC=... env var unless you also adjust the entrypoint.
Updating the license (every ~90 days)
AMD’s evaluation/university-program licenses for the relevant IP cores expire in roughly 90-day cycles (the precise dates vary per feature; check the START=… and expiry dates in the current license). When lmgrd starts logging “license expired” or Vivado clients report Feature unavailable, follow this procedure:
Back up the current license file first so you have something to roll back to if the new one is malformed or has fewer features than the old one:
Terminal window kubectl -n xilinx-dev get configmap xilinx-lic -o jsonpath='{.data.xilinx\.lic}' \> "$HOME/xilinx-lic-backup-$(date -u +%Y-%m-%d).lic"Keep this on the admin workstation (or in your usual personal-backup location). The cluster has no canonical secret-backup pattern for this; a dated local copy is sufficient since rollback is just “re-apply the previous configmap.” This file is small (a few KB) — keep a couple of generations.
Get a new license from AMD’s website. Go to https://www.xilinx.com/getlicense (AMD Licensing Site). When asked for the host configuration, use:
- Host name:
xilinxd - Host ID type: Ethernet MAC
- Host ID:
b6e10931ba0e(must match the pinned MAC above) - Port:
2100
Download the
.licfile AMD emails back. It must include theSERVER xilinxd b6e10931ba0e 2100line andVENDOR xilinxd PORT=27000(orUSE_SERVER-style block).- Host name:
Patch the configmap. From a workstation with cluster admin kubeconfig:
Terminal window kubectl -n xilinx-dev create configmap xilinx-lic \--from-file=xilinx.lic=./new-xilinx.lic \-o yaml --dry-run=client \| kubectl apply -f -(You can also edit it in place with
kubectl -n xilinx-dev edit configmap xilinx-lic, but the from-file dance is less error-prone for a multi-line license body with\r\nline endings, which lmgrd is fussy about.)Restart the daemon to pick up the new file. The pod mounts the configmap, but
lmgrdreads the file once at startup; it doesn’t watch for changes:Terminal window kubectl -n xilinx-dev rollout restart deployment xilinxdkubectl -n xilinx-dev rollout status deployment xilinxdVerify. From any pod in any namespace:
Terminal window # quick TCP checknc -vz xilinxd.xilinx-dev 2100# full check inside a Vivado-enabled podexport XILINXD_LICENSE_FILE=2100@xilinxd.xilinx-devsource /tools/Xilinx/Vivado/2023.1/settings.shvlm # Vivado License Manager — should list all available features and datesAnd on the server side:
Terminal window kubectl -n xilinx-dev logs deployment/xilinxd --tail=100# look for "lmgrd tcp-port 2100" and "xilinxd: Server started on xilinxd"
Rotating the MAC (only if you really need to)
If the pinned MAC ever needs to change — e.g. AMD reissues against a different hostid — you must update both the SERVER line in the new license file and the ifconfig eth0 hw ether ... argument in the deployment’s container command. They must match exactly (lowercase, no separators) or lmgrd will not start. Roll the deployment, then re-verify with vlm from a client pod.
Troubleshooting
vlmshows “Cannot connect to license server system.” Service may not be resolving —kubectl -n xilinx-dev get svc xilinxdandnslookup xilinxd.xilinx-devfrom a debug pod. If DNS is fine, check the lmgrd pod logs andnc -vz xilinxd.xilinx-dev 2100.vlmconnects but says “No such feature exists.” The license has been served but the specific feature (e.g.SDNET,v6_pcie,TPG) isn’t in it. Confirm withkubectl -n xilinx-dev get cm xilinx-lic -o yaml | grep INCREMENT. If the user’s feature is missing, request it from AMD University Program for the same hostid; don’t issue a brand-new server license.lmgrdexits immediately with “Wrong hostid on SERVER line.” The MAC override was not applied. Confirm the container started with NET_ADMIN and checkkubectl -n xilinx-dev exec deploy/xilinxd -- ip a show dev eth0returns the pinned MAC.
PCIe Availability
Every working Alveo card appears twice in lspci — once as the management physical function (PF, ending in .0) and once as the user PF (.1):
nautilus@node-2-7:~$ lspci -d 10ee: -nD0000:21:00.0 Processing accelerators: Xilinx Corporation Device 505c0000:21:00.1 Processing accelerators: Xilinx Corporation Device 505d0000:a1:00.0 Processing accelerators: Xilinx Corporation Device 505c0000:a1:00.1 Processing accelerators: Xilinx Corporation Device 505d505cis the management function (driven byxclmgmt).505dis the user function (driven byxocl).- A card stuck in golden / recovery shell still shows the same PCI IDs but
xbutil examinereports0 devices foundbecause the XMC subdevice isn’t loaded. Reflash withxbmgmt program --base(see Flashing below) and cold-reboot.
A “phantom” PCIe device showing (rev ff) is a sign of stale kernel state — typically a card that was physically pulled while the driver was bound. A cold (ipmitool power cycle) reboot clears it.
USB JTAG (FTDI) Availability
Each U55C has an on-board FTDI FT4232H that exposes JTAG over USB-C. The FTDI’s USB iSerial is identical to the card’s XMC serial number — the same string from lsusb and from /sys/bus/pci/drivers/xclmgmt/<BDF>/xmc.*/serial_num. That equality is what verify_jtag_serials.sh in fpga-testing/ exploits to validate the cable-to-card mapping.
Quick checks from the host:
# count Alveo JTAG cableslsusb | grep -c "Future Technology Devices International, Ltd FT4232H"
# Alveo card serial via XRT-loaded sysfs (requires the xclmgmt driver)for d in /sys/bus/pci/drivers/xclmgmt/0000:*; do for sn in "$d"/xmc.*/serial_num; do [ -f "$sn" ] && echo "$(basename $d) $(cat $sn)" donedoneEvery FT4232H exposes four UART channels (/dev/ttyUSB0..3) per card, so 2 cards → 8 ttyUSBs, 7 cards → 28, etc. The smarter-device-manager DaemonSet surfaces these to k8s as smarter-devices/ttyUSB* resources (see below).
Kubernetes integration
Two DaemonSets cover FPGA workloads:
1. xilinx-device-plugin-daemonset (namespace kube-system)
Source: AMD’s k8s-device-plugin. Advertises each programmed FPGA as a schedulable resource.
| Property | Value |
|---|---|
| Node selector | fpga=true |
| Node affinity | Hardcoded list of hostnames under kubernetes.io/hostname |
| Resource added | amd.com/xilinx_u55c_gen3x16_xdma_base_3-0 |
| JTAG resource | xilinx.com/fpga_jtag (one count per Alveo card) |
| Container image | public.ecr.aws/xilinx_dcg/k8s-device-plugin:1.1.0 |
To onboard a new FPGA host into the device plugin you must do BOTH:
Label the node:
Terminal window kubectl label node <fqdn> fpga=true --overwriteAdd the FQDN to the DaemonSet’s
nodeAffinityhostname list:Terminal window kubectl -n kube-system get ds xilinx-device-plugin-daemonset -o json \| jq '.spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].values |= (. + ["<fqdn>"] | unique)' \| kubectl apply -f -The label alone is not enough — the DaemonSet has both
nodeSelector: fpga=trueand arequiredDuringSchedulingnode affinity onkubernetes.io/hostname. Forgetting step 2 leavesDESIREDone short of the labelled node count.
2. smarter-device-manager (DaemonSet in kube-system)
| Property | Value |
|---|---|
| Node selector | smarter-device-manager=enabled |
| Resources | smarter-devices/ttyUSB0, smarter-devices/ttyUSB1, smarter-devices/ttyUSBN, smarter-devices/vfio, etc. |
Each smarter-devices/ttyUSBN resource defaults to a multi-allocation count of 16 (the same /dev/ttyUSBN can be claimed by up to 16 concurrent pods). This makes JTAG cables (/dev/ttyUSB0..3 per FPGA) reachable from interactive FPGA-dev pods.
To onboard a new FPGA host:
kubectl label node <fqdn> smarter-device-manager=enabled --overwriteThere is also a second DaemonSet smarter-device-manager/smarter-device-manager (no node selector) that runs cluster-wide — leave it alone; it doesn’t expose FPGA-specific resources.
Pod spec example (JTAG + Alveo card)
resources: limits: amd.com/xilinx_u55c_gen3x16_xdma_base_3-0: 1 # one whole FPGA smarter-devices/ttyUSB0: 1 # one JTAG UART channel smarter-devices/ttyUSB1: 1 smarter-devices/ttyUSB2: 1 smarter-devices/ttyUSB3: 1XRT installation
Three flavours of XRT are deployed today depending on the host’s Ubuntu version:
| OS | XRT package | Notes |
|---|---|---|
| Ubuntu 22.04 (most hosts) | xrt_2.16.204_amd64.deb (2023.2) or 2.15.225 (2023.1) | Userspace bins built against 20.04 — extra libs required (see below) |
Ubuntu 24.04 (prp-gpu-2) | xrt_202510.2.19.194_24.04-amd64-xrt.deb (2025.1) | Required for kernel 6.8; 2023.x DKMS will not compile on this kernel |
The 22.04 XRT 2023.2 deb declares dependencies on Ubuntu-20.04 versions of libboost-*1.71.0, libssl1.1, libprotobuf17. Two workarounds are in use:
sudo dpkg -i --force-depends /tmp/xrt_2.16.204_amd64.debsudo tar xzf /tmp/xrt-extra-libs.tgz -C /usr/lib/x86_64-linux-gnu/ # libboost_filesystem.so.1.71.0, libboost_program_options.so.1.71.0sudo tar xzf /tmp/ssl11.tgz -C /usr/lib/x86_64-linux-gnu/ # libssl.so.1.1, libcrypto.so.1.1 (needed for xbmgmt + xclbinutil)sudo ldconfigsudo modprobe xclmgmt && sudo modprobe xoclssl11.tgz and xrt-extra-libs.tgz are the artifacts shipped to each FPGA host under /tmp/.
After install, smoke-test:
source /opt/xilinx/xrt/setup.shxbutil examine # all cards report Device Ready: Yesxbmgmt examine # mgmt side; shows installed shellsFlashing a card
Required when:
xbutil examinereports0 devices foundbutlspci -d 10ee:shows the cards.xbmgmt examine -d <BDF> --report platformshowsxilinx_u55c_recovery(the cards are sitting in golden/rescue mode).- A node was rebooted after the U55C platform deb was removed (e.g. as a side-effect of
apt --fix-broken install).
One-time platform deb installation
The U55C XDMA shell is not in any apt repo; it must be installed from the AMD-supplied .tar.gz artifact. The 4 debs inside are:
xilinx-cmc-u55_*.deb # Card Management Controller firmwarexilinx-sc-fw-u55_*.deb # Satellite Controller firmwarexilinx-u55c-gen3x16-xdma-base_*.deb # the deployable shell + xsabin builderxilinx-u55c-gen3x16-xdma-validate_*.deb # validate xclbinInstall all four with dpkg -i --force-depends. The *-base*.deb’s postinst runs create_xsabin.sh, which invokes xclbinutil and requires libcrypto.so.1.1 to be reachable (hence the ssl11.tgz step above). If postinst fails the package ends in pFR (purge-failed-reinstreq); see the troubleshooting section below.
Programming the shell
sudo /opt/xilinx/xrt/bin/xbmgmt program \ --base \ --device <BDF>.0 \ --image xilinx_u55c_gen3x16_xdma_base_3Each card takes 30–45 min to flash. Multiple cards on the same host can be flashed in parallel — each xbmgmt process drives its own card.
A cold reboot (chassis power cycle, not systemctl reboot) is required for the FPGA to load the freshly written flash partition:
sudo ipmitool power cycleVerify post-reboot:
source /opt/xilinx/xrt/setup.sh && xbutil examine# both cards: Device Ready: YesWhen a custom shell is intentional
node-2-11 previously ran a custom shell — xbutil examine returned 0 devices found by design, while the cards were still bound to xclmgmt. As of 2026-05-13 it was flashed to stock xilinx_u55c_gen3x16_xdma_base_3 so the device plugin can advertise it. If a future card needs a custom shell again, also remove it from the device plugin’s nodeAffinity list so the DaemonSet stops scheduling there (see device-plugin section above).
Recovering a user-bricked card
The user-facing FPGA docs explicitly tell users to coordinate before flashing, but in practice cards still occasionally end up in a bad state — a flash aborted halfway, a wrong image written, or a dpkg -i side-effect that nuked the platform deb. Symptoms cluster into three patterns:
| Symptom on the host | Likely state |
|---|---|
lspci -d 10ee: shows the card; xbutil examine says 0 devices found | Card sitting in golden / recovery shell (xilinx_u55c_recovery). Subdevices not loaded; XMC missing. |
lspci -d 10ee: shows (rev ff) for the card; nothing else | PCIe lost track of the device; happens when the FPGA was reflashed without a cold reboot or mid-flash crash. |
xbmgmt examine lists the card but --report platform shows missing/wrong shell | Shell file present but not the one matching what’s flashed; common after a partial dpkg -i of the platform deb. |
The recovery procedure is the same for all three — reflash the base shell from the host (you don’t need JTAG for this; PCIe is enough as long as the card enumerates), then cold-reboot.
Step 1 — drain and taint the node
HOST=node-2-7.sdsc.optiputer.netkubectl cordon "$HOST"kubectl taint nodes "$HOST" nautilus.io/issue=fpga-install:NoSchedule --overwritekubectl drain "$HOST" --ignore-daemonsets --delete-emptydir-data --force --grace-period=30Step 2 — confirm the card’s BDF and current state
SSH to the host:
ssh nautilus@$HOST
# Cards Xilinx can seelspci -d 10ee: -nD
# What XRT thinks of themsource /opt/xilinx/xrt/setup.shxbmgmt examinexbmgmt examine --device 0000:21:00.0 --report platform # repeat per BDFIf lspci shows (rev ff) for a card, a cold reboot first is required to get PCIe to re-enumerate it before any flash will work — jump to Step 4, then come back and start Step 3.
If the platform deb is in pFR (purge-failed-reinstreq) state, run the dpkg state recovery procedure to clean it up, then reinstall the base deb cleanly before flashing.
Step 3 — reflash the base shell
For each affected card:
sudo /opt/xilinx/xrt/bin/xbmgmt program \ --base \ --device 0000:21:00.0 \ --image xilinx_u55c_gen3x16_xdma_base_3Each card: 30–45 min. Multiple cards on the same host can flash in parallel. xbmgmt will end with Cold reboot required.
Step 4 — cold-reboot the chassis
sudo ipmitool power cycleA warm reboot is not enough — the flash partition is only loaded into the FPGA at chassis power-on. While the host is down, watch for kubelet to mark the node NotReady (~30s) and stay there until the box comes back. Total downtime is usually 5–10 min depending on POST and boot.
Step 5 — clear stale kubelet state if needed
If after the boot, kubelet crash-loops with Static policy invalid state (CPU manager state from before the reboot), see Stale kubelet state after reboot.
Step 6 — verify and uncordon
# back to ops workstation$HOME/fpga-testing/verify_fpga_fleet.sh # whole-fleet check, or:
ssh nautilus@$HOST 'source /opt/xilinx/xrt/setup.sh && xbutil examine'# every card: Device Ready: Yes, shell xilinx_u55c_gen3x16_xdma_base_3
kubectl taint nodes "$HOST" nautilus.io/issue-kubectl uncordon "$HOST"The xilinx-device-plugin pod on that node will re-detect the cards and start advertising the resource again within a minute or so. Confirm:
kubectl get node "$HOST" -o jsonpath='{.status.allocatable.amd\.com/xilinx_u55c_gen3x16_xdma_base_3-0}'should match the lspci card count.
When PCIe-side reflash isn’t enough — JTAG fallback
In rare cases (mid-flash power loss during the boot ROM region, not just the shell partition) PCIe can lose the card entirely — lspci -d 10ee: shows nothing for the slot and a cold reboot doesn’t recover it. The only fix is flashing through JTAG with Vivado hw_server + xsdb, attached to the on-card FT4232H. That’s outside the scope of this runbook; if you hit it, escalate to whoever maintains the AMD relationship — there’s a documented JTAG recovery path on AMD’s Alveo flashing guide.
Joining a new FPGA host to the cluster
End-to-end procedure:
1. Generate a join token (on controller)
sudo kubeadm token create2. Run nautilus-ansible setup (from operator workstation)
cd nautilus-ansible./run.sh setup <fqdn> <token>The setup playbook installs containerd / kubelet / kubeadm, runs kubeadm join, configures local HAProxy LB, applies topology + netbox labels, and taints with nautilus.io/testing=true:NoSchedule until the node is verified.
Caveats observed during stratix-10-02 join (2026-06-03):
- lvm role: if the inventory entry has
lv_devices: /dev/sdabut the host already runs root on/dev/md0(RAID1 ofsda/sdb), run with--skip-tags lvm.pvcreatewould safely refuse to overwrite existing signatures, but the playbook halts before the kubernetes role. - apt state: if XRT was previously force-installed with broken deps, the
os : remove packagestask fails becauseubuntu-serverdepends onmultipath-tools(one of the packages it tries to remove). Runsudo apt --fix-broken install -yon the host first (this removes the broken XRT package — reinstall it after the join). - sudo timeout on
hold kubernetes: intermittent. Re-running the playbook with-t kubernetes,node-labels,multus-removal,chrony,visis idempotent and finishes the join.
3. Add the FPGA-specific labels
kubectl label node <fqdn> fpga=true --overwritekubectl label node <fqdn> smarter-device-manager=enabled --overwrite4. Add the host to xilinx-device-plugin-daemonset’s nodeAffinity list
kubectl -n kube-system get ds xilinx-device-plugin-daemonset -o json \ | jq '.spec.template.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[0].matchExpressions[0].values |= (. + ["<fqdn>"] | unique)' \ | kubectl apply -f -5. Install XRT + platform deb
See the XRT installation and Flashing sections above.
6. Remove the testing taint
kubectl taint nodes <fqdn> nautilus.io/testing-7. Verify
# all in onefpga-testing/verify_fpga_fleet.shA passing host shows: Ready=True, netbox.io/site set, xdma=N, fpga_jtag=N, XRT version + xmc_paired=N matching the lspci card count.
Common admin operations
Stale kubelet state after reboot
After an ipmitool power cycle on a node using the static CPU manager, kubelet may crash-loop with Static policy invalid state, please drain node and remove policy state file. Fix:
sudo systemctl stop kubeletsudo rm -f /var/lib/kubelet/cpu_manager_state /var/lib/kubelet/memory_manager_statesudo systemctl start kubeletThen watch for the node.kubernetes.io/unreachable taint to clear (typically within 30s of kubelet posting heartbeats again).
Stale allocatable counts on a node
If kubectl describe node shows amd.com/xilinx_u55c_gen3x16_xdma_base_3-0=3 but only 1 card is actually present, delete the local device-plugin pod to force re-detection:
POD=$(kubectl -n kube-system get pods --field-selector spec.nodeName=<fqdn> -o name | grep xilinx-device-plugin)kubectl -n kube-system delete "$POD"dpkg state recovery (U55C platform deb)
If the xilinx-u55c-gen3x16-xdma-base package is stuck in pFR (“very bad inconsistent state”), the prerm tries to remove files that don’t exist. Workaround: pre-create the file paths it expects, then purge and reinstall:
sudo mkdir -p /opt/xilinx/firmware/u55c/gen3x16-xdma/base/firmwaresudo touch /lib/firmware/xilinx/b7ac1abe1e3e1cb686d5a81232452676 \ /opt/xilinx/firmware/u55c/gen3x16-xdma/base/{partition_metadata.json,partition.xsabin} \ /opt/xilinx/firmware/u55c/gen3x16-xdma/base/firmware/{cmc-u55,ert-v30,sc-fw-u55}sudo dpkg --remove --force-remove-reinstreq --force-all xilinx-u55c-gen3x16-xdma-basesudo rm -rf /lib/firmware/xilinx/97088961feaeda9152a21d9dfd63ccef \ /lib/firmware/xilinx/b7ac1abe1e3e1cb686d5a81232452676 \ /opt/xilinx/firmware/u55csudo dpkg -i --force-depends /tmp/xilinx-u55c-gen3x16-xdma-base_*.debNEVER do dpkg-deb -x <u55c-base.deb> /
The U55C base deb’s tar archive contains a ./lib/ directory entry. On Ubuntu 22.04, /lib is a symlink to /usr/lib. Extracting the tar to / replaces the symlink with a real directory, which silently breaks every dlopen of PAM modules (/lib/security/pam_*.so no longer resolves to /usr/lib/x86_64-linux-gnu/security/). The visible symptom is sshd accepting the public key and then closing the connection (PAM unable to dlopen lines in /var/log/auth.log).
Recovery (from a privileged pod with host root mounted at /host, since SSH is broken):
mv /host/lib/firmware/xilinx /host/usr/lib/firmware/xilinx # save extracted filesrmdir /host/lib/firmware /host/lib # drop the broken dirln -s usr/lib /host/lib # restore the symlinkIf you ever genuinely need to peek at the deb’s contents, do it in a scratch directory — never extract to /.
Examining FPGAs from the host
source /opt/xilinx/xrt/setup.shxbmgmt examineExpected:
Device(s) Present|BDF ||Shell ||Logic UUID ||Device ID ||Device Ready* ||----------------||---------------------------------||--------------------------------------||------------------||---------------||[0000:21:00.0] ||xilinx_u55c_gen3x16_xdma_base_3 ||97088961-FEAE-DA91-52A2-1D9DFD63CCEF ||mgmt(inst=128) ||Yes ||[0000:a1:00.0] ||xilinx_u55c_gen3x16_xdma_base_3 ||97088961-FEAE-DA91-52A2-1D9DFD63CCEF ||mgmt(inst=129) ||Yes |Vivado is available on the admin instance of Coder in an FPGA Flashing template: Coder Dev. AMD’s flashing reference: AMD/Xilinx Flashing Guide.
Periodic health check
The fleet currently has no Prometheus exporter for FPGA-specific health, and the device plugin’s “card not advertised” failure mode is silent — you only notice when a user complains. Until a proper exporter is in place, the recommended pattern is to run fpga-testing/verify_fpga_fleet.sh (or its successor) periodically from an admin workstation or a cluster cron.
What it checks per host:
- Node
Ready=True netbox.io/sitelabel is set and non-DaemonSet pods on the node are allRunning/Completed- HW visible: lspci Xilinx count, lsusb FT4232H count,
/dev/ttyUSB*count - k8s allocatable:
amd.com/xilinx_u55c_gen3x16_xdma_base_3-0,xilinx.com/fpga_jtag, smarter-devices ttyUSB resource kinds - XRT health:
xbutil --version,lsmodforxocl/xclmgmt, count of cards bound toxclmgmt, XMC paired count
A passing host shows all five lines populated. Typical failures and what they mean:
| Symptom in script output | Means… |
|---|---|
(1) joined: Ready= (empty) | Node not in cluster, or kubelet not heartbeating. Check journalctl -u kubelet. |
(3) HW: lspci=0 | Card lost from PCIe — cold reboot first; investigate physical seating after. |
(3) HW: lspci=N lsusb_future=N but (4) xdma= (empty) | Device plugin doesn’t have this host in its nodeAffinity list, or the fpga=true label is missing. See Kubernetes integration. |
(5) modules_loaded=0 | XRT installed but kernel modules aren’t loaded; sudo modprobe xclmgmt xocl on the host. |
(5) cards_bound=N, xmc_paired=0 | Cards are bound but in golden/recovery shell — reflash per Recovering a user-bricked card. |
(5) xrt_version= (empty) | XRT not installed (or setup.sh path moved). See XRT installation. |
A minimum-viable cron is just verify_fpga_fleet.sh > out.txt && grep -E "(empty|=0$)" out.txt && mail -s 'FPGA fleet drift' … on the admin workstation; running it nightly catches drift before users do. A more proper Prometheus exporter (reading the same five signals and exposing them as gauges) is a worthwhile follow-up but doesn’t exist today.
