Rook/Ceph Upgrades
Check versions and compatibility
# Check Kuberneteskubectl version
# Check the Rook operator imagekubectl -n rook-system get deploy rook-ceph-operator \ -o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}'
# Check all CephClusterskubectl get cephcluster -A \ -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,EXTERNAL:.spec.external.enable,IMAGE:.spec.cephVersion.image,HEALTH:.status.ceph.health'Before upgrading, check the official docs for the exact Rook version you plan to install:
- Rook supports the installed Kubernetes version.
- Rook supports the current and target Ceph versions.
- The Ceph target version supports the host OS and kernel.
- The upgrade path does not require another version first.
Replace v1.16 in these links when targeting another Rook release:
- Rook v1.16 prerequisites and Kubernetes versions
- Rook v1.16 Ceph upgrade and supported Ceph versions
- Rook v1.16 operator upgrade
- Ceph OS recommendations
- Ceph Squid release and upgrade notes
If the installed Rook version does not support the target Ceph version, upgrade Rook first.
Production compatibility matrix
Use the exact target-version docs before each step. For Nautilus on Kubernetes 1.33.8:
| Rook version | Kubernetes 1.33 supported | Ceph versions in official docs | Use in this plan |
|---|---|---|---|
| v1.16 | No; docs list Kubernetes v1.27-v1.32 | Reef and Squid | Current operator version only. Do not use for Tentacle. |
| v1.17 | Yes; docs list Kubernetes v1.28-v1.33 | Reef and Squid | Intermediate operator step only. |
| v1.18 | Yes; docs list Kubernetes v1.29-v1.34 | Reef, Squid, and Tentacle | First version line that supports Tentacle. |
| v1.19 | Yes; docs list Kubernetes v1.30-v1.35 | Squid and Tentacle | Preferred minimum line before Tentacle. |
| v1.20 | Yes; docs list Kubernetes v1.31-v1.36 | Squid and Tentacle | Latest checked line for Kubernetes 1.33.8; includes CSI migration changes. |
Do not attempt the Tentacle upgrade while the operator is still on v1.16.9. Upgrade the operator through each minor release first. If targeting v1.20, upgrade to at least v1.19.5 before v1.20, then follow the v1.20 CSI migration steps.
Official references used for this matrix:
- Rook v1.17 prerequisites and Ceph upgrade docs
- Rook v1.18 prerequisites and Ceph upgrade docs
- Rook v1.19 prerequisites and Ceph upgrade docs
- Rook v1.20 prerequisites, Ceph upgrade docs, and operator upgrade docs
Production upgrade order
For the Nautilus production cluster, use this order:
- Upgrade each local Ceph cluster to the approved Squid image, one cluster at a time.
- Upgrade the Rook operator one minor release at a time: v1.16.9 → v1.17 → v1.18 → v1.19.5+ → v1.20.x, or stop at the newest approved release that supports Kubernetes 1.33.8.
- After the operator is on a release that supports Tentacle, plan the Squid to Tentacle Ceph upgrade.
Do not use Rook v1.16 as the final operator target for Kubernetes 1.33.8. Check the exact target Rook docs before each operator step.
Ceph upgrade: Reef to Squid
Pick one cluster
# Set NS to the namespace of the Ceph cluster you are upgradingNS=rook-centralCEPH_CLUSTER="$NS"Change NS to the cluster you want to upgrade. Upgrade only one local Ceph cluster at a time.
Check Ceph health
# Check overall Ceph statuskubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph -s
kubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph statuskubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph health detailkubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph versionskubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph osd statDo not continue unless:
- Health is
HEALTH_OK. - All PGs are
active+clean. - No PGs or objects are misplaced, degraded, recovering, or backfilling.
- All OSDs are
upandin.
Some clusters may have slow OSDs or other HEALTH_WARN items. Confirm those warnings are understood and safe before continuing.
1. Upgrade to the approved Reef patch
Use the exact approved production image tag. Date-suffixed tags are preferred in production. This example uses Reef 18.2.8:
REEF_IMAGE=quay.io/ceph/ceph:v18.2.8
kubectl -n "$NS" patch cephcluster "$CEPH_CLUSTER" --type merge \ -p "{\"spec\":{\"cephVersion\":{\"image\":\"$REEF_IMAGE\"}}}"# Wait until all daemons use the new version and health is HEALTH_OKkubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph versionskubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph status
# Update the toolbox image after the Ceph upgrade finisheskubectl -n "$NS" set image deploy/rook-ceph-tools \ rook-ceph-tools="$REEF_IMAGE"kubectl -n "$NS" rollout status deploy/rook-ceph-tools2. Upgrade Reef to Squid
Run the health checks again, then use the approved Squid image. Date-suffixed tags are preferred in production. This example uses Squid 19.2.4:
SQUID_IMAGE=quay.io/ceph/ceph:v19.2.4
kubectl -n "$NS" patch cephcluster "$CEPH_CLUSTER" --type merge \ -p "{\"spec\":{\"cephVersion\":{\"image\":\"$SQUID_IMAGE\"}}}"Mixed Reef and Squid versions are normal while the rollout is running.
# Watch the rolloutkubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph versionskubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph status
# Update the toolbox image after the Ceph upgrade finisheskubectl -n "$NS" set image deploy/rook-ceph-tools \ rook-ceph-tools="$SQUID_IMAGE"kubectl -n "$NS" rollout status deploy/rook-ceph-tools# Final verificationkubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph healthkubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph versionskubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph pg statkubectl -n "$NS" exec deploy/rook-ceph-tools -- ceph osd statRepeat by changing NS to each locally managed cluster: rook, rook-central, rook-east, rook-haosu, rook-pacific, rook-south-east, rook-tide, and rook-ucsd.
rook-system is the shared operator namespace, not a Ceph cluster. Do not patch rook-fullerton; it is an external CephCluster.
Ceph upgrade: Squid to Tentacle
Do this only after all local Ceph clusters are on Squid and the Rook operator has been upgraded to a release that supports Tentacle.
Use the official target-version Rook and Ceph docs and an approved Tentacle image. Avoid Ceph Tentacle 20.2.0; use an approved 20.2.2 or newer production image tag.
Follow the same per-cluster pattern:
- Set
NSto one local Ceph namespace. - Verify
HEALTH_OK,active+cleanPGs, and all OSDsupandin. - Patch only that CephCluster to the approved Tentacle image.
- Watch
ceph statusandceph versions. - Update that namespace’s toolbox image after the Ceph upgrade finishes.
- Repeat for the next local Ceph namespace.
Rook operator upgrade
Follow the official Rook upgrade guide for every minor version you cross. Use files from the exact target release, not master.
For Kubernetes 1.33.8, choose a target Rook release whose official prerequisites list Kubernetes 1.33 as supported. Newer Rook releases may also include extra upgrade steps, such as CSI migration; follow the target-version guide exactly. Rook v1.20 moves CSI management to the ceph-csi-operator, so do not treat it as a simple image-only upgrade.
Before changing the operator, confirm every local Ceph cluster is healthy:
kubectl get cephcluster -A \ -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,EXTERNAL:.spec.external.enable,IMAGE:.spec.cephVersion.image,HEALTH:.status.ceph.health'# Use the approved target Rook version, for example v1.16.9TARGET_ROOK_VERSION=vX.Y.ZROOK_OPERATOR_IMAGE="docker.io/rook/ceph:$TARGET_ROOK_VERSION"
git clone --single-branch --depth=1 --branch "$TARGET_ROOK_VERSION" \ https://github.com/rook/rook.git
cd rook/deploy/examplesApply the target-version CRDs and RBAC before changing the operator image.
- Apply
crds.yamlfromTARGET_ROOK_VERSIONonce. CRDs are cluster-wide. - Use
common.yamlfor the primary Ceph namespace,rook. - Use the target release’s
common-second-cluster.yamlfor each secondary Ceph namespace.
The operator still runs in rook-system. These files give that operator the target-version resources and permissions it needs in the Ceph cluster namespaces.
Do not generate ordinary secondary-cluster RBAC for rook-fullerton; it is an external CephCluster.
The link above is only an example pinned to v1.16.9. In the commands below, use the file from TARGET_ROOK_VERSION.
export ROOK_OPERATOR_NAMESPACE=rook-systemmkdir -p clusters
sed \ -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g" \ -e "s/\(.*\):.*# namespace:cluster/\1: rook # namespace:cluster/g" \ common.yaml > clusters/rook.yaml
for NS in rook-central rook-east rook-haosu rook-pacific rook-south-east rook-tide rook-ucsd; do sed \ -e "s/\(.*\):.*# namespace:operator/\1: $ROOK_OPERATOR_NAMESPACE # namespace:operator/g" \ -e "s/\(.*\):.*# namespace:cluster/\1: $NS # namespace:cluster/g" \ common-second-cluster.yaml > "clusters/$NS.yaml"done
grep "namespace:" clusters/rook.yaml | head -5grep "namespace:" clusters/rook-central.yaml | head -5
kubectl diff -f crds.yaml -f clusters/kubectl apply -f crds.yaml -f clusters/Review the diff before applying. Do not add PodSecurityPolicy (PSP) resources; PSP is removed from modern Kubernetes.
Check for pinned CSI image variables before changing the operator image. kubectl set image only changes the operator container image.
kubectl -n rook-system get deploy rook-ceph-operator \ -o jsonpath='{range .spec.template.spec.containers[0].env[*]}{.name}={.value}{"\n"}{end}' \ | grep -i csiIf any CSI image is pinned to a version that is not compatible with the target Rook release, update the operator Deployment before or immediately after the operator image change.
kubectl -n rook-system set image deploy/rook-ceph-operator \ rook-ceph-operator="$ROOK_OPERATOR_IMAGE"
kubectl -n rook-system rollout status deploy/rook-ceph-operatorCheck operator access
The operator runs in rook-system, but it must manage CephClusters in other namespaces. Check that namespace-only mode is off:
kubectl -n rook-system get deploy rook-ceph-operator \ -o jsonpath='{.spec.template.spec.containers[0].env[?(@.name=="ROOK_CURRENT_NAMESPACE_ONLY")].value}{"\n"}'Expected:
falseCheck that the operator service account can read CephClusters in every local namespace:
for NS in rook rook-central rook-east rook-haosu rook-pacific rook-south-east rook-tide rook-ucsd; do echo -n "$NS: " kubectl auth can-i get cephclusters.ceph.rook.io \ --as=system:serviceaccount:rook-system:rook-ceph-system \ -n "$NS"doneEvery result should be yes. If any result is no, regenerate and apply that namespace’s target-version common-second-cluster.yaml. Do not give cluster-admin to the operator, fix the missing namespace RBAC instead.
kubectl get cephcluster -A \ -o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,EXTERNAL:.spec.external.enable,IMAGE:.spec.cephVersion.image,HEALTH:.status.ceph.health'After the operator starts, it reconciles CephClusters one at a time. If one namespace is stuck, the others can wait behind it. Watch the operator logs if any cluster does not return to HEALTH_OK:
kubectl -n rook-system logs deploy/rook-ceph-operator -f | grep -E "ERROR|WARN|reconcile"