Scheduling Workloads in Nautilus
Scheduling
In Kubernetes, scheduling refers to the process of assigning pods to nodes in a cluster based on various factors such as resource requirements, node capacity, and other constraints. The Kubernetes scheduler is responsible for determining where and how to run pods within the cluster.
❗ While you can run jobs without any special node selectors, understanding this section will allow you to better optimize the placement of your workloads and significantly increase computational performance. You can request more performant CPUs, GPUs with more memory, faster network links, and even select nodes in a specific geographical region to optimize latency to your storage.
Prerequisites
This section builds on skills from both the Quickstart and the tutorial on Basic Kubernetes.
Learning Objectives
- You will learn how to query the cluster to view high-level node availability in real time.
- You will understand how node capabilities (e.g. GPU type, network speed, geography) are exposed through labels.
- You will learn how to target specific node features without requiring direct access to individual nodes.
- You will be able to enforce or prefer node types and resource requirements within a pod
yamlfile.
Explore the system
Let’s start by looking at what’s available in the system. You have already seen the list of all nodes:
kubectl get nodesThis is a very long list — and growing. While you can see basic node information, Nautilus intentionally limits direct access to detailed node configuration.
⚠️ Note on permissions
Nautilus users have list-only access to nodes.
- ✅ kubectl get nodes
- ❌ kubectl get node
- ❌ kubectl describe node
- ❌ kubectl get nodes -o yaml
All user-relevant scheduling information is exposed through node labels, which are safe to query and can be used directly in pod scheduling.
Viewing node capabilities with labels
For example, you can see which nodes provide which GPU types:
kubectl get nodes -L nvidia.com/gpu.productThis shows a cluster-wide view of GPU availability without inspecting any individual node.
You can also view other commonly useful labels, such as network speed:
kubectl get node -l 'nautilus.io/network=100000'Or combine selectors and label output:
kubectl get node -l 'nvidia.com/gpu.product!=NVIDIA-GeForce-GTX-1080' -L nvidia.com/gpu.product❗ Many of these queries are also available through the Nautilus portal. You can visit the Resources page to view a live table of nodes and their features.
Validating requirements
Before adding scheduling constraints to your pod yaml, it’s good practice to verify that the requested resources exist somewhere in the cluster.
For example, to check whether a specific GPU type is available:
kubectl get node -l 'nvidia.com/gpu.product=NVIDIA-GeForce-RTX-3090'❓ Did you get any results?
Here we check for nodes with 100 Gbps networking:
kubectl get node -l 'nautilus.io/network=100000'❓ Did you get any results?
💡 Even though you cannot inspect nodes directly, the scheduler has full visibility and will match your pod requirements against all eligible nodes automatically.
Requirements in pods
You have already used resource requirements in pods. Here is a simple example:
apiVersion: v1kind: Podmetadata: name: test-podspec: containers: - name: mypod image: rocker/cuda resources: limits: memory: 100Mi cpu: 100m requests: memory: 100Mi cpu: 100m command: ["sh", "-c", "sleep infinity"]The resource requests and limits are intentionally small, making it very likely that the pod will start.
Requesting a GPU
Now let’s add a GPU requirement.
❗ Note: You cannot request a fraction of a GPU. Requests and limits must match.
apiVersion: v1kind: Podmetadata: name: test-gpupodspec: containers: - name: mypod image: rocker/cuda resources: limits: memory: 100Mi cpu: 100m nvidia.com/gpu: 1 requests: memory: 100Mi cpu: 100m nvidia.com/gpu: 1 command: ["sh", "-c", "sleep infinity"]Once the pod starts, log into it and verify the GPU:
kubectl exec test-gpupod -it -- /bin/bashInside the container, run:
nvidia-smi❗ Remember to delete old pods when you are done.
Requesting a specific GPU type
To require a specific GPU model, use nodeAffinity:
apiVersion: v1kind: Podmetadata: name: test-gpupodspec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nvidia.com/gpu.product operator: In values: - NVIDIA-GeForce-RTX-3090 containers: - name: mypod image: rocker/cuda resources: limits: memory: 100Mi cpu: 100m nvidia.com/gpu: 1 requests: memory: 100Mi cpu: 100m nvidia.com/gpu: 1 command: ["sh", "-c", "sleep infinity"]If the pod starts successfully, confirm the GPU type using nvidia-smi.
Preferences in pods
Sometimes you would prefer a resource but do not want to require it. This can be expressed using preferredDuringSchedulingIgnoredDuringExecution:
apiVersion: v1kind: Podmetadata: name: test-gpupodspec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: nvidia.com/gpu.product operator: In values: - NVIDIA-GeForce-RTX-2080-Ti - Tesla-V100-SXM2-32GB containers: - name: mypod image: rocker/cuda resources: limits: memory: 100Mi cpu: 100m nvidia.com/gpu: 1 requests: memory: 100Mi cpu: 100m nvidia.com/gpu: 1 command: ["sh", "-c", "sleep infinity"]Check where the pod landed and which GPU you received.
Using geographical topology
Nautilus nodes are distributed globally. You can select nodes closer to your data or collaborators using topology labels.
View available regions and zones:
kubectl get nodes -L topology.kubernetes.io/zone,topology.kubernetes.io/regionRun a pod in Korea:
apiVersion: v1kind: Podmetadata: name: test-geospec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - korea containers: - name: mypod image: alpine resources: limits: memory: 100Mi cpu: 100m requests: memory: 100Mi cpu: 100m command: - sh - -c - | apk add curl; curl ipinfo.ioCheck the logs:
kubectl logs test-geoOptional: Reserved nodes and taints
Some nodes are restricted and require explicit tolerations. You can view node taints at: https://nrp.ai/viz/resources/.
If a pod cannot be scheduled, inspect events:
kubectl get events --sort-by=.metadata.creationTimestampYou may see a NoSchedule taint. To tolerate it:
tolerations:- key: nautilus.io/reservations operator: Equals value: "name" effect: NoSchedule