Scheduling Workloads in Nautilus

Scheduling

In Kubernetes, scheduling refers to the process of assigning pods to nodes in a cluster based on various factors such as resource requirements, node capacity, and other constraints. The Kubernetes scheduler is responsible for determining where and how to run pods within the cluster.

❗ While you can run jobs without any special node selectors, understanding this section will allow you to better optimize the placement of your workloads and significantly increase computational performance. You can request more performant CPUs, GPUs with more memory, faster network links, and even select nodes in a specific geographical region to optimize latency to your storage.

Prerequisites

This section builds on skills from both the Quickstart and the tutorial on Basic Kubernetes.

Learning Objectives

You will learn how to query the cluster to view high-level node availability in real time.
You will understand how node capabilities (e.g. GPU type, network speed, geography) are exposed through labels.
You will learn how to target specific node features without requiring direct access to individual nodes.
You will be able to enforce or prefer node types and resource requirements within a pod yaml file.

Explore the system

Let’s start by looking at what’s available in the system. You have already seen the list of all nodes:

kubectl get nodes

This is a very long list — and growing. While you can see basic node information, Nautilus intentionally limits direct access to detailed node configuration.

⚠️ Note on permissions
Nautilus users have list-only access to nodes.
✅ kubectl get nodes
❌ kubectl get node
❌ kubectl describe node
❌ kubectl get nodes -o yaml
All user-relevant scheduling information is exposed through node labels, which are safe to query and can be used directly in pod scheduling.

Viewing node capabilities with labels

For example, you can see which nodes provide which GPU types:

kubectl get nodes -L nvidia.com/gpu.product

This shows a cluster-wide view of GPU availability without inspecting any individual node.

You can also view other commonly useful labels, such as network speed:

kubectl get node -l 'nautilus.io/network=100000'

Or combine selectors and label output:

kubectl get node -l 'nvidia.com/gpu.product!=NVIDIA-GeForce-GTX-1080' -L nvidia.com/gpu.product

❗ Many of these queries are also available through the Nautilus portal. You can visit the Resources page to view a live table of nodes and their features.

Validating requirements

Before adding scheduling constraints to your pod yaml, it’s good practice to verify that the requested resources exist somewhere in the cluster.

For example, to check whether a specific GPU type is available:

kubectl get node -l 'nvidia.com/gpu.product=NVIDIA-GeForce-RTX-3090'

❓ Did you get any results?

Here we check for nodes with 100 Gbps networking:

kubectl get node -l 'nautilus.io/network=100000'

❓ Did you get any results?

💡 Even though you cannot inspect nodes directly, the scheduler has full visibility and will match your pod requirements against all eligible nodes automatically.

Requirements in pods

You have already used resource requirements in pods. Here is a simple example:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: mypod
    image: rocker/cuda
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
      requests:
        memory: 100Mi
        cpu: 100m
    command: ["sh", "-c", "sleep infinity"]

The resource requests and limits are intentionally small, making it very likely that the pod will start.

Requesting a GPU

Now let’s add a GPU requirement.

❗ Note: You cannot request a fraction of a GPU. Requests and limits must match.

apiVersion: v1
kind: Pod
metadata:
  name: test-gpupod
spec:
  containers:
  - name: mypod
    image: rocker/cuda
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

Once the pod starts, log into it and verify the GPU:

kubectl exec test-gpupod -it -- /bin/bash

Inside the container, run:

nvidia-smi

❗ Remember to delete old pods when you are done.

Requesting a specific GPU type

To require a specific GPU model, use nodeAffinity:

apiVersion: v1
kind: Pod
metadata:
  name: test-gpupod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: nvidia.com/gpu.product
            operator: In
            values:
            - NVIDIA-GeForce-RTX-3090
  containers:
  - name: mypod
    image: rocker/cuda
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

If the pod starts successfully, confirm the GPU type using nvidia-smi.

Preferences in pods

Sometimes you would prefer a resource but do not want to require it. This can be expressed using preferredDuringSchedulingIgnoredDuringExecution:

apiVersion: v1
kind: Pod
metadata:
  name: test-gpupod
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: nvidia.com/gpu.product
            operator: In
            values:
            - NVIDIA-GeForce-RTX-2080-Ti
            - Tesla-V100-SXM2-32GB
  containers:
  - name: mypod
    image: rocker/cuda
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

Check where the pod landed and which GPU you received.

Using geographical topology

Nautilus nodes are distributed globally. You can select nodes closer to your data or collaborators using topology labels.

View available regions and zones:

kubectl get nodes -L topology.kubernetes.io/zone,topology.kubernetes.io/region

Run a pod in Korea:

apiVersion: v1
kind: Pod
metadata:
  name: test-geo
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - korea
  containers:
  - name: mypod
    image: alpine
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
      requests:
        memory: 100Mi
        cpu: 100m
    command:
    - sh
    - -c
    - |
      apk add curl;
      curl ipinfo.io

Check the logs:

kubectl logs test-geo

Optional: Reserved nodes and taints

Some nodes are restricted and require explicit tolerations. You can view node taints at: https://nrp.ai/viz/resources/.

If a pod cannot be scheduled, inspect events:

kubectl get events --sort-by=.metadata.creationTimestamp

You may see a NoSchedule taint. To tolerate it:

tolerations:
- key: nautilus.io/reservations
  operator: Equals
  value: "name"
  effect: NoSchedule

This work was supported in part by National Science Foundation (NSF) awards CNS-1730158, ACI-1540112, ACI-1541349, OAC-1826967, OAC-2112167, CNS-2100237, CNS-2120019.