Ptah: Orchestrating Secure Edge-AI & Post-Quantum Crypto

Project Description

The Ptah project is an innovative curriculum and reference implementation designed for the emerging field of secure edge-AI in space and terrestrial applications. Combining hardware diversity—RISC-V accelerators, Raspberry Pi clusters, and NVIDIA edge GPUs—with state-of-the-art cryptography (post-quantum lattice schemes and lightweight AEAD), Ptah demonstrates how to architect resilient, future-proof systems under stringent power, weight, and environmental constraints. Participants will learn to deploy containerized microservices across heterogeneous clusters, orchestrate workloads with K3s, instrument telemetry pipelines with PQC signatures, and perform real-time monitoring using Prometheus and Grafana. Over a 15-week course, students engage in hands-on labs, benchmarking, and system integration, culminating in a comprehensive final quiz covering cryptography, orchestration, hardware design, and performance evaluation.

🔐 Post-Quantum Cryptography (PQC)

Imagine a future where quantum computers render today’s encryption obsolete in minutes. To safeguard critical spacecraft and edge-computing nodes against that threat, we turn to Post-Quantum Cryptography (PQC). Algorithms like CRYSTALS-Dilithium and CRYSTALS-Kyber are built on mathematically rigorous lattice problems—challenges so complex that even a million-qubit quantum computer would take centuries to solve them.

In the harsh environment of space, where remote satellites and deep-space probes cannot be patched on the fly, PQC ensures that firmware updates remain authentic and unforgeable for decades. On terrestrial edge systems—drones, unmanned rovers, and IoT sensors—“harvest-now, decrypt-later” attacks become futile because every telemetry packet, command stream, and key exchange is secured against future quantum decryption.

Why Lattices? Lattice-based schemes provide compact keys and fast operations without sacrificing security. - Dilithium delivers robust digital signatures, so every software bundle, sensor reading, or inter-device handshake bears an unbreakable quantum-resistant stamp. - Kyber enables ultra-secure key-exchange, allowing ground stations to establish shared secrets with spacecraft or edge nodes in a way that remains confidential even under quantum attack.

By integrating PQC into our Ptah framework, we not only future-proof critical systems but do so with performance tuned for power-and-weight-constrained platforms. The result is a security foundation that remains unshakable in the quantum era—because in space, tomorrow’s threats demand today’s unbreakable cryptography.

🔒 Lightweight Cryptography

While traditional ciphers like AES excel in data centers, they’re too heavy for tiny, battery-powered edge nodes. Lightweight cryptography fills that gap by delivering strong security with minimal footprint—CPU cycles, RAM, and power.

AEAD vs. Block Cipher Comparison

Feature	AEAD (e.g., Ascon)	Block Cipher + MAC (e.g., AES-GCM)
Encryption + Authentication	Single pass (atomic)	Two steps (encrypt, then tag)
Code Size	≈ 2 – 5 kB	≈ 10 – 20 kB
RAM Usage	≈ 200 – 500 bytes	≈ 1 – 2 kB
Throughput (cycles/byte)	2 – 5	10 – 15
Security Goal	Confidentiality & Authenticity	Confidentiality & Authenticity

ASCON Internals

Property	Value
Permutation Size	320 bits (5 × 64-bit lanes)
Rate	64 bits / 8 bytes per absorption/squeeze
Initialization Rounds	12
Intermediate Rounds	6
Finalization Rounds	12
Key Size	128 bits (optional 256 bits)
Nonce Size	128 bits
Tag Size	128 bits
Performance (Cortex-M4)	≈ 1 MB/s

ASCON’s design is built around a sponge construction, where data and keys are absorbed into an internal state that is repeatedly permuted. This single-pass approach (absorb-permute-squeeze) gives both encryption and authentication in one go, cutting code size and RAM needs by up to 50% compared with AES-GCM on the same hardware.

Security Strength vs. Block Ciphers

Security Aspect	ASCON (128-bit key)	AES-128 (GCM)
Bit-security	≥ 128 bits	128 bits
Integrity Bound	2⁶⁴ forgery bound	2⁶⁴ forgery bound
Side-Channel Resistance	Simple permutation – easier to mask	Complex S-boxes – harder to mask

By choosing ASCON for Ptah’s edge modules, we ensure each micro-controller—or even a small FPGA slice—can authenticate and encrypt telemetry with minimal overhead, leaving headroom for sensor processing and control loops.

⚙️ Orchestration Frameworks

Managing a distributed Edge-AI/PQC cluster requires a lightweight yet powerful orchestrator. Below we compare three leading container orchestration platforms on footprint, feature set, and resource utilization—then dive deeper into how GPU scheduling and CPU allocation work in K3s for drones and UGVs.

Cluster Topology Diagram

Feature & Footprint Comparison

Framework	Binary Size	Memory Overhead¹	Supported APIs	Ideal Use Case
Docker Swarm	~200 MB	~150 MB	Core Swarm, Stacks	Simple clusters & rapid prototyping
K3s	~50 MB	~70 MB	Kubernetes v1.x (core)	Edge/IoT & power-constrained nodes
Kubernetes	~1 GB+	~1 GB+	Full k8s API	Enterprise datacenters

¹ Memory measured as RSS of control-plane components on a baseline Pi 4.

CPU & GPU Resource Allocation

In K3s, you can label nodes with cpu and gpu capacity, then request them in your Pod specs. Below is an example of how a PQC service and an AI inference service would request resources:

# PQC signature service (runs on any CPU node)
resources:
  requests:
    cpu: "0.5"
    memory: "256Mi"
  limits:
    cpu: "1"
    memory: "512Mi"

# AI inference service (runs on GPU-enabled node)
resources:
  limits:
    nvidia.com/gpu: 1
    memory: "1Gi"

Performance Estimates

Node Type	CPU Cores	Clock (GHz)	GPU Cores	Approx. Throughput
Raspberry Pi CM4	4	1.50	–	~200 Dilithium ops/sec
TRK1 (Rockchip RK3588)	8	2.40	–	~1 200 Dilithium ops/sec
Jetson Nano	4	1.43	128 (Maxwell)	• GPU: ~500 ASCON ops/sec • CPU: ~400 Dilithium ops/sec
Jetson Orin NX	6	2.20	1024 (Ampere)	• GPU: ~5 000 ASCON ops/sec • CPU: ~800 Dilithium ops/sec

Which to Choose?

Drone Swarms (ClusterHat): K3s on Pi Zero W can run ultra-light pods (cpu: "0.1") with Ascon AEAD for telemetry, preserving battery life.
UGV / Rover Platforms (TuringPi + Orin NX): K3s with GPU scheduling enables offloading neural nets to Orin NX while still hosting PQC services on CM4 nodes.
Hybrid Deployments: Leverage nodeSelector and affinity to ensure heavy workloads land on TRK1/Orin, and lightweight tasks run on Pi-class nodes.

By using K3s with fine-grained resource requests and node labels, you can orchestrate a heterogeneous cluster that maximizes both performance and power-efficiency—crucial attributes for computer architects designing next-generation edge-AI & space systems.

Hardware Architectures

TuringPi cluster of CM4/Jetson modules with built-in switch.
TRK1 (Rockchip RK3588 RISC-V) NPU-accelerated compute.
Pi CM4 nodes for general compute.
Jetson Nano/Orin NX for GPU-accelerated AI inference.
ClusterHat 2.5 to simulate drone swarms on Pi Zero W nodes.

🛰️ Telemetry & GPS Integration

Robust, low-latency telemetry and precise positioning are critical for autonomous drones, rovers, and space systems. In Ptah, each node—whether a Pi Zero W, CM4, TRK1, or Orin NX—connects to a GNSS receiver (GPS+GLONASS+Beidou) via UART or USB. A dedicated telemetry pod under K3s executes this pipeline:

Acquisition: Multi-constellation fixes at 1–10 Hz (HDOP ≤ 3).
Parsing & Filtering: Normalize NMEA sentences, drop low-accuracy fixes, correct drift.
Cryptographic Protection:
- Dilithium signature (~2 ms on CM4, < 0.5 ms on Orin NX)
- Kyber KEM encapsulation (~1 ms on CM4)
Publication & QoS:
- MQTT (QoS 2) for exactly-once delivery
- HTTP/2 + TLS-PQC for sub-10 ms end-to-end latency
Self-Healing: K3s probes restart any failed pod within seconds.

Performance & Accuracy Metrics

Metric	Pi Zero W	Compute Module 4	TRK1 / Orin NX
GNSS Fix Rate (Hz)	1	5	10
Dilithium Sign Latency (ms)	8.0	2.1	<0.5
Kyber KEM Latency (ms)	6.3	1.2	0.3
End-to-End Delay (ms)	20.5	8.4	3.2

Deployment Profiles

Drone Swarms (ClusterHAT): Pi Zero pods at < 10 % CPU, < 50 MiB RAM for secure telemetry.
UGVs & Rovers (TuringPi + CM4): Dual-pod setups at 5–10 Hz for redundant, signed location streams.
Space-Grade Emulation: Orin NX pods with Kalman filtering and PQC signing for deep-space comms.

📦 Pods & Container Deployment

In Ptah, every core function—post-quantum signing/encryption, telemetry acquisition, and monitoring—is packaged as a self-contained Docker image and deployed as a pod under K3s. This approach yields:

Scalability: Define replicaCount in your Helm chart to scale a POD from 1 to N instances (e.g., running multiple Dilithium signers in parallel).
Resilience & Self-Healing: Liveness and readiness probes restart crashed containers automatically. For example, if a telemetry pod loses its GPS connection, K3s will recreate it within seconds.
Resource-Aware Scheduling:
- Use resources.requests and resources.limits to reserve CPU/RAM exactly—for example, 0.5 CPU and 256Mi for a PQC service on CM4.
- Leverage nodeSelector or affinity rules to pin GPU-intensive pods to Jetson Orin NX (requesting nvidia.com/gpu: 1), while lightweight ASCON pods run on Pi Zero nodes.
Sidecar & Init Containers:
- An init container can wait for hardware readiness (e.g., ensure the GPS serial port is available before starting the telemetry app).
- A sidecar can run a small heartbeat exporter, feeding health metrics to Prometheus without modifying the main application.
Rolling Updates & Canary Deployments:
- Set strategy.type: RollingUpdate so PQC libraries can be patched without downtime—K3s will bring up new pods with the updated container image and gracefully retire old ones.
- Use maxSurge and maxUnavailable to control the pace of updates, crucial when running on mission-critical UGVs or drone networks.

Example Pod Spec

apiVersion: v1
kind: Pod
metadata:
  name: pqc-signer
  labels:
    app: pqc
spec:
  initContainers:
  - name: wait-for-gps
    image: busybox
    command: ["sh", "-c", "until test -e /dev/ttyUSB0; do sleep 1; done"]
    volumeMounts:
      - mountPath: /dev/ttyUSB0
        name: gps-device
  containers:
  - name: signer
    image: rscl/pqc-signer:latest
    resources:
      requests:
        cpu: "0.5"
        memory: "256Mi"
      limits:
        cpu: "1"
        memory: "512Mi"
    volumeMounts:
      - mountPath: /dev/ttyUSB0
        name: gps-device
    livenessProbe:
      exec:
        command: ["pgrep", "signer"]
      initialDelaySeconds: 10
      periodSeconds: 30
  volumes:
    - name: gps-device
      hostPath:
        path: /dev/ttyUSB0
  nodeSelector:
    kubernetes.io/hostname: cm4-node-01

This spec ensures the signer pod only runs on a CM4 node, waits for its GPS device, reserves half a CPU core, and restarts if the process dies—demonstrating the full power of K3s pod orchestration in Ptah’s heterogeneous cluster.

📈 Performance Monitoring

To maintain operational excellence across a heterogeneous Ptah cluster, we employ a best-in-class monitoring stack:

Metrics Collection (Prometheus): • Node Exporter on each Linux node (CM4, TRK1, Jetsons, Pi Zeros) scrapes CPU, memory, filesystem, and temperature. • cAdvisor or kubelet metrics expose container-level stats: CPU throttling, memory usage, network I/O. • Custom PQC Exporter in each crypto pod emits counters (signatures/sec, KEM ops/sec) and histograms (latency distribution).
Storage & Retention: • Prometheus TSDB stores high-resolution (1s scrape) data for 24 h, then down-samples to 1 min resolution for 30 days. • Remote write to long-term storage (e.g., Thanos or Cortex) for 1 year of historical analysis.
Visualization (Grafana): • Dashboards for each hardware class: – CPU & Memory Utilization vs. Crypto Throughput (ops/sec) – Network Bandwidth & Packet Loss for telemetry streams – GPU Utilization and Temperature on Jetson modules • Alert rules: – CPU >90 % for >1 min triggers High-Load alert – Signature latency >5 ms on CM4 triggers Performance-degradation alert – Missing telemetry heartbeat (>3 scrapes) triggers Pod-restart action

Sample PromQL Queries:

# CPU usage on CM4 nodes
avg(rate(node_cpu_seconds_total{instance=~"cm4-.*",mode!="idle"}[1m])) by (instance)

# PQC ops per second
rate(pqc_signatures_total[30s])

# Telemetry packet latency
histogram_quantile(0.95, rate(telemetry_latency_seconds_bucket[5m]))

Scalability & Federation: • Shard scraping across multiple Prometheus replicas for large swarms (>100 nodes). • Use Prometheus Federation to centralize critical metrics (e.g., overall cluster health) while preserving local dashboards.

This comprehensive monitoring framework not only provides real-time visibility into resource usage and cryptographic performance but also enables automated alerting and long-term trend analysis—ensuring that Ptah deployments remain robust, performant, and mission-ready.

🎥 Video Resources

For deeper insights and demonstrations, explore our curated video playlist. Each link includes an overview of Ptah concepts, hands-on labs, and system walkthroughs:

Ptah Framework Overview: A comprehensive introduction to Secure Edge-AI & Quantum-Proof Crypto.
https://www.youtube.com/watch?v=131QKl_34bk
Post-Quantum Crypto Lab: Live demonstration of Dilithium & Kyber implementations.
https://www.youtube.com/watch?v=8pVTYIt6LCA
Lightweight Cryptography Deep Dive: ASCON sponge and AEAD usage on microcontrollers.
https://www.youtube.com/watch?v=0ujsGjr43ig&feature=youtu.be
Distributed Orchestration Demo: K3s deployment across heterogeneous nodes.
https://www.youtube.com/watch?v=QuFMe6n7Z-Y

15-Week Course Flow

Weeks 1–3 Project intro, PQC & edge fundamentals, hardware setup.
Weeks 4–6 Orchestration evaluation & K3s rollout.
Weeks 7–9 PQC services (Dilithium & Kyber) containerization & benchmarking.
Weeks 10–11 ASCON implementation, GPS telemetry pipeline, Helm charts.
Weeks 12–13 Monitoring stack (Prometheus/Grafana), metrics dashboards.
Week 14 Integration tests, drone/vehicle demos, optimization.
Week 15 Final report, live demo, quizzes on key topics.

Final Quiz

When you’re ready, dive into the comprehensive 40-question quiz covering every module. You’ll get instant feedback on each answer—all on one page.

Take the Final Quiz

🤝 Acknowledgments

NASA MINDS

U.S. Navy

AMD/Xilinx

NVIDIA

AFRL

RSCL

Special thanks to all our partners for hardware, funding, and expertise that made this course possible.

Ptah: Secure Edge-AI & Post-Quantum Crypto in Space Systems