From Power-On to Proof: What It Actually Takes to Build an OS from Scratch

Building an OS is not the same as configuring one. At the kernel level, you start with a processor and a hardware manual, with no runtime, no library, no framework beneath you. This is what that work actually involves.

Starting at zero

When a processor is powered on, it begins executing instructions from a fixed memory address called the reset vector. What happens at that address is entirely determined by whoever wrote the code there. On a system with an existing OS, a bootloader handles early startup and loads the kernel. When you are building the OS from scratch, you are also writing the bootloader. You are writing the code that must run before any other code can run.

This means there is no standard library. No runtime. No memory allocator, because the allocator is one of the things you are building. No debugging output, because the serial console driver has not been written yet. When something fails in this stage, you find out about it through a JTAG hardware debugger attached directly to the processor's debug port, reading register states in hexadecimal.

This is the daily engineering reality of bare-metal OS development. It is methodical, demanding work, and it is where RedKill's development begins.

The architectural choice: why microkernel

The most consequential design decision in OS development is the kernel architecture. A monolithic kernel runs core OS services in the most privileged CPU execution mode. This includes device drivers, file systems, network stacks, and memory management, all sharing the same privilege level and memory space. It is a proven design with strong performance characteristics, and it is the approach taken by the Linux kernel.

A microkernel keeps only the minimum in privileged mode: address space management, thread scheduling, and inter-process communication. Everything else runs in isolated user-space processes with no shared memory and no direct access to hardware. If a device driver has a bug, it cannot corrupt the kernel, because it has no write access to kernel memory. A compromised component cannot escalate its privileges.

The tradeoff is that all communication between components goes through the kernel's IPC mechanism, which must be designed for minimal latency. Modern microkernel designs have brought IPC latency to the range of hundreds of nanoseconds, which is fast enough for real-time applications.

The reason this architecture matters for safety-critical systems is verification. The trusted computing base of a microkernel can be small enough to formally verify completely. The seL4 microkernel, currently the most rigorously verified OS kernel in existence, has approximately 8,700 lines of C code in its trusted computing base. Formal verification of a monolithic kernel at the scale of Linux is not practically achievable with current tools. For DO-178C Level A or ISO 26262 ASIL-D certification, a microkernel architecture is the only realistic path.

Source: seL4.systems —

Source: Klein et al., SOSP 2009 —

Hardware bring-up: the first real milestone

Once the architecture is decided, the first major engineering challenge is hardware bring-up: making the kernel run on actual silicon. This requires reading the processor's architecture reference manual at a register level, not at a documentation overview level. For ARM64, this means the ARM Architecture Reference Manual. For RISC-V, the privilege architecture specification. For the AJIT processor, the SPARC V8 ISA documentation & the processor technical reference manual produced by the IIT Bombay team that designed it.

Symmetric multi-processing, SMP, is the milestone that validates the kernel's ability to manage multiple CPU cores correctly. When multiple cores are active, they each have their own cache but share physical memory. Ensuring that a write on one core is visible to another in the correct order requires explicit cache coherence management, memory barrier instructions, and careful handling of atomic operations. Bring-up failures in SMP typically produce intermittent bugs that only appear under specific multi-core access patterns and are almost impossible to reproduce or diagnose without hardware trace tools.

RedKill has completed SMP validation on both the AJIT processor and standard ARM64 cores. The AJIT processor is India's indigenous microprocessor, designed at IIT Bombay and fabricated at the government-owned Semiconductor Laboratory in Chandigarh. Running an indigenously designed RTOS on an indigenously designed processor is the starting point of a genuinely sovereign computing stack.

Source: IIT Bombay, AJIT Processor —

Source: RedKill OS product page —

Real-time scheduling: correctness is a requirement

In a general-purpose OS, the scheduler determines which process runs next and optimises for throughput and responsiveness. In a real-time OS, the scheduler has a harder job: it must guarantee that every task completes before its deadline, under the worst-case conditions, provably. Missing a deadline in a safety-critical system is not a performance issue. It is a functional failure.

Real-time scheduling theory provides formal tools for this. Rate-Monotonic Scheduling and Earliest Deadline First are two well-studied algorithms with mathematical schedulability conditions, meaning you can formally prove whether a given set of tasks will always meet their deadlines. But implementing these correctly in hardware requires careful handling of priority inversion.

Priority inversion is the condition where a high-priority task is blocked, waiting for a resource held by a lower-priority task, which is itself preempted by a medium-priority task. The result is that the high-priority task effectively waits behind the medium-priority one, inverting the intended order. This was the cause of unexpected resets in the Mars Pathfinder mission in 1997, and it is a well-documented failure mode in real-time systems that must be addressed at the kernel architecture level through protocols like Priority Ceiling or Priority Inheritance.

Source: Mars Pathfinder priority inversion incident — widely documented in real-time systems literature; original account by Mike Jones, Microsoft Research

Formal verification: from testing to proof

Testing can show that a system works correctly in the cases you tested. Formal verification can prove that a system works correctly in every possible case, across every possible input and execution sequence. For safety-critical software, this distinction is significant. A formally verified kernel cannot have the class of bugs that formal verification rules out. A tested kernel can still have bugs in untested paths.

The seL4 microkernel project at NICTA, which began in 2006 and published its first machine-checked proof of functional correctness in 2009, used the Isabelle/HOL theorem prover to construct a proof that connects a high-level abstract specification to the actual C implementation. The proof was described as the only example of formal proof of this scale maintained continuously as the system evolved over nearly a decade. By the time comprehensive verification was published in 2014, the project had been running for eight years.

For RedKill, formal verification is the path to DO-178C Level A certification, the highest software assurance level in aviation. At Level A, software must be shown to have no errors that could contribute to catastrophic failure conditions. The microkernel architecture, with its small trusted computing base, is what makes this achievable.

Source: Klein et al., 'Comprehensive Formal Verification of an OS Microkernel,' ACM TOCS 2014 —

Source: seL4.systems, The seL4 Microkernel —

Where RedKill is today

RedKill OS is India's first safety-critical, verification-ready, open-source RTOS. The kernel is built from scratch, with no inherited dependencies from Linux or any other existing kernel. SMP has been validated on the indigenous AJIT processor and on ARM64. The system is designed for ISO 26262 ASIL-D and DO-178C Level A certification readiness. RedKill is pre-incubated at SINE, IIT Bombay, and is under contract with India's National Supercomputing Mission to deliver customised OS software for advanced hardware platforms.

The work ahead includes formal verification proofs and the development of certification evidence packages. This is a multi-year engineering programme, which is the correct scale for the problem. Formal verification of a kernel is not a sprint. The seL4 team's experience, widely published in peer-reviewed research, shows that doing this well takes years of sustained, rigorous work.

What RedKill represents is the beginning of that work in India, on Indian hardware, by a team with the systems engineering background to see it through.