Close

Native Language Operating System (NLOS)

shankarshshankar.sh wrote 06/24/2026 at 14:02 • 11 min read • Like

Native Language Operating System (NLOS) A Modular, AI‑Assisted Architecture for Linguistic Purity and Universal Binary Compatibility

R&D Concept Paper — Version 1.0 Author: lazy_dude (Independent Enthusiast) Date: June 24, 2026 Status: Conceptual Architecture, Under Active Development

Abstract

NLOS is a Tamil‑native operating system where the kernel API, binary format, and security policies are expressed entirely in Tamil. It converts any foreign executable into its native format by lifting to LLVM IR, mapping to a booked Tamil function set, and lowering directly, without lossy C++ decompilation. All applications run inside a layered sandbox with zero initial privilege, capability tokens, and mandatory retesting—guilty until proven innocent. The entire system is composed of swappable modules under 1000 lines, designed to be built by a human architect and an AI coding assistant like oh‑my‑pi (omp). This paper defines the complete architecture, justifies all technology choices, and outlines a realistic 10‑month path to a minimal prototype.

1. Introduction

Operating systems have always spoken English at their core. System calls are named open, read, write; binary formats begin with \x7FELF or MZ; error codes are ENOENT and EACCES. Even when user interfaces are translated into hundreds of languages, the machine’s native tongue remains English. This linguistic lock‑in prevents any community from building a computing environment that thinks natively in its own language, and it forces every piece of software to conform to an English‑centric interface.

At the same time, the software world is fractured by platform‑specific executable formats and APIs. A Windows .exe cannot run on Linux, a macOS .app cannot run on Android. Existing solutions—Wine, QEMU, Anbox—either emulate an entire operating system or translate at runtime with imperfect semantics. They never permanently convert the foreign program into a secure, native citizen of the host OS.

The Native Language Operating System (NLOS) solves both problems simultaneously. It defines a kernel Application Binary Interface (ABI) where every identifier, from the lowest system call to the magic number in the executable header, is written in a chosen natural language—Tamil for the reference implementation. It introduces a static translation pipeline that lifts foreign binaries to the LLVM intermediate representation, semantically maps every foreign OS call to a strictly curated set of Tamil functions, and lowers the result directly into a new Universal Binary Representation (UBR). Security is the central mechanism, not an add‑on: every translated application runs inside a multi‑layer sandbox, starts with zero privileges, and must prove its innocence through repeated automated testing before any capability is granted.

Crucially, NLOS is designed for construction by a human‑AI partnership. Every component is broken into modules of under 1000 lines, perfectly sized for generation by a modern AI coding agent like oh‑my‑pi (omp). A single individual, without manual coding, can guide the AI through architectural decisions, verification, and integration. The result is an operating system that is linguistically pure, universally compatible, and built by a new kind of engineer—the architect who wields AI as a construction tool.

2. Core Theme and Design Principles

linguistic sovereignty through functional abstraction>span class="">. By moving the natural language boundary into the kernel ABI itself, and by providing an automatic translation mechanism for all foreign software, NLOS creates a computing environment where the machine genuinely speaks the user’s mother tongue. Five inviolable principles underpin the design:

  1. Linguistic Purity at the Kernel ABI – Every symbol visible at the system‑call boundary is written in the target natural language. The kernel does not translate; it thinks in that language.
  2. Universal Binary Translation via LLVM IR Lowering – Foreign executables are decompiled to LLVM IR, not to high‑level C++, and then directly lowered to the NLOS native format. This avoids the information‑loss problems of decompilation.
  3. Booked/Cooked Function Set & Capability Security – Applications are restricted to a pre‑approved, immutable whitelist of kernel services. Every resource is accessed through capability tokens, enforcing least privilege by construction.
  4. Retest‑Then‑Trust – Converted applications start with zero permissions in a layered sandbox. They must pass multiple deterministic replay tests, comparing their behaviour to the original, before any capability is granted.
  5. Extreme Modularity for AI Generation – Every subsystem is a self‑contained module of under 1000 lines of code, making the entire OS auditable, maintainable, and directly generatable by AI coding assistants.

3. The Tamil‑Native Kernel Interface

NLOS adopts a microkernel architecture. The kernel itself provides only three primitive mechanisms, each named in Tamil:

Tamil IdentifierFunction (English)Mechanism
செயல்முறை_உருவாக்குProcess CreateSpawns a new protection domain
நினைவகம்_கேள்Memory RequestGrants physical/virtual pages
செய்திமாற்றம்_அனுப்புIPC SendSends a typed message to a capability‑controlled endpoint

All higher‑level services—file systems, network stacks, display servers—are implemented as user‑space processes that communicate exclusively through Tamil‑named IPC channels. The kernel can be built on top of a formally verified microkernel like seL4, which provides a trusted base while the Tamil API server in user space enforces language purity.

This minimal interface ensures that the booked/cooked function set remains small and auditable. Any application that attempts to invoke a function not in this list is automatically rejected.

4. Universal Binary Representation (UBR)

Every NLOS‑native executable is stored in the Universal Binary Representation. This format replaces ELF, PE, Mach‑O, and APK after translation.

4.1 UBR Layout

text

+----------+----------+---------+-------------+---------+--------------+
| மூலம்    | பதிப்பு   | கட்டளை  | செயல்முறை   | தரவு    | கையொப்பம்    |
| Magic    | Version  | Code    | Import Table| Data    | Signature    |
| 6 bytes  | 4 bytes  | variable| variable    | variable| 64 bytes     |
+----------+----------+---------+-------------+---------+--------------+

This format makes it impossible for a binary to invoke an unauthorized syscall; the needed functions are declared in the import table, and the kernel simply refuses to resolve any symbol not explicitly granted.

5. The Translation Engine (LLVM IR Direct Lowering)

The translation pipeline converts foreign executables into UBR without emulation or lossy C++ decompilation. It operates in five stages, each a modular component.

5.1 Pipeline Stages

  1. Format Detector (வடிவம்_கண்டறி) – Identifies the source binary (ELF, PE, Mach‑O, APK). If the binary uses dynamic code generation (JIT, self‑modifying code), it is routed to the WebAssembly Dynamic Execution Domain instead of the static pipeline.
  2. Lifter (உயர்த்தி) – Translates the raw machine code into LLVM IR. This stage leverages proven tools (McSema, RetDec, Ghidra’s SLEIGH‑to‑LLVM) to preserve low‑level semantics faithfully without attempting to recover high‑level abstractions.
  3. API Mapper (செயலி_வரைபடம்) – The core transformation. Using an extensible knowledge base of foreign OS semantics, every foreign system call or library call is replaced by a sequence of Tamil NLOS API calls. For example, a Linux write(1, buf, len) becomes a message to the NLOS terminal server via செய்திமாற்றம்_அனுப்பு. The knowledge base starts with a small set of Linux syscalls (the musl‑libc subset) and grows through automatic observation.
  4. UBR Backend (மூல_பின்புலம்) – An LLVM backend pass that lowers the modified IR directly into native machine code wrapped in the UBR format. It constructs the import table from the Tamil API calls present in the IR, ensuring the whitelist is automatically generated and never exceeds the booked/cooked set.
  5. Retest Harness (மீள்சோதனை) – The output UBR binary is executed under deterministic replay in the sandbox. Its system‑call trace is compared against a golden trace captured from the original binary running on its native OS. If divergence is detected, the API Mapper knowledge base is refined and the pipeline re‑runs. Only binary‑identical behaviour passes.

This pipeline avoids the mathematical impossibility of decompiling optimized binaries back to compilable C++ by staying at the LLVM IR level, where information loss is negligible. It follows the proven approach of Apple’s Rosetta 2.

6. Modular Kernel Architecture

NLOS is built as a set of independent, swappable modules, each under 1000 lines of code. The kernel itself (based on seL4) is a microkernel complemented by user‑space servers.

Module (Tamil Name)ResponsibilityLine Limit
தொடங்கி (Boot)Multiboot header, early CPU init200
செயலாக்கி (Scheduler)Priority‑based preemptive scheduling500
நினைவகம் (Memory)Page allocation, virtual memory600
செய்தி_வாயில் (IPC)Capability‑based message passing700
தமிழ்_வழங்கி (API Server)Tamil syscall demux, validation, dispatch800
கோப்பு_சேவையகம் (File Server)Virtual filesystem, block cache900
வலை_சேவையகம் (Network Server)TCP/IP stack, socket emulation900

For high‑performance inter‑module communication, groups of related servers can share a single address space using shared‑memory ring buffers (similar to io_uring), but capability checks are never bypassed.

7. Execution Sandbox Layers

Every application, whether natively compiled or translated, runs inside a four‑layer sandbox. Each layer is a modular component.

text

+-------------------------------------------------+
| Application (UBR binary with capability token)   |
+-------------------------------------------------+
| Layer 3: Permission Gate (உரிம_வாயில்)          |
|   Validates token, grants scoped capabilities    |
+-------------------------------------------------+
| Layer 2: Sandbox (காப்பகம்)                     |
|   Private filesystem, resource quotas, no network|
+-------------------------------------------------+
| Layer 1: Syscall Interceptor (முறை_பிடி)        |
|   Filters every syscall against the import table |
+-------------------------------------------------+
| Layer 0: NLOS Kernel (Tamil API)                 |
+-------------------------------------------------+

Privilege Escalation Protocol:

  1. Initial State: The application has no capabilities beyond CPU and memory allocation.
  2. Retesting: The binary is executed with a recorded input dataset. All syscall traces are logged.
  3. Analysis: The Security & Quirk Mitigation Agent (SQMA) compares the trace against the expected golden trace.
  4. Grant: Upon passing, a signed capability token is issued, granting exactly the Tamil functions observed during the test, with time and resource limits. The token is bound to the binary’s cryptographic hash.

8. WebAssembly Dynamic Execution Domain

Many modern applications rely on just‑in‑time compilation (JavaScript, Java, Python). NLOS does not reject these. Instead, it executes them inside a sandboxed WebAssembly (Wasm) runtime.

>span class="">9. TamilSeed Boot Firmware

A custom boot firmware replaces legacy BIOS/UEFI to enforce linguistic purity from power‑on.

StageTamil NameFunction
0தொடக்கம்Masked ROM verifies cryptographic signature of the next stage
1பாதுகாப்பு_சரிSafety Monitor checks hardware integrity and presence of all cartridges
2மொழி_உறுதிScans all kernel modules for valid Tamil magic bytes and identifier schemas
3கைவழிJumps to the kernel entry symbol தொடங்கு()

The firmware uses measured boot to build a tamper‑evident log, and it refuses to load any binary that does not begin with the UBR Tamil magic.

10. Multi‑Agent Software Architecture

Four dedicated agents, all user‑space processes communicating over Tamil IPC, provide runtime intelligence and security.

Agent (Tamil)English RoleDuty
நிர்வாகிManagerMaintains Ground State of capability grants and binary signatures; coordinates all other agents.
பாதுகாப்புGuardianContinuously monitors IPC traffic for policy violations; applies anonymous vulnerability mitigations; triggers sandbox isolation on anomaly.
வள_ஒதுக்கீடுAllocatorProfiles application behaviour via eBPF; adjusts CPU, memory, and I/O quotas dynamically.
சேமிப்புArchivistRecords complete syscall traces for audit, replay, and forensic analysis; maintains the golden traces for retesting.

These agents are the only processes with elevated capabilities, and they themselves are subject to the same capability discipline.

11. Stability and Security Subsystem

Security is inherent in the architecture, not a bolt‑on.

12. Diagnostic Subsystem

Before any application is given a capability token, the system provides clear status using only Tamil.

13. Practical Feasibility

No component requires science fiction. The entire architecture can be prototyped using existing tools.

SubsystemFeasibilityNotes
Tamil kernel APIBuildable todayGCC/clang support Unicode identifiers; seL4 provides verified base
LLVM IR translation pipelineProven approachRosetta 2; McSema/RetDec; custom LLVM backend
Booked/cooked function setTrivial enforcementseccomp‑bpfilter or custom capability system
Layered sandboxMature container techLinux namespaces or seL4 native domains
WebAssembly domainProduction runtimesWasmtime, Wasmer; standard WASI
Multi‑agent securityIncrementally deployableStandard daemons over IPC
TamilSeed firmwareModifiable open firmwarecoreboot, ARM TF‑A

A minimal single‑application demonstrator (busybox translated to UBR and running under sandbox) can be built by one person using omp within 10 months.

14. Technology Choices & Justification

Technology / ChoiceWhy ChosenCore Duty Fulfilled
Tamil as exclusive ABI languageEliminates English lock‑in; cultural sovereigntyLinguistic purity from boot to app
Microkernel on seL4Formally verified; minimal TCBSecurity foundation
LLVM IR direct loweringAvoids lossy C++ decompilation; proven in Rosetta 2Universal binary translation
Booked/cooked function setNo ambient authority; each action explicitly allowedMaximum security by construction
Capability‑based securityTokens for every resource; no global IDsLeast privilege enforcement
Mandatory retesting cyclesEnsures behavioural equivalence to originalCorrectness guarantee
Wasm dynamic domainEmbraces JIT languages without sacrificing sandbox purityModern application compatibility
Modular <1000 lines per moduleEnables AI generation, auditing, and maintenanceSustainable, democratic development
AI‑assisted construction (omp)Code generation, debugging, and integration at scaleFeasibility for a single architect

15. How NLOS Fills the Identified Gaps

During conceptual development, critical gaps were systematically identified and addressed.

GapNLOS Solution
No OS with a completely non‑English kernel ABITamil‑only API, boot firmware, and binary format
No universal binary execution without runtime emulationStatic translation via LLVM IR → UBR pipeline
No inherent application security in translationBooked/cooked set, layered sandbox, retest‑then‑trust
No defence against translated malicious binariesOne‑request‑one‑function, capability tokens, time‑limited tokens
Lossy decompilation makes reliable translation impossibleLLVM IR direct lowering, no C++ emitter
Dynamic code (JIT, browsers) cannot run safelyWasm execution domain with Tamil‑only WASI
No language‑pure security modelAll agents and logs use Tamil identifiers
No modular, AI‑buildable OS designEvery component is a <1000‑line module
OS development requires deep coding expertiseHuman is architect; AI (omp) writes the code

16. Core Concept Implementation Checklist

17. Building NLOS with oh‑my‑pi (omp) – 10‑Month Plan Outline

NLOS is explicitly designed to be built by a human architect partnered with an AI coding agent. oh‑my‑pi (omp) — with its hash‑anchored edits, in‑process shell, LSP navigation, persistent Python/JS evaluation, subagent parallelism, and browser‑powered research — is the ideal tool.

How a Single Person + omp Builds NLOS (No Manual Coding Required):

10‑Month Roadmap (1 architect + omp):

PhaseWeeksGoalomp Tools Used
01–2Set up cross‑compiler, Tamil namespace, seL4 build environmentwrite, bash, eval
13–6seL4 Tamil API server (3 primitives) running in QEMUread, write, lsp, bash
27–9UBR format specification, loader, and basic file serverwrite, eval, bash
310–14Translation pipeline: ELF → LLVM IR → Tamil API mappingweb_search, browser, write, task
415–18PIVOT: LLVM IR → UBR direct lowering (LLVM backend)write, edit, bash, debug
519–22Security layers: syscall interceptor, sandbox, capability gatewrite, task, eval
623–26Four‑agent security network (Manager, Guardian, Allocator, Archivist)write, task, irc
727–30WebAssembly dynamic execution domain (Wasmtime integration)write, bash, edit
831–34TamilSeed boot firmware (ARM/RISC‑V)write, bash, debug
935–40Integration: busybox → UBR → secure execution under full sandboxALL TOOLS

At the end of Phase 9, the prototype will boot in QEMU, accept a translated Linux busybox, run it in the Tamil sandbox, and demonstrate the complete pipeline. The full production system is a 2–3‑year effort, but the core vision is proven in 10 months with one person and omp.

18. Conclusion and Next Steps

NLOS v1.0 presents a complete, gap‑free architecture for a language‑native operating system that translates all foreign software into a secure, native format. The kernel speaks Tamil, the binary format carries Tamil magic, and the security model enforces linguistic purity at the syscall level. The architecture corrects the critical flaw of prior designs by using LLVM IR direct lowering instead of impossible C++ decompilation. It embraces modern dynamic software through a Wasm sandbox, and it is built from the ground up as a collection of AI‑generatable modules.

Next steps:

All work is dedicated to the public domain (CC0).

19. Thesis Source

Not based on a formal academic thesis. This is independent R&D work.

Public Domain Dedication

The author dedicates this work to the public domain under Creative Commons Zero (CC0). You are free to copy, modify, distribute, and perform the work, even for commercial purposes, without asking permission.

Like

Discussions