Native Language Operating System (NLOS) A Modular, AI‑Assisted Architecture for Linguistic Purity and Universal Binary Compatibility
R&D Concept Paper — Version 1.0 Author: lazy_dude (Independent Enthusiast) Date: June 24, 2026 Status: Conceptual Architecture, Under Active Development
Abstract
NLOS is a Tamil‑native operating system where the kernel API, binary format, and security policies are expressed entirely in Tamil. It converts any foreign executable into its native format by lifting to LLVM IR, mapping to a booked Tamil function set, and lowering directly, without lossy C++ decompilation. All applications run inside a layered sandbox with zero initial privilege, capability tokens, and mandatory retesting—guilty until proven innocent. The entire system is composed of swappable modules under 1000 lines, designed to be built by a human architect and an AI coding assistant like oh‑my‑pi (omp). This paper defines the complete architecture, justifies all technology choices, and outlines a realistic 10‑month path to a minimal prototype.
1. Introduction
Operating systems have always spoken English at their core. System calls are named open, read, write; binary formats begin with \x7FELF or MZ; error codes are ENOENT and EACCES. Even when user interfaces are translated into hundreds of languages, the machine’s native tongue remains English. This linguistic lock‑in prevents any community from building a computing environment that thinks natively in its own language, and it forces every piece of software to conform to an English‑centric interface.
At the same time, the software world is fractured by platform‑specific executable formats and APIs. A Windows .exe cannot run on Linux, a macOS .app cannot run on Android. Existing solutions—Wine, QEMU, Anbox—either emulate an entire operating system or translate at runtime with imperfect semantics. They never permanently convert the foreign program into a secure, native citizen of the host OS.
The Native Language Operating System (NLOS) solves both problems simultaneously. It defines a kernel Application Binary Interface (ABI) where every identifier, from the lowest system call to the magic number in the executable header, is written in a chosen natural language—Tamil for the reference implementation. It introduces a static translation pipeline that lifts foreign binaries to the LLVM intermediate representation, semantically maps every foreign OS call to a strictly curated set of Tamil functions, and lowers the result directly into a new Universal Binary Representation (UBR). Security is the central mechanism, not an add‑on: every translated application runs inside a multi‑layer sandbox, starts with zero privileges, and must prove its innocence through repeated automated testing before any capability is granted.
Crucially, NLOS is designed for construction by a human‑AI partnership. Every component is broken into modules of under 1000 lines, perfectly sized for generation by a modern AI coding agent like oh‑my‑pi (omp). A single individual, without manual coding, can guide the AI through architectural decisions, verification, and integration. The result is an operating system that is linguistically pure, universally compatible, and built by a new kind of engineer—the architect who wields AI as a construction tool.
2. Core Theme and Design Principles
linguistic sovereignty through functional abstraction>span class="">. By moving the natural language boundary into the kernel ABI itself, and by providing an automatic translation mechanism for all foreign software, NLOS creates a computing environment where the machine genuinely speaks the user’s mother tongue. Five inviolable principles underpin the design:
- Linguistic Purity at the Kernel ABI – Every symbol visible at the system‑call boundary is written in the target natural language. The kernel does not translate; it thinks in that language.
- Universal Binary Translation via LLVM IR Lowering – Foreign executables are decompiled to LLVM IR, not to high‑level C++, and then directly lowered to the NLOS native format. This avoids the information‑loss problems of decompilation.
- Booked/Cooked Function Set & Capability Security – Applications are restricted to a pre‑approved, immutable whitelist of kernel services. Every resource is accessed through capability tokens, enforcing least privilege by construction.
- Retest‑Then‑Trust – Converted applications start with zero permissions in a layered sandbox. They must pass multiple deterministic replay tests, comparing their behaviour to the original, before any capability is granted.
- Extreme Modularity for AI Generation – Every subsystem is a self‑contained module of under 1000 lines of code, making the entire OS auditable, maintainable, and directly generatable by AI coding assistants.
3. The Tamil‑Native Kernel Interface
NLOS adopts a microkernel architecture. The kernel itself provides only three primitive mechanisms, each named in Tamil:
| Tamil Identifier | Function (English) | Mechanism |
|---|---|---|
| செயல்முறை_உருவாக்கு | Process Create | Spawns a new protection domain |
| நினைவகம்_கேள் | Memory Request | Grants physical/virtual pages |
| செய்திமாற்றம்_அனுப்பு | IPC Send | Sends a typed message to a capability‑controlled endpoint |
All higher‑level services—file systems, network stacks, display servers—are implemented as user‑space processes that communicate exclusively through Tamil‑named IPC channels. The kernel can be built on top of a formally verified microkernel like seL4, which provides a trusted base while the Tamil API server in user space enforces language purity.
This minimal interface ensures that the booked/cooked function set remains small and auditable. Any application that attempts to invoke a function not in this list is automatically rejected.
4. Universal Binary Representation (UBR)
Every NLOS‑native executable is stored in the Universal Binary Representation. This format replaces ELF, PE, Mach‑O, and APK after translation.
4.1 UBR Layout
text
+----------+----------+---------+-------------+---------+--------------+ | மூலம் | பதிப்பு | கட்டளை | செயல்முறை | தரவு | கையொப்பம் | | Magic | Version | Code | Import Table| Data | Signature | | 6 bytes | 4 bytes | variable| variable | variable| 64 bytes | +----------+----------+---------+-------------+---------+--------------+
- மூலம் (Magic): The Tamil UTF‑8 sequence
0xAE0B 0xAE2E 0xAE32(“source”). Any binary without this magic is rejected. - செயல்முறை (Import Table): Lists the hashed identifiers of the Tamil kernel functions required. The loader validates this list against the binary’s signed capability token; any function outside the allowed set causes immediate rejection.
- கையொப்பம் (Signature): A cryptographic signature binding the code and import table to a specific, time‑limited capability token issued after successful retesting.
This format makes it impossible for a binary to invoke an unauthorized syscall; the needed functions are declared in the import table, and the kernel simply refuses to resolve any symbol not explicitly granted.
5. The Translation Engine (LLVM IR Direct Lowering)
The translation pipeline converts foreign executables into UBR without emulation or lossy C++ decompilation. It operates in five stages, each a modular component.
5.1 Pipeline Stages
- Format Detector (வடிவம்_கண்டறி) – Identifies the source binary (ELF, PE, Mach‑O, APK). If the binary uses dynamic code generation (JIT, self‑modifying code), it is routed to the WebAssembly Dynamic Execution Domain instead of the static pipeline.
- Lifter (உயர்த்தி) – Translates the raw machine code into LLVM IR. This stage leverages proven tools (McSema, RetDec, Ghidra’s SLEIGH‑to‑LLVM) to preserve low‑level semantics faithfully without attempting to recover high‑level abstractions.
- API Mapper (செயலி_வரைபடம்) – The core transformation. Using an extensible knowledge base of foreign OS semantics, every foreign system call or library call is replaced by a sequence of Tamil NLOS API calls. For example, a Linux
write(1, buf, len)becomes a message to the NLOS terminal server viaசெய்திமாற்றம்_அனுப்பு. The knowledge base starts with a small set of Linux syscalls (the musl‑libc subset) and grows through automatic observation. - UBR Backend (மூல_பின்புலம்) – An LLVM backend pass that lowers the modified IR directly into native machine code wrapped in the UBR format. It constructs the import table from the Tamil API calls present in the IR, ensuring the whitelist is automatically generated and never exceeds the booked/cooked set.
- Retest Harness (மீள்சோதனை) – The output UBR binary is executed under deterministic replay in the sandbox. Its system‑call trace is compared against a golden trace captured from the original binary running on its native OS. If divergence is detected, the API Mapper knowledge base is refined and the pipeline re‑runs. Only binary‑identical behaviour passes.
This pipeline avoids the mathematical impossibility of decompiling optimized binaries back to compilable C++ by staying at the LLVM IR level, where information loss is negligible. It follows the proven approach of Apple’s Rosetta 2.
6. Modular Kernel Architecture
NLOS is built as a set of independent, swappable modules, each under 1000 lines of code. The kernel itself (based on seL4) is a microkernel complemented by user‑space servers.
| Module (Tamil Name) | Responsibility | Line Limit |
|---|---|---|
| தொடங்கி (Boot) | Multiboot header, early CPU init | 200 |
| செயலாக்கி (Scheduler) | Priority‑based preemptive scheduling | 500 |
| நினைவகம் (Memory) | Page allocation, virtual memory | 600 |
| செய்தி_வாயில் (IPC) | Capability‑based message passing | 700 |
| தமிழ்_வழங்கி (API Server) | Tamil syscall demux, validation, dispatch | 800 |
| கோப்பு_சேவையகம் (File Server) | Virtual filesystem, block cache | 900 |
| வலை_சேவையகம் (Network Server) | TCP/IP stack, socket emulation | 900 |
For high‑performance inter‑module communication, groups of related servers can share a single address space using shared‑memory ring buffers (similar to io_uring), but capability checks are never bypassed.
7. Execution Sandbox Layers
Every application, whether natively compiled or translated, runs inside a four‑layer sandbox. Each layer is a modular component.
text
+-------------------------------------------------+ | Application (UBR binary with capability token) | +-------------------------------------------------+ | Layer 3: Permission Gate (உரிம_வாயில்) | | Validates token, grants scoped capabilities | +-------------------------------------------------+ | Layer 2: Sandbox (காப்பகம்) | | Private filesystem, resource quotas, no network| +-------------------------------------------------+ | Layer 1: Syscall Interceptor (முறை_பிடி) | | Filters every syscall against the import table | +-------------------------------------------------+ | Layer 0: NLOS Kernel (Tamil API) | +-------------------------------------------------+
Privilege Escalation Protocol:
- Initial State: The application has no capabilities beyond CPU and memory allocation.
- Retesting: The binary is executed with a recorded input dataset. All syscall traces are logged.
- Analysis: The Security & Quirk Mitigation Agent (SQMA) compares the trace against the expected golden trace.
- Grant: Upon passing, a signed capability token is issued, granting exactly the Tamil functions observed during the test, with time and resource limits. The token is bound to the binary’s cryptographic hash.
8. WebAssembly Dynamic Execution Domain
Many modern applications rely on just‑in‑time compilation (JavaScript, Java, Python). NLOS does not reject these. Instead, it executes them inside a sandboxed WebAssembly (Wasm) runtime.
- >span class=""only>span class=""> the Tamil NLOS API, translated into a minimal WASI‑like interface. No raw system calls are possible.
- This preserves the integrity of the booked/cooked function set while allowing browsers, scripting languages, and other dynamic software to run securely.
>span class="">9. TamilSeed Boot Firmware
A custom boot firmware replaces legacy BIOS/UEFI to enforce linguistic purity from power‑on.
| Stage | Tamil Name | Function |
|---|---|---|
| 0 | தொடக்கம் | Masked ROM verifies cryptographic signature of the next stage |
| 1 | பாதுகாப்பு_சரி | Safety Monitor checks hardware integrity and presence of all cartridges |
| 2 | மொழி_உறுதி | Scans all kernel modules for valid Tamil magic bytes and identifier schemas |
| 3 | கைவழி | Jumps to the kernel entry symbol தொடங்கு() |
The firmware uses measured boot to build a tamper‑evident log, and it refuses to load any binary that does not begin with the UBR Tamil magic.
10. Multi‑Agent Software Architecture
Four dedicated agents, all user‑space processes communicating over Tamil IPC, provide runtime intelligence and security.
| Agent (Tamil) | English Role | Duty |
|---|---|---|
| நிர்வாகி | Manager | Maintains Ground State of capability grants and binary signatures; coordinates all other agents. |
| பாதுகாப்பு | Guardian | Continuously monitors IPC traffic for policy violations; applies anonymous vulnerability mitigations; triggers sandbox isolation on anomaly. |
| வள_ஒதுக்கீடு | Allocator | Profiles application behaviour via eBPF; adjusts CPU, memory, and I/O quotas dynamically. |
| சேமிப்பு | Archivist | Records complete syscall traces for audit, replay, and forensic analysis; maintains the golden traces for retesting. |
These agents are the only processes with elevated capabilities, and they themselves are subject to the same capability discipline.
11. Stability and Security Subsystem
Security is inherent in the architecture, not a bolt‑on.
- One‑Request‑One‑Function: The interceptor enforces that each user‑space call maps to exactly one Tamil kernel function; no multiplexing is allowed.
- Immutability of Booked/Cooked Set: The function whitelist is stored in a signed, read‑only memory region that even the kernel cannot modify at runtime.
- Capability Tokens: Every file, socket, and memory region is accessed via a cryptographically verified token, not a global descriptor.
- Fault Containment: Each application and server is a separate process; a crash in one cannot affect others. A faulty server is automatically restarted by the Manager agent.
- Ground State Snapshots: A versioned, encrypted snapshot of all capability grants is maintained. On detection of a security breach, the system can instantly revert to the last known‑good state.
12. Diagnostic Subsystem
Before any application is given a capability token, the system provides clear status using only Tamil.
- Boot Console: Displays messages like
இயங்குதளம் தொடங்கியது(OS started) orசெயலி மொழிபெயர்ப்பு தோல்வி(Translation failed). - Sandbox Logs: Every syscall attempt is logged with Tamil function names, e.g.,
முறை_மீறல்: கோப்பு_எழுது அனுமதி மறுக்கப்பட்டது(Violation: file write permission denied). - Retest Report: A human‑readable summary shows pass/fail counts, permission requests, and risk scores for each application.
13. Practical Feasibility
No component requires science fiction. The entire architecture can be prototyped using existing tools.
| Subsystem | Feasibility | Notes |
|---|---|---|
| Tamil kernel API | Buildable today | GCC/clang support Unicode identifiers; seL4 provides verified base |
| LLVM IR translation pipeline | Proven approach | Rosetta 2; McSema/RetDec; custom LLVM backend |
| Booked/cooked function set | Trivial enforcement | seccomp‑bpfilter or custom capability system |
| Layered sandbox | Mature container tech | Linux namespaces or seL4 native domains |
| WebAssembly domain | Production runtimes | Wasmtime, Wasmer; standard WASI |
| Multi‑agent security | Incrementally deployable | Standard daemons over IPC |
| TamilSeed firmware | Modifiable open firmware | coreboot, ARM TF‑A |
A minimal single‑application demonstrator (busybox translated to UBR and running under sandbox) can be built by one person using omp within 10 months.
14. Technology Choices & Justification
| Technology / Choice | Why Chosen | Core Duty Fulfilled |
|---|---|---|
| Tamil as exclusive ABI language | Eliminates English lock‑in; cultural sovereignty | Linguistic purity from boot to app |
| Microkernel on seL4 | Formally verified; minimal TCB | Security foundation |
| LLVM IR direct lowering | Avoids lossy C++ decompilation; proven in Rosetta 2 | Universal binary translation |
| Booked/cooked function set | No ambient authority; each action explicitly allowed | Maximum security by construction |
| Capability‑based security | Tokens for every resource; no global IDs | Least privilege enforcement |
| Mandatory retesting cycles | Ensures behavioural equivalence to original | Correctness guarantee |
| Wasm dynamic domain | Embraces JIT languages without sacrificing sandbox purity | Modern application compatibility |
| Modular <1000 lines per module | Enables AI generation, auditing, and maintenance | Sustainable, democratic development |
| AI‑assisted construction (omp) | Code generation, debugging, and integration at scale | Feasibility for a single architect |
15. How NLOS Fills the Identified Gaps
During conceptual development, critical gaps were systematically identified and addressed.
| Gap | NLOS Solution |
|---|---|
| No OS with a completely non‑English kernel ABI | Tamil‑only API, boot firmware, and binary format |
| No universal binary execution without runtime emulation | Static translation via LLVM IR → UBR pipeline |
| No inherent application security in translation | Booked/cooked set, layered sandbox, retest‑then‑trust |
| No defence against translated malicious binaries | One‑request‑one‑function, capability tokens, time‑limited tokens |
| Lossy decompilation makes reliable translation impossible | LLVM IR direct lowering, no C++ emitter |
| Dynamic code (JIT, browsers) cannot run safely | Wasm execution domain with Tamil‑only WASI |
| No language‑pure security model | All agents and logs use Tamil identifiers |
| No modular, AI‑buildable OS design | Every component is a <1000‑line module |
| OS development requires deep coding expertise | Human is architect; AI (omp) writes the code |
16. Core Concept Implementation Checklist
- Tamil‑native kernel API with three primitives
- Universal Binary Representation with Tamil magic and import whitelist
- Translation pipeline: LLVM IR lifting → API mapping → UBR backend (no C++ emitter)
- Booked/cooked function set
- Layered sandbox (interceptor, sandbox, permission gate)
- Retest‑then‑trust capability escalation
- WebAssembly dynamic execution domain
- Multi‑agent security network (Manager, Guardian, Allocator, Archivist)
- TamilSeed boot firmware
- Modular microkernel (seL4‑based)
- Practical feasibility assessment
- All identified gaps explicitly filled
- AI‑assisted construction pathway defined
17. Building NLOS with oh‑my‑pi (omp) – 10‑Month Plan Outline
NLOS is explicitly designed to be built by a human architect partnered with an AI coding agent. oh‑my‑pi (omp) — with its hash‑anchored edits, in‑process shell, LSP navigation, persistent Python/JS evaluation, subagent parallelism, and browser‑powered research — is the ideal tool.
How a Single Person + omp Builds NLOS (No Manual Coding Required):
- >span class="">omp generates the complete C/Rust/assembly module.
- omp compiles it, runs the test harness, and debugs failures.
- >span class="">Subagents work on multiple modules in parallel.
- >span class="">All code is kept within the 1000‑line limit, fitting perfectly in omp’s context window.
10‑Month Roadmap (1 architect + omp):
| Phase | Weeks | Goal | omp Tools Used |
|---|---|---|---|
| 0 | 1–2 | Set up cross‑compiler, Tamil namespace, seL4 build environment | write, bash, eval |
| 1 | 3–6 | seL4 Tamil API server (3 primitives) running in QEMU | read, write, lsp, bash |
| 2 | 7–9 | UBR format specification, loader, and basic file server | write, eval, bash |
| 3 | 10–14 | Translation pipeline: ELF → LLVM IR → Tamil API mapping | web_search, browser, write, task |
| 4 | 15–18 | PIVOT: LLVM IR → UBR direct lowering (LLVM backend) | write, edit, bash, debug |
| 5 | 19–22 | Security layers: syscall interceptor, sandbox, capability gate | write, task, eval |
| 6 | 23–26 | Four‑agent security network (Manager, Guardian, Allocator, Archivist) | write, task, irc |
| 7 | 27–30 | WebAssembly dynamic execution domain (Wasmtime integration) | write, bash, edit |
| 8 | 31–34 | TamilSeed boot firmware (ARM/RISC‑V) | write, bash, debug |
| 9 | 35–40 | Integration: busybox → UBR → secure execution under full sandbox | ALL TOOLS |
At the end of Phase 9, the prototype will boot in QEMU, accept a translated Linux busybox, run it in the Tamil sandbox, and demonstrate the complete pipeline. The full production system is a 2–3‑year effort, but the core vision is proven in 10 months with one person and omp.
18. Conclusion and Next Steps
NLOS v1.0 presents a complete, gap‑free architecture for a language‑native operating system that translates all foreign software into a secure, native format. The kernel speaks Tamil, the binary format carries Tamil magic, and the security model enforces linguistic purity at the syscall level. The architecture corrects the critical flaw of prior designs by using LLVM IR direct lowering instead of impossible C++ decompilation. It embraces modern dynamic software through a Wasm sandbox, and it is built from the ground up as a collection of AI‑generatable modules.
Next steps:
- Refine the LLVM UBR backend specification.
- Build a minimal seL4‑based Tamil API server.
- Translate a small set of Linux static binaries to prove the pipeline.
- Engage the open‑source and language‑technology communities.
All work is dedicated to the public domain (CC0).
19. Thesis Source
Not based on a formal academic thesis. This is independent R&D work.
Public Domain Dedication
The author dedicates this work to the public domain under Creative Commons Zero (CC0). You are free to copy, modify, distribute, and perform the work, even for commercial purposes, without asking permission.
shankar.sh
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.