Friday, January 29, 2016

Initialization of Kernel Code

I was wondering what happens when you initialize your module....type in insmod module_name.ko.

Most of the code paths in the kernel run as function pointers. You will be tracing some source code and then suddenly there seems to be no direct function call being made, but suddenly the execution returns to the same place. This is where magic happens when function pointers are used.

So, If you are saying what happens when I insert my module, How does the kernel keep track of my custom functions (open, close, read, write, ioctl etc etc).

So the fact is that we as mere mortals are not doing anything extraordinary when we write module operations functions. We are simply attaching our functions read, write.... to a template of module operations.

This kind of framework is essential for the kernel to manage each module operation successfully.

#ifndef MODULE
 **
  * module_init() - driver initialization entry point
 * @x: function to be run at kernel boot time or module insertion
  *
  * module_init() will either be called during do_initcalls() (if
  * builtin) or at module insertion time (if a module).  There can only
  * be one per module.
  */
 #define module_init(x)  __initcall(x);

As you can see above this module_init() will be called wither during do_initcalls()....what does he mean, so essentially module_init() is going to register our init function in an array of function pointers. the do_initcalls() is going to call one by one depending on the order and dependencies, function from this array of function pointers (please excuse my english).

 #define __initcall(fn) device_initcall(fn)

#define device_initcall(fn)             __define_initcall(fn, 6)

/* initcalls are now grouped by functionality into separate
 * subsections. Ordering inside the subsections is determined
 * by link order.
 * For backwards compatibility, initcall() puts the call in
 * the device init subsection.
 *
 * The `id' arg to __define_initcall() is needed so that multiple initcalls
 * can point at the same handler without causing duplicate-symbol build errors.
 */
 #define __define_initcall(fn, id) \
         static initcall_t __initcall_##fn##id __used \
         __attribute__((__section__(".initcall" #id ".init"))) = fn; \
         LTO_REFERENCE_INITCALL(__initcall_##fn##id)

This bold is one statement.

static initcall_t __initcall_##fn##id __used __attribute__((__section__(".initcall" #id ".init"))) = fn;

typedef int (*initcall_t)(void);

as you can see the initcall_t is a function pointer.

Who calls Main.. and can we do more before calling main

When you write your application in C, your code isn't the only thing that gets programmed. Before your application can perform its first action, the C Runtime Environment startup code must configure the device to run code produced by a C compiler.


There are several things the C Runtime Environment startup code must do before your application's code can run.
  • Allocate space for a software stack and initialize the stack pointer
    On 8-bit devices that have a hardware based return address stack, the software stack is mostly used for parameter passing to and from functions. On 16- and 32-bit devices the software stack also stores the return address for each function call and interrupt.
  • Allocate space for a heap (if used)
    A heap is a block of RAM that has been set aside as a sort of scratchpad for your application. C has the ability to dynamically create variables at runtime. This is done in the heap.
  • Copy values from Flash into variables declared with initial values
    Variables declared with initial values (e.g. int x=10;) must have those initial values loaded into memory before the program can use them. The initial values are stored in flash program memory (so they will be available after the device is power cycled) and are copied into each RAM location allocated to an initialized variable for its storage.
  • Clear uninitialized RAM
    Any RAM (file register) not allocated to a specific purpose (variable storage, stack, heap, etc.) is cleared so that it will be in a known state.
  • Disable all interrupts
  • Call main(), where your application code start.

The runtime environment setup code is automatically linked into your application. It usually comes from a file with a name like crt0.s (assembly source) or crt0.o (object code).

The runtime startup code can be modified if necessary. In fact, the source file provides hooks for "user initialization" where you can run code that must execute before the main application begins, such as initializing some external hardware immediately after power is applied. Details on runtime startup code modification will be covered in the compiler specific classes.

crt0.o is an object file with code that is prepended to object code supplied by the user to make an executable. It initializes variables and the stack, and starts the user's program, among other things.

The simplest C runtime code would be
.text    // Select .text section
 b main  // Branch to main() in C source

c runtime

  • crt1.o
    This object file defines the _start symbol. The manner in which this code handles program bootstrap is highly dependent on the particularC library implementation. Some systems use crt0.o while others may even specify crt2.o or higher. Ultimately, whatever gcc has encoded should correspond to the C library in use.
  • crti.o and crtn.o
    crti.o defines the _init and _fini function prologs for the .init and .fini sections, respectively. crtn.o defines the corresponding function epilogs. When the static linker eventually merges all .init and .fini sections of its input object files, the DT_INITand DT_FINI tags in the dynamic section of its output object file will correspond to the addresses of the complete _init and _finisymbols, respectively.
    During run-time, _start sets up some way that the _init and _fini symbols will get invoked e.g. via the __libc_csu_init and__libc_csu_fini symbols, respectively, of the C library.
  • crtbegin.o and crtend.o
    The details of the symbols and sections defined in these files vary among architectures. With the Ubuntu 12.04 AMD64 toolchain, these include legacy code that GCC used to find the constructors and destructors i.e. __do_global_dtors_aux and __do_global_ctors_aux.
SOme more general Information
Some definitions:
PIC - position independent code (-fPIC)
PIE - position independent executable (-fPIE -pie)
crt - C runtime

crt0.o crt1.o etc...
  Some systems use crt0.o, while some use crt1.o (and a few even use crt2.o
  or higher).  Most likely due to a transitionary phase that some targets
  went through.  The specific number is otherwise entirely arbitrary -- look
  at the internal gcc port code to figure out what your target expects.  All
  that matters is that whatever gcc has encoded, your C library better use
  the same name.

  This object is expected to contain the _start symbol which takes care of
  bootstrapping the initial execution of the program.  What exactly that
  entails is highly libc dependent and as such, the object is provided by
  the C library and cannot be mixed with other ones.

  On uClibc/glibc systems, this object initializes very early ABI requirements
  (like the stack or frame pointer), setting up the argc/argv/env values, and
  then passing pointers to the init/fini/main funcs to the internal libc main
  which in turn does more general bootstrapping before finally calling the real
  main function.


glibc ports call this file 'start.S' while uClibc ports call this crt0.S or
  crt1.S (depending on what their gcc expects).
crti.o Defines the function prologs for the .init and .fini sections (with the _init
  and _fini symbols respectively).  This way they can be called directly.  These
  symbols also trigger the linker to generate DT_INIT/DT_FINI dynamic ELF tags.

  These are to support the old style constructor/destructor system where all
  .init/.fini sections get concatenated at link time.  Not to be confused with
  newer prioritized constructor/destructor .init_array/.fini_array sections and
  DT_INIT_ARRAY/DT_FINI_ARRAY ELF tags.

  glibc ports used to call this 'initfini.c', but now use 'crti.S'.  uClibc
  also uses 'crti.S'.

crtn.o
  Defines the function epilogs for the .init/.fini sections.  See crti.o.

  glibc ports used to call this 'initfini.c', but now use 'crtn.S'.  uClibc
  also uses 'crtn.S'.
For statically linked applications2, the load process only requires the kernel to make the binary available in its fixed load address before initializing the Program Counter (PC) for the process with the address of the _start symbol. On the other hand, for dynamically linked applications, the kernel first transfers control to the dynamic linker. In turn, the dynamic linker loads the required shared object dependencies and performs anyimmediate relocations (by default, lazy relocations for function references are performed later on when the symbols are actually referenced). It then methodically runs the initialization code for the loaded shared objects before handing control over to the executable's _start.
Entering the executable's _start concludes the application's load process and control proceeds to the executable's C run-time code before eventually reaching main.

How to enter into Kernel Mode

The only way an user space application can explicitly initiate a switch to kernel mode during normal operation is by making an system call such as openreadwrite etc.
Whenever a user application calls these system call APIs with appropriate parameters, a software interrupt/exception(SWI) is triggered.

  • Make a system call, i.e. explicitly request service from the kernel
  • trap into the kernel because of either:
    • an error (segmentation violation, invalid instruction, etc.) - this is fatal,
    • or a page fault - accessing mapped, but not resident memory page.
A kernel code snippet is run on request of a user process. This code runs in ring 0 (with current privilege level -CPL- 0), which is the highest level of privilege in x86 architecture. All user processes run in ring 3 (CPL 3). So, to implement system call mechanism, what we need is 1) a way to call ring 0 code from ring 3 and 2) some kernel code to service the request.

It was found out that this software interrupt method was much slower on Pentium IV processors. To solve this issue, Linus implemented an alternative system call mechanism to take advantage of SYSENTER/SYSEXIT instructions provided by all Pentium II+ processors. Before going further with this new way of doing it, let's make ourselves more familiar with these instructions.

The SYSENTER instruction is part of the "Fast System Call" facility introduced on the Pentium® II processor. The SYSENTER instruction is optimized to provide the maximum performance for transitions to protection ring 0 (CPL = 0). The SYSENTER instruction sets the following registers according to values specified by the operating system in certain model-specific registers.
  • CS register set to the value of (SYSENTER_CS_MSR)
  • EIP register set to the value of (SYSENTER_EIP_MSR)
  • SS register set to the sum of (8 plus the value in SYSENTER_CS_MSR)
  • ESP register set to the value of (SYSENTER_ESP_MSR)
Looks like processor is trying to help us. Let's look at SYSEXIT also very quickly:
The SYSEXIT instruction is part of the "Fast System Call" facility introduced on the Pentium® II processor. The SYSEXIT instruction is optimized to provide the maximum performance for transitions to protection ring 3 (CPL = 3) from protection ring 0 (CPL = 0). The SYSEXIT instruction sets the following registers according to values specified by the operating system in certain model-specific or general purpose registers.
  • CS register set to the sum of (16 plus the value in SYSENTER_CS_MSR)
  • EIP register set to the value contained in the EDX register
  • SS register set to the sum of (24 plus the value in SYSENTER_CS_MSR)
  • ESP register set to the value contained in the ECX register

Monday, November 30, 2015

What happens when you sleep

Once again a joy ride of a roller coaster that is the Linux kernel.

What happens when you sleep in the kernel. You here is the process context code that has either voluntarily relinquished the CPU or because it was doing a blocking call.

/**
 75  * mutex_lock - acquire the mutex
 76  * @lock: the mutex to be acquired
 77  *
 78  * Lock the mutex exclusively for this task. If the mutex is not
 79  * available right now, it will sleep until it can get it.
 80  *
 81  * The mutex must later on be released by the same task that
 82  * acquired it. Recursive locking is not allowed. The task
 83  * may not exit without first unlocking the mutex. Also, kernel
 84  * memory where the mutex resides must not be freed with
 85  * the mutex still locked. The mutex must first be initialized
 86  * (or statically defined) before it can be locked. memset()-ing
 87  * the mutex to 0 is not allowed.
 88  *
 89  * ( The CONFIG_DEBUG_MUTEXES .config option turns on debugging
 90  *   checks that will enforce the restrictions and will also do
 91  *   deadlock debugging. )
 92  *
 93  * This function is similar to (but not equivalent to) down().
 94  */
 95 void __sched mutex_lock(struct mutex *lock)
 96 {
 97       might_sleep();  //---> Just for Debugging.
 98       /*
 99        * The locking fastpath is the 1->0 transition from
100        * 'unlocked' into 'locked' state.
101        */
102      __mutex_fastpath_lock(&lock->count, __mutex_lock_slowpath);
103       mutex_set_owner(lock);
104 }
105 
106 EXPORT_SYMBOL(mutex_lock);
Ingo Molnar's function header sums it up. 



Sorting method in Linux Kernel

I came across this source code lying in the kernel SRC, about a NON recursive sorting algorithm which does a pretty decent job at worst and average case scenarios.
The source file is /lib/sort.c
Prototype:void sort(void *base, size_t num, size_t size, int (*cmp_func)(const void *, const void *),          void (*swap_func)(void *, void *, int size))
This function does a heapsort on the given array. You may provide a
* swap_func function optimized to your element type.
* Sorting time is O(n log n) both on average and worst-case. While  qsort is about 20% faster on average, it suffers from exploitable O(n*n) worst-case behavior and extra memory requirements that make it less suitable for kernel use.
Heap sort is simple to implement, performs an O(n·lg(n)) in-place sort, but is not stable.
So our 1st task is to understand the HEAP DS.
A heap is a kind of binary tree having this property: every parent value is greater than (or equal to) its children's.
 The root always holds the max value
If the Parent value is less than its children, then the value is sifted down repeatedly until it proper heap is formed.
Suppose we have the following, arbitrarily assigned, array:assigned array
The right half side (the blue marked side) is already arranged as the bottom row of a heap, because its items have no children: consider, for instance the first 'blue' item, having value 24. Its index is 7, hence the heap properties for such item (h7>= h15, h7>= h16) are automatically satisfied, because there are no items with index 15 or 16 in the array. So the 'blue side' is our partial heap and we can proceed with heap construction by repeatedly sifting down the items of the 'red side'.
3 - sifting down the first 'red' (left side) item of the array
The sift function would be implemented like this:
// The sift function:
// 'sifts down' the a[l] item until heap properties are restored
// ARGS:
//  a[] (INOUT) the array  (where a[l+1], a[l+2] .. a[size-1] is a partial heap)
//  size (IN) number of items of a
//  l (IN) index of the item inserted into the heap

void sift(int a[], int size, int l)
{
  int i,j, x;
  i = l;
  j = 2*i+1;
  x = a[i];
  while (j < size)
  {
    if ( j <size - 1)
      if ( a[j] < a[j+1])
        ++j;
    if (x >= a[j])
      break;
    a[i] = a[j];
    i = j;
    j = 2*i + 1; // sift
  }
  a[i] = x;
}
The creation of heap than becomes like this:
/ makes the heap using the R.W.Floyd's algorithm
// ARGS:
//  a[] (INOUT) the array wherein the heap is created
//  size (IN) the size of the array
 
void make_heap(int a[], int size)
{
  int l = size / 2;
  while (l)
  {
    --l;
    sift(a, size, l);
  }
} 
After all the operations are finished, We will get a heap array with
First element in array as Maximum value element in the unsorted 
array.
The 1st Fig: Unsorted Array. Fig2: Heaped array---> (29 is the max value).
so that the maximum value (29) is now the first item of the array (or, in other words, is on the top of the heap).

We can remove such maximum value (storing it in a safe place) and rearrange the remaining items in order to regain a heap (to obtain another 'maximum', that is the successive value of the sorted sequence).

It turns out the safe place for storing the maximum value is the last item of 
the array. As matter of fact, we swap the first and last item.
Now we have:

The maximum value in its own safe place (grayed, on the right side of the array)
A partial heap to fix (the blue items)
An item to sift down (the read one)
So We implement this heap_sort for arranging in descending order is:
void heapsort(int a[], int size)
{
  int l = 0, r = size;
  make_heap(a, size);
 
  while ( r > 1)
  {
    int tmp = a[0];
    --r;
    a[0] = a[r];
    a[r] = tmp;
    sift(a, r,0);
  }
}
NOTE: All credit for the diagrams and Source code for Heap sort goes to this guy:http://www.codeproject.com/Tips/732196/Heap-Data-Structure-and-Heap-Sort 
Now coming back to our kernel code for sort.c
void sort(void *base, size_t num, size_t size,
            int (*cmp_func)(const void *, const void *),
            void (*swap_func)(void *, void *, int size))
  {
          /* pre-scale counters for performance */
          int i = (num/2 - 1) * size, n = num * size, c, r;
  
          if (!swap_func) {
                  if (size == 4 && alignment_ok(base, 4))
                          swap_func = u32_swap;
                  else if (size == 8 && alignment_ok(base, 8))
                          swap_func = u64_swap;
                  else
                          swap_func = generic_swap;
          }
  
          /* heapify */
          for ( ; i >= 0; i -= size) {
                  for (r = i; r * 2 + size < n; r  = c) {
                          c = r * 2 + size;
                          if (c < n - size &&
                                          cmp_func(base + c, base + c + size) < 0)
                                  c += size;
                          if (cmp_func(base + r, base + c) >= 0)
                                  break;
                          swap_func(base + r, base + c, size);
                  }
          }
  
          /* sort */
          for (i = n - size; i > 0; i -= size) {
                  swap_func(base, base + i, size);
                  for (r = 0; r * 2 + size < i; r = c) {
                          c = r * 2 + size;
                          if (c < i - size &&
                                          cmp_func(base + c, base + c + size) < 0)
                                  c += size;
                          if (cmp_func(base + r, base + c) >= 0)
                                  break;
                          swap_func(base + r, base + c, size);
                  }
         }
 }

Friday, October 23, 2015

The enigma of Spinlocks...

The SMP and locking has always been a tricky affair to handle in latest embedded systems. Nothing can be understood unless one delves into source code of the implementations.

So take for instance:

spin_lock_irqsave(lock, flags)

spin_lock_irqsave is basically used to save the interrupt state before taking the spin lock, this is because spin lock disables the interrupt, when the lock is taken in interrupt context, and re-enables it when while unlocking. The interrupt state is saved so that it should reinstate the interrupts again.

Thats just english ;)... What it actually does is a do..while(0) loop.

#define spin_lock_irqsave(lock, flags)                          \
do 
{                                                            \
         raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
} while (0)

/*
* Map the spin_lock functions to the raw variants for PREEMPT_RT=n
 */
 static inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
 {
         return &lock->rlock;
 }

#define raw_spin_lock_irqsave(lock, flags)                      \
do  \
{                                            \
                typecheck(unsigned long, flags);        \
                flags = _raw_spin_lock_irqsave(lock);   \
 } while (0)

The _raw_spin_lock_irqsave has 2 Variants: 1st A UP one and 2nd a SMP.

Defined as a preprocessor macro in:
              include/linux/spinlock_api_up.h, line 69
              include/linux/spinlock_api_smp.h, line 61

Defined as a function in:
kernel/locking/spinlock.c, line 147

We check the Uni-processor Variant First:
As it is written: 
/*
 * In the UP-nondebug case there's no real locking going on, so the
   * only thing we have to do is to keep the preempt counts and irq
   * flags straight, to suppress compiler warnings of unused lock
   * variables, and to add the proper checker annotations:  
*/

#define _raw_spin_lock_irqsave(lock, flags)     __LOCK_IRQSAVE(lock, flags)

#define __LOCK_IRQSAVE(lock, flags)
do { local_irq_save(flags); __LOCK(lock); } while (0)

#define local_irq_save(flags) ((flags) = 0)
OR
#define local_irq_save(flags)
do {raw_local_irq_save(flags);} while (0)

#define raw_local_irq_save(flags)                       \
         do {                                            \
                 typecheck(unsigned long, flags);        \
                 flags = arch_local_irq_save();          \
         } while (0)

From here it goes into architecture specific code.

#define arch_local_irq_save arch_local_irq_save
 static inline unsigned long arch_local_irq_save(void)
  {
          unsigned long flags, temp;
          asm volatile(
                  "       mrs     %0, cpsr        @ arch_local_irq_save\n"
                  "       orr     %1, %0, #128\n"
                  "       msr     cpsr_c, %1"
                  : "=r" (flags), "=r" (temp)
                  :
                  : "memory", "cc");
          return flags;
  }

What do these Assembly instructions mean:
1st:    mrs    flags, cpsr        @ arch_local_irq_save\n"
                    - MRS{cond} Rd,  cpsr ---> So basically we are copying the contents of cpsr into flags

So what does this CPSR do:

The Current Program Status Register is a 32-bit wide register used in the ARM architecture to record various pieces of information regarding the state of the program being executed by the processor and the state of the processor. This information is recorded by setting or clearing specific bits in the register.

ARM CPSR format

For our spinlock discussion, lets just talk about the I and F bits which determine whether interrupts (such as requests for input/output) are enabled or disabled.

So when in our assembly code:

orr     %1, %0, #128\n" ---> We do Logical OR of flags and #128 -> 0x00000080 (last 2 nibbles 10000000).

So if the interrupts are enabled the ORing at the 7th bit will disable the interrupts.

And then we copy the value of the ORing into temp. Finally we copy temp into CPSR to disable the interrupts.

Thats all the spin_lock_irqsave(flags) does. Disable the interrupts if they are already enabled, also save the CPSR into flags.

So when we do spin_unlock_irqrestore(flags). It is going to do the opposite.

But Now all this code digging is going for a toss when I say that the do while which I showed is actually not going to execute because its while(0). BUMP !!!. Essentially the point of spinlocks to do nothing in a uni-processor environment.

Now, lets talk about the SMP spin_lock case.
#ifdef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
#define _raw_spin_lock_irqsave(lock) __raw_spin_lock_irqsave(lock)

#endif

/*
* If lockdep is enabled then we use the non-preemption spin-ops
  * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
  * not re-enabled during lock-acquire (which the preempt-spin-ops do):
  */
 #if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)

 static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
 {
         unsigned long flags;

         local_irq_save(flags);
         preempt_disable();
         spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
         /*
          * On lockdep we dont want the hand-coded irq-enable of
          * do_raw_spin_lock_flags() code, because lockdep assumes
          * that interrupts are not re-enabled during lock-acquire:
          */
 #ifdef CONFIG_LOCKDEP
         LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
 #else
         do_raw_spin_lock_flags(lock, &flags);
 #endif
         return flags;
 }

#define local_irq_save(flags)                                   \
        do {                                                    \
                raw_local_irq_save(flags);                      \
        } while (0)

#define raw_local_irq_save(flags)                       \
         do {                                            \
                 typecheck(unsigned long, flags);        \
                 flags = arch_local_irq_save();          \
         } while (0)

Then again the architecture specific stuff.

 #define arch_local_irq_save arch_local_irq_save
 static inline unsigned long arch_local_irq_save(void)
  {
          unsigned long flags, temp;

          asm volatile(
                  "       mrs     %0, cpsr        @ arch_local_irq_save\n"
                  "       orr     %1, %0, #128\n"
                  "       msr     cpsr_c, %1"
                  : "=r" (flags), "=r" (temp)
                  :
                  : "memory", "cc");
          return flags;
  }

 #define preempt_disable() \
 do { \
         preempt_count_inc(); \
         barrier(); \
 } while (0)

#define preempt_count_inc() preempt_count_add(1)

#define preempt_count_add(val)  __preempt_count_add(val)

static __always_inline void __preempt_count_add(int val)
 {
         *preempt_count_ptr() += val;
 }

static inline void barrier(void)
 {
         asm volatile("" : : : "memory");
 }

I hope you enjoyed the ride into the kernel source for understanding spinlocks.

LUCI LUA MVC

Recently I had the opportunity to work on OpenWRT for a Network Security Product. The hardest part for getting the product finished and delivered was customizing the UI.

The default UI of OpenWRT uses LUCI Model-View-Controller Framework. In a traditionally know language like php, Java, javascript, working on MVC is always a pleasure for UI developers. I am by choice a Kernel programmer, and even I would, for once in a while case, touch upon and write some UI components. However, working on LUA based LUCI was a night mare, not only for a non professional UI guy like me, but even for veterans in UI in my team.

The primary reasons why this was the case with LUCI-LUA.
1. No documentation. ( I mean even the kernel has better documentation than LUCI-LUA)
2. The code is not written in a generally accepted and used language (Its written in Lua Language).
3. No one else has written a memoir about there tryst with LUCI-LUA.

So I decided that since recently I just finished adding a dozen custom changes to it, I should share them with readers who might stumble upon it.

The first feature which is not supported by LUCI-LUA is that it is mostly hard-coded with 1 admin user "root". Inherently it does not have multi-user support. But there are hacks lying around to make LUCI-LUA pseudo multi-user.


The reason it is pseudo multi-user is because, even though a new user "Admin" logs into the UI, the underlying user in the linux system is still "root", because the http server which serves LUCI pages is itself running as root.

Suppose you need to add a custom page for a new feature. As an example we add a new Tab under System tab which is to add External Admins configured in a RADIUS Server.

In that case take an existing Template ( like we all do while developing drivers....look into an existing similar code). Add your entry into the System.lua file and then a blank page with necessary CSS and theme automatically pops up with your Menu entry.

Then add your own lua script which does handling of all the logic and UI handling.

Code Snippets:

FIle: system.lua
entry({"admin", "system", "externaladmin"}, cbi("admin_system/externaladmin"), _("External Admin"), 97)

File: externaladmin.lua.

adminform = SimpleForm("userform1", translate("External Admin"))
adminform.reset  = false
username = adminform:field(Value, "uname",translate("Username"))
serverip = adminform:field(Value, "serverip",translate("Server IP"))
key = adminform:field(Value, "key",translate("Key"), translate("Table1: External Admins maintained
in External Radius Server. Table2: Radius Server with Server IPs"))

function adminform.handle(self, state, data)

if state == FORM_VALID then

local file = io.open("/tmp/myuser.log","w+")
local usernametxt = data.uname
local serveriptxt = data.serverip
local keytxt = data.key
local serverfile

if serveriptxt and keytxt and #serveriptxt > 0 and #keytxt > 0 then
command = "sh /root/radius_adt.sh "..serveriptxt.." "..keytxt
os.execute(command)
luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))
end

if usernametxt and #usernametxt > 0 then
command = "sh /root/sysauth_adt.sh add external "..usernametxt.." NULL"
--os.execute(command)
output = luci.util.exec(command)
if output ~= "" then
adminform.errmessage=output
else
luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))
end
end

file:close()

-- luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))

end

return true
end

--[[ This section adds table to display External Admins ]]--

tbluser = adminform:section(Table, userdata)

tblcoluser = tbluser:option(DummyValue, "coluser", translate("User Name"))
tblcoldel = tbluser:option(Button, "deluser", translate("Delete"))

tblcoldel.render = function(self, section, scope)
if userdata[section].enabled then
self.title = translate("Delete")
self.inputstyle = "delete"
end
Button.render(self, section, scope)
end

tblcoldel.write = function(self, section)
local userdel = userdata[section].coluser
command = "sh /root/sysauth_adt.sh remove external "..userdel.." NULL"
--command = "deluser "..userdel
os.execute(command)
luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))
end

Rest of the template remains same. Much of my LUCI LUA experience has been constantly looking at existing code and putting debug prints to understand the flow.

If you want to add a Start up script UI element which allows you to DO a ENABLE/START/STOP/RESTART of your script, then just write a init script.

#!/bin/sh /etc/rc.common
# script to Run DHCP snoop module using previous state
# Copyright (C) 2015 Nevis Networks
# Author: Gadre Nayan Anand Version: 1.0
# This is a init script to enable, start, stop, restart DHCP snoop Feature.
# For More details on commands involved read User Manual.

START=58
STOP=58

FILE1=/etc/config/dhcp_sis
FILE2=/etc/config/trusted_interfaces

start()
{
        local p_m
        local u_b
        while IFS="=" read -r key value;
        do
                case "$key" in
                        "preventive_mode") p_m=$value ;;
                        "use_bridge") u_b=$value;;
                esac
        done < "$FILE1"
        echo preventive_mode = $p_m use_bridge = $u_b

        echo "starting DHCP SNOOP IP SPOOF"
        /usr/sbin/insmod  /root/dhcp_snoop_IP_spoof.ko preventive_mode=$p_m use_bridge=$u_b

while read -r key;
do
echo "a-$key" > /sys/kernel/dhcp/trusted_interfaces
done < "$FILE2"
}

stop()
{
        echo "stopping DHCP SNOOP IP SPOOF"
        /usr/sbin/rmmod dhcp_snoop_IP_spoof
}



And LUCI framework  is really pathetically rigid, especially with the GUI perspective. CSS and other aspects are very hard to change....atleast this is what I felt.