Friday, October 23, 2015

The enigma of Spinlocks...

The SMP and locking has always been a tricky affair to handle in latest embedded systems. Nothing can be understood unless one delves into source code of the implementations.

So take for instance:

spin_lock_irqsave(lock, flags)

spin_lock_irqsave is basically used to save the interrupt state before taking the spin lock, this is because spin lock disables the interrupt, when the lock is taken in interrupt context, and re-enables it when while unlocking. The interrupt state is saved so that it should reinstate the interrupts again.

Thats just english ;)... What it actually does is a do..while(0) loop.

#define spin_lock_irqsave(lock, flags)                          \
do 
{                                                            \
         raw_spin_lock_irqsave(spinlock_check(lock), flags);     \
} while (0)

/*
* Map the spin_lock functions to the raw variants for PREEMPT_RT=n
 */
 static inline raw_spinlock_t *spinlock_check(spinlock_t *lock)
 {
         return &lock->rlock;
 }

#define raw_spin_lock_irqsave(lock, flags)                      \
do  \
{                                            \
                typecheck(unsigned long, flags);        \
                flags = _raw_spin_lock_irqsave(lock);   \
 } while (0)

The _raw_spin_lock_irqsave has 2 Variants: 1st A UP one and 2nd a SMP.

Defined as a preprocessor macro in:
              include/linux/spinlock_api_up.h, line 69
              include/linux/spinlock_api_smp.h, line 61

Defined as a function in:
kernel/locking/spinlock.c, line 147

We check the Uni-processor Variant First:
As it is written: 
/*
 * In the UP-nondebug case there's no real locking going on, so the
   * only thing we have to do is to keep the preempt counts and irq
   * flags straight, to suppress compiler warnings of unused lock
   * variables, and to add the proper checker annotations:  
*/

#define _raw_spin_lock_irqsave(lock, flags)     __LOCK_IRQSAVE(lock, flags)

#define __LOCK_IRQSAVE(lock, flags)
do { local_irq_save(flags); __LOCK(lock); } while (0)

#define local_irq_save(flags) ((flags) = 0)
OR
#define local_irq_save(flags)
do {raw_local_irq_save(flags);} while (0)

#define raw_local_irq_save(flags)                       \
         do {                                            \
                 typecheck(unsigned long, flags);        \
                 flags = arch_local_irq_save();          \
         } while (0)

From here it goes into architecture specific code.

#define arch_local_irq_save arch_local_irq_save
 static inline unsigned long arch_local_irq_save(void)
  {
          unsigned long flags, temp;
          asm volatile(
                  "       mrs     %0, cpsr        @ arch_local_irq_save\n"
                  "       orr     %1, %0, #128\n"
                  "       msr     cpsr_c, %1"
                  : "=r" (flags), "=r" (temp)
                  :
                  : "memory", "cc");
          return flags;
  }

What do these Assembly instructions mean:
1st:    mrs    flags, cpsr        @ arch_local_irq_save\n"
                    - MRS{cond} Rd,  cpsr ---> So basically we are copying the contents of cpsr into flags

So what does this CPSR do:

The Current Program Status Register is a 32-bit wide register used in the ARM architecture to record various pieces of information regarding the state of the program being executed by the processor and the state of the processor. This information is recorded by setting or clearing specific bits in the register.

ARM CPSR format

For our spinlock discussion, lets just talk about the I and F bits which determine whether interrupts (such as requests for input/output) are enabled or disabled.

So when in our assembly code:

orr     %1, %0, #128\n" ---> We do Logical OR of flags and #128 -> 0x00000080 (last 2 nibbles 10000000).

So if the interrupts are enabled the ORing at the 7th bit will disable the interrupts.

And then we copy the value of the ORing into temp. Finally we copy temp into CPSR to disable the interrupts.

Thats all the spin_lock_irqsave(flags) does. Disable the interrupts if they are already enabled, also save the CPSR into flags.

So when we do spin_unlock_irqrestore(flags). It is going to do the opposite.

But Now all this code digging is going for a toss when I say that the do while which I showed is actually not going to execute because its while(0). BUMP !!!. Essentially the point of spinlocks to do nothing in a uni-processor environment.

Now, lets talk about the SMP spin_lock case.
#ifdef CONFIG_INLINE_SPIN_LOCK_IRQSAVE
#define _raw_spin_lock_irqsave(lock) __raw_spin_lock_irqsave(lock)

#endif

/*
* If lockdep is enabled then we use the non-preemption spin-ops
  * even on CONFIG_PREEMPT, because lockdep assumes that interrupts are
  * not re-enabled during lock-acquire (which the preempt-spin-ops do):
  */
 #if !defined(CONFIG_GENERIC_LOCKBREAK) || defined(CONFIG_DEBUG_LOCK_ALLOC)

 static inline unsigned long __raw_spin_lock_irqsave(raw_spinlock_t *lock)
 {
         unsigned long flags;

         local_irq_save(flags);
         preempt_disable();
         spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
         /*
          * On lockdep we dont want the hand-coded irq-enable of
          * do_raw_spin_lock_flags() code, because lockdep assumes
          * that interrupts are not re-enabled during lock-acquire:
          */
 #ifdef CONFIG_LOCKDEP
         LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);
 #else
         do_raw_spin_lock_flags(lock, &flags);
 #endif
         return flags;
 }

#define local_irq_save(flags)                                   \
        do {                                                    \
                raw_local_irq_save(flags);                      \
        } while (0)

#define raw_local_irq_save(flags)                       \
         do {                                            \
                 typecheck(unsigned long, flags);        \
                 flags = arch_local_irq_save();          \
         } while (0)

Then again the architecture specific stuff.

 #define arch_local_irq_save arch_local_irq_save
 static inline unsigned long arch_local_irq_save(void)
  {
          unsigned long flags, temp;

          asm volatile(
                  "       mrs     %0, cpsr        @ arch_local_irq_save\n"
                  "       orr     %1, %0, #128\n"
                  "       msr     cpsr_c, %1"
                  : "=r" (flags), "=r" (temp)
                  :
                  : "memory", "cc");
          return flags;
  }

 #define preempt_disable() \
 do { \
         preempt_count_inc(); \
         barrier(); \
 } while (0)

#define preempt_count_inc() preempt_count_add(1)

#define preempt_count_add(val)  __preempt_count_add(val)

static __always_inline void __preempt_count_add(int val)
 {
         *preempt_count_ptr() += val;
 }

static inline void barrier(void)
 {
         asm volatile("" : : : "memory");
 }

I hope you enjoyed the ride into the kernel source for understanding spinlocks.

LUCI LUA MVC

Recently I had the opportunity to work on OpenWRT for a Network Security Product. The hardest part for getting the product finished and delivered was customizing the UI.

The default UI of OpenWRT uses LUCI Model-View-Controller Framework. In a traditionally know language like php, Java, javascript, working on MVC is always a pleasure for UI developers. I am by choice a Kernel programmer, and even I would, for once in a while case, touch upon and write some UI components. However, working on LUA based LUCI was a night mare, not only for a non professional UI guy like me, but even for veterans in UI in my team.

The primary reasons why this was the case with LUCI-LUA.
1. No documentation. ( I mean even the kernel has better documentation than LUCI-LUA)
2. The code is not written in a generally accepted and used language (Its written in Lua Language).
3. No one else has written a memoir about there tryst with LUCI-LUA.

So I decided that since recently I just finished adding a dozen custom changes to it, I should share them with readers who might stumble upon it.

The first feature which is not supported by LUCI-LUA is that it is mostly hard-coded with 1 admin user "root". Inherently it does not have multi-user support. But there are hacks lying around to make LUCI-LUA pseudo multi-user.


The reason it is pseudo multi-user is because, even though a new user "Admin" logs into the UI, the underlying user in the linux system is still "root", because the http server which serves LUCI pages is itself running as root.

Suppose you need to add a custom page for a new feature. As an example we add a new Tab under System tab which is to add External Admins configured in a RADIUS Server.

In that case take an existing Template ( like we all do while developing drivers....look into an existing similar code). Add your entry into the System.lua file and then a blank page with necessary CSS and theme automatically pops up with your Menu entry.

Then add your own lua script which does handling of all the logic and UI handling.

Code Snippets:

FIle: system.lua
entry({"admin", "system", "externaladmin"}, cbi("admin_system/externaladmin"), _("External Admin"), 97)

File: externaladmin.lua.

adminform = SimpleForm("userform1", translate("External Admin"))
adminform.reset  = false
username = adminform:field(Value, "uname",translate("Username"))
serverip = adminform:field(Value, "serverip",translate("Server IP"))
key = adminform:field(Value, "key",translate("Key"), translate("Table1: External Admins maintained
in External Radius Server. Table2: Radius Server with Server IPs"))

function adminform.handle(self, state, data)

if state == FORM_VALID then

local file = io.open("/tmp/myuser.log","w+")
local usernametxt = data.uname
local serveriptxt = data.serverip
local keytxt = data.key
local serverfile

if serveriptxt and keytxt and #serveriptxt > 0 and #keytxt > 0 then
command = "sh /root/radius_adt.sh "..serveriptxt.." "..keytxt
os.execute(command)
luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))
end

if usernametxt and #usernametxt > 0 then
command = "sh /root/sysauth_adt.sh add external "..usernametxt.." NULL"
--os.execute(command)
output = luci.util.exec(command)
if output ~= "" then
adminform.errmessage=output
else
luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))
end
end

file:close()

-- luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))

end

return true
end

--[[ This section adds table to display External Admins ]]--

tbluser = adminform:section(Table, userdata)

tblcoluser = tbluser:option(DummyValue, "coluser", translate("User Name"))
tblcoldel = tbluser:option(Button, "deluser", translate("Delete"))

tblcoldel.render = function(self, section, scope)
if userdata[section].enabled then
self.title = translate("Delete")
self.inputstyle = "delete"
end
Button.render(self, section, scope)
end

tblcoldel.write = function(self, section)
local userdel = userdata[section].coluser
command = "sh /root/sysauth_adt.sh remove external "..userdel.." NULL"
--command = "deluser "..userdel
os.execute(command)
luci.http.redirect(luci.dispatcher.build_url("admin/system/externaladmin"))
end

Rest of the template remains same. Much of my LUCI LUA experience has been constantly looking at existing code and putting debug prints to understand the flow.

If you want to add a Start up script UI element which allows you to DO a ENABLE/START/STOP/RESTART of your script, then just write a init script.

#!/bin/sh /etc/rc.common
# script to Run DHCP snoop module using previous state
# Copyright (C) 2015 Nevis Networks
# Author: Gadre Nayan Anand Version: 1.0
# This is a init script to enable, start, stop, restart DHCP snoop Feature.
# For More details on commands involved read User Manual.

START=58
STOP=58

FILE1=/etc/config/dhcp_sis
FILE2=/etc/config/trusted_interfaces

start()
{
        local p_m
        local u_b
        while IFS="=" read -r key value;
        do
                case "$key" in
                        "preventive_mode") p_m=$value ;;
                        "use_bridge") u_b=$value;;
                esac
        done < "$FILE1"
        echo preventive_mode = $p_m use_bridge = $u_b

        echo "starting DHCP SNOOP IP SPOOF"
        /usr/sbin/insmod  /root/dhcp_snoop_IP_spoof.ko preventive_mode=$p_m use_bridge=$u_b

while read -r key;
do
echo "a-$key" > /sys/kernel/dhcp/trusted_interfaces
done < "$FILE2"
}

stop()
{
        echo "stopping DHCP SNOOP IP SPOOF"
        /usr/sbin/rmmod dhcp_snoop_IP_spoof
}



And LUCI framework  is really pathetically rigid, especially with the GUI perspective. CSS and other aspects are very hard to change....atleast this is what I felt.

Sunday, October 11, 2015

Pulling up oneself by one's own bootstraps!!!

This has been explained by many on their respective blog posts around the world...so eh...why not another one!!

I went through some Wikipedia entries as well to understand how the world wide community explains this stuff...and i got some pretty new buzz words to write here...the Hen and the egg problem for example...explains initramfs quite straight forward manner.

So lets start and push your power button.

This probably causes the x86 microprocessor to reset, and hence loads it's program counter (PC)
with a fixed address near the top of the one-megabyte address space assignable in real mode.

Did I write real mode...Yes!! to all those undergraduates who have read that there are two or modes supported by the x86 processors Real, protected, etc....the only code that actually uses Real Mode 16 bit addressing scheme (No paging) etc...is the Boot-loader. Yes!! the boot-loader is the place in today's era that uses 16 bit addressable memory and instructions ( o_O again!!)...this is kept for backward compatibility.

A full power-on self-test (POST) is run (although you can stop POST from running evertime you boot -cold boot by pressing the 3 finger salute ctrl+alt+del,This saves the time otherwise used to detect and test all memory. ).  The POST checks, identifies, and initializes system devices such as the CPU, RAM, interrupt and DMA controllers and other parts of the chipset, video display card, keyboard, hard disk drive, optical disc drive and other basic hardware.

On a more detailed Note:

The BIOS does several things. This is its usual sequence:
Check the CMOS Setup for custom settings
Load the interrupt handlers and device drivers
Initialize registers and power management
Perform the power-on self-test (POST)
Display system settings
Determine which devices are bootable

Initiate the bootstrap sequence

Interrupt handlers are small pieces of software that act as translators between the hardware components and the operating system. For example, when you press a key on your keyboard, the signal is sent to the keyboard interrupt handler, which tells the CPU what it is and passes it on to the operating system. The device drivers are other pieces of software that identify the base hardware components such as keyboard, mouse, hard drive and floppy drive. Since the BIOS is constantly intercepting signals to and from the hardware, it is usually copied, or shadowed, into RAM to run faster.

Now the BIOS is just going to look at a bootlable device in the list of drives attached, be it Floppy or HD or CD or USB. Then jumps to the first partition at that drive and loads just 1 sector (512 bytes) at again a predetermined location. The 1st sector contains MBR (Master Boot Record ).

The MBR contains assembly code ( Not C code) since at that so very nascent stage of machine, the C run time environment it self is not set. Now the job of the MBR is to load the Bootloader (GRUB in Linux), bootmgr ( In windows).

The boot loader then loads the Kernel and initramfs. After the kernel is loaded, it unpacks the initramfs (initial RAM filesystem), which becomes the initial root filesystem. The kernel then executes /init as the first process. The early userspace starts.

The purpose of the initramfs is to bootstrap the system to the point where it can access the root filesystem (see FHS for details). This means that any modules that are required for devices like IDE, SCSI, SATA, USB/FW (if booting from an external drive) must be loadable from the initramfs if not built into the kernel; once the proper modules are loaded (either explicitly via a program or script, or implicitly via udev), the boot process continues. For this reason, the initramfs only needs to contain the modules necessary to access the root filesystem; it does not need to contain every module one would ever want to use. The majority of modules will be loaded later on by udev, during the init process.

At the final stage of early userspace, the real root is mounted, and then replaces the initial root filesystem. /sbin/init is executed, replacing the /init process. Arch uses systemd as the default init.

After the Display manager comes up then the Login screen pops up and you can then LOGIN!!!