Losing serial connectivity/lockups.

6 replies [Last post]
Pentala
Offline
Joined: 22 Oct 2011

I seem to be having an issue where the Livebox looses connectivity with my Jeenode and also the HAH board after a couple of days, sometimes its possible to ssh into the unit and see what is going on, at other times ssh will start to connect but will then freeze before login.

Using xFx Viewer, I can see xap messages for the hub and the attached CurrentCost but not for the Controller or Jeenode...

I managed to get the following output -

 

# date
Tue Nov 22 19:48:41 UTC 2011
# ps
  PID USER       VSZ STAT COMMAND
    1 root      2256 S    init
    2 root         0 SW   [keventd]
    3 root         0 SWN  [ksoftirqd_CPU0]
    4 root         0 SW   [kswapd]
    5 root         0 SW   [bdflush]
    6 root         0 SW   [kupdated]
    7 root         0 SW   [mtdblockd]
    8 root         0 SW   [khubd]
   31 root         0 SWN  [jffs2_gcd_mtd2]
   97 root      2248 S    udhcpc -T 10 -i br0
  112 root      1668 S    dropbear -p 22
  125 root      2240 S    inetd
  131 root      1028 S    /usr/bin/xap-hub -i br0
  134 root     11536 R    /usr/bin/xap-pachube -i br0
  144 root      5000 S    /usr/bin/kloned
  145 root      5012 S    /usr/bin/kloned
  147 root      2608 S    /usr/bin/xap-currentcost -s /dev/ttyUSB0 -i br0
  162 root      5012 S    /usr/bin/kloned
  251 root      1748 S    dropbear -p 22
  252 root      2268 S    -ash
  262 root      2244 R    ps
#

There were no additional messages is /var/log/messages other than the normal startup messages.

The web GUI showed question marks in all the locations where values would have been expected - then the box stopped responding to SSH or Web access (I was going to try and restart the lost processes manually). Ping still worked however XAP messages also stopped.

The Pachube process is running and 'good' values can be seen in the XAP messages prior to the lockup, however, Pachube itself is reporting no data.

Are there any other logs I can look at to see what may be causing the problem here?

Many thanks,

Andy.

 

brett
Offline
Providence, United States
Joined: 9 Jan 2010
You are missing a processes

Where is the xap-serial process gone?  Also where is the xap-livebox process?  Wait where it the plugboard process too.

You've got some sort of massive failure here .... Perhaps some sort of serial misconfiguration?

xap-serial is REQUIRED to be up to talk to the based JeeNode which will probably be on /dev/ttyUSB1 - given currentcost is on /dev/ttyUSB0.

Is this how you configured the jeenodeApplet.lua settings?

Pentala
Offline
Joined: 22 Oct 2011
This is the box after a

This is the box after a reboot with everything running as it should -

 

# ps
  PID USER       VSZ STAT COMMAND
    1 root      2256 S    init
    2 root         0 SW   [keventd]
    3 root         0 RWN  [ksoftirqd_CPU0]
    4 root         0 SW   [kswapd]
    5 root         0 SW   [bdflush]
    6 root         0 SW   [kupdated]
    7 root         0 SW   [mtdblockd]
    8 root         0 SW   [khubd]
   31 root         0 SWN  [jffs2_gcd_mtd2]
   97 root      2248 S    udhcpc -T 10 -i br0
  112 root      1668 S    dropbear -p 22
  125 root      2240 S    inetd
  131 root      1028 S    /usr/bin/xap-hub -i br0
  134 root      3356 S    /usr/bin/xap-pachube -i br0
  135 root      1068 S    /usr/bin/xap-livebox -s /dev/ttyS0 -i br0
  142 root      1044 S    /usr/bin/xap-serial -i br0
  144 root      5000 S    /usr/bin/kloned
  146 root      5012 S    /usr/bin/kloned
  147 root      2608 S    /usr/bin/xap-currentcost -s /dev/ttyUSB0 -i br0
  151 root      3824 S    lua /etc_ro_fs/plugboard/plugboard.lua
  161 root      5012 S    /usr/bin/kloned
  162 root      1740 S    dropbear -p 22
  163 root      2268 S    -ash
  165 root      2244 R    ps

and jeenodeApplet.lua
# cat jeenodeApplet.lua
--[[
    JeeNode to xAP Endpoint mapping
--]]

-- _DEBUG=9
module(...,package.seeall)

monitor = require("xap.jeenode").monitor
RoomNode = require("xap.roomnode").RoomNode
OutputNode = require("xap.outputnode").OutputNode

info={
   version="2.0", description="JeeNode"
}

       local jeemon={
       port="/dev/ttyUSB1",
       baud=57600,
       stop=1,
       databits=8,
       parity="none",
       flow="none"
}

-- Keyed by NODE ID
local nodes = {
   [2] = RoomNode{base="dbzoo.livebox.jeenode:conservatory", endpoints={temp=1,lobat=1}, ttl=900},
}

function init()
   monitor(jeemon, nodes)
end
#
As you say, for some reason a load of the processes are failing yet i can't see any logs to indicate why...?

 

brett
Offline
Providence, United States
Joined: 9 Jan 2010
Logging

None of the C program log.  If you run them on the command line they will report any fatal exit conditions.  Perhaps I might fix this and have them ALWAYS output to /var/log too so if they fail fatally at least this will be logged.  I'll do that.
I've been running them 24/7 for months and have yet to have one fail, esp so many at once.

If the lua (plugboard) process dies it does write to /var/log/xap-plugboard.log however you *must* check this file before rebooting as its in the memory filesystem so its will not persist past a reboot.

Brett

derek
Offline
Glasgow, United Kingdom
Joined: 26 Oct 2009
Check the Livebox

Like Brett, I've been running a few JeeNodes for quite a while now. I do a 'planned reboot' of my HAH every fortnight, but apart from that, I've not seen the sort of process chaos that you describe.

It might be interesting to see if your Livebox is happy when running in a 'minimal configuration' e.g. just the xap-livebox process, no Lua, no xap-serial. If this is good, it helps to rule out the possibility of a hardware glitch on the box.

One other thought ... the BaseNode firmware can run either in a mode where 'junk' packets of data are reported to the HAH or in a mode where they are not passed on. Since the strings of 'junk' data can be pretty long and give the xap-serial process a lot more work to do, it might be worth running with these suppressed (this is the mode that I use). 

Derek. 

brett
Offline
Providence, United States
Joined: 9 Jan 2010
HAH Central node

I've posted up a replacement for the RF12demo that you can run on the base HAHnode.  This operates by default in QUIET mode, also it won't accept A-Z as commands to change the node ID which can mess with your configuration.

Flashing from the HAH would be like this:

/etc/init.d/xap stop xap-serial
stty -F /dev/ttyUSB0 hupcl
avrdude -v -c arduino -p m328p -P /dev/ttyUSB0 -b 57600 -Uflash:w:HAHcentral.hex
reboot

Pentala
Offline
Joined: 22 Oct 2011
Thanks Brett, I'll re-flash

Thanks Brett, I'll re-flash the base node with the new code. I am already running the node in 'Quiet' mode although I did notice that on a couple of occasions it seemed to lose its Group ID and went back to the 'Noisy' mode.

I'll let you know how it goes.

 

Andy

Hardware Info