Tracking Luup Restarts/Reloads

Do you know how often Vera does a Luup Restart/Reload? You may not care too much but it can be an early warning of problems.

The fine EventWatcher plugin will tell you when Vera was last rebooted and the time of the last Luup Restart and can also log this along with your other events in a designated file and/or on a SysLog server. Sometimes, though, a dedicated log can be useful.

The following code, placed in Startup Lua, maintains a simple text log of each Luup restart/reload. Along with the date and time of the restart, each entry shows the time since the previous restart (uptime) and the state of the network connection (LAN + Internet). This last part can indicate if Vera did a Luup restart because it could not access the network (an issue I have had). To avoid an ever-growing file, the log is purged of entries older than 30 days (adjustable).

The log can be accessed from a browser with the following URL. Change to the IP address of your Vera (without the <>):

http://<veraip>/restarts.txt

Log entries look like this:

 [b]17 Mar 2015  10:27:51.891  Restart. Uptime: 0 Days 12 Hrs 15 Mins. Network: OK[/b]

The code to be added to the end of Startup Lua:

[code]-- Log each restart.
local function restartLog(message, keepdays)
local lfs = require “lfs”
local socket = require(“socket”)
local time = socket.gettime() or os.time()
local tms = string.format(“.%03d “,math.floor (1000 * (time % 1)))
local stamp = os.date(”%d %b %Y %T”,math.floor(time)) … tms
local baseFile = “/www/restarts”
local logFile = baseFile … “.txt”
local lastLog = lfs.attributes(logFile,“modification”) or time
– Log this restart
local uptime = time - lastLog
local uptimeStr = string.format("Uptime: %d Days %d Hrs %d Mins. “,math.floor(uptime / 86400),
math.floor((uptime % 86400) / 3600), math.floor((uptime % 3600) / 60))
local file = io.open(logFile, “a+”)
file:write(stamp … “Restart. " … uptimeStr … (message or “”) … “\n”)
file:close()
– Remove old entries
local prevDays = keepdays or 30 – Days to keep in addition to today.
local tmpFile = baseFile … “.tmp”
local mTxt={Jan=1,Feb=2,Mar=3,Apr=4,May=5,Jun=6,Jul=7,Aug=8,Sep=9,Oct=10,Nov=11,Dec=12}
local dt = {}
local today = time - (time % 86400)
local cut = today - ( prevDays * 86400)
local tmpF,tmpE = io.open(tmpFile,“w+”)
if tmpF ~= nil then
for line in io.lines(logFile) do
dStr,mStr,yStr = string.match(line,”^(%d+)%s(%a+)%s(%d+)%s”)
dt.month = mTxt[mStr]
dt.year = tonumber(yStr)
dt.day =tonumber(dStr)
date = os.time(dt)
if (date >= cut) then tmpF:write(line … “\n”) end
end
tmpF:close()
local retn,err = os.rename(tmpFile, logFile)
end
lfs.touch(logFile,time)
end

local netOK = “OK”
if os.execute(“ping -c 1 8.8.8.8”) ~= 0 then netOK = “Offline” end
restartLog("Network: " … netOK, 30) – Last argument is log history in days.
[/code]

Edit: 17/03/2015 23:08 Code updated to prevent error when first run.
Edit: 19/03/2015 14:48 Changed terminology.

Thanks for posting this. Not to hijack this thread, but one of the things I am battling right now is luup restarts. How might one modify this to log those as well or does it do that as well?

Luup restarts (aka Reloads) are what this code logs.

Sorry. I thought it was vera restarts. Perfect! Thank you very much!

Sorry. I thought it was vera restarts. Perfect! Thank you very much!

Well the terminology is not perfectly defined. In my world, Vera Restart == Luup Restart == Luup Reload. The big gun, usually achieved by cycling the power, I call a Reboot.

There are other possible intermediate restarts but they are not much used.

Hi Rex,

FYI. For me it failed on the lfs.attributes call until I manually created a restarts.txt file

Cheers Rene

[quote=“reneboer, post:6, topic:186464”]Hi Rex,

FYI. For me it failed on the lfs.attributes call until I manually created a restarts.txt file

Cheers Rene[/quote]

Hi Rene,

You are quite right. I see the error. I have changed the code in the first post to fix it. Thanks for the heads-up.

I have updated the code in the first post. The new version attempts to log the reason for the restart. This is somewhat experimental. I am extracting this information from Vera’s signal.log file.

The only signal I am fairly sure of is that of a manual restart (6). For others, I just show the signal number. If I can get any more information on other signals I will add appropriate text messages.

Correction: This only worked on UI5 so I have removed the update until I can figure-out a UI7 alternative.

The evidence of what caused an unexpected restart is usually in LuaUPnP.log - in the minute or so before it happened. Whenever I try to investigate one, I invariably find that the logs have rotated since the restart so the evidence is much harder to find.

One solution is to add the following code to the end of Startup Lua. This takes a snapshot of the last 180 (configurable) seconds of the log. The snapshot is placed in the file /www/logsnap1.txt so it can be easily viewed with a browser with the following URL. Change to the IP address of your Vera (without the <>):

http://<veraip>/logsnap1.txt

The code also keeps the previous 2 (configurable) snapshots and just renames them to …logsnap2.txt, …logsnap3.txt, etc. The files are not very big so keeping a few may help to see if there is a common pattern to apparently random restarts. To keep more snapshots, just change the number in local snapNum = to the total number you want to keep. You can also keep interesting snapshots either by copying them to your PC using WinSCP or by saving them from your browser using Save as …

Code to be added to the end of Startup Lua:

-- Keep snapshots of LuaUPnP.log local snapSecs = 180 -- Seconds of log to keep in snapshot. local snapNum = 3 -- Total number of snapshots to keep. local baseFile = "/www/logsnap" local snapFile = baseFile .. "1.txt" -- Rename the snapshot stack if snapNum > 1 then for i=snapNum-1,1,-1 do os.rename(baseFile .. tostring(i) .. ".txt", baseFile .. tostring(i+1) .. ".txt") end end -- Take snapshot of current log local dt = {} local cut = os.time() - snapSecs local snapF,snapE = io.open(snapFile,"w+") if snapF ~= nil then local foundcut = false for line in io.lines("/tmp/log/cmh/LuaUPnP.log") do if not foundcut then mStr,dStr,yStr,hhStr,mmStr,ssStr = string.match(line,"^%d%d.-(%d+)/(%d+)/(%d+)%s(%d+):(%d+):(%d+)") if mStr ~= nil then dt.month = tonumber(mStr) dt.year = tonumber(yStr) + 2000 dt.day = tonumber(dStr) dt.hour = tonumber(hhStr) dt.min = tonumber(mmStr) dt.sec = tonumber(ssStr) tstamp = os.time(dt) if (tstamp >= cut) then foundcut = true end end end if foundcut then snapF:write(line .. "\n") -- Following line stops snapshot at point of crash. if string.find(line,"LuaUPnP Terminated with Exit Code:") ~= nil then break end end end snapF:close() end

It does take a few seconds to read through a large log file and strip out the last few entries so it will slow-down your restart slightly. I have tested this on UI5 and UI7 (7.0.5) and it doesn’t seem to cause any issues.

Edit: 19/03/2015 13:32 Code updated to run slightly faster.
Edit: 26/03/2015 01:20 Code updated to stop snapshot at point of crash.
Edit: 26/03/2015 01:50 Added comments on saving snapshots.

Luup restarts (aka Reloads) are what this code logs.[/quote]

Hi Rex… this still confuses me just a bit. I use the System+Monitor plugin as well, and it shows 3 things: Luup restart, which is what you show, Vera restart, which is I assume a reboot, and CMH, which also looks like some type of reboot.

I find that things are ugly when there are Vera restarts, not reloads, which happen frequently (admittedly when i’m messing around with the system).

Just want to make sure I have this right. Since it traps data around a Luup restart, and that ‘also’ happens around a Vera restart(reboot), it would be helpful there as well?

I don’t know why InfoViewer shows three events. I would interpret CMH Reboot as a complete reboot of the hardware.

To avoid any further confusion I shall just change my terminology. My code logs Luup Restarts/Reloads. These happen on user request, due to a crash or following a hardware reboot. They will also occur automatically during the installation of a new plugin and, in UI7, after changing some (but not all) configuration parameters.

[quote=“RexBeckett, post:11, topic:186464”]I don’t know why InfoViewer shows three events. I would interpret CMH Reboot as a complete reboot of the hardware.

To avoid any further confusion I shall just change my terminology. My code logs Luup Restarts/Reloads. These happen on user request, due to a crash or following a hardware reboot.[/quote]

Awesome… thanks

Rex,

Do you run this all the time? e.g. is there minimal overhead, or run it just for a set period to help diagnose restarts?

[quote=“tomgru, post:13, topic:186464”]Rex,

Do you run this all the time? e.g. is there minimal overhead, or run it just for a set period to help diagnose restarts?[/quote]

I run it all the time. The only overhead is for a few seconds during a restart - the rest of the time it does nothing. The basic log function doesn’t take too long to run - less than a second in my tests. The log-file snapshot takes somewhat longer - particularly if the log file is large. As far as I can see, it has no deleterious effect on Vera performance and it is a great improvement on wading through big files.

If you are suffering a lot of restarts, set snapNum to a larger value so it will keep more snapshots. This will give you more chances to spot any commonality. A snapshot is usually only 100-200kB so you should be able to keep a few without killing Vera’s memory. I pull mine off using WinSCP if they look unusual and keep them for future reference.

i’m going to definitely try this.

Both System+Monitor and EventWatcher told me i had a restart today at 11:30 am. (since i’m at work and can’t SSH in)… i contacted tech support (who answered right away - nice!)… and he looked through my logs and said that he can’t find any signs of a restart then.

I had the same answer a couple of months ago. I guess i should be happy, but i don’t like thinking something else is going on.

Trying this… loads fine. but when I check uptime, I get the right time, but none of the uptime stats:

24 Mar 2015 16:52:28.407 Restart. Uptime: 0 Days 0 Hrs 0 Mins. Network: OK

Hey, while I have you, do you have a handy way to send yourself an alert when there is a restart?

The Vera Alerts does this by default … it also indicates a Reload vs a Reboot.

cool.

any thoughts on the uptime stats not showing?

@RTS

Would it possible for a future version of vera alerts to include the signal or exit code in the alert it sends for restarts? (I am on UI5). That would let me know whether it was something I likely need to worry about (ie not a signal 6 / exit code 250).

If easy I wonder if it might also have the option to include the cpu and memory usage from top and perhaps the last minute of the luaUpnP.log file? That would make it a lot easier to track down problems even while traveling. Thanks !

That’s stepping a little outside of the problem space Vera Alerts was designed for …

But one could write some LUA code that run’s at Vera Start … packages up the info … and uses Vera Alerts to deliver the content.