Crashing by polling user_data every 5 minutes

alejandro.liuly · March 23, 2020, 12:23pm

Hi, I have a shell script that measures my veraedge response time every 5 minutes:

#!/bin/sh
  time_then=$(awk '{print $1}' /proc/uptime)
  base="http://localhost:3480/data_request?"
  xml="&output_format=xml"
  wget="busybox wget"

  for ok in "id=alive"
  do
    isOK=$($wget -q -O- "$base$ok")
    [ x"$isOK" != x"OK" ] && exit 1 # System is down!
  done

  for probe in "id=user_data$xml"
  do
    $wget -q -O/dev/null "$base$probe"
  done
  time_now=$(awk '{print $1}' /proc/uptime)
  awk 'BEGIN { print "resp.value", '$time_now' - '$time_then' ; exit }'

If I run this, polling the system every 5 minutes, the veraedge crashes.

Anybody else experienced this?

Any suggestions on how to fix this?

rigpapa · March 23, 2020, 1:22pm

Can you describe what “crashes” means? Does Luup reload? Does the entire box reboot? Does it just become unresponsive to anything? Is it just unresponsive at the UI but automations keep running? These are all important distinctions.

Requesting user_data in its entirety may require the Vera to build a very large result, particularly in XML. The user_data structure is very large, even on a small system. I have nearly 200 devices (ZWave and virtual/plugin) on my production Vera, and user_data in XML form for me is about 2MB (2048K), and takes a little over three seconds to generate and transfer. JSON format is about the same size, but takes less than 1/2 second to generate and return. Typically, the UI, for example, will request user_data once (in JSON), and then request deltas (changes) from that initial result, so its use is more efficient.

Frequent requests for all of user_data could conceivably cause spikes in memory usage leading to a crash. But perhaps more to the point, you’re probing the system’s responsiveness by doing one of the most responsiveness-killing tasks you can do.

Since 7.30, I have observed that lighttpd sometimes hangs returning large responses. Mostly I have noticed this when requesting the LuaUPnP.log file contents; I have not seen it in other contexts, but it could be happening here. Normally, the service restarts itself within a minute or so when it happens, so it self-heals and thus may be less noticeable to most users. Since you didn’t describe what “crash” means, this may or may not intersect the behavior you are seeing.

akbooer · March 23, 2020, 3:19pm

Yes, I have often found that repeated HTTP requests (in a short period of time) can crash Vera. However, many plugins (AltUI, VeraBridge) frequently poll status without problem.

If you just want to get a response time, you could go for /data_request?id=alive which simply reponds with “OK”.

alejandro.liuly · March 23, 2020, 10:01pm

What I mean specifically with crash is that:

the UI is not responsive.
ping works, but ssh does not yield a login prompt
If I am logged (via ssh) when this happens, I am unable to run anymore processes. I would just get a “Segmentation fault”.

It does look as it is out of memory.