Http connection left open

Lee · July 1, 2022, 7:26pm

We experience periodic hangs of our plugin, currently testing on several Ezlo Plus units–maybe one hang per 2-3 days per unit. Periodically, ssh also becomes unavailable. Reboot recovers.

Inspecting one of these hangs while ongoing, I observe our plugin is waiting for an http connection it initiated to close. And still waiting after 130,000 seconds (approaching two days). There appear to be 32 bytes sitting in a receive queue, unprocessed.

The server the Ezlo thinks it’s connected to has no open connections from the Ezlo (per netstat).

The server logs show no errors. All responses are 3447 bytes. 99pctl response time is about 1 sec with none over 2 sec.

On the Ezlo:

# netstat -tpeW
[...]
tcp       32      0 [Ezlo IP redacted --Lee]:52950            [host redacted --Lee]:https      CLOSE_WAIT  4978/ha-luad
[...]

I watched this for over an hour–this entry isn’t transient. I haven’t dug into the /proc/ entries or elsewhere enough to actually find a create timestamp for the connection.

Our http.request call looks like this:

    local request = {
        url = address,
        type = "POST",
        verbose = true,
        content_type = "application/json",
        data = data_json,
        content_length = data_length,
        fail_on_error = true,
        handler = constants.plugin_hub_script_path .. "utils/http_receive"
    }

    local success, connection_code = pcall(http.request, request)

99%+ of the time (maybe more nines!) everything works fine.

Is there anything we’re doing or not doing that can be leaving data behind in a buffer?
Does this represent an OS or plugin error? or is it in some way a feature?
We’ll start closing connections from our plugin after a reasonable time. It’s sort of gross if we retry after a successful send because connection closure got gummed up. Is there anything else we can try?

Thanks for your help!

Alvaro_Ochoa · July 1, 2022, 8:42pm

Hello @Lee ,

Since this is a specific issue related to your plugin, we will need to replicate the scenario and investigate further the error although we may take a couple of days, taking into account your statement related to the periodicity of the issue.

We will send you an email asking you for some additional details.

StefanyCantero · July 6, 2022, 12:55am

Hi @Lee ,

We sent you an email asking for some details. We’ll remain attentive to your response.

Lee · July 6, 2022, 8:48pm

To date, timing out connections by closing them seems to have solved my practical problem.

I really don’t see how anything in our plugin should be able to trigger this apparent problem at the OS level. I don’t currently have a reproduction available. So this is mostly by way of letting you know of a potential problem you may have and asking if there are best practices we should be looking out for.