MDB_MAP_FULL error writing table to storage

The code generating this error is basically:

local storage = require “storage”
storage.set_table(storage_key, datatable)

How would the plugin get access to the lmdb library? require “lmdb” doesn’t work. Nor does
the storage API doesn’t expose the lmdb environment or the path to the database.

Santiago,

Are you saying the Ezlo storage API exposes the LMDB of its implementation so we can replace it? Or proposing we hook into a Lua global variable to change it? Or that we bypass the Ezlo storage API?

Which LUA wrapper are you using? Is it GitHub - shmul/lightningmdb: Lightningdbm is a thin wrapper around OpenLDAP Lightning Memory-Mapped Database (LMDB) or something else?

All the best,

Lee

@duran_duran can you please put a little more flesh on the bones of you Dec 2021 reply? We have been unable to figure out how to apply your suggested change.

@duran_duran And a follow-up: it looks like we’re getting the MDB_MAP_FULL error after writing progressively smaller amounts of data, as if there’s some garbage accumulating.

Is there a way for us to enumerate the entire content of the database, either through Lua or an external tool?

We’re not seeking to set up an additional lmdb database, of course, but to enlarge the one backing your storage API. It might be good to know where that is.

@Leonardo_Soto Maybe you could take a look at this thread where we haven’t had any response in months?

MAP_FULL is playing a key role in my not being able to add more than 43 devices to my Ezlo Plus, as it sets my Zwave chip into perpetual restart:

2022-07-01 07:20:52.745522 INFO : Zwave reset is completed

2022-07-01 07:20:52.748770 ERROR: Exception ‘N4lmdb5ErrorE’ was caught:MDB_MAP_FULL: Environment mapsize limit reached

terminate called after throwing an instance of ‘HubPlatform::Services::TaskLoop::TaskException’

what(): TaskLoop: uncaught exceptions

Aborted (core dumped)

**** zwaved Restarted at Fri Jul 1 07:20:55 PDT 2022

2022-07-01 07:20:55 INFO : Logs folder was changed: //var/log/firmware

2022-07-01 07:20:55 INFO : addon.zwave: at-release/1978 [2022-05-05T11:14:51+0000]

2022-07-01 07:20:55 INFO : addon.zwave: Spread: connected to “4803” with private group “#addon.zwav#localhost”

Cannot set programming mode for ZW chip!

Z-Wave stick based on EFR32ZG14 is supported

Zwave hard reset is requested

2022-07-01 07:20:58.233546 INFO : zwaveModuleResetPinAssert: Assert reset on zwave chip

2022-07-01 07:20:58.234552 INFO : zwaveModuleResetPinRelease: Deassert reset on zwave chip

2022-07-01 07:20:58.735915 INFO : Zwave reset is completed

This is not even 2000 sqft - how can anyone call this a serious system if it can’t even operate a smaller-than-average-house?

Hello Dan,

We’re sorry about this issue. Unfortunately, there might be an issue with your particular setup causing the storage environment to crash. Your Ezlo controller should be able to handle more than 43 devices. We’d like to replicate this and let you know what else we can find. Please open a ticket with us indicating your controller’s serial number and whether you give us permission to create support credentials.

Hi Lee,

We’re sorry about the delay. Rest assured that we’re closely looking into this. We have asked the developers to shed some more light on the storage API not exposing the lmdb environment to set an appropriate map size. We’re waiting on their input.

We apologize for the inconveniences.

Hello @Lee @Dan-n-Randy
Thank you for bringing up this topic. We reviewed the issue you described. We plan to create a new FW build with a fix for DB size by the end of the week. We can provide you with this early beta update if you want to try the fix.

Thanks for this heads-up. I’m juggling a number of issues and won’t be able to try out a beta. I’d be happy to review docs for API enhancements, and so on.

@Max @duran_duran Any update on this issue? I’ve looked through recent release announcements and not seen anything.

Hello @Lee
That fix was in the release candidate of our Linux FW. I’m checking that version’s current status and will return to you with the info.

Hello @Lee
The new FW version has been released to beta. We are planning to release this version live in one week.

Hello all,
Hello @Lee

Today we released a new version of Linux FW that contains a fix for the issue reported in this group.

@Max

Thanks for the update! We’re looking forward to more storage. Three-ish quick questions:

  • What do I actually have to do to take up this update in our controllers?
  • Was there a straight-up bug being fixed, or what’s changed and what do I have to do?
  • Part of our original request was for more insight into what this storage subsystem is doing, and how. I still don’t feel like I’ve got that. Can you say more here, or at Storage - Ezlo API Documentation , about how storage capacity is managed in the current implementation to address some of the questions raised upthread?

All the best,

Lee

Today the new version of FW has been pushed to live. It might take approx 24 hrs to upload that new version to all controllers. You can force an update by switching the controller off/on. I’d suggest checking the current version of FW that you have on your controllers first.
As for your next two questions, I’ll discuss them with our devs and come back with comments.

@Max Thanks. I confirmed they weren’t updated yet before asking. I’ll watch the normal process proceed over the next day before taking action.

Hello @Lee
Please, find below additional info related to the issue discussed in this thread.

Problem
We had a limitation of the amount of storage (both main and temporary) - 1Mb. It caused issues in serving approx 50 ZWave devices. For test_plugin devices, this number is around 700.
Also, we have a limitation of 1Mb of storage for every plugin, so, all that space may be allocated to a single plugin.
The reason for such a behavior is a bug in the lmdb library: the default memory map size in the docs is 10Mb, but in fact, it is 1Mb.

Solution
The solution is to increase the amount of data to save more data on a controller. We applied the following logic:
Allocate 30% of the Flash drive, but not less than 10Mb and less than 1Gb of main memory.
Allocate 10% of the RAM, but not less than 10Mb and less than 1Gb of main memory.
After that, the storage.bin and temp.storage.bin have grown up to these limits and are able to store all needed data.

@Max Thanks for that update. We’ll do some destructive testing to understand what that means for us.

Another angle has come up. I’m happy to address it elsewhere if that’s better…

Only one of our four Ezlo Plus has taken up the version update to firmware 2.0.30 automatically. Per ezlogic.mios.com, the other 3 are still on 2.0.29. I’d happily check also by logging into the controller, but while I’ve looked at many files in /etc with various version numbers, none hold the version number of the firmware. Maybe you could point me in the right direction and also suggest what might be going on with the boxes not updating. I really don’t want to fix this symptomatically, but to get at the root cause!

While we see an increase in capacity, it’s strangely short of what we would expect, and we are now getting a different error message.

We would really appreciate documentation and tools to form deeper insights about what’s going on.

After updating to your newest firmware, testing shows we’re able to store 3.5-7x more items of data than before. The file /home/data/storage.bin which I think is where it is stored, is now about 3.5MB.

We are primarily storing a single large Lua array that can now hold about 710 items before failing. A maximal entry is around 900 bytes, based on inspection of storage.bin. So we can account for about 650K of the 3.5MB footprint of storage.bin.

We no longer see MDB_MAP_FULL but instead:
Fatal error: DbLuaBinding: storage.setTable: fail to set a table value

When this happens, we remove the oldest event in the array and try saving again. The total size currently hovers around 710 entries.

Can you help us understand what’s going on?

P.S.
The storage bin is open in 154 simultaneous file descriptors. I don’t even know whether that’s a smell, but I thought I’d mention it.
/tmp/log/firmware# lsof | grep /home/data/storage.bin$ | wc
154 1602 16786