Whilst I agree in part with your suggestion that the release candidate should go through an additional stage of testing (release candidate) by the time that a firmware upgrade is offered to the “normal user” it should have been fully tested in its FINAL form and demonstrated not to present issues to a very high percentage of, if not all, installation (well one can hope). From the posts on this and the other related threads this does not seem to have happened for this release. In addition if there are known issues that may affect some users then these should be clearly flagged for review by the user prior to the user being able to initiate the firmware update process.
in my opinion it is clearly implied by a firmware upgrade being offered via the normal UI7 user interface (Settings - Firmware) that the company believes that the firmware being offered is suitable for use by a normal user in a production environment and that no known issues that have not been clearly flagged directly to the potential upgrader exist.
I was able to go to 11 days with the alpha (basically, Luup engine + 7.29) and never got it with the beta (new kernel + new luup engine). The record was 4 days with the beta, but right now I’m more at 12/24 hours. That’s still better than 7.29 in terms of reliability and stability (especially regarding ZWave stability and performance), but it was better with a Frankenstein build (7.29 + alpha luup engine) than with the beta. That’s a regression and I’d never shipped it (and when I was in charge of shipping software projects, I never did, even under pressure). Shipping bad software is never a good thing to do. If you have reports that your software contains reproducible bugs, shipping it will only cause more work for your support department and frustration for your users. This will not change if you do B2B or B2C, it will hurt your users anyway. Period.
That’s unfortunate, because, as me and @rafale77 already said, the alpha was stunning, almost fantastic and we were enthusiastic about it. Then this release, all the low level changes under the hood without being neither fully tested nor disclosed: it’s normal we’re pretty annoyed, but that’s life. Now, I truly hope 7.31 will not take months and they can start fixing things that are still causing problems like these.
Just a word on the bright side: after my quasi-bricking event 2 days ago, as all/most others with VPs, which I solved through SSH’ing and restoring, the fact is that the thing is rock solid (hope I’m not speaking too soon ;)).
I’ve had no reboots, no weird messages, no dropouts, auto scenes are running as they should… If this stays like this, I won’t need any more updates and would simply spend my HA time making up new automation and adding devices, as it should be, and forget I have a controller with hardware and an OS.
Just wanted to shed some light to this thread and give Mios/Ezlo some cred.
My Vera Secure has been rock solid for 2 years since day one and I never had an issue with any FW upgrade.
Today I installed 7.30 without any hassle and it upgraded in under 8 minutes.
Even though it has some known issues, I say good job devs!
I have been trying to explain this. During the beta testing we found out that the vera upon reboot as @dJOS mentioned has >50% chance to not come back up. It enters a recovery mode causing it to not acquire an IP address and just flash its internet and service light eternally. That’s what we call the “Xmas light” mode. It appears random and is connected to the changes on the network monitor program aggressively looking to connect to the cloud servers. The only workaround has been to kill that program at boot but it has for consequence to prevent you from reliably connecting to the remote access mirror server which the mobile app uses. This problem affects all units >50% if the time. @rigpapa has investigated this even further than I have.
We reported this problem quite extensively. We also reported the fact that a manual reboot was required and that the upgrade wipes out the user data. All of these are critical issues needing to be addressed before release in my opinion.
You also can’t fully downgrade! I also reported this as the kernel does not get downgraded when you try to reinstall 7.29.
@therealdb, my record breaking uptime is on a unit with no cloud connections and with many accesses to mios blocked, no network monitor and also no kernel changes. It is on my emulator which runs a much newer kernel and OS than vera is running.
I just upgrade my test unit to 4833 and humorously, it went as smooth as butter… That’s on an empty unit with no plugins or devices.
So… after running the upgrade I am finding out that:
There is a regression which was supposed to have been fixed with the 4800 but has come back: SFTP is not working. SCP does.
extroot is still not working as the kernel is blocking it.
network monitor problem is still there. Though I was lucky enough to have rebooted twice without seeing the Xmas light problem in a row.
I am basically seeing one more regression vs. the latest beta and no fixes since the beta.
edit:
Managed to mount the ssd using my ssd730.sh script. Installed ALTUI and and now deleting the rogue apps.
I know. I was able to get to my record with a Vera Edge, 50 zwave devices, plug-ins and cloud connections. That’s why this debacle is unfortunate, the alpha had a lot of potential.
The Xmas lighting is a mode which basically tells you the unit can not connect to the internet. This has been on the unit for a long time and we do not yet understand the issue well enough to fix it. If we did not make any release until the product is perfect we would not make any release at all.
The biggest factor causing issues on live that have not been seen on beta is that the upgrade method is different. You are right in saying that this is something that needs to change in the process.
We found an issue with the upgrade process on Vera Plus. If the unit has bad blocks at certain address ranges the process hangs. We’re working on fixing this and will patch the release as soon as possible (first part of next week). This issue that we found is also old, it’s been lurking for quite some time in this upgrade process. We do test this in our QA, however they didn’t have units with bad blocks in the proper address ranges in order to cause this behavior.
Also, there’s another issue with the upgrade process that created confusion with our users: some files need to be downloaded during the upgrade process (the 15min timer), which on slow networks it takes more than the 15min.
I have 100/40 Mbps internet and still nothing happened after 15 mins. You guys need to make it a Functional progress bar rather than a wild ass guess as to how long it’ll take.
Show the download progress and if for some reason auto reboot fails, show an “It’s now safe to reboot” message once the Firmware package has been checked and found to be complete.
I restored from backup which only partially worked and was then forced to do a manual factory reset (reset button pressed 3 times) and then the restore worked properly.
The problem here is that this is happening in the Boot process and causing a crash - it shouldn’t be an issue if there is no internet connection and TBH this should be the last thing Vera does during startup. Not early in the boot process as it currently appears to be.
Also worth noting, my Plus is using wired ethernet, WiFi and the DHCP server are disabled - I’ve actually had internet outages cause my Plus to crash and then boot into Xmas tree mode on 7.30. I’ve also rebooted my plus manually (LUA command or via SSH) intentionally when my internet was just fine and it boot into Xmas tree mode.
Thanks for bringing some light to the situation during the weekend
This is odd because on my network, with perfect connection to the internet, this has only happened with the beta 7.30 and never before in many years. The rate at which it occurs is also extremely high (for me 80% of the reboots) with again perfect connection to the internet monitored by pfsense with regular internet failover pings with dual wan. I really think something new is happening here with the network monitor. Also I am not advocating for release only when it is perfect. I am saying that these are critical defects which should prevent a release.
Since I had no problem upgrading mine (except for the xmas lighting mode), I can’t comment on the rest of your findings but I am glad you are looking into them.
For information; my four year old VeraPlus reports no bad blocks.
I haven’t tested the final upgrade.
This unit has been very good to me in general, haven’t had much of the errors others reported on earlier upgrades either…
Bad blocks don’t get fixed. It is a hardware issue with the flash cells no longer able to retain its charge. All the kernel can do is to skip writing on that block. It will continue to report it as bad.
I am on day 21 without reload on for the beta version. At the next reload, I will upgrade the production unit… Previous record was 23 days.
No, they should still show as bad. I have a few units with bad sectors and they always show as bad in the test. The test and marking actually occurs at every vera boot up. You can check it with the dmesg command.