/build/static/layout/Breadcrumb_cap_w.png

Upgrade from 13.0 to 13.1 breaks the box

We recently attempted to upgrade from KACE SMS 13.0 to 13.1 and the upgrade process completed successfully according to the upgrade webpage. However, upon restart, the box complained about missing files as soon as it attempted to initialize mysql. All consecutive services failed to initialize as well, and we ended up with a box that we can no longer access via web interface. KACE support suggested downloading a 13.0 image from their website and restoring the database from our backups. That got the box running again and we only lost 24 hours worth of work which was annoying but tolerable. We attempted to upgrade to 13.1 again with the same exact result. KACE support so far has been less than helpful with pre-canned answers that we reply to only to be asked same questions over and over again. Has anyone else experienced issues upgrading to 13.1 from 13.0? We are running the appliance in Server 2016 Hyper-V environment.

0 Comments   [ + ] Show comments

Answers (17)

Answer Summary:
Posted by: airwolf 11 months ago
Red Belt
8

SMA Principal Developer here. The dev team is following this issue closely, but we cannot reproduce it in-house and are unable to diagnose it without investigating an affected system pre- and post-upgrade. There are only 2 customers in this thread with the same upgrade issue/symptoms (described in the original post) so far, and both are in touch with Support to continue investigating. If you are affected and have not yet reached out to our Support team, we encourage you to do so. We'll need your cooperation to get to the root cause.

This appears to be an isolated issue, as hundreds of customers have upgraded to 13.1 without issue. Unfortunately, until we understand what is going on here we cannot offer any sort of solution.

Rest assured we are taking this issue very seriously, and we apologize for the inconvenience caused to our affected customers.

EDIT 5/11/23 @ 12:19PM CDT:

Root cause discovered with assistance from dstarrisom (thanks so much!). This issue is being tracked as K1-34089 and can be referenced when discussing with support. The issue is caused by the secure remote database access certificate (configurable under Control Panel -> Security Settings) being either expired or too weak for the upgraded versions of openssl and MySQL in SMA 13.1.74. Customers can preemptively regenerate these certificates on that page prior to upgrade (if you've bumped into this and reverted snapshots or restored from backups) by choosing the "Override Default Certificates" option under the "Enable Secure database access (SSL)" option on Control Panel -> Security Settings page, checking the box labeled "Reset to Default Certificate Files" and clicking "Save and Restart Services". This will regenerate the certificates prior to upgrade and everything should come up fine after your next upgrade attempt. If you are unable to or would rather not revert/restore, the issue can also be resolved post-upgrade by Support staff but requires backend access and is not something you can DIY.

Thank you to everyone for your patience and assistance in finding a swift solution for this issue. We will take steps to prevent this from occurring with future upgrades.


Comments:
  • Can we send you a backup of our DB? You could restore it to a 13.0 VM and then attempt an upgrade.
    Our Case number is 02097115 - ager01 11 months ago
    • Thank you, that may be an option. I'm discussing with the team now. - airwolf 11 months ago
      • I uploaded the DB backup - ager01 11 months ago
    • I've added a lengthy edit to my original post and want to make sure you see the details. Support should be reaching out to you as well. - airwolf 11 months ago
  • I have an 11:30 AM (eastern) call set up with support (case 02104841) and one of your engineers to GoToMeeting into the system with root credentials. Are you interested in joining? It's a VM and I take a snapshot of it powered off, so we can do all sorts of testing on it afterwards. We have already declared a maintenance window to facilitate this meeting. - dstarrisom 11 months ago
    • I'll be there with additional dev team members. - airwolf 11 months ago
  • I am happy to report that "Override Default Certificates" prior to upgrade resolved our upgrade issue and we are now running version 13.1. Please post this guidance on the download page to save us hours of downtime. Thanks! - ager01 11 months ago
    • I'm glad you were able to upgrade successfully! Support is working on publishing the details of this issue and workaround to the portal / knowledge base. - airwolf 11 months ago
    • worked for me as well - jjayko 11 months ago
Posted by: Channeler 11 months ago
Red Belt
2

Thanks for posting your findings team,

@dstarrisom
These screenshots point to mysqld daemon being down\not running.

Without mysql many other servies will not start.

We could start by checking the upgrade.log file, to see how far it went, and if it failed in the middle ground, might be a good a idea to restore from backups and perform a mysqlcheck.

Contact support for this.

Also to everyone here, snapshots are not supported:
https://support.quest.com/kb/4368176/regarding-third-party-virtual-machine-backup-and-kace-appliances

See this post:
https://communities.vmware.com/t5/Virtual-Machine-Guest-OS-and-VM/Snapshots-of-servers-with-databases-how-stable-is-it/td-p/1051928

Snapshotting a VM with MYSQLD writing transactions into the DB, is a VIP ticket to Filesystem and DB issues; ideally you should power off the VM, then take a snapshot, and then power ON the VM... but  this scenario does not occur in real life, where 24/7 business time and downtimes are limited.

Back to the first Link.

See this post for a related topic:

https://communities.vmware.com/t5/VMware-vSphere-Discussions/VM-snapshot-problems-with-databases-circumvent-by-shutting/td-p/2890115

You might want to share your case numbers here, since part of the support team is monitoring ITNinja.
(the outcome might be the same, upgrade failed\crashed the SMA, but the reason behind might be different for some of you).



Comments:
  • Hi Channeler,

    My snapshots were taken while the system was off (during a maintenance window for the upgrade).

    The problem has been trying to find a coordinated time for support to review my appliance while it's in the broken state. This session with support is now scheduled for late morning eastern time, tomorrow.

    My case number (02104841) is also in my first reply to this question from earlier today. I've also specifically asked support to review this question/posting. - dstarrisom 11 months ago
Posted by: Nico_K 11 months ago
Red Belt
1

this is really unusual. but support is the only who can help.
Is this a physical or virtual appliance?
And usually restoring is the fastest way. So verify with support what happened.
It is helpful to open a tether before the update (since after it iseems not to be possible according what you wrote) and let

Posted by: dstarrisom 11 months ago
Yellow Belt
1

After the upgrade to 13.1, this flickers across the screen:


9k=


Then the system boots to this:


jMPxPFbrzXwBAAAAACzJGH+jlNGUs013ZwAAAABJRU5ErkJggg==

Posted by: ager01 11 months ago
Senior White Belt
1

Absolutely identical! The errors start appearing when mysql is getting initialized. The box retains its network connection and can be ssh'd into, but when I initially contacted support hoping they'd want to ssh and see what's happening they instead suggested starting with a new 13.0 image and DB restore. The restore worked as expected, but another update to 13.1 failed with same symptoms. What was your experience with Quest support?

Posted by: dstarrisom 11 months ago
Yellow Belt
1

Glad to hear it's not just us!  We're a VMware environment, so I'm guessing we can rule that out (not that I really thought it had anything to do with the problem).

Posted by: dstarrisom 11 months ago
Yellow Belt
1

In corresponding with support this morning, they are interested in me running a file system check.

Support has also reviewed this question/thread and been asked to consider that maybe there is a systematic issue.  Response: "As for now no defect has being identify with the upgrade."

Posted by: ager01 11 months ago
Senior White Belt
1
I am surprised they are not interested in ssh-ing and examining the logs
Posted by: dstarrisom 11 months ago
Yellow Belt
1

I have a late morning appointment tomorrow for a tech to do a remote meeting and execute a filesystem check via Putty using root credentials.  More to follow after that meeting and a subsequent update attempt.

Posted by: ager01 11 months ago
Senior White Belt
1

I created another SMA VM on a different machine and restored our database to it, and then attempted to upgrade to 13.1. Same result with files missing on reboot. It's definitely not a host machine issue. 

Posted by: ager01 11 months ago
Senior White Belt
0

this is a virtual appliance running in Server 2016 Hyper-V. No error messages during the upgrade, see the screenshot

8PrlBGFqgQd3IAAAAASUVORK5CYII=

Posted by: CarstenBuscher 11 months ago
Purple Belt
0

We had the same problem at the weekend. (VMWare)

[Sun May 7 11:59:56 CEST 2023] [notice]     applying Infrastructure Upgrades...
[Sun May 7 11:59:50 CEST 2023] [notice] Starting software updates ...
[Sun May 7 11:59:50 CEST 2023] [notice] DB update completed.
[Sun May 7 11:59:50 CEST 2023] [notice]     restore_report_schedules done

After this nothing more happend. So i contacted the Support. After they looked the appliance they said, that we need to reinstall the appliance with the last backup. I hope that the next try runs fine. But this time with a snapshot of the VM.


Comments:
  • and this is why we snapshot before applying patches. If your virtualisation environment can snapshot, always snapshot before apply updates. Just remember to remove them otherwise you will cause yourself other issues. - Norlag 11 months ago
    • We used to do that all the time, but for some reason we forgot this time. - CarstenBuscher 11 months ago
  • This issue is not related to the issue the other customers are having in this thread. If your upgrade is hanging in the middle like that, it's not something that can be diagnosed without inspecting the system while it's hung. - airwolf 11 months ago
    • I reported this and a support technician got in touch and restarted the VM. After that, we re-established the SMA and restored the backup, which caused the agents not to update the inventory.

      All in all, I have to say that the quality of the releases has dropped significantly in the last two years. Such problems did not exist before. - CarstenBuscher 11 months ago
      • Bugs exist, have always existed, and will always exist in all software. There has not been any sort of increase in agent communication issues or bugs, in general, between recent versions. In fact, we fixed a record breaking number of bugs in the 13.0 release.

        Please continue to work with support and feel free to have them pass along any feedback you'd like to the product management team. - airwolf 11 months ago
  • I restored the backup last night and everything seemed to be working fine again. However, over the course of the day it became apparent that the computers create the inventory locally, but this is not updated in the SMA. We uninstalled and reinstalled the agent on several computers as a test, but that didn't bring any improvement either. - CarstenBuscher 11 months ago
    • You'll have to work with support to diagnose that issue, as there are countless factors that come into play with agent inventory uploading and processing, including environmental concerns. - airwolf 11 months ago
Posted by: dstarrisom 11 months ago
Yellow Belt
0

I believe we have the same issue.  Case#02104841 

Our system is virtual, so I did a snapshot pre-upgrade and just kept rolling back whenever it'd fail.

Support was given tether access to the system.  They did something to the DB and asked us to try upgrading again.  Still broken.  Support gave up and asked us to start with a new OVF deployment and restore the backup.

I originally asked support if multiple customers were complaining about this and was told 'no, it's just you."  I'll be replying with a link to this thread in my case suggesting they have a systemic issue.

Posted by: ager01 11 months ago
Senior White Belt
0
I enabled tether and support just emailed saying they can't find anything wrong and that I should backup the DB and attempt to upgrade again. This has got to be something in the upgrade scripts since I am not the only one reporting this issue 
Posted by: dstarrisom 11 months ago
Yellow Belt
0

I cannot leave it broken for any length of time and coordinating with support could take hours.  After a failed attempt, I usually just roll back.  Might have to consider some additional coordination or just rolling out the 13.1 OVF and trying to reload the 13.0 backups.


Comments:
  • well, this will not work - Nico_K 11 months ago
    • To elaborate, backups must match the server version exactly for restore. So, you cannot restore 13.0 backups on a 13.1 OVF. - airwolf 11 months ago
Posted by: ager01 11 months ago
Senior White Belt
-1

yes, their support response time and level of expertise seems to have gone downhill

Posted by: jct134 11 months ago
Senior Purple Belt
-2

Per support, we updated to 13.0 to see if an issue with shellscript dependencies NOT pulling from replication shares would be fixed... (NOT)


Then we updated to 13.1 (had NO ISSUES with the update) shellscript dependencies issue still BROKEN (and support confirmed the test lab also has same issues)


However, after the update to 13.0 (and 13.1) if we try to email in a process starting ticket to our employment queue the process does NOT launch properly!!

And support confirmed that they are have the same issue on their end... it has been over a week... many many logs, tether enabled etc...


just FYI to anyone else that might trigger major processes through email...


Luckily the 1st ticket (parent ticket) is created, and we can start a manual process with the info from the parent ticket, but what a pain in the arse, considering all the custom work (and paid professional services work) that is now not 100% functional because of their crappy update(s)


Also if you have reports that are scheduled to e-mail the results (in hmtl or txt format) into a ticket queue the .html or .txt file is somehow removed during the process ugh! and you get a ticket without the attachment ugh!

Hope they get this fixed soon!


J

Don't be a Stranger!

Sign up today to participate, stay informed, earn points and establish a reputation for yourself!

Sign up! or login

Share

 
This website uses cookies. By continuing to use this site and/or clicking the "Accept" button you are providing consent Quest Software and its affiliates do NOT sell the Personal Data you provide to us either when you register on our websites or when you do business with us. For more information about our Privacy Policy and our data protection efforts, please visit GDPR-HQ