Skip to content

VM Snapshot: Prevent vm snapshots being indefinitely stuck in Expunging state on deletion failure#4898

Merged
yadvr merged 2 commits into
apache:4.15from
shapeblue:fix-vmsnap-deletion
Apr 12, 2021
Merged

VM Snapshot: Prevent vm snapshots being indefinitely stuck in Expunging state on deletion failure#4898
yadvr merged 2 commits into
apache:4.15from
shapeblue:fix-vmsnap-deletion

Conversation

@Pearl1594

Copy link
Copy Markdown
Contributor

Description

Fixes #4201

This PR addresses the issue of a vm snapshot being indefinitely stuck is Expunging state in case deletion fails.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

Simulated an exception at the hypervisor end while deleting a VM snapshot and verified that the vm snapshot goes back to Ready state.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@Pearl1594 Pearl1594 changed the title VM Snapshot: Prevent snapshot indefinitely being stuck in Expunging state on deletion failure VM Snapshot: Prevent vm snapshots being indefinitely stuck in Expunging state on deletion failure Apr 7, 2021
@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 350

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr yadvr added this to the 4.15.1.0 milestone Apr 7, 2021

@DaanHoogland DaanHoogland left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question

Comment thread api/src/main/java/com/cloud/vm/snapshot/VMSnapshot.java Outdated
} catch (OperationTimedoutException e) {
throw new CloudRuntimeException("Delete vm snapshot " + vmSnapshot.getName() + " of vm " + userVm.getInstanceName() + " failed due to " + e.getMessage());
} catch (AgentUnavailableException e) {
} catch (OperationTimedoutException | AgentUnavailableException e) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\o/

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-373)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35559 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4898-t373-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Smoke tests completed. 85 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
ContextSuite context=TestKubernetesCluster>:teardown Error 78.16 test_kubernetes_clusters.py
test_01_migrate_VM_and_root_volume Error 69.16 test_vm_life_cycle.py
test_02_migrate_VM_with_two_data_disks Error 53.09 test_vm_life_cycle.py

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔️ centos7 ✔️ centos8 ✔️ debian. SL-JID 360

@yadvr yadvr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't test but lgtm

@yadvr

yadvr commented Apr 9, 2021

Copy link
Copy Markdown
Member

@blueorangutan test centos7 vmware-67u3

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@Pearl1594 Pearl1594 marked this pull request as ready for review April 9, 2021 07:01

@harikrishna-patnala harikrishna-patnala left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testes with the PR changes. Observed VM snapshots in error state upon expunge failure and later I could delete the snapshot which is in error state.
LGTM

@apache apache deleted a comment from blueorangutan Apr 10, 2021
@yadvr

yadvr commented Apr 10, 2021

Copy link
Copy Markdown
Member

@blueorangutan test centos7 vmware-67u3

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + vmware-67u3) has been kicked to run smoke tests

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-401)
Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35087 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr4898-t401-vmware-67u3.zip
Intermittent failure detected: /marvin/tests/smoke/test_kubernetes_clusters.py
Intermittent failure detected: /marvin/tests/smoke/test_vm_life_cycle.py
Smoke tests completed. 85 look OK, 2 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_deploy_kubernetes_cluster Failure 87.37 test_kubernetes_clusters.py
test_02_invalid_upgrade_kubernetes_cluster Failure 53.55 test_kubernetes_clusters.py
test_03_deploy_and_upgrade_kubernetes_cluster Failure 82.88 test_kubernetes_clusters.py
test_04_deploy_and_scale_kubernetes_cluster Failure 53.26 test_kubernetes_clusters.py
test_05_delete_kubernetes_cluster Failure 55.24 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 56.31 test_kubernetes_clusters.py
test_08_deploy_and_upgrade_kubernetes_ha_cluster Failure 68.55 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 56.27 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 100.73 test_kubernetes_clusters.py
ContextSuite context=TestVAppsVM>:setup Error 79.84 test_vm_life_cycle.py

@yadvr yadvr merged commit a64ad9d into apache:4.15 Apr 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants