Skip to content

NPE when VM is planned to migrate to other host during dynamic scaling#3998

Merged
yadvr merged 1 commit into
apache:masterfrom
shapeblue:VMScalingNPEwhilefindHostandMigrate
Jun 18, 2020
Merged

NPE when VM is planned to migrate to other host during dynamic scaling#3998
yadvr merged 1 commit into
apache:masterfrom
shapeblue:VMScalingNPEwhilefindHostandMigrate

Conversation

@harikrishna-patnala

Copy link
Copy Markdown
Member

Description

NPE occurred when dynamic scaling tried on VM and as part of this when MS tries to migrate VM if current host does not have capacity.

Repro Steps:
1. Create a VM on host1
2. Make host1 capacity full by deploying multiple VMs
3. Try Dynamic scaling on VM on host1
4. NPE occurs when MS tries to find host to migrate the VM and then scale.
2020-03-27 05:37:15,899 WARN [c.c.v.UserVmManagerImpl] (API-Job-Executor-6:ctx-f7c76e69 job-3021 ctx-d68f535a) (logid:3189e554) Received exception while scaling
java.lang.NullPointerException
at com.cloud.deploy.DeploymentPlanningManagerImpl.planDeployment(DeploymentPlanningManagerImpl.java:255)
at com.cloud.vm.VirtualMachineManagerImpl.findHostAndMigrate(VirtualMachineManagerImpl.java:3768)
at com.cloud.vm.UserVmManagerImpl.upgradeRunningVirtualMachine(UserVmManagerImpl.java:1816)
at com.cloud.vm.UserVmManagerImpl.upgradeVirtualMachine(UserVmManagerImpl.java:1702)
at com.cloud.vm.UserVmManagerImpl.upgradeVirtualMachine(UserVmManagerImpl.java:1641)

Root cause: VM profile is not initiated properly with serviceoffering before planning for deployment

Solution: Initiate VM profile with serviceoffering and also make sure custom compute parameters are handled

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

How Has This Been Tested?

1. Create a VM on host1
2. Make host1 capacity full by deploying multiple VMs
3. Try Dynamic scaling on VM on host1
4. Successfully scaled the VM after migration of VM to host2

@harikrishna-patnala

Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@harikrishna-patnala a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✖centos7 ✖debian. JID-1105

… VM tries to migrate if current host does not have capacity.

Repro Steps:
1. Create a VM on host1
2. Make host1 capacity full by deploying multiple VMs
3. Try Dynamic scaling on VM on host1
4. NPE occurs when MS tries to find host to migrate the VM and then scale.

Root cause: VM profile is not initiated properly with serviceoffering before planning for deployment

Solution: Iniate VM profile with serviceoffering and also make sure custom compute parameters are handled
@harikrishna-patnala harikrishna-patnala force-pushed the VMScalingNPEwhilefindHostandMigrate branch from 32b4340 to b3cf7b5 Compare March 30, 2020 14:20
@harikrishna-patnala

Copy link
Copy Markdown
Member Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@harikrishna-patnala a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos7 ✔debian. JID-1106

@borisstoyanov

Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos7 ✔debian. JID-1225

@yadvr yadvr added this to the 4.15.0.0 milestone Jun 4, 2020
@yadvr

yadvr commented Jun 11, 2020

Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos7 ✔debian. JID-1349

@yadvr

yadvr commented Jun 12, 2020

Copy link
Copy Markdown
Member

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr yadvr self-requested a review June 12, 2020 04:55
@yadvr

yadvr commented Jun 13, 2020

Copy link
Copy Markdown
Member

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr

yadvr commented Jun 16, 2020

Copy link
Copy Markdown
Member

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result: ✔centos7 ✖debian. JID-1384

@yadvr

yadvr commented Jun 16, 2020

Copy link
Copy Markdown
Member

Debian build failure was env related, I'll kick test
@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@rhtyd a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@yadvr yadvr left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM did not test it

@yadvr yadvr requested a review from DaanHoogland June 16, 2020 06:11
@yadvr yadvr requested a review from borisstoyanov June 16, 2020 06:11
@yadvr

yadvr commented Jun 16, 2020

Copy link
Copy Markdown
Member

After merging #3991, are you LGTM on this as well @borisstoyanov and @DaanHoogland ?

@DaanHoogland

Copy link
Copy Markdown
Contributor

code looks good, but i'm not familiar with the use case / bug; +0.1

@harikrishna-patnala

harikrishna-patnala commented Jun 16, 2020

Copy link
Copy Markdown
Member Author

@rhtyd @DaanHoogland This is not related to #3991. While testing that PR I came across with this bug and fixed it here. This PR is required as NPE is not good in any case.

@blueorangutan

Copy link
Copy Markdown

Trillian test result (tid-1750)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 52130 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3998-t1750-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 82 look OK, 1 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
test_03_create_redundant_VPC_1tier_2VMs_2IPs_2PF_ACL_reboot_routers Failure 637.84 test_vpc_redundant.py

@yadvr yadvr merged commit 0d4f67a into apache:master Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants