Upgrade the global cluster

consists of a global cluster and one or more workload clusters. To move the platform to a new ACP Distribution Version, upgrade the global tier to the target Distribution Version first, and then upgrade workload clusters to that same Distribution Version.

ACP 4.3 uses a CVO-based workflow for cluster upgrades. A typical global cluster upgrade includes artifact preparation, preflight checks, upgrade request, and status observation.

Before upgrading the global cluster to ACP 4.3, verify that every workload cluster is on a compatible Kubernetes version. For ACP 4.3, the compatible versions are 1.34, 1.33, 1.32, and 1.31. This prerequisite is separate from the broader third-party cluster management range.

This Compatible Versions prerequisite applies whether or not the environment uses global DR. Global DR changes the procedure used to upgrade the global tier, but it does not change the requirement that workload clusters must remain within the compatible Kubernetes version range before the global tier is upgraded to the target Distribution Version.

Global cluster upgrades follow the validated upgrade.sh-based procedure documented on this page. You can request the global-cluster upgrade from the Web Console, by updating ClusterVersionShadow.spec.desiredUpdate, or by using ACP CLI with --cluster=global. For the complete AC CLI workflow and output interpretation, see Upgrading Clusters. For full command and flag syntax, see AC CLI Administrator Command Reference.

If the environment uses global DR, follow the Global DR Procedure. Otherwise, follow the standard workflow below.

Standard Workflow

Prepare upgrade artifacts

Run bash upgrade.sh from the extracted core package directory.

upgrade.sh prepares the resources required by the CVO-based workflow, including:

TypeContentPurpose
Product imagesproduct-imageUsed to resolve the target version and image in ProductManifest and CVO.
CVO imagecluster-version-operatorUsed to deploy or update the cluster version operator.
Plugin artifactsplugins/*.tgzUsed by the upgrade plan when plugin artifacts are required.

Registry behavior depends on how the environment is configured:

ScenarioBehavior
--registry is specifiedUse the provided registry directly.
--registry is not specifiedRead the registry address from ProductBase.spec.registry.address.
Built-in platform registryRebuild the access address by using the global VIP.
External registryAutomatically set SKIP_SYNC_IMAGE=true and skip image synchronization.
Image upload required but credentials omittedRead username and password from the cpaas-system/registry-admin Secret.

Common parameters:

ParameterPurpose
--registrySpecify the target registry address.
--username / --passwordSpecify registry credentials.
--only-sync-imageSynchronize images and plugin artifacts only.
--skip-sync-imageSkip image and plugin synchronization.
--skip-check-artifactsSkip artifact validation.
bash upgrade.sh
WARNING
  • Do not continue to the next step until image and plugin synchronization is complete.
  • Use --only-sync-image only when you want artifact synchronization without further preparation.
  • Use --skip-sync-image only when the required images and plugin artifacts have already been uploaded.

Run preflight checks

Run preflight before requesting the upgrade:

bash upgrade.sh --preflight

Preflight returns two parts:

OutputPurpose
SummaryShows the overall result, current version, desired version, and desired image.
ChecksShows the result of each individual validation item.

The default check set includes:

  • ResourcePatchUpgradeable
  • ClusterVersionUpgradeable
  • VersionUpgradePath
  • KubernetesVersionSupported
  • DockerRuntimeUnsupported
  • ClusterRunning
  • ClusterModuleStable
  • ControlPlaneStaticPodsPresent
  • CustomEtcdBackupCronJobsAbsent
  • CRIUpgradePodsAbsent
  • ModuleInfoStable
  • PlatformLicense

Handle preflight blocks when needed

If ResourcePatchUpgradeable fails with reason=UnexemptResourcePatches, inspect the blocking ResourcePatch and add the required exemption annotation:

kubectl -n cpaas-system get cvsh global \
  -o jsonpath='{range .status.preflight.checks[?(@.name=="ResourcePatchUpgradeable")]}{.state}{"\t"}{.reason}{"\t"}{.message}{"\n"}{end}'

kubectl get resourcepatches <rp-name> -o yaml

The default annotation key is config.cpaas.io/exempt-for-ver.

kubectl annotate resourcepatches <rp-name> \
  config.cpaas.io/exempt-for-ver=4.3.0 \
  --overwrite

If temporary troubleshooting requires specific checks to be disabled, configure cpaas-system/cvo-config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cvo-config
  namespace: cpaas-system
data:
  preflight: |
    disabled:
      - ResourcePatchUpgradeable
      - VersionUpgradePath

Request the upgrade

After the preparation phase completes, choose one of the following entry points:

  • Use the Web Console after the target version becomes available for the cluster.
  • Patch ClusterVersionShadow.spec.desiredUpdate directly when you need to operate the underlying CVO resource.
  • Use ACP CLI to request the upgrade for global explicitly.

If you use the Web Console, the request follows a two-step flow:

  • In Step 1, review the RPCH list.
  • Click Acknowledge to continue to Step 2.
  • In Step 2, review Current Version and Target Version. The page does not display a plugin list or a warning panel at this stage.
  • The target version is determined by the prepared upgrade artifacts and cannot be selected manually in the Web Console.
  • Click Start Upgrade.
  • Confirm the action in the dialog.
  • After confirmation, the page shows that the upgrade request has been submitted and the action enters an in-progress state.

kubectl example:

kubectl patch cvsh global -n cpaas-system --type merge -p '{
  "spec": {
    "desiredUpdate": {
      "version": "4.3.0"
    }
  }
}'

You can also edit the resource directly:

kubectl edit cvsh global -n cpaas-system

Minimum configuration:

spec:
  desiredUpdate:
    version: 4.3.0

ACP CLI example:

# Request upgrade to the highest version currently published in availableUpdates
ac adm upgrade --cluster=global --to-latest

# Request upgrade to a specific target version
ac adm upgrade --cluster=global --to=4.3.0

# Show summary, preflight, and stage progress for the global cluster upgrade
ac adm upgrade status --cluster=global

Observe execution

Use the following command to inspect the overall status:

kubectl get cvsh -n cpaas-system

Important status fields:

FieldPurpose
status.conditionsOverall status entry point.
status.preflight.observedAtTime of the latest preflight run.
status.preflight.checksDetailed result of each preflight item.
status.currentCurrent applied version and image.
status.desiredTarget version and image being reconciled.
status.historyUpgrade history, newest first.
status.stagesUpgrade stages and per-stage execution state.

Focus on these conditions first:

ConditionInterpretation
PreflightReadyTrue means preflight passed.
ReadyTrue means the cluster has reached the desired version.
ReconcilingTrue means the upgrade is still running.
StalledTrue means the upgrade is blocked and requires intervention.

Useful diagnostics:

kubectl -n cpaas-system get cvsh global \
  -o jsonpath='{range .status.conditions[*]}{.type}{"\t"}{.status}{"\t"}{.reason}{"\t"}{.message}{"\n"}{end}'

kubectl -n cpaas-system get cvsh global \
  -o jsonpath='{.status.preflight.observedAt}{"\n"}{range .status.preflight.checks[*]}{.name}{"\t"}{.policy}{"\t"}{.state}{"\t"}{.reason}{"\t"}{.message}{"\n"}{end}'

kubectl -n cpaas-system get cvsh global \
  -o jsonpath='{range .status.history[*]}{.version}{"\t"}{.state}{"\t"}{.startedTime}{"\t"}{.completionTime}{"\n"}{end}'

(Conditional) Upgrade Service Mesh Essentials

If Service Mesh v1 is installed, refer to the Alauda Service Mesh Essentials Cluster Plugin documentation before upgrading the workload clusters.

Post-upgrade

Global DR Procedure

Use this procedure when the environment includes both a primary global cluster and a standby global cluster. The DR-specific steps below are in addition to the standard CVO workflow.

Verify the DR environment before upgrading

Follow your regular global DR inspection procedures to ensure that data in the standby global cluster is consistent with the primary global cluster. For background on the DR topology and synchronization workflow, see Global Cluster Disaster Recovery.

If inconsistencies are detected, contact technical support before proceeding.

On both global clusters, run the following command to ensure no Machine nodes are in a non-running state:

kubectl get machines.platform.tkestack.io

If any such nodes exist, resolve them before continuing.

Uninstall the etcd synchronization plugin from the standby global cluster

  1. Access the Web Console of the standby global cluster through its IP or VIP.
  2. Switch to Administrator view.
  3. Navigate to Marketplace > Cluster Plugins and select the global cluster.
  4. Find etcd Synchronizer and uninstall it.
  5. Wait for the uninstallation to complete before proceeding.

Prepare upgrade artifacts on both global clusters

Complete Prepare upgrade artifacts in the standard workflow on both the standby global cluster and the primary global cluster.

Use the same preparation mode on both clusters.

Upgrade the standby global cluster

If you will use the Web Console on the standby global cluster, verify that the standby cluster ProductBase includes the standby VIP in spec.alternativeURLs:

apiVersion: product.alauda.io/v1alpha2
kind: ProductBase
metadata:
  name: base
spec:
  alternativeURLs:
    - https://<standby-cluster-vip>

After preparation completes, run the remaining steps from the standard workflow on the standby global cluster:

  1. Run preflight checks
  2. Request the upgrade
  3. Observe execution until the standby global cluster reaches the desired version

Upgrade the primary global cluster

After the standby global cluster has reached the desired version, run the remaining steps from the standard workflow on the primary global cluster:

  1. Run preflight checks
  2. Request the upgrade
  3. Observe execution until the primary global cluster reaches the desired version

Reinstall the etcd synchronization plugin and verify sync status

Before reinstalling the plugin, verify that port 2379 is forwarded correctly from both global-cluster VIPs to their control plane nodes when that forwarding mode is used. Port forwarding through a load balancer is not required if the standby global cluster can access the active global cluster directly.

To reinstall the plugin:

  1. Access the standby global cluster Web Console through its VIP and switch to Administrator view.
  2. Navigate to Marketplace > Cluster Plugins and select the global cluster.
  3. Find etcd Synchronizer, click Install, and configure the required parameters.

When you configure the plugin:

  • When port 2379 is not forwarded through a load balancer, set Active Global Cluster ETCD Endpoints correctly.
  • Use the default value of Data Check Interval.
  • Leave Print detail logs disabled unless you are troubleshooting.

Verify the sync Pod is running on the standby global cluster:

kubectl get po -n cpaas-system -l app=etcd-sync
etcd_sync_pod=$(kubectl get po -n cpaas-system -l app=etcd-sync -o jsonpath='{.items[0].metadata.name}')
kubectl logs -n cpaas-system "$etcd_sync_pod" | grep -i "Start Sync update"

Once Start Sync update appears, recreate one of the Pods to trigger synchronization of resources with ownerReference dependencies:

etcd_sync_pod=$(kubectl get po -n cpaas-system -l app=etcd-sync -o jsonpath='{.items[0].metadata.name}')
kubectl delete po -n cpaas-system "$etcd_sync_pod"

Check sync status:

mirror_svc=$(kubectl get svc -n cpaas-system etcd-sync-monitor -o jsonpath='{.spec.clusterIP}')
ipv6_regex="^[0-9a-fA-F:]+$"
if [[ $mirror_svc =~ $ipv6_regex ]]; then
  mirror_host="[$mirror_svc]"
else
  mirror_host="$mirror_svc"
fi
curl -g "http://${mirror_host}/check"

Output interpretation:

  • LOCAL ETCD missed keys: Keys exist in the primary global cluster but are missing from the standby. This often resolves after restarting one etcd-sync Pod.
  • LOCAL ETCD surplus keys: Keys exist in the standby global cluster but not in the primary. Review these with your operations team before deleting them.