Overview

The Vault 1.15.x upgrade guide contains information on deprecations, important or breaking changes, and remediation recommendations for anyone upgrading from Vault 1.14. Please read carefully.

Consul service registration

As of version 1.15, service_tags supplied to Vault for the purpose of Consul service registration will be case-sensitive.

In previous versions of Vault tags were converted to lowercase which led to issues, for example when tags contained Traefik rules which use case-sensitive method names such as Host().

If you previously used Consul service registration tags ignoring case, or relied on the lowercase tags created by Vault, then this change may cause unexpected behavior.

Please audit your Consul storage stanza to ensure that you either:

Manually convert your service_tags to lowercase if required
Ensure that any system that relies on the tags is aware of the new case-preserving behavior

Rollback metrics

Vault no longer measures and reports the metrics vault.rollback.attempts.{MOUNTPOINT} and vault.route.rollback.{MOUNTPOINT} by default. The new default metrics are vault.rollback.attempts and vault.route.rollback, which do not contain the mount point in the metric name.

To continue measuring vault.rollback.attempts.{MOUNTPOINT} and vault.route.rollback.{MOUNTPOINT}, you must explicitly enable mount-specific metrics in the telemetry stanza of your Vault configuration with the add_mount_point_rollback_metrics option.

Application of Sentinel Role Governing Policies (RGPs) via identity groups

As of versions 1.15.0, 1.14.4, and 1.13.8, the Sentinel RGPSs derived from membership in identity groups apply only to entities in the same and child namespaces, relative to the identity group.

Also, the group_policy_application_mode only applies to to ACL policies. Vault Sentinel Role Governing Policies (RGPs) are not affected by group policy application mode.

Known issues and workarounds

Transit Encryption with Cloud KMS managed keys causes a panic

Affected versions

1.13.1+ up to 1.13.8 inclusively
1.14.0+ up to 1.14.4 inclusively
1.15.0

Issue

Vault panics when it receives a Transit encryption API call that is backed by a Cloud KMS managed key (Azure, GCP, AWS).

Note

The issue does not affect encryption and decryption with the following key types:

PKCS#11 managed keys
Transit native keys

Workaround

None at this time

Transit Sign API calls with managed keys fail

Affected versions

1.14.0+ up to 1.14.4 inclusively
1.15.0

Issue

Vault responds to Transit sign API calls with the following error when the request uses a managed key:

requested version for signing does not contain a private part

Note

The issue does not affect signing with the following key types:

Transit native keys

Workaround

None at this time

Affected versions

1.15.0

Issue

A panic can occur in the AWS auth method during IAM-based login when a client config does not exist.

Workaround

The panic can be avoided by writing an empty client config:

vault write -f auth/aws/config/client

Affected versions

The UI issue affects Vault versions 1.14.0+ and 1.15.0+. A fix is expected for Vault 1.16.0.

Issue

The Vauil UI currently uses a version of HDS that does not allow users to click within collapsed elements. In particular, the dev console or namespace picker become inaccessible when viewing the components in smaller viewports.

Workaround

Expand the width of the screen until you deactivate the collapsed view. Once the full navbar is displayed, click the desired components.

File audit devices do not honor SIGHUP signal to reload

Affected versions

1.15.0

Issue

The new underlying event framework for auditing causes Vault to continue using audit log files instead of reopening the file paths even when you send SIGHUP after log rotation. The issue impacts any Vault cluster with file audit devices enabled.

Not honoring the SIGHUP signal has two key consequences when moving or deleting audit files.

If you move or rename your audit log file locally, Vault continues to log data to the original file. For example, if you archive a file locally:

$ mv /var/log/vault/audit.log /var/log/vault/archive/audit.log.bak

Vault continues to write data to /var/log/vault/archive/audit.log.bak instead of logging audit entries to a newly created file at /var/log/vault/audit.log.

If you delete your audit log file, the OS unlinks the file from the directory structure, but Vault still has the file open. Vault continues to write data to the deleted file, which continues to consume disk space as it grows. When Vault is sealed or restarted, the OS deletes the previously unlinked file, and you will lose all data logged to the audit file after it was tagged for deletion.

The issue with file audit devices not honoring SIGHUP signals is fixed as a patch release in Vault 1.15.1.

Workaround

Set the VAULT_AUDIT_DISABLE_EVENTLOGGER environment variable to true to disable the new underlying event framework and restart Vault:

$ export VAULT_AUDIT_DISABLE_EVENTLOGGER=true

On startup, Vault reverts to the audit behavior used in 1.14.x.

Internal error when vault policy in namespace does not exist

If a user is a member of a group that gets a policy from a namespace other than the one they’re trying to log into, and that policy doesn’t exist, Vault returns an internal error. This impacts all auth methods.

Affected versions

1.13.8 and 1.13.9
1.14.4 and 1.14.5
1.15.0 and 1.15.1

A fix has been released in Vault 1.13.10, 1.14.6, and 1.15.2.

Workaround

During authentication, Vault derives inherited policies based on the groups an entity belongs to. Vault returns an internal error when attaching the derived policy to a token when:

the token belongs to a different namespace than the one handling authentication, and
the derived policy does not exist under the namespace.

You can resolve the error by adding the policy to the relevant namespace or deleting the group policy mapping that uses the derived policy.

As an example, consider the following userpass auth method failure. The error is due to the fact that Vault expects a group policy under the namespace that does not exist.

# Failed login
$ vault login -method=userpass username=user1 password=123
Error authenticating: Error making API request.

URL: PUT http://127.0.0.1:8200/v1/auth/userpass/login/user1
Code: 500. Errors:

* internal error

To confirm the problem is a missing policy, start by identifying the relevant entity and group IDs:

$ vault read -format=json identity/entity/name/user1 | \
  jq '{"entity_id": .data.id, "group_ids": .data.group_ids} '
{
  "entity_id": "420c82de-57c3-df2e-2ef6-0690073b1636",
  "group_ids": [
    "6cb152b7-955d-272b-4dcf-a2ed668ca1ea"
  ]
}

Use the group ID to fetch the relevant policies for the group under the ns1 namespace:

$ vault read -format=json -namespace=ns1 \
  identity/group/id/6cb152b7-955d-272b-4dcf-a2ed668ca1ea | \
  jq '.data.policies'
[
  "group_policy"
]

Now that we know Vault is looking for a policy called group_policy, we can check whether that policy exists under the ns1 namespace:

$ vault policy list -namespace=ns1
default

The only policy in the ns1 namespace is default, which confirms that the missing policy (group_policy) is causing the error.

To fix the problem, we can either remove the missing policy from the 6cb152b7-955d-272b-4dcf-a2ed668ca1ea group or create the missing policy under the ns1 namespace.

To remove group_policy from group ID 6cb152b7-955d-272b-4dcf-a2ed668ca1ea, use the vault write command to set the applicable policies to just include default:

$ vault write                                             \
  -namespace=ns1                                          \
  identity/group/id/6cb152b7-955d-272b-4dcf-a2ed668ca1ea  \
  name="test"                                             \
  policies="default"

To create the missing policy, use vault policy write and define the appropriate capabilities:

$ vault policy write -namespace=ns1 group_policy - << EOF
    path "secret/data/*" {
        capabilities = ["create", "update"]
    }
EOF

Verify the fix by re-running the login command:

$ vault login -method=userpass username=user1 password=123

Vault is storing references to ephemeral sub-loggers leading to unbounded memory consumption

Affected versions

This memory consumption bug affects Vault Community and Enterprise versions:

1.13.7 - 1.13.9
1.14.3 - 1.14.5
1.15.0 - 1.15.1

This change that introduced this bug has been reverted as of 1.13.10, 1.14.6, and 1.15.2

Issue

Vault is unexpectedly storing references to ephemeral sub-loggers which prevents them from being cleaned up, leading to unbound memory consumption for loggers. This came about from a change to address a previously known issue around sub-logger levels not being adjusted on reload. This impacts many areas of Vault, but primarily logins in Enterprise.

Workaround

There is no workaround.

Sublogger levels not adjusted on reload

Affected versions

This issue affects all Vault Community and Vault Enterprise versions.

Issue

Vault does not honor a modified log_level configuration for certain subsystem loggers on SIGHUP.

The issue is known to specifically affect resolver.watcher and replication.index.* subloggers.

After modifying the log_level and issuing a reload (SIGHUP), some loggers are updated to reflect the new configuration, while some subsystem logger levels remain unchanged.

For example, after starting a server with log_level: "trace" and modifying it to log_level: "info" the following lines appear after reload:

[TRACE] resolver.watcher: dr mode doesn't have failover support, returning
...
[DEBUG] replication.index.perf: saved checkpoint: num_dirty=5
[DEBUG] replication.index.local: saved checkpoint: num_dirty=0
[DEBUG] replication.index.periodic: starting WAL GC: from=2531280 to=2531280 last=2531536

Workaround

The workaround is to restart the Vault server.

Fatal error during expiration metrics gathering causing Vault crash

Affected versions

This issue affects Vault Community and Enterprise versions:

1.13.9
1.14.5
1.15.1

A fix has been issued in Vault 1.13.10, 1.14.6, and 1.15.2.

Issue

A recent change to Vault to improve state change speed (e.g. becoming active or standby) introduced a concurrency issue which can lead to a concurrent iteration and write on a map, causing a fatal error and crashing Vault. This error occurs when gathering lease and token metrics from the expiration manager. These metrics originate from the active node in a HA cluster, as such a standby node will take over active duties and the cluster will remain functional should the original active node encounter this bug. The new active node will be vulnerable to the same bug, but may not encounter it immediately.

There is no workaround.

Overview

Consul service registration

Rollback metrics

Application of Sentinel Role Governing Policies (RGPs) via identity groups

Known issues and workarounds

Transit Encryption with Cloud KMS managed keys causes a panic

Affected versions

Issue

Workaround

Transit Sign API calls with managed keys fail

Affected versions

Issue

Workaround

Panic in AWS auth method during IAM-based login

Affected versions

Issue

Workaround

Collapsed navbar does not allow you to click inside the console or namespace picker

Affected versions

Issue

Workaround

File audit devices do not honor SIGHUP signal to reload

Affected versions

Issue

Workaround

Internal error when vault policy in namespace does not exist

Affected versions

Workaround

Vault is storing references to ephemeral sub-loggers leading to unbounded memory consumption

Affected versions

Issue

Workaround

Sublogger levels not adjusted on reload

Affected versions

Issue

Workaround

Fatal error during expiration metrics gathering causing Vault crash

Affected versions

Issue