New features in Salt 3004 Silicon

27 minute read Updated: Aug 27, 2022

Salt 3004 Silicon didn’t follow the usual 4-month release cycle and was released seven months after the previous major version. I believe this slowdown is actually good, and I hope VMware won’t rush with the next version too. However, Salt 3004 ships with several new major components and internal changes that may (or may not) signal that something interesting is cooking under the hood.

Enjoy the reading (and check out the official announcement as well)!

New features in Salt 3004 Silicon: Pluggable transports, DeltaProxy, Loader refactoring, Vault Enterprise, VMware extensions, Transactional systems, Salt SSH, Memory leaks mitigations

Pluggable transports
DeltaProxy
Salt extension modules for VMware
Native minions
Internal changes
Memory leaks
Slack
Engines
Beacons
State system
Transactional Systems
New operating systems support
Windows improvements
Vault Enterprise namespaces
Salt SSH
Grains
Packages
Nifty tricks
Other notable changes

Pluggable transports

This is a work in progress, but I want to mention it anyway because it can lead to some exciting developments in future versions.

The first PR #60852 by Daniel Wozniak removes transport singletons. It was done as part of tech-debt removal and is related to the second pull request.

The second PR #60867 by Daniel Wozniak wasn’t merged into the Salt 3004 release and is still a work in progress (UPD: it was superseded by #61450). It splits the transport module namespace into channel and transport parts and introduces a couple of classes for channels and transports:

ReqChannel
PushChannel
PullChannel
AsyncReqChannel
AsyncPubChannel
AsyncPushChannel
AsyncPullChannel
ReqServerChannel
PubServerChannel
RequestClient
RequestServer
PublishServer
PublishClient

It also adds a new transport module that uses a centralized RabbitMQ broker server. The rationale behind the new developments is explained in the Pluggable Transports SEP that was accidentally merged without any community discussion. The SEP also mentions an abandoned HTTP Transport PR created after all these disastrous CVEs in 2020. Oh, and salt-syndic doesn’t look like a recommended solution to scale Salt (it was even discussed whether it makes sense to deprecate it).

I’m pretty sure this new transport is not related in any way to the fact that VMware (who acquired SaltStack in 2020) sells Tanzu RabbitMQ - an enterprise version of the open-source RabbitMQ broker. Personally, I’d very much prefer a really secure and well-audited lightweight built-in transport that allows running untrusted minions over the internet than a centralized broker with who knows how many potential vulnerabilities… However, the scalability and fault tolerance bits, plus the potential of new community-driven transport implementations, are really interesting for some use-cases. UPD: there is another PR that has additional details, see #61464.

DeltaProxy

DeltaProxy is a special kind of proxy minion that can control multiple devices per proxy process instead of a single device. Its development probably started somewhere in 2018, and the corresponding abstraction layer (MetaProxy) was released as part of Salt 2019.2.1. For some time, the DeltaProxy source code was proprietary. Then, in November 2019, SaltStack briefly considered open-sourcing it but ultimately postponed the decision. And finally, after some refactoring and stabilization efforts, it was open-sourced in Salt 3004 Silicon. To read this story in more detail, check out the MetaProxy section of my Salt Neon release notes.

Now to the feature itself. First, the documentation is non-existent. This fact can trigger me to write another rant, but I’m not in the right mood at the moment :) The only configuration examples I was able to find are located in #60177. Below is my (possibly incorrect) summary.

First, you need to have a node to run the salt-proxy process. It could be hosted on the same node as your salt-master, on any minion node, or a dedicated one. Second, to enable the feature, you need to add the following option to /etc/salt/proxy configuration file on that node; otherwise the default metaproxy module will be used (metaproxy: proxy):

master: SALT_MASTER_ADDRESS
metaproxy: deltaproxy

Then you need to define the pillar data for each node:

# pillar/top.sls
base:
  controlproxy:
    - controlproxy
  device1:
    - device1
  device2:
    - device2

For the control proxy (DeltaProxy) node (where you run the salt-proxy process), you need to specify proxytype: deltaproxy and a list of proxied devices:

# pillar/controlproxy.sls
proxy:
  proxytype: deltaproxy
  ids:
    - device1
    - device2

And then, you need to add a pillar file for each proxied device. Since I do not have any real devices to test, I’m using the dummy proxy module:

# pillar/device1.sls
proxy:
  proxytype: dummy

# pillar/device2.sls
proxy:
  proxytype: dummy

The final step is to start the salt-proxy process and accept the keys sequentially (it is a known limitation):

salt-proxy --proxyid deltaproxy -l debug

salt-key -a deltaproxy
salt-key -a device1
salt-key -a device2

Now you should be able to ping the deltaproxy minion and its proxied dummy devices:

salt '*' test.ping

deltaproxy:
    True
device1:
    True
device2:
    True

salt '*' grains.item osfinger

deltaproxy:
    ----------
    osfinger:
        proxy-proxy
device1:
    ----------
    osfinger:
        proxy-proxy
device2:
    ----------
    osfinger:
        proxy-proxy

I’m not sure which proxy modules are safe to run through DeltaProxy, but the merged PR touches the following ones:

dummy
napalm
rest_sample

Also, it is not clear how many devices could be realistically controlled via a single DeltaProxy (Control Proxy) instance and the difference in consumed resources compared to the same number of regular salt-proxy processes. If you run DeltaProxy in production with real devices and are willing to share some stats, please drop an email to .

PRs #60090 and #60791 by Gareth J. Greenaway

Salt extension modules for VMware

The saltext.vmware collection of modules is not a part of the Salt 3004 release (but was announced around the same time). Instead, it is distributed as a separate Python library using the Salt Extensions mechanism.

The extensions rely on pyVmomi (the Python SDK for the VMware vSphere API to manage ESX, ESXi, and vCenter) and have the following modules:

ESXi grains
Proxy Minion interface module for managing ESXi hosts
Execution modules to manage:
- Datacenters
- Clusters, DRS, and HA
- Distributed vSwitch instances
- ESXi hosts
- NSX-T managers, IP address pools and blocks, licenses, segments, Tier 0/1 gateways, transport nodes, profiles, zones, uplink profiles
- VMs
- VMC DHCP profiles, Direct Connect, distributed firewall rules, DNS forwarders, NAT rules, networks, public IPs, SDDCs, security groups and rules, VPN stats
State modules to manage:
- Datacenters
- NSX-T managers, IP address pools and blocks, licenses, segments, Tier 0/1 gateways, transport nodes, profiles, zones, uplink profiles
- VMC security rules

For more details, see the Open Hour recording for September 30th on Youtube and read the introductory blog post. And check out the following howto: Salt SDDC Modules – Getting Started.

The modules are well documented and even have ADRs (yay!).

I also found a vRealize Automation module that is not a part of the main extension and is distributed through the _modules dir.

Native minions

Salt 3004 native minion packages are availale for the following platforms:

Arista 32-bit/64-bit
Juniper (x86_64)
Solaris 10 Intel and Sparc
Solaris 11.4 Intel and Sparc
AIX v7.1 and v7.2

Instructions for installing the latest packages can be found at gitlab.com.

Internal changes

Loader

Allow the discovery of Salt extensions installed while Salt is running. Additionally, prevent loading utils from extensions which would be packed into __utils__. PR #60214 by Pedro Algarvio
Stop using pkg_resources for Salt’s entry points loading. PR #60868 by Pedro Algarvio
Restore support of loading generator-based entry points. PR #60175 by Pedro Algarvio
Refactor loader into submodules. PR #60595 by Daniel Wozniak
Simplify the LazyLoader. PR #60714 by Pedro Algarvio

Other changes

Drop Python 2 code from the entire codebase. PR #59934 by Daniel Wozniak
Add Python 3.10 requirements. PR #59953 by Pedro Algarvio
Handle signals and properly exit instead of raising exceptions. PRs #60972 and #61013 by Pedro Algarvio
Do more rigorous checking of __salt__ module docstrings on CI. PR #58539 by Pedro Algarvio
Consolidate __getstate__ and __setstate__ methods of the Process class, to ensure that any forked process on Windows (and all other platforms which support forking) will behave properly without having to implement their own getstate/setstate functions. PR #55793 by Pedro Algarvio
Do not break master_tops for minion with version lower to 3003 (the compatibility alias will be deprecated in 3006). PR #60980 by Pablo Suárez Hernández
Deprecate salt.payload.Serial. PR #60954 by Daniel Wozniak
Redirect imports of salt.ext.six to six. PR #60967 by Pedro Algarvio
New allow_one_of() and require_one_of() utility decorators. PR #58742 by Mark Ferrell
Get rid of salt.utils.zeromq.ZMQDefaultLoop, because Salt no longer supports older versions of ZeroMQ. PR #60618 by Daniel Wozniak

Memory leaks

This is the long-awaited progress in mitigating memory leaks that were related to Gitfs backends. It was done in a crude but very simple and practical way - instead of fighting with memory leaks caused by 3-rd party libraries, the Salt Master file server update thread will restart periodically to release held memory.

I do not believe that the restart interval is configurable (it is set to 300 seconds for now). However, if the gitfs_update_interval setting is higher than 300 seconds, it will be used as the update thread restart interval.

PR #60386 by Daniel Wozniak

Another master memory leak was mitigated in PR #60262 by Daniel Wozniak

Slack

Update the slack.post_message state, slack engine, slack execution module, and slack returner to adhere to the deprecated usage of a token as a query string param for web API requests. Finally! PR #60165 by @xeacott
Add support for posting events to slack_webhook returner. PR #57480 by Nate Mellendorf

Engines

Engine processes got enhanced process titles. They could be helpful if your custom engine consumes too many resources and you want to spot it just by looking at the process list. To enable this feature, install the python3-setproctitle package, then add some engines to master or minion config files and restart the services:

# /etc/salt/{master,minion}.d/engines.conf
engines:
  - test:

This is how the new titles look in the process list:

ps ax | grep salt | grep test

  32647 ?        Sl     0:00 /usr/bin/python3 /usr/bin/salt-master salt.engines.Engine(salt.loaded.int.engines.test)
  33690 ?        Sl     0:00 /usr/bin/python3 /usr/bin/salt-minion KeepAlive MultiMinionProcessManager MinionProcessManager salt.engines.Engine(salt.loaded.int.engines.test)

And if you use multiple instances of the same engine, the process titles will use instance aliases instead:

# /etc/salt/{master,minion}.d/engines.conf
engines:
  - test_instance1:
      engine_module: test
  - test_instance2:
      engine_module: test

ps ax | grep salt | grep test

  34388 ?        Sl     0:00 /usr/bin/python3 /usr/bin/salt-minion KeepAlive MultiMinionProcessManager MinionProcessManager salt.engines.Engine(salt.loaded.int.engines.test-test_instance1)
  34389 ?        Sl     0:00 /usr/bin/python3 /usr/bin/salt-minion KeepAlive MultiMinionProcessManager MinionProcessManager salt.engines.Engine(salt.loaded.int.engines.test-test_instance2)
  34436 ?        Sl     0:00 /usr/bin/python3 /usr/bin/salt-master salt.engines.Engine(salt.loaded.int.engines.test-test_instance1)
  34437 ?        Sl     0:00 /usr/bin/python3 /usr/bin/salt-master salt.engines.Engine(salt.loaded.int.engines.test-test_instance2)

PR #60260 by Daniel Wozniak

Beacons

Handle beacon exceptions by logging and firing an event that includes the exception. PR #60619 by Gareth J. Greenaway
Refresh available beacons when the refresh_modules flag is passed as an argument to a state. PR #60542 by Gareth J. Greenaway
Make the % sign optional when configuring usage beacons (diskusage, memusage, sensehat, and swapusage). PR #60685 by Gareth J. Greenaway

State system

Allow onfail requisite to be used with onchanges and other requisites in a single state. PR #59985 by @xeacott
Fix file.accumulated dependency handling, when a state_id dependency format is used instead of a function: state_id format. PR #60636 by Gareth J. Greenaway
Make the state aggregation system properly handle requisities. PR #60604 by Gareth J. Greenaway
Add ability to pass exclude kwarg to salt.state from orchestrations. PR #58062 by @vryzhenkin and Wayne Werner
Make sure to always check state_type while compiling require_in, even if the name being searched for already exists at top-level in a highstate, because two different ids can exist with the same name. PR #59943 by @vin01

New operating systems support

The changes listed below do not mean official support as described in the Supported Operating Systems document. Instead, they mean that someone made an improvement in Salt for a specific operating system or the OS was included in the official test suite.

Run tests on Debian 11 Bullseye. PR #60473 by Bryce Larson
Run tests on AlmaLinux 8. PR #60209 by Bryce Larson
Add a salt.util.platform check to detect the AArch64 64-bit extension of the ARM architecture. PR #59915 by Kirill Ponomarev
Run tests on CentOS Stream 8. PR #60141 by Bryce Larson
Recognize Rocky Linux 8 as RedHat in the os_family grain. PR #59682 by @StackKorora PR #60427 by Kirill Ponomarev
Recognize Aliyun Linux as RedHat family. PR #59687 by @xuchunmei000
Add support for Mendel Linux to be detected as Debian. PR #59893 by Morgan Kesler
Astra Linux (AstraLinuxCE, AstraLinuxSE) is now considered a Debian family distro. PR #59353 by Anton Karmanov
Add ARM64 support for Ubuntu 20 test pipeline. PR #57997 by Kirill Ponomarev
Add Debian 11 on ARM64. PR #60901 by Bryce Larson
Run tests on Fedora-34 instead of Fedora-32. PR #60124 by Bryce Larson
pip-tools-compile now knows what FreeBSD is. PR #60138 by Pedro Algarvio

Transactional Systems

This feature adds support for transactional systems and openSUSE MicroOS in particular. MicroOS has a read-only root filesystem and the transactional-update tool that leverages snapper, zypper, btrfs and overlayfs to perform atomic updates. Salt 3004 ships with two new execution modules (transactional_update and rebootmgr) and a new executor module (transactional_update). The executor module wraps Salt module calls with transactions. Below is a rough summary of how the feature works:

It can be activated by adding module_executors: [transactional_update, direct_call] to the minion config file, or by using the command line argument salt-call --module-executors='[transactional_update, direct_call]' test.version
The list of functions that are wrapped by default: state.single, state.sls, state.apply, state.highstate (it can be controlled via delegated_functions or add_delegated_functions minion options)
These modules are also wrapped by default (the list can be controlled via delegated_modules or add_delegated_modules minion options)
You can also schedule a reboot if needed: salt-call --module-executors='[transactional_update]' state.sls stuff activate_transaction=True
It also adds three new grains (efi, efi-secure-boot, and transactional) and a new function (chroot.in_chroot)

I wonder if NixOS support could be added to this transactional framework (it seems like a right fit).

PR #58520 by Alberto Planas

UPDATE: the internal implementation is going to be redesigned in PR #61188 that was submitted after the release.

Windows improvements

Install anywhere

The default install location will be %ProgramFiles%\Salt Project\Salt for the binary data and %ProgramData%\Salt Project\Salt for the Root Directory (root_dir). A couple of switches control the installer behavior:

/install-dir allows the user to define the install location via the command line
/move-config moves config from C:\salt (if found) to %ProgramData%

And for the uninstaller:

/delete-install-dir deletes the installation directory that contains the config and pki directories. This applies to old method installations where the root directory and the installation directory are the same. The default is not to delete it.
/delete-root-dir deletes the root directory that contains the config and pki directories. Default is to not delete it.

For more details on how this feature is designed, read the SEP-31.

PRs #60267 and #60952 by Shane Lee

`file.patch`

I needed this feature a couple of times to self-patch Windows minions and had to implement my own workarounds for that using patch.exe and msys-2.0.dll from Git for Windows Portable. The good thing is that the feature is now built-in; the not-so-good thing is that the patch executable is not bundled with the Salt installer and needs to be delivered to a minion using a separate state. Anyway, the feature is quite helpful, and I’ll definitely try it.

PR #60399 by @xeacott

Surface the errors that occur when user account creation fails on a Windows box (e.g., when a password does not meet the password policy requirements). PR #59563 by @xeacott
Fix win_servermanager.install so it will reboot when restart=true is passed. PR #60111 by Shane Lee
Standardize on using the “Success and Failure” for all auditing policies (both normal and advanced ones). PR #60178 by Shane Lee
Do not ship unmaintained PythonWin IDE (that is installed with PyWin32) with Salt installer. PR #60754 by Shane Lee
Update Windows build deps & DLLs, use Python 3.8. PR #59870 by Shane Lee
Update the build scripts to use a standalone installer for Visual C++ Build Tools 2015 that is a part of VS Build Tools 2017. PR #60093 by Shane Lee

Vault Enterprise namespaces

Namespaces is a set of features within Hashicorp Vault Enterprise that allows Vault environments to support Secure Multi-tenancy (or SMT) within a single Vault infrastructure. Through namespaces, Vault administrators can support tenant isolation for teams and individuals as well as empower delegated administrators to manage their own tenant environment. API operations performed under a namespace can be done by providing the relative request path along with the namespace path using the X-Vault-Namespace header.

To enable the feature, add an optional namespace key to the vault master config section:

vault:
  # ...
  namespace: vault_enterprice_namespace
  # ...

PR #58586 by Edmund Adderley

Salt SSH

Directory roster

This is a new type of roster called “directory roster”. The directory roster is a flat directory of files. Each file’s name is a minion id, and the contents of each file must yield the data structure expected within each roster entry after being rendered with the salt rendering system. It was introduced to help solve the following use-case:

We maintain our roster in a git repo. As our team grows and we add and remove systems from the roster, the number of merge conflicts in the flat roster file has increased significantly. Switching to this directory roster system has significantly decreased the headache of git merge semantics when multiple git users introduce different roster changes at the same time.

Configuration example:

# /etc/salt/master.d/roster.conf
roster: dir
roster_dir: config/roster.d
# roster_domain: example.com

# config/roster.d/minion-x:
host: minion-x.example.com
port: 22
sudo: true
user: ubuntu

# config/roster.d/minion-y:
host: minion-y.example.com
port: 22
sudo: true
user: gentoo

If you uncomment the roster_domain setting, you can omit the domain part in the individual roster files.

PR #60364 by @kojiromike and Gareth J. Greenaway

Heist minion presence events

Allows presence events to work with Heist-Salt minions. If you set the master configuration option detect_remote_minions to True it will try to detect connected minions over port 22 unless the port specified is changed with the configuration remote_minions_port.

Another feature with almost useless documentation. As of the time of this writing, Google search on docs.saltproject.io gives zero matches for the heist keyword. How is a Salt user supposed to learn what Heist is, how it differs from Salt SSH or enable_ssh_minions, and what is the use-case for this option?

Below is what I was able to understand after many hours of research.

There are two googleable repositories that are located on GitLab and have very cryptic descriptions and the motto of “making deployment and management of Salt easy”:

heist - “ephemeral software tunneling and delivery system”
heist-salt - “App-merge components for deploying salt with heist”

Using Heist

So, let’s install heist first:

pip3 install heist==5.0.0

Create the necessary folders and files:

mkdir -p /etc/heist/rosters
touch /etc/heist/heist.conf
cat << EOF > /etc/heist/rosters/roster.conf
minion1:
  # host: minion1
  host: 10.211.55.25
  username: vagrant
EOF

I struggled for 5 hours trying to make Heist work. Here is the list of problems I found:

I had to run pip3 install aiologger; otherwise pop_config crashed when I ran heist. Same as a year ago, the dependencies in the pop ecosystem are not pinned, and the installs are not repeatable.
I had to install the latest Heist from git by running pip uninstall heist && pip install git+https://gitlab.com/saltstack/pop/heist.git - the grains subcommand was removed in 5.0.0 (but not from the CLI args!), the test one mentioned in the README was not yet released
I had to downgrade pop-config pip install pop-config==6.11; otherwise I got nothing (not even log messages with --log-level debug). Did I say already that pop-based installs are not reproducible?
I was unable to make the roster dir work automatically (got KeyError: 'ssh_scan_ports'), so I had to use the explicit -R /etc/heist/rosters/roster.conf argument
The cryptic ValueError: The roster scan did not return data when rendered error took the most time to solve. Using a Python debugger, I was unable to understand where exactly the CLI options are passed down to the heist.roster.init.read() and why they are empty. As it turned out the order of the arguments does matter: heist --log-level info -t minion1 -R /etc/heist/rosters/roster.conf test fails, and heist --log-level info test -t minion1 -R /etc/heist/rosters/roster.conf works. Good luck finding that via heist --help…

So, after so much wasted time, I was able to make the bare Heist work (yeah, the majority of the INFO messages should be logged as DEBUG ones):

heist --log-level info test -R /etc/heist/rosters/roster.conf

[INFO    ] Module /usr/local/lib/python3.8/dist-packages/pop_loop/loop/curio_loop returned virtual FALSE: No module named 'curio'
[INFO    ] Module /usr/local/lib/python3.8/dist-packages/pop_loop/loop/selector_win returned virtual FALSE: WindowsSelectorEventLoop only runs on windows
[INFO    ] Module /usr/local/lib/python3.8/dist-packages/pop_loop/loop/proactor returned virtual FALSE: WindowsProactorEventLoop only runs on windows
[INFO    ] Module /usr/local/lib/python3.8/dist-packages/pop_loop/loop/qt returned virtual FALSE: No module named 'qasync'
[INFO    ] Module /usr/local/lib/python3.8/dist-packages/pop_loop/loop/trio_loop returned virtual FALSE: No module named 'trio_asyncio'
[INFO    ] Module /usr/local/lib/python3.8/dist-packages/pop_loop/loop/uv_loop returned virtual FALSE: No module named 'uvloop'
[INFO    ] Picking default roster: flat
[INFO    ] This is a test heist manager. You have installed heist correctly. Install a heist manager to use full functionality

It looks like Heist is just a skeleton library with some functions. Here is what they can do:

Grab host lists from different rosters (clustershell, scan, flat with optional fernet encryption)
Connect to remote hosts using asyncssh Python library, send files back and forth, run commands (with optional sudo support)
Create reverse tcp tunnels (via SSH) from ports on the target system back to ports on the source system
Detect the target OS family and CPU architecture
Fetch binary artifacts (based on the target OS/arch), verify their signatures and unpack the artifacts
Manage remote services using systemd

In summary, Heist is a tool to deploy some binaries to remote systems through ssh and run them, with optional tcp tunneling back to the source system. To do something useful with it, you need to install a manager. I’m not sure why Heist is split into two tools (thus complicating the user experience), because I was unable to find any other addons that use it other than heist-salt.

Using Heist Salt

Now let’s install heist-salt as well:

pip3 install heist-salt==4.0.0

Below is what heist-salt is supposed to do:

Download a single-binary Salt minion package from https://repo.saltproject.io/salt/singlebin/
Deploy the Salt minion package to a remote system via SSH
Establish two tcp tunnels back to the source system - 44505 -> 4505, 44506 -> 4506 (you are supposed to run a Salt master on the source host)
Add a minion configuration file that connects it to localhost master address with ports 44505 and 44506 (i.e., via the SSH tunnel)
Set the remote minion grain minion_type: heist
Generate minion keys
Start the salt-minion service
Accept the keys on the master

Alternatively, it can bootstrap a Salt minion and connect it to an existing master (Heist will skip the grain and tunnels setup in this mode)

As you can guess, it didn’t work for me either. Setting aside the unreasonably chatty INFO logging, this is what I got when I ran heist --log-level info salt.minion -t minion1 -R /etc/heist/rosters/roster.conf:

AssertionError: version 3004rc1-1 is not valid. I was running this from a Salt 3004 RC1 master installed from Git, and it looks like the release candidates are not supported (although the RC1 single-binary is available in the repo). I’m not sure how I’m supposed to try a feature that needs 3004rc1 to work.
Simultaneously, I got the NameError: name 'open' is not defined exception deep in the standard Python logging library. It is an asyncio-induced error; go to bugs.python.org for more details.
Then I added an explicit version number (3003.3 didn’t work, so I had to use 3003) heist --log-level info salt.minion -t minion1 -R /etc/heist/rosters/roster.conf --artifact-version 3003 and Heist proceeded a little bit further. It was able to download Salt single binary, but the salt-call command exited with error, and everything failed with the json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) exception.
When I ran the /var/tmp/heist_vagrant/6547/salt call --config-dir /var/tmp/heist_vagrant/6547 --local grains.items --out json command on the target host manually, it returned a perfectly valid JSON data structure.
Then I ran chmod go+rwx /var/log/salt && rm -f /var/log/salt/minion on the target system, the Heist command progressed a bit more (it printed some grains), but then failed to create the /etc/systemd/system/salt-minion.service over SFTP due to permission issues.
I tried to add sudo: true to the roster, but it failed early with asyncssh.sftp.SFTPPermissionDenied: Permission denied trying to copy the Salt binary (a temp folder is created using sudo and is owned by root, but the sftp operation runs under a regular user). Apparently, there is no way to win here with a regular user account and sudo.

I enabled SSH root login, re-ran the command, and voila! The heist command generated and accepted a minion key and stayed in the foreground, keeping the tunnels alive. After 8+ hours, I was finally able to run a command on Salt Heist minion:

salt minion1 grains.get minion_type

minion1:
    heist

Now let’s get back to my original questions:

1. How is a Salt user supposed to learn what Heist is?

Use these notes until better docs are available.
Ask in the #pop Slack channel (visit saltproject.io and search for “slack” for an invitation link).
Google and dig into the source code a lot.

2. How Heist differs from Salt SSH or `enable_ssh_minions`?

It is more complex than Salt SSH because it requires you to run a Salt master and keep the heist command running on the master (not always, just when you need to run something on Heist minions).
Heist deploys an experimental single binary Salt minion package, and uses reverse TCP tunnels on top of SSH to connect via ZeroMQ to Salt master. Salt SSH just packs Salt python modules and your state tree into a tarball, uploads to a remote system using SSH, runs salt-call --local and sends the results back.
Heist is slower to deploy than Salt SSH, but once the tunnels are up, you can run subsequent commands much faster.
It needs a roster file like the enable_ssh_minions mode, but it can use all the features available for regular ZeroMQ-connected minions (Salt SSH is more limited).
It can be used to deploy regular Salt minions that connect to any master via ZMQ and do not need Heist and reverse tunnels running, with the exception that the minions use experimental single-binary packages. But you can do the same with Salt SSH by writing some states to bootstrap a minion. Or you can use Fabric or Ansible as well for this task.
There is also a Salt extension that provides the heist.deploy runner to deploy a Heist minion via salt-run

3. What is the use-case for these new `detect_remote_minions` and `remote_minions_port` master options?

This turned out to be a pretty obscure feature:

Some Salt master features (stalekey engine, manage runner, minions wheel module, master cache worker, AES key rotation) need to know which minions are connected to the master right now.
Because the default ZeroMQ protocol does not expose client IP addresses, Salt master has no direct ability to know which minion IDs are connected.
As a workaround, the master uses the ss Linux command output to find connected minions.
To distinguish minion tcp connections from any other connections, it filters them by remote address and local or remote port number.
To check if a specific remote address belongs to a minion, it compares it with the cached ipv4 and ipv6 grains.
And to check the port number, it matches connections against master ports (4505 and 4506).
Because remote Heist minions are connected via SSH tunnels that originate from localhost, the master checks established SSH connections in addition to standard master ports. To enable this behavior, you need to set detect_remote_minions: true.
And because SSH can use a non-standard port, you can account for that by setting remote_minions_port.

To summarize:

This is a quite clever trick to make some existing Salt master features work with minions that are deployed via Heist and connected via reverse SSH tunnels
You only need to enable the detect_remote_minions flag if you use Heist minions
You do not want to use Heist minions in the near future unless you like to test highly experimental software, ready to dig into source code and submit bug reports and patches

PR #60633 by Megan Wilhite . A little bit more context can be found in the salt-heist repo.

Grains

Ignore the enable_fqdns_grains setting on AIX, Solaris, and Juniper (always use False). PR #60533 by David Murphy
Clear the cached network interface grain information upon init of minion or when saltutil.refresh_grains is requested. PR #60130 by @xeacott
Improve virtual* grain handling for LXC containers. Now the virtual grain is set to container and virtual_subtype to LXC even when Salt is running inside of another virtual machine. PR #60196 by Piter Punk
Rename manufacture grain to manufacturer for Solaris on SPARC. PR #60514 by Lukas Raska
Implement grains.uuid on Windows. PR #59928 by Piter Punk

Packages

Add rpm_vercmp Python library for version comparison. It is needed for Tiamat-based builds to avoid pulling a binary C library that does the same thing. PR #60815 by Megan Wilhite
Use apt CLI to manage repos (as an alternative to python-apt library). Again, it will be used in Tiamat-based builds. PR #60900 by Megan Wilhite
Handle various architecture formats (e.g., amd64) in the aptpkg module. PR #60986 by Megan Wilhite

Nifty tricks

Override command retcode based on output

This feature was inspired by Ansible’s failed_when directive:

Installer script:
  file.managed:
    - name: /tmp/installer.sh
    - mode: 755
    - contents: |
        #!/bin/sh
        # This is a contrived example of idempotent command that exits with 1 on a second run
        if [ ! -f /tmp/.installed ]; then
           touch /tmp/.installed
           echo "The thing was installed successfully"
           exit 0
        else
           echo "The thing is already installed"
           exit 1
        fi

# Note that the command will run each time you apply the state
# With success_stdout it just won't fail on subsequent runs
Run the installer:
  cmd.run:
    - name: /tmp/installer.sh
    - require:
        - file: Installer script
    - success_stdout:
        - The thing is already installed

It is supported in cmd.wait, cmd.wait_script, cmd.run and cmd.script states. You can specify a list of lines to match (using the success_stdout and success_stderr arguments), and if any of them is found in the command output, then the resulting retcode will be overridden with zero.

The example above is a bit contrived because you can achieve the same result with the unless directive that checks for /tmp/.istalled and prevents the state from being run the second time. However, if a command does something less obvious (for example, interacts with a network service), then this feature could be useful.

PR #59841 by Gareth J. Greenaway and Loren Gordon

File lookup functions

The slsutil.findup function was originally written to help state files locate a Jinja file to be imported. It will find the first path matching a filename or list of filenames in a specified directory or the nearest ancestor directory. It could be useful for formulas that typically contain a map.jinja that needs to be included by every state file.

New functions:

slsutil.findup find the first path matching a filename or list of filenames in a specified directory or the nearest ancestor directory. Returns the full path to the first file found.
slsutil.file_exists return True if a file exists in the state tree, False otherwise (uses cp.list_master internally)
slsutil.dir_exists return True if a directory exists in the state tree, False otherwise (uses cp.list_master_dirs internally)
slsutil.path_exists return True if a path exists in the state tree, False otherwise. The path could refer to a file or directory (uses both cp.list_master and cp.list_master_dirs)

Example:

{% from salt['slsutil.findup']('formulas/shared/nginx', 'map.jinja') import nginx with context %}

The following folders (relative to salt:// file tree) will be searched for the map.jinja file:

formulas/shared/nginx
formulas/shared
formulas
.

PR #60159 by @amendlik

Other notable changes

Netbox pillar enhancements - Virtual Machines, Interfaces, IP Addresses, Documentation. PR #59500 by Gary T. Giesen
Remove all Silicon deprecations. PR #60895 by Wayne Werner
Remove glance state module in favor of glance_image. PR #59784 by Megan Wilhite
Bump keystone deprecation warning to Phosphorus. PR #59813 by Gareth J. Greenaway
Drop support of Ubuntu 16.04. PR #59869 by Bryce Larson
Make relative Jinja includes work with Jinja 3.0. PR #60811 by Alberto Planas
Update AWS API so salt-cloud can create VMs with IPv6 addresses. PR #60804 by Bryce Larson
Many Zabbix inventory handling improvements. PR #60400 by Piter Punk
Fix salt-ssh extra-filerefs option handling. PRs #61014 and #60891 by Daniel Wozniak
Introduce a mechanism to figure out the actual Python version available inside the container when executing dockermod.call, in the same way as for salt-ssh. PR #60229 by Pablo Suárez Hernández
Update pcs module to support versions > 0.10. PR #60257 by @waynegemmell
Honor the --log-file CLI argument in salt-api. PR #59881 by Daniel Wozniak
Add poudriere -i -j jail_name option to list jail information for poudriere on FreeBSD. PR #59831 by Kirill Ponomarev
Allow GCE Salt Cloud to use previously created IP addresses. PR #60043 by @dawidpogorzelski and Gareth J. Greenaway
Reinstate ignore_cidr option in salt-cloud openstack driver. PR #59778 by Mark Hyde
Add nosync switch to lvm.lvcreate to disable initial raid synchronization. PR #59193 by Jerzy Drozdz
Update schedule state and module to report changes. PR #59997 by Gareth J. Greenaway
Remove the unnecessary nginx/ prefix from nginx.version return, so it can actually be used in functions like pkg.version_cmp. PR #57111 by @syphernl and @alexey-zhukovin
Handle volumes on stopped pools in virt.vm_info. PR #60133 by Cedric Bosdonnat
Use /dev/kvm to detect KVM. PR #60420 by Cedric Bosdonnat
Pass emulator when getting domain capabilities from libvirt. PR #60492 by Cedric Bosdonnat
Better handling of bad public keys from minions. PRs #60662 and #60688 by Daniel Wozniak
Add psutil as a dependency on all platforms. This prevents a minion from starting if an instance is already running. PR #60946 by Charles McMarrow
Make pillar_roots order deterministic. PR #59212 by @mkirkland4874
Multiple fixes for Ansible modules in Salt. PRs #60208 by Pedro Algarvio and #59746 by Pablo Suárez Hernández

You can find other changes and bugfixes in the official CHANGELOG.md and Release Notes

Want to read about the upcoming Argon release?