What's new in Salt 3002 Magnesium

28 minute read Updated:

Salt Magnesium

This is an unofficial summary of new features in the Salt Magnesium release. The release is relatively modest with new features and mostly focused on bugfixes (which is actually good because we had enough regressions in the past releases! 🤞). If you want to know about other changes and deprecations, go read the official release notes and the changelog. The packages are available at repo.saltproject.io.

And if you haven’t heard the shocking big news: VMware has acquired SaltStack on Oct 13, 2020 (Press release, Early announcement 1, Early announcement 2, The New Stack publication, Q/A video with Tom Hatch).

What's New in Salt 3002 Magnesium: Development, Notable bugfixes, Idem support, Cloud, FreeBSD, macOS Big Sur, Windows, Networking, GitFS, SaltSSH, Tiamat packages, Nifty tricks

Notable bugfixes

Usually, I do not cover bugfixes, but this time I’ll make an exception and highlight a couple of important ones (the choice of bugs to mention is totally subjective!). Unfortunately, the new Salt release policy means that these fixes won’t be backported to older supported versions 😞

ZMQ hang

It is a really nasty bug that manifested itself as a salt-call command hang. The garbage collector was involved, and a couple of initial attempts (1, 2, 3) to find the root cause were unsuccessful. Finally, after three long months, the problem was solved!

Fixed in PR #58364 by Charles McMarrow

It looks like the multithreading/multiprocessing code in Salt is convoluted. Maybe that is the reason why it can’t be upgraded to Tornado 6 yet…

Loader race condition

Apparently, this multithreading bug affected Salt for years, and the first attempts to fix it were taken in 2015. You can take a look at issue #58369 and all the linked ones to see how it manifested itself and the troubles it caused. The initial fix was merged into the old develop branch and was supposed to be shipped in Salt Neon, but it wasn’t backported to the new master branch. The final fix was implemented by @dwoz in Salt Magnesium.

Fixed in PRs #56513 by Daniel Wozniak , and #58630 by Pedro Algarvio

Msgpack buffer size

Another bug was also proliferating for almost two years (at least since 2019) and resulted in unresponsive Salt Master when a minion payload exceeded a certain size limit (with specific msgpack versions).

Fixed in PR #58283 by @psyer

salt-ssh reverse DNS lookup

It looked more like a feature, but I’m really happy that it was finally removed because it affected my workflow quite often. The gist is that salt-ssh did a reverse DNS lookup when an IP address was specified, and then used the resulting DNS name to connect to a host (or failed horribly when there was no reverse DNS record). I also believe that this feature (now removed) had potential security implications.

Fixed in PR #58163 by Megan Wilhite

Development

Welcome Bot

The Stale Bot is dead; long live the Welcome Bot!

Salt Welcome Bot

PR #58414 by Kirill Ponomarev

Pyupgrade

A pre-commit hook that uses pyupgrade to upgrade code syntax to Py3 incrementally. While the automated Py3 transition is really cool, the incremental part is debatable because it adds a lot of noise to each PR, and it is impossible to add it to .git-blame-ignore-revs. I would prefer a single giant commit like with Black.

PR #57936 by Pedro Algarvio

Rstcheck

Another cool pre-commit addition will check the documentation *.rst files using the rstcheck tool. It can validate both the rst syntax and any embedded code blocks.

PR #58675 by Pedro Algarvio

Pytest

There were tons of Pytest-related changes made by Pedro Algarvio , but the switch to Pytest didn’t happen yet.

Idem support

Idem is a new standalone state/execution engine that is similar to Salt states. This topic is quite extensive, and I’m not going to cover it here (see references for more details on that). Instead, I’ll focus on Salt’s actual changes, provide a few examples, and add a couple of thoughts at the end.

There are three things the change brings:

  1. Sub state returns
  2. idem execution module
  3. idem state module

Sub state returns

Below is the current state data return format:

def succeed(name):
    return {
        'name': name,
        'result': True,
        'comment': 'Success!',
        'changes': {
            'old': 'Old thing',
            'new': 'New thing'
        }
    }

And here is the same state return with the new sub_state_run key:

def succeed(name):
    return {
        'name': 'Salt_Idem_State',
        'result': True,
        'comment': 'Success!',
        'changes': {
            'old': 'Old thing',
            'new': 'New thing'
        },
        'sub_state_run': [
            {
                'name': 'Idem_Sub_State',
                'result': True,
                'comment': 'Sub state success!',
                'changes': {
                    'testing': {
                        'old': 'Old sub thing',
                        'new': 'New sub thing'
                    }
                },
                'low': {
                    'name': 'Idem_Sub_State',
                    'state': 'test',
                    '__id__': 'test_state',
                    'fun': 'succeed_with_changes'
                }
            },
        ]
    }

These sub_state_run items will be formatted and printed alongside other Salt states, as long as they look like normal states with low and high data. This is useful for states that can return multiple state runs from an external engine. For example, it could be utilized by state modules that extend tools like Puppet, Chef, Ansible, and Idem. So far, the feature is only used by Idem state in Salt.

idem execution module

First, we need to install Idem on a minion:

% sudo salt minion1 pip.install idem

minion1:
    ----------
    pid:
        17444
    retcode:
        0
    stderr:
    stdout:
        Collecting idem
        ...
        Installing collected packages: aiofiles, aiologger, pop-config, dict-toolbox, pop, colored, toml, rend, acct, idem
        Successfully installed acct-2.3 aiofiles-0.4.0 aiologger-0.6.0 colored-1.4.2 dict-toolbox-1.12 idem-7.5 pop-14 pop-config-6.10 rend-4.2 toml-0.10.1

Now we can try running an Idem execution module:

% sudo salt minion1 idem.exec test.ping

minion1:
    True

Success! This is an Idem ping, not a Salt one, although it is proxied through Salt. The call chain looks like this:

[Salt CLI] -> [salt-master] -> [salt-minion] -> [Salt Idem module] -> [POP] -> [Idem] -> [Idem test.ping]

The same is possible on the salt-master:

% sudo pip3 install idem

Collecting idem
...
Installing collected packages: toml, dict-toolbox, aiofiles, aiologger, pop-config, pop, acct, colored, rend, idem
Successfully installed acct-2.3 aiofiles-0.4.0 aiologger-0.6.0 colored-1.4.2 dict-toolbox-1.12 idem-7.5 pop-14 pop-config-6.10 rend-4.2 toml-0.10.1
% sudo salt-run salt.cmd idem.exec test.ping

[INFO    ] Module /usr/local/lib/python3.8/dist-packages/pop/mods/pop/testing returned virtual FALSE: Async pop testing libs are not available
True
[INFO    ] Runner completed: 20201018154413570141

We got some logging garbage, but the ping has succeeded! The call chain looks different:

[Salt Runner CLI] -> [salt-master] -> [salt.cmd runner] -> [Salt Idem module] -> [POP] -> [Idem] -> [Idem test.ping]

I guess it could be more straightforward if an Idem runner did exist:

% sudo salt-run idem.exec test.ping

Because there are various idem-* cloud drivers being developed, running them directly from the Salt master (through this hypothetical runner or maybe a Salt cloud driver) would be more convenient.

idem state module

Now let’s try to run some Idem states through Salt states. First, we need to write an Idem state (it looks exactly like a Salt one):

test_sub_state:
  test.succeed_with_changes:
    - name: Idem_Sub_State

Because Idem doesn’t have any fileserver components yet (it is pretty much like the salt-call --local command), the state file needs to reside on the target machine. However, we can distribute it using Salt! Let’s save this state into salt://magnesium/idem/idem_state.sls.

Next, we need to write a Salt state that distributes the file to a target machine and then calls the Idem state module:

# magnesium/salt_idem_state.sls
cache_idem_state_file:
  file.cached:
    - name: salt://magnesium/idem/idem_state.sls

idem_state:
  idem.state:
    - runtime: parallel
    - sls: idem_state
    # The dir where Idem will look for idem_state.sls:
    - source_dir: /var/cache/salt/minion/files/base/magnesium/idem

Let’s try to run it:

% sudo salt minion1 state.apply magnesium.salt_idem_state

minion1:
    ----------
    file_|-cache_idem_state_file_|-salt://magnesium/idem/idem_state.sls_|-cached:
        ----------
        __id__:
            cache_idem_state_file
        __run_num__:
            0
        __sls__:
            magnesium.salt_idem_state
        changes:
            ----------
            hash:
                ----------
                new:
                    0c35e83383dbc4310e9763f6c1c92af38d087769386cd44f6c130633be0f61d7
                old:
                    55a744653d314fbf9075d0e676fe03f97aaa5c0d287e50b7dcf0cb61fb7b38b3
        comment:
            File is cached to /var/cache/salt/minion/files/base/magnesium/idem/idem_state.sls
        duration:
            55.144 ms
        name:
            salt://magnesium/idem/idem_state.sls
        result:
            True
        start_time:
            23:29:42.799900
    idem_|-idem_state_|-idem_state_|-state:
        ----------
        __id__:
            idem_state
        __run_num__:
            1
        __sls__:
            magnesium.salt_idem_state
        changes:
            ----------
        comment:
            Ran 1 idem states
        duration:
            86.894 ms
        name:
            idem_state
        result:
            True
        start_time:
            23:29:42.866037
    test_|-idem_sub_state__|-Idem_Sub_State_|-succeed_with_changes:
        ----------
        __run_num__:
            3
        __sls__:
            magnesium.salt_idem_state
        __state_ran__:
            True
        changes:
            ----------
            testing:
                ----------
                new:
                    Something pretended to change
                old:
                    Unchanged
        comment:
            Success!
        duration:
            None
        name:
            Idem_Sub_State
        result:
            True
        start_time:
            None

The output looks weird (unlike the normal highstate output), but I guess it is ok for an experimental feature. I haven’t tried running any cloud drivers (idem-vagrant, idem-virtualbox, idem-aws, etc…), but I plan to do so when I have more free time. So far, I spent about 6 hours experimenting with this Salt/Idem integration, the docs are quite sparse, and I encountered a couple of issues which I plan to report (the examples above are intentionally crafted to avoid them).

Overall, there are a couple of other gaps that aren’t addressed yet.

No reliable ways to install/upgrade

The recommended method to install Idem is to use pip. Unfortunately, the dependencies (and any transitive ones) aren’t pinned to exact package versions. This means installs aren’t repeatable, and something can break unexpectedly in a new environment.

It is harder to manage the upgrade process, as compared to apt/yum, for example.

Some components aren’t regularly uploaded to PyPI, so you have to install them directly from Git.

Atomized dependencies

idem, pop, pop-config, acct, rend, takara, and the various idem-* components. It is good for developers, but for new users, it can be hard to understand how these components interact, how to troubleshoot any problems, and where to file bugs. The documentation is focused on developers, is quite spotty, and dispersed across multiple projects, which also doesn’t help.

No compelling reasons to switch

So far, POP and Idem are mostly marketed to developers. The overall messaging is that these components:

  1. Are a new programming paradigm and a dataflow language
  2. Are easy to compose/merge, support pipelining
  3. Have typed arguments, contracts, etc.
  4. Use asynchronous Python 3 code
  5. Are easy to write
  6. Are easy to teach/handoff to a new developer

There is much less information on why end-users should use these projects instead of (or in addition to) other tools:

Summary

References

PRs #58119 and #57993 by Akmod

Cloud

Linode APIv4

The Linode APIv4 went to the Early Access stage somewhere in 2017, then switched to the General Release stage in May 2018. The SaltStack cloud driver transition was initiated about a year ago (I’ve seen a couple of Slack discussions during the summer of 2019), and the actual PR was merged in Sep 2020. The previous version (APIv3) is marked as deprecated, but I was unable to find a scheduled removal date.

To switch the driver to v4, use the api_version salt-cloud provider option:

my-linode-provider:
  driver: linode
  api_version: v4
  apikey: f4ZsmwtB1c7f85Jdu43RgXVDFlNjuJaeIYV8QMftTqKScEB2vSosFSr...
  password: F00barbaz

PRs #58093 and #58415 by Charlie Kenney

Azure

Enable rudimentary support for Azure MSI-style authentication. This is similar to assigning roles via IAM profiles in Amazon EC2 and prevents the requirement to put secure credentials into config files inside of VMs.

PR #58042 by @edlane

AWS

Other cloud changes

FreeBSD

FreeBSD is now officially supported!

macOS

The majority of macOS-related changes are bugfixes, but as a whole they mean that macOS Big Sur is now supported!

Windows

Networking

Delta Proxy supporting code

These changes prepare some ground for the closed source Delta Proxy feature (that is NOT going to be open-sourced yet because of the implementation complexity). They are mostly focused on passing configuration and context options, subproxy process management, and netmiko proxy module improvements. There is no point in trying to run this code because the proprietary bits are not there, and the minion will likely crash if you set the metaproxy: deltaproxy config option.

PR #58403 by Gareth J. Greenaway

Other networking changes

Performance

SaltSSH

Certificates

Files

Disks and volumes

MySQL

GitFS and Pillar

Per-remote git_pillar_base override

When you have multiple pillar git repositories, and the majority of them have a full set of branches (i.e., develop, qa, staging and master) you can also have a couple of exceptions that have less branches (i.e., just develop and master). In this case, you may want to always point the latter ones to a specific branch on every salt-master (while keeping the ability to override pillarenv). Now you can do so:

git_pillar_base: develop

ext_pillar:
  - git:
    - __env__ https://gitserver/git-pillar.git
    - __env__ https://gitserver/git-pillar2.git:
      - base: master

PR #57288 by Mathieu Parent

Update specific GitFS repos

A new remotes argument to specify which repositories to update:

salt-run fileserver.update backend=git remotes=myrepo,yourrepo

PR #55014 by Mathieu Parent

Bootstrap

There are a couple of improvements in salt-bootstrap.sh script:

Beta Tiamat packages

In addition to the traditional Salt packages, there are new beta Tiamat-based ones (generated nightly). Tiamat is a wrapper around PyInstaller that simplifies the process of building Python projects as a single frozen binary.

The packages are available through artifactory.saltstack.net and there is a README.md with instructions on how to install them on different operating systems.

Nifty tricks

Jinja profiling

This feature adds some fine-grained profiling capabilities to Jinja templates. It could help optimize pillar (think salt-master CPU usage!), and state render times, especially when you abuse heavily use Jinja and various map files.

Let’s take a simple state file (that also uses a yaml map) to see how to profile it:

# magnesium/profile_example.sls
{% import_yaml 'magnesium/profile_map.yaml' as data %}

{% set local_data = {'counter': 0} %}
{% for i in range("0xB00"|int(base=16)) %}
  {% do local_data.update({'counter': i}) %}
{% endfor %}

always-changes-and-succeeds:
  test.succeed_with_changes:
    - comment: "Count: {{ local_data['counter'] }}"
# magnesium/profile_map.yaml
{% set data = {'counter': 0} %}
{% for i in range("0x700BAD"|int(base=16)) %}
  {% do data.update({'counter': i}) %}
{% endfor %}

data: {{ data }}

Here is how profiling-related log messages looked before Salt Magnesium:

% sudo salt-call state.apply magnesium.profile_example -l profile 2>&1 | grep PROFILE

[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'jinja' renderer: 6.4014732837677
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'yaml' renderer: 0.0005137920379638672

We just see the basic render time for each renderer stage, and that’s it. In Salt Magnesium, all import_* statements are also automatically profiled! Note the first log line that is new here and is missing in the previous output:

% sudo salt-call state.apply magnesium.profile_example -l profile 2>&1 | grep PROFILE

[PROFILE ] Time (in seconds) to render import_yaml 'magnesium/profile_map.yaml': 7.446487665176392
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'jinja' renderer: 7.470700025558472
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'yaml' renderer: 0.002416372299194336

This automatic profiling is enabled for the import_yaml, import_json, and import_text statements. Unfortunately, it doesn’t work with native Jinja imports (import and from ... import ...). Below is a modified state file that uses from ... import instead of import_yaml:

# magnesium/profile_example.sls
{% from 'magnesium/profile_map.yaml' import data %}

{% set local_data = {'counter': 0} %}
{% for i in range("0xB00"|int(base=16)) %}
  {% do local_data.update({'counter': i}) %}
{% endfor %}

always-changes-and-succeeds:
  test.succeed_with_changes:
    - comment: "Count: {{ local_data['counter'] }}"

Oops, we’re back to square one (just two default log lines):

% sudo salt-call state.apply magnesium.profile_example -l profile 2>&1 | grep PROFILE

[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'jinja' renderer: 6.4014732837677
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'yaml' renderer: 0.0005137920379638672

Fortunately, the new profile Jinja block statement allows you to dig even further and measure how much time a specific Jinja line or block takes. Let’s add it to the state file to measure the import statement as well as the loop that modifies the local_data dictionary:

# magnesium/profile_example.sls
{% profile as 'import data' %}
{% from 'magnesium/profile_map.yaml' import data %}
{% endprofile %}

{% profile as 'local data' %}
{% set local_data = {'counter': 0} %}
{% for i in range("0xB00"|int(base=16)) %}
  {% do local_data.update({'counter': i}) %}
{% endfor %}
{% endprofile %}

always-changes-and-succeeds:
  test.succeed_with_changes:
    - comment: "Count: {{ local_data['counter'] }}"

Now there are four logged messages! The first one shows how much the from ... import statement takes, and the second one measures the for loop block:

% sudo salt-call state.apply magnesium.profile_example -l profile 2>&1 | grep PROFILE

[PROFILE ] Time (in seconds) to render profile block 'import data': 6.11327338218689
[PROFILE ] Time (in seconds) to render profile block 'local data': 0.002644062042236328
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'jinja' renderer: 6.132274866104126
[PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'yaml' renderer: 0.0004177093505859375

You can add as many profile block statements as you want (and even nest them!) to find what part of your template takes most of the time.

PR #57850 by Justin Findlay and Brian Harring

Pro Tip: if you want to perform some profiling on a remote minion and do not want to leave your cozy salt-master console (running salt-call remotely or checking the logs can be cumbersome), try the following command:

% sudo salt minion1 cmd.run \
  'salt-call state.apply magnesium.profile_example -l profile 2>&1 | grep PROFILE'

minion1:
    [PROFILE ] Time (in seconds) to render profile block 'import data': 6.6108009815216064
    [PROFILE ] Time (in seconds) to render profile block 'local data': 0.002589702606201172
    [PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'jinja' renderer: 6.627198219299316
    [PROFILE ] Time (in seconds) to render '/var/cache/salt/minion/files/base/magnesium/profile_example.sls' using 'yaml' renderer: 0.00037360191345214844

In one shot, you can run a minion process (the salt-call command) with an elevated logging level AND return the resulting logs back to the master.

Cleanup of template context variables

This improvement fixes a couple of long-standing bugs with sls template context variables and also documents all of them:

If the name of the directory containing the sls is contained in sls name, tpldir will return an incorrect result:
VariableHow formula is appliedOld valueValue with enable_slsvars_fixes
tplfilestate.apply formula.formulaformula.slsformula/formula.sls
Blank tplfile value when loading top-level sls files:
VariableState fileOld valueValue with enable_slsvars_fixes
tplfilesalt://example.sls(blank)example.sls
tplfilesalt://formula/example.slsformula/example.slsformula/example.sls
Inclusion of init in sls* variables (which should be directories!) when sls path explicitly contains it:
VariableHow formula is appliedOld valueValue with enable_slsvars_fixes
slspath state.apply formula formula formula
slspath state.apply formula.init formula/init formula
sls_path state.apply formula formula formula
sls_path state.apply formula.init formula_init formula
slsdotpath state.apply formula formula formula
slsdotpath state.apply formula.init formula.init formula
slscolonpath state.apply formula formula formula
slscolonpath state.apply formula.init formula:init formula
tplfile state.apply formula formula/init.sls formula/init.sls
tplfile state.apply formula.init formula/init.sls formula/init.sls
tpldir state.apply formula formula formula
tpldir state.apply formula.init formula formula
tpldot state.apply formula formula formula
tpldot state.apply formula.init formula formula

To enable the improved behavior, add the following option to minion config (and possibly master too, if you use those variables in reactors/orchestrations) and restart the daemon:

features:
  enable_slsvars_fixes: true

The feature flag will go away in Salt Phosphorus (3005), and the fixed functionality will become the default.

PR #58238 by @mlasevich

<RANT INCOMING>

There were two issues with this changeset that I want to highlight.

1. The feature was broken out of the box

It had mandatory (since SEP 10) unit tests, passed multiple reviews, was merged, and released in Salt Magnesium Release Candidate. And yet:

It was quite easy to find serious crashes in previous Salt releases by briefly trying new features. I did that a couple of times when I wrote these unofficial release notes in the past. And it still quite easy to find such bugs right now, despite the SEP 10 and more than a year of improvements. I’m happy to help in testing, but it is scary that I’m able to find issues like this with so little effort. Sometimes I feel like I’m the only person who does that… It is very worrying that SaltStack has no QA processes that can catch such obvious defects.

To me, this tweet made some time ago by Michael DeHaan (Ansible creator) still rings somewhat true:

Here is another one:

2. It also contained yet another feature flag subsystem

I understand the intent behind that, and 100% agree with it. Backward-incompatible and other dangerous changes need to be hidden behind feature flags. However, this topic is quite complex, and SaltStack lacks robust processes around that. There are more than 100 boolean config options in Salt that were introduced over time, and I bet at least some of them are feature flags:

salt-repo % grep ': bool,' salt/config/__init__.py | wc -l
     117

They all are named differently, aren’t organized into namespaces, some of them do not have a defined deprecation timeline (i.e., are just there indefinitely), etc. Moreover, there are flags that only exist in code, aren’t mentioned in the docs, and even can’t be found by the above grep command. This is not sustainable, and there is no system in that…

And now we have this PR that sneaks a generic feature flag subsystem implementation. The new features namespace can be set through config files and needs a minion restart. Okay, looks good.

But Salt already has an existing feature flag subsystem since 2017! The feature flag namespace is named use_superseded (although the name is terrible). And it is more powerful:

Maybe there were perfectly valid reasons to add the second implementation, but they weren’t communicated. All I have is just questions:

  1. Is it necessary to have both? Can the existing one be improved instead?
  2. What should be added to the Salt development docs about adding new feature toggles?
  3. What about the UX? How are end-users supposed to deal with deprecations? Heck, there is even a long-forgotten SEP, that nobody wants to implement…

Oh, and the old module.run syntax (deprecated since 2017) has been “undeprecated”, and it looks like instead of actively and safely managing (and owning) the transition, SaltStack has decided to leave it in limbo…

Am I too spoiled by the excellent feature deprecation process in Django, and how reluctant (in a positive way) they are about adding new settings?

</END OF RANT>

Unless/onlyif result parsing

This feature extends the unless/onlyif module calling capability that was released in Salt Neon and further improved in Salt Sodium. It introduces the new get_return key, that is used to determine the value to parse for modules that return deep data structures. Internally, it relies on the traverse Jinja lookup filter.

Given the state:

test:
  test.nop:
    - name: foo
    - unless:
      - fun: test.arg
        kwarg:
          deep: False

The test.arg execution module call returns kwargs: {deep: False}. We’d like to evaluate that key for the onlyif or unless behavior, but it is not possible. Having a result returned at all is evaluated as True, and thus the state does not run.

Here is how the state looks with the new get_return keyword added:

test:
  test.nop:
    - name: foo
    - unless:
      - fun: test.arg
        kwarg:
          deep: False
        get_return: kwargs:deep

The False return of the module can now be detected and evaluated by the unless/onlyif requisites, and the state is run.

PR #57504 by Christian McHugh

Another interesting twist is buried in the PR comment:

We have an initiative right now to start standardizing returns from execution modules. Though, it’s going to take some time to complete those efforts. We’re aiming to get some validation in place during MG that can start warning of non-standard returns.

I believe that initiative is being explored in PR #58508 by Dmitry Kuzmenko

Other notable changes