Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Array of Hashes comparison result in a change at each run #327

Open
Clebam opened this issue Jun 21, 2024 · 3 comments
Open

Array of Hashes comparison result in a change at each run #327

Clebam opened this issue Jun 21, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@Clebam
Copy link
Contributor

Clebam commented Jun 21, 2024

Describe the Bug

When setting a resource that have a parameter with an Array of Hash, it compares with every entry of the hash, even though they are optional. This results in a perma-change (each run sets the resource again and again)

Expected Behavior

The values in the Array of Hash should not be set at each run

Steps to Reproduce

Module : webadministration dsc
Resource : dsc_website (assuming you have already set a pool to host the website)

  dsc_website { 'machin.fr':
      dsc_ensure          => 'Present',
      dsc_name            => 'machin.fr',
      dsc_physicalpath    => 'd:/www/sites/machin.fr',
      dsc_applicationpool => 'machin.fr',
      dsc_bindinginfo      => [{
            hostname           => 'machin.fr',
            protocol           => 'http',
            port               => 80,
      }],
    }

The part that is interesting is dsc_bindinginfo
The run (each run in fact) results with this

Notice: /Stage[main]/Main/Node[default]/Dsc_website[machin.fr]/dsc_bindinginfo: dsc_bindinginfo changed [
  {
    'bindinginformation' => '*:80:machin.fr',
    'certificatestorename' => undef,
    'certificatesubject' => undef,
    'certificatethumbprint' => undef,
    'hostname' => 'machin.fr',
    'ipaddress' => '*',
    'port' => 80,
    'protocol' => 'http',
    'sslflags' => '0'
  }] to [
  {
    'hostname' => 'machin.fr',
    'protocol' => 'http',
    'port' => 80
  }] (corrective)
Notice: dsc_website[{:name=>"machin.fr", :dsc_name=>"machin.fr"}]: Updating: Finished in 1.32 seconds

Environment

  • Version 7.23
  • Platform Windows 2022

Additional Context

  1. There is an easy fix, but I do not find it acceptable. (It is using validation_mode: resource). It does resolve the issue but the execution time skyrockets from 0.1 sec to 1.5 sec. per website. Moreover, I think it concerns each parameter in every resources that has mof_is_embedded: true So having each resource taking a full second, you end up easily with 5 min runs when you handle many resources. So I hope there is an alternative

  2. So here is the type definition of dsc_bindinginfo

    dsc_bindinginfo: {
      type: "Optional[Array[Struct[{
              sslflags => Optional[Enum['0', '1', '2', '3']],
              certificatestorename => Optional[Enum['My', 'my', 'WebHosting', 'webhosting']],
              certificatethumbprint => Optional[String],
              hostname => Optional[String],
              certificatesubject => Optional[String],
              bindinginformation => Optional[String],
              port => Optional[Integer[0, 65535]],
              ipaddress => Optional[String],
              protocol => Enum['http', 'https', 'msmq.formatname', 'net.msmq', 'net.pipe', 'net.tcp']
            }]]]",
      desc: "Website's binding information in the form of an array of embedded instances of the DSC_WebBindingInformation CIM class.",

      mandatory_for_get: false,
      mandatory_for_set: false,
      mof_type: 'DSC_WebBindingInformation[]',
      mof_is_embedded: true,
    },

The sub parameters are almost all optional.
I did try to specifiy them all by the way, but I can't specify undef for certificatestorename :( (and this would be a poor workaround)

I this the handling of this type of resources is link to mof_is_embedded: true (with a dedicated handling for pscredential)

I tried to debug dsc_base_provider.rb to understand what is causing this issue, but could not figure it out yet. :/

@Clebam Clebam added the bug Something isn't working label Jun 21, 2024
@Clebam
Copy link
Contributor Author

Clebam commented Jun 24, 2024

Hi,

I spent some time trying to find a way to handle this. I would suggest a piece of code that is best effort

Code location : https://github.com/puppetlabs/ruby-pwsh/blob/main/lib/puppet/provider/dsc_base_provider/dsc_base_provider.rb#L391

# Special handling for CIM Instances
if data[type_key].is_a?(Enumerable)
  downcase_hash_keys!(data[type_key])
  munge_cim_instances!(data[type_key])
  if data[type_key].respond_to?('each') and name_hash[type_key].respond_to?('each')
    if data[type_key].length() == name_hash[type_key].length()
      name_hash_merged=name_hash[type_key].zip(data[type_key]).map do |hash1, hash2|
        hash1 ||= {}
        hash2.merge(hash1 || {})
      end
      name_hash_merged_sorted = recursively_sort(name_hash_merged)
      data_sorted = recursively_sort(data[type_key])
      if name_hash_merged_sorted == data_sorted
        name_hash[type_key] = name_hash_merged_sorted
      end
    end
  end
end

Current behavior (without this piece of code)

If the INPUT array of hash is not strictly the same as the one retrieved (each key, each value) => There is a change
This cause always a permanent change at each run.

Behavior with the piece of code

Case 1 : Ressources are different length

==> Run as current behavior

Case 2 : Ressources are same length but after merge, ressources are still different

==> Run as current behavior

Case 3 : Ressources are same length and are equal after merge

==> Replace current declaration (name_hash) with the merge. This ends up considering ressource are equal and no changes are done
Note : This relies on the fact that gathered data are retrieved in the same order as they are written. Otherwise you fall down to Case 2

This implies that [name: john, surname: doe] will overwrite [surname: doe, name: john]. But if the system reads in the order it was written, it will not redo this on subsequent runs.

In my opinion it is not perfect but still better than always rewriting the resource.
And if you fall in a case where you always fall on case 2, well, it is the current behavior anyway and you're doomed to use "validation_mode => resource"

@Clebam
Copy link
Contributor Author

Clebam commented Sep 20, 2024

Hi, I have spent a bit more time on this recently. I have sort of a "good enough" solution.
Will probably make a PR soon

# Special handling for CIM Instances
      if data[type_key].is_a?(Enumerable)
        downcase_hash_keys!(data[type_key])
        munge_cim_instances!(data[type_key])

        # Partially handle nested Arrays/Hashes
        # Should properly compare (ie. not try to make a permanent change on the ressource) the following format
        # - Arrays (my_object = ['foo', 'bar'])
        # - Nested Arrays (my_object = [['foo1', 'bar1'], ['foo2','bar2']])
        # - Array of Hash (my_object = [{:foo1 => 'bar1'}, {:foo2 => 'bar2'}])
        # Deeper nested object may not work and require "validation_mode => resource" to avoid permanent changes
        # This is a "good enough" handling of nested object, to avoid an increasing amount of computing the deeper the nest is.
        if data[type_key].class == name_hash[type_key].class && data[type_key].length() == name_hash[type_key].length()
          if name_hash[type_key].is_a?(Array)
            if name_hash[type_key][0].is_a?(Hash)
              merge = name_hash[type_key].zip(data[type_key]).map do |h1, h2|
                        h1 ||= {}
                        h2.merge(h1 || {})
                      end
            else
              merge = name_hash[type_key] | data[type_key]
            end
          elsif name_hash[type_key].is_a?(Hash)
            merge = name_hash[type_key].merge(data[type_key] || {})
          end

          merge_sorted = recursively_sort(merge)
          data_sorted = recursively_sort(data[type_key])
          if merge_sorted == data_sorted
            name_hash[type_key] = merge_sorted
          end
        end
      end

@Clebam
Copy link
Contributor Author

Clebam commented Sep 21, 2024

Upon further testing I found one use case where you can trigger a permanent change (though it seems a bit convoluted)

First :
Declare your array of hash

    dsc_bindinginfo:
      - hostname: "machin.fr"
        protocol: http
        port: 80
      - hostname: "machin.es"
        protocol: http
        port: 80
      - hostname: "machin.fr"
        protocol: http
        port: 8080
      - hostname: "machin.lt"
        protocol: http
        port: 8080

then run puppet. The resources is applied

Second run and following run : no permanent change

Update your declaration by changing the order (but not changing or addind anything)

    dsc_bindinginfo:
      - hostname: "machin.es"
        protocol: http
        port: 80
      - hostname: "machin.fr"
        protocol: http
        port: 80
      - hostname: "machin.fr"
        protocol: http
        port: 8080
      - hostname: "machin.lt"
        protocol: http
        port: 8080

This would trigger a permanent change because the comparison will always check the order it was written at first. But given order does not really matter, nothing is actually changed.

However, if you add an element or change any value, then the order will be written with the last one you declared.

I guess it is acceptable. (Given now we have always a permanent change)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant