Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excel Dynamic Arrays (Avoid Adding At-Signs to Formulas) #3962

Merged
merged 42 commits into from
Aug 12, 2024

Commits on Mar 27, 2024

  1. WIP Excel Adding At-Signs to Functions

    This has come up a number of times, most recently with issue PHPOffice#3901, and also issue PHPOffice#3659. It will certainly come up more often in days to come. Excel is changing formulas which PhpSpreadsheet has output as `=UNIQUE(A1:A19)`; Excel is processing the formula as it were `=@unique(A1:A19)`. This behavior is explained, in part, by PHPOffice#3659 (comment). It is doing so in order to ensure that the function returns only a single value rather than an array of values, in case the spreadsheet is being processed (or possibly was created) by a less current version of Excel which cannot handle the array result.
    
    PhpSpreadsheet follows Excel to a certain extent; it defaults to returning a single calculated value when an array would be returned. Further, its support for outputting an array even when that default is overridden is incomplete. I am not prepared to do everything that Excel does for the array functions (details below), but this PR is a start in that direction. If the default is changed via:
    ```php
    use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
    Calculation::setArrayReturnType(Calculation::RETURN_ARRAY_AS_ARRAY);
    ```
    When that is done, `getCalculatedValue` will return an array (no code change necessary). However, Writer/Xlsx will now be updated to look at that value, and if an array is returned in that circumstance, will indicate in the Xml that the result is an array *and* will include a reference to the bounds of the array. This gets us close, although not completely there, to what Excel does, and may be good enough for now. Excel will still mess with the formula, but now it will treat it as `{=UNIQUE(A1:A19)}`. This means that the spreadsheet will now look correct; there will be superficial differences, but all cells will have the expected value.
    
    Technically, the major difference between what PhpSpreadsheet will output now, and what Excel does on its own, is that Excel supplies values in the xml for all the cells in the range. That would be difficult for PhpSpreadsheet to do; that could be a project for another day. Excel will treat the output from PhpSpreadsheet as "Array Formulas" (a.k.a. CSE (control shift enter) formulas because you need to use that combination of keys to manually enter them in older versions of Excel). Current versions of Excel will instead use "Dynamic Array Formulas". Dynamic Array Formulas can be changed by the user; Array Formulas need to be deleted and re-entered if you want to change them. I don't know what else might have to change to get Excel to use the latter for PhpSpreadsheet formulas, and I will probably not even try to look now, saving it for a future date.
    
    Unit testing of this change uncovered a bug in Calculation::calculateCellValue. That routine saves off ArrayReturnType, and may change it, and is supposed to restore it. But it does not do the restore if the calculation throws an exception. It is changed to do so.
    oleibman committed Mar 27, 2024
    Configuration menu
    Copy the full SHA
    be25c44 View commit details
    Browse the repository at this point in the history

Commits on Apr 17, 2024

  1. Xlsx Reader Use Dimensions from Functions With Array Results

    Thinking about PHPOffice#3958 - user wondered if unsupported formulas with array results could be handled better. I said that the answer was "no", but I think Xlsx Reader can make use of the dimensions of the result after all, so the answer is actually "sometimes". This is an initial attempt to do that. Implementing it revealed a bug in how Xlsx Reader handles array formula attributes, and that is now corrected. Likewise, Xlsx Writer did not indicate a value for the first cell in the array, and does now.
    oleibman committed Apr 17, 2024
    Configuration menu
    Copy the full SHA
    b53a4b7 View commit details
    Browse the repository at this point in the history

Commits on Apr 19, 2024

  1. Sample Submitted by @jr212

    Address sample code submitted by @jr212 which was not working correctly.
    oleibman committed Apr 19, 2024
    Configuration menu
    Copy the full SHA
    61f24ca View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    74f312b View commit details
    Browse the repository at this point in the history

Commits on May 31, 2024

  1. Configuration menu
    Copy the full SHA
    c78888c View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2024

  1. Configuration menu
    Copy the full SHA
    5e7ebf3 View commit details
    Browse the repository at this point in the history

Commits on Jun 5, 2024

  1. Configuration menu
    Copy the full SHA
    b79cd20 View commit details
    Browse the repository at this point in the history

Commits on Jun 6, 2024

  1. Populate Rest of Array Cells, UNIQUE Changes

    See issue PHPOffice#4062. When calculating an array formula, populate all the cells associated with the result. This is almost the same as Excel's behavior. As yet, there is no attempt to create a #SPILL error, so cells may be inappropriately overwritten. Also, if the array size shrinks (e.g. there are fewer unique values than before),  no attempt is made to unpopulate the cells which were in range but are now outside the new dimensions. Spill and unpopulation are somewhat related, and will probably be handled at the same time, but their time has not yet come.
    
    UNIQUE, at least for rows, was treating all cell (calculated) values as strings. This is not the same behavior as Excel, which will preserve datatypes, and treat int 3 and string 3 as unique values. Excel will, however, treat int 3 and float 3.0 as non-unique. Within UNIQUE, private function uniqueByRow is changed to try to preserve the the datatype when executing (it will probably treat 3.0 as int - I don't know how I can, or even if I should attempt to, do better - but no int nor float should be treated as a string).
    oleibman committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    08ba00b View commit details
    Browse the repository at this point in the history
  2. Formatting

    oleibman committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    856a00b View commit details
    Browse the repository at this point in the history
  3. Incorrect Case for Filename

    oleibman committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    1b21984 View commit details
    Browse the repository at this point in the history
  4. More Formatting

    Frustrating morning.
    oleibman committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    47481c6 View commit details
    Browse the repository at this point in the history
  5. Still More Formatting

    I think I should go back to bed.
    oleibman committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    6b5bf84 View commit details
    Browse the repository at this point in the history
  6. Add TODO Note

    ArrayFunctions2Test - the calculations seem too complicated for PhpSpreadsheet. The debug log is 21,300 lines, so I don't know how far I will get with it.
    oleibman committed Jun 6, 2024
    Configuration menu
    Copy the full SHA
    3daac0a View commit details
    Browse the repository at this point in the history

Commits on Jun 10, 2024

  1. Excel Handle Array Functions as Dynamic Rather than CSE

    With a number of changes, PhpSpreadsheet can finally generate a spreadsheet which Excel will recognize as a Dynamic Array function rather than CSE. In particular, changes are needed to ContentTypes, workbook.xml.rels, cell definitions in the worksheet, and a new metadata.xml is added.
    oleibman committed Jun 10, 2024
    Configuration menu
    Copy the full SHA
    ef176f3 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    846fec7 View commit details
    Browse the repository at this point in the history

Commits on Jun 14, 2024

  1. CONCATENATE Changes, and Csv/Html/Ods Support

    The CONCATENATE function has been treated as equivalent to CONCAT. This is not how it is treated in Excel; it is closer to (and probably identical to) the ampersand concatenate operator. The difference manifests itself when any of the arguments is an array (typically a cell range). Code is added to support this difference.
    
    Support for array results is added to Csv Writer, Html Writer, and Ods Reader and Writer. I have not figured out how to get it to work with Xls.
    oleibman committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    ad2194d View commit details
    Browse the repository at this point in the history
  2. Drop Some Dead Code

    oleibman committed Jun 14, 2024
    Configuration menu
    Copy the full SHA
    784e8a0 View commit details
    Browse the repository at this point in the history

Commits on Jun 17, 2024

  1. Spills

    Implement SPILL for dynamic arrays. Calculating a dynamic array function will result in a SPILL error if it attempts to overlay a non-null cell which was not part of its previous calculation. Furthermore, it will set to null all cells which were part of its previous calculation but which are not part of the current one (i.e. one or both of the dimensions of the calculation is smaller than it had been); this should also apply for spills (whose result is reduced to 1*1).
    
    Excel will stop you from changing the value in any cell in a dynamic array except the formula cell itself. I have not built this particular aspect into PhpSpreadsheet.
    
    As usual, MS has taken some unusual steps here. If the result of a dynamic array calculation is #SPILL!, it will nevertheless be written to the xml as #VALUE!. It recognizes this situation by adding a new `vm` attribute to the cell, and expanding metadata.xml to recognize this.
    
    A new optional parameter `$reduceArrays` is added to `toArray` and related functions. This will reduce a dynamic array to its first cell, which seems more useful than outputing it as an array (default).
    oleibman committed Jun 17, 2024
    Configuration menu
    Copy the full SHA
    d770012 View commit details
    Browse the repository at this point in the history
  2. Eliminate Dead Code

    oleibman committed Jun 17, 2024
    Configuration menu
    Copy the full SHA
    8609b78 View commit details
    Browse the repository at this point in the history

Commits on Jun 20, 2024

  1. Spill Operator

    Spill operator now works both as trailing `#` and ARRAYANCHOR function. `#` is converted to ARRAYANCHOR when writing. I do not think it is important to convert the other way when reading.
    
    Documentation updates have started, but are a work in progress.
    
    SINGLE function is implemented. I believe it works correctly when referring to a cell, but not when referring to a cell range. No attempt is yet made to convert leading `@` to and from SINGLE; I haven't figured out how to do so without interfering with `@` in structured references.
    
    ISREF has problems. At least one of its tests was wrong, and many of those that were right were so accidentally. The code is changed, quite kludgily, so that almost all the tests are now deliberately correct. One very complicated test is incorrect; for now, I will skip it, and will open an issue when this PR is merged.
    oleibman committed Jun 20, 2024
    Configuration menu
    Copy the full SHA
    2c9e2e2 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0b471ef View commit details
    Browse the repository at this point in the history

Commits on Jun 21, 2024

  1. SINGLE Function, and Gnumeric

    SINGLE function can be used to return first value from a dynamic array result, or to return the value of the cell which matches the current row (VALUE error if not match) for a range. Excel allows you to specify an at-sign unary operator rather than SINGLE function; this PR does not permit that.
    
    Add support for reading CSE array functions for Gnumeric.
    
    Throw an exception if setValueExplicit Formula is invalid (not a string, or doesn't begin with equal sign. This is equivalent to what happens when setValueExplicit Numeric specifies a non-numeric value.
    
    Added a number of tests from PR PHPOffice#2787.
    oleibman committed Jun 21, 2024
    Configuration menu
    Copy the full SHA
    3a690a7 View commit details
    Browse the repository at this point in the history

Commits on Jun 22, 2024

  1. Mostly Docs and Tests

    Also support for Xml format.
    oleibman committed Jun 22, 2024
    Configuration menu
    Copy the full SHA
    ef2b5b9 View commit details
    Browse the repository at this point in the history
  2. Dead Code

    oleibman committed Jun 22, 2024
    Configuration menu
    Copy the full SHA
    fed7a32 View commit details
    Browse the repository at this point in the history

Commits on Jun 23, 2024

  1. Minor Tweaks

    oleibman committed Jun 23, 2024
    Configuration menu
    Copy the full SHA
    2076a07 View commit details
    Browse the repository at this point in the history

Commits on Jun 30, 2024

  1. Use CELL("width") As Another Unimplemented Array Function

    I might want to implement CHOOSECOLS after this PR is merged.
    oleibman committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    b051d49 View commit details
    Browse the repository at this point in the history
  2. Resolve Merge Conflict

    oleibman committed Jun 30, 2024
    Configuration menu
    Copy the full SHA
    1253f35 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    edf7f71 View commit details
    Browse the repository at this point in the history

Commits on Jul 1, 2024

  1. Configuration menu
    Copy the full SHA
    5295704 View commit details
    Browse the repository at this point in the history

Commits on Jul 5, 2024

  1. Configuration menu
    Copy the full SHA
    98ba95e View commit details
    Browse the repository at this point in the history

Commits on Jul 7, 2024

  1. Configuration menu
    Copy the full SHA
    b22d1f5 View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2024

  1. Shot Myself in Foot

    PR PHPOffice#4073 adds parsing of formulas for `$cell->setValue()`, but the spill operator won't parse correctly, breaking some of this PR. Add code to avoid that problem.
    oleibman committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    5b18dcf View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2024

  1. Anchor Cell Without Spill

    Some examples submitted by @infojunkie have demonstrated that an anchor cell without a spill operator was not being handled consistently with Excel. This means that the calculated value of such a cell needs to be a scalar when using it as part of a formula without the spill operator, but as an array when using it with the spill operator or when just getting the cell's value. This is tricky. My solution seems awfully kludgey to me, but it does seem to work.
    
    I have another change that I will want to make in a day or so. When that change is pushed, I will take this back out of draft status, and will now aim for an install date of about August 6.
    
    This particular commit removes the changing of array return type during calculations. It's been this way for a very long time, but I don't understand why it should have been needed in the first place. It causes problems (you need to be sure to restore the original value even when, for example, you throw an exception during calculation). I am relieved that it caused a problem only for one test member. The TEXTSPLIT function really makes sense only when you are returning arrays as arrays. Its test now needs to set that value explicitly since the Calculation engine is no longer changing it under the covers. No other tests broke as a result of this change.
    
    One other test "broke" as a result of this commit. One of the tests for INDIRECT had been expecting a null value, and the test was commented with "Excel result is 0". The PhpSpreadsheet result is now 0 as well, so this would seem to be a bugfix rather than a breaking change.
    oleibman committed Jul 10, 2024
    Configuration menu
    Copy the full SHA
    fda4855 View commit details
    Browse the repository at this point in the history
  2. Instance Variable for Array Return Type

    Till now we have used a static variable/getter/setter to decide what type of result should be returned when a formula is evaluated and an array is the result. This is messy; it would be much better to use an instance variable instead. We cannot eliminate `setArrayReturnType` and `getArrayReturnType` because that would be a BC break. I am considering whether they should be deprecated. In the meantime, I have added a new instance property `instanceArrayReturnType` with getter and setter methods. The property is initially null, and, if it remains so when needed, the static property will be used instead. However, if it is set, its value will be used.
    oleibman committed Jul 10, 2024
    Configuration menu
    Copy the full SHA
    b00dd47 View commit details
    Browse the repository at this point in the history
  3. Valid Scrutinizer Messages

    oleibman committed Jul 10, 2024
    Configuration menu
    Copy the full SHA
    a405ca0 View commit details
    Browse the repository at this point in the history
  4. More Scrutinizer

    oleibman committed Jul 10, 2024
    Configuration menu
    Copy the full SHA
    a084e7d View commit details
    Browse the repository at this point in the history

Commits on Jul 23, 2024

  1. Configuration menu
    Copy the full SHA
    312cd5a View commit details
    Browse the repository at this point in the history

Commits on Jul 31, 2024

  1. Configuration menu
    Copy the full SHA
    768dd75 View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2024

  1. Configuration menu
    Copy the full SHA
    f3ae0bd View commit details
    Browse the repository at this point in the history
  2. Update Changelog and Docs Prior to Merge Next Week

    This will, I hope, be my last change prior to merge on August 7. PR is fully synced with master (except for this change), and, except for an emergency, I do not intend to merge anything else before this.
    oleibman committed Aug 2, 2024
    Configuration menu
    Copy the full SHA
    e5e6bde View commit details
    Browse the repository at this point in the history

Commits on Aug 10, 2024

  1. Configuration menu
    Copy the full SHA
    823cb2d View commit details
    Browse the repository at this point in the history
  2. Update CHANGELOG.md

    oleibman committed Aug 10, 2024
    Configuration menu
    Copy the full SHA
    fdbf333 View commit details
    Browse the repository at this point in the history