Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Allow to read data from a table located in a different sheet #3659

Merged

Conversation

kevin-verschaeve
Copy link
Contributor

@kevin-verschaeve kevin-verschaeve commented Jul 31, 2023

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests -> Not yet, waiting for validation to do them
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

When a cell is referencing a "Table" (structured reference) that is located in a different sheet that the cell itself, the library currently throws an error "{SheetName}!{cellName} -> Table {tableName} for Structured Reference cannot be located".

Here is a simple example excel file with this error
bug_with_table_in_other_sheet.xlsx

This fix will keep the same behaviour, but will also try to read all tables from all sheet before throwing the error.

Please tell me if you think the fix is needed or if there is a better solution.

Fix #3635.

Thanks

@oleibman
Copy link
Collaborator

When I use your code to load the example spreadsheet you provided and save it as another spreadsheet, the result is not correct. Excel shows the formula in sheet Feul1 cell A1 as =Tableau1[@MyCol], with a spurious at sign, rather than =Tableau1[MyCol], and computes this as a Value error. The Xml does not contain this at sign, so I'm not sure why Excel is adding it. Do you get a spreadsheet that avoids this problem when you try load/save?

While running that test, I also got a deprecation warning on line 344 of StructuredReference - columnName is apparently null (not null-string) when the statement is executed. I have no idea whether this is related to the previous problem.

One of several ways to avoid the Phpstan complaint is to use getParentOrThrow rather than getParent in findTableFromOtherSheets.

Finally, please add a test case to show that your change is working as desired.

@kevin-verschaeve
Copy link
Contributor Author

Indeed, when saving the file with the library, a "@" sign is added an cause some trouble.
I have no idea why, and i'm afraid I don't have any case without this issue.

But I don't think it is related to my fix proposal, so I'll add a test case for the fix and I let you decide if you want to merge it or not

@oleibman
Copy link
Collaborator

oleibman commented Aug 2, 2023

Implicit intersection - see https://support.microsoft.com/en-gb/office/implicit-intersection-operator-ce3be07b-0101-4450-a24e-c1c999be2b34 for an explanation of the at sign. I'm not sure exactly what is in the original spreadsheet, but is missing or different in the copy, that allows Excel to read it without adding in the at sign. I do know one difference for sure - in the original, the formula for cell A1 specifies t="array" ref="A1:A4". When I add this to the xml for the copy, things are sort of okayish. Excel complains of corruption when opening up the file, but, if you accept its corrections, the result is okay (although the formula is now changed to {=Tableau1[MyCol]}. Changing AppVersion in app.xml, and changing lastEdited and rupBuild in workbook.xml do not change this behavior. I'm not sure what else might be involved. In some cases, setting preCalculateFormulas to false avoids problems; alas, this is not one of those cases.

Please go ahead and add the test case. I am not yet sure whether we should proceed with this change in light of the implicit intersection problem. If you think that the result is still reasonably usable for you, that is an argument for proceeding, but please do try to confirm that you are really okay with it.

@oleibman
Copy link
Collaborator

oleibman commented Aug 3, 2023

I think I've figured out what I want in the test case. In your sample spreadsheet in sheet Feul1 cell A1, use the formula =SUM(Tableau1[MyCol]). This returns a single value rather than an array, so Excel has no reason to add an at sign to that formula. This benefits from your code change just as your original does. I think PhpSpreadsheet will still have trouble calculating the result of the formula, but Excel will not, so load/save will produce a usable spreadsheet. We can then track PhpSpreadsheet's difficulties with your original formula, and with calculating the result of the new one, as separate issues.

@oleibman
Copy link
Collaborator

Test added, calculation corrected.

@kevin-verschaeve
Copy link
Contributor Author

awesome !

sorry for the time, I was facing another issue with my proposal and tried to fix it before writing the test and going back to you.

I see that with the test you have made some more fixes, and it seems to be what I was looking for, as it seems to fix my issue totally now.

Thank you very much

@kevin-verschaeve
Copy link
Contributor Author

another issue found with some files I'm trying to import is, on this line
I have a $token that is a StructuredReference, but it cannot be converted to string (to be added in the stack), so I get this error:
Object of class PhpOffice\\PhpSpreadsheet\\Calculation\\Engine\\Operands\\StructuredReference could not be converted to string.

Do you want that I open a new issue for that ? Because I'm not sure this is not related to our fix here.

My quick workaround is to add a __toString() method on StructuredReference, but I feel like it is not the right solution.

What do you think ?

Kevin Verschaeve and others added 2 commits August 16, 2023 10:34
Calculate correct result, at least for calculations with a single answer. For an array of answers, result will be similar to other functions which return an array (dynamic arrays not yet fully supported).
@oleibman
Copy link
Collaborator

Without more context, I have no idea whether your toString problem has anything to do with this fix. Feel free to open a new issue. I too have something that I will need to report as an issue once this one has been merged.

@kevin-verschaeve
Copy link
Contributor Author

Test file renamed.

I think we can merge this one if it's ok for you.

I'll open a new issue for the toString issue with a reproducer if I can figure out one

Thanks for the help

@oleibman
Copy link
Collaborator

Thank you for doing the rename. I think I have figured out my issue, which I feel is closely enough related to the original, that I will fix it with this PR. So expect at least one more change before merging. In case you were wondering about my issue, the test case currently tests for =SUM(Tableau1[MyCol]). The new code will also handle =SUM(Tableau1[]). But, so far, it will not handle =SUM(Tableau1), where the table is used like a defined name rather than a structured reference. Excel will handle that, and, as noted above, I think I can get PhpSpreadsheet to do so as well.

See Issue3569Test, which behaves like Excel.
Scrutinizer is correct here.
Worksheet points to Table, Table points back to Worksheet. Circular reference prevents garbage collection. Remove Table collection in Worksheet at destruct time to avoid this problem.
Avoiding memory leak triggered segfault in Phpunit 10. Research and fix later.
Absence of such a method seems to cause problems for untaken IF branches in Calculation.
@oleibman
Copy link
Collaborator

The toString fix is now part of this PR.

@oleibman oleibman merged commit 4971801 into PHPOffice:master Aug 19, 2023
12 checks passed
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Aug 20, 2023
Fix PHPOffice#3679. That issue was opened for a problem which was already solved by PR PHPOffice#3659, however there were additional problems with the spreadsheet.

The main problem is that Data Validations and Conditional Styles can each be supplied in the Xml in either "external" or "internal" formats. The code for each to handle "external" assumes that each is the only "external" item on the worksheet in the Xml, but some of the worksheets in the sample spreadsheet provide both as "external" on some sheets. The code to fix this is verified against the supplied sample, however no formal test has been added for it. The sample is much too large and complicated to be added to the test suite - it takes several minutes to read, and even longer to write (`setPreCalculateFormulas(false)` is highly recommended). I will leave this ticket open for a few days to see if I can hand-craft a suitable test case, but I am not hopeful.

A second problem is that something in the Xlsx Reader `$xmlSheetNS->sheetData->row` loop breaks the selected cell for the worksheet. This is easily fixed and verified by eye (and with the supplied sample), but, again, no explicit test case is added.

A third problem is that drawings which are part of the supplied sample use `srcRect` tags in the Xml to effectively produce a cropped version of the image. This tag has hitherto been ignored. It is now supported in Xlsx Reader, Xlsx Writer, and Worksheet/BaseDrawing object. This is again verified with the supplied sample; unlike the other parts, it was easy to add a new formal test case for this part of the fix.
oleibman added a commit that referenced this pull request Aug 27, 2023
* Fix Several Problems in a Very Complicated Spreadsheet

Fix #3679. That issue was opened for a problem which was already solved by PR #3659, however there were additional problems with the spreadsheet.

The main problem is that Data Validations and Conditional Styles can each be supplied in the Xml in either "external" or "internal" formats. The code for each to handle "external" assumes that each is the only "external" item on the worksheet in the Xml, but some of the worksheets in the sample spreadsheet provide both as "external" on some sheets. The code to fix this is verified against the supplied sample, however no formal test has been added for it. The sample is much too large and complicated to be added to the test suite - it takes several minutes to read, and even longer to write (`setPreCalculateFormulas(false)` is highly recommended). I will leave this ticket open for a few days to see if I can hand-craft a suitable test case, but I am not hopeful.

A second problem is that something in the Xlsx Reader `$xmlSheetNS->sheetData->row` loop breaks the selected cell for the worksheet. This is easily fixed and verified by eye (and with the supplied sample), but, again, no explicit test case is added.

A third problem is that drawings which are part of the supplied sample use `srcRect` tags in the Xml to effectively produce a cropped version of the image. This tag has hitherto been ignored. It is now supported in Xlsx Reader, Xlsx Writer, and Worksheet/BaseDrawing object. This is again verified with the supplied sample; unlike the other parts, it was easy to add a new formal test case for this part of the fix.

* Scrutinizer False Positives
oleibman added a commit to oleibman/PhpSpreadsheet that referenced this pull request Mar 27, 2024
This has come up a number of times, most recently with issue PHPOffice#3901, and also issue PHPOffice#3659. It will certainly come up more often in days to come. Excel is changing formulas which PhpSpreadsheet has output as `=UNIQUE(A1:A19)`; Excel is processing the formula as it were `=@unique(A1:A19)`. This behavior is explained, in part, by PHPOffice#3659 (comment). It is doing so in order to ensure that the function returns only a single value rather than an array of values, in case the spreadsheet is being processed (or possibly was created) by a less current version of Excel which cannot handle the array result.

PhpSpreadsheet follows Excel to a certain extent; it defaults to returning a single calculated value when an array would be returned. Further, its support for outputting an array even when that default is overridden is incomplete. I am not prepared to do everything that Excel does for the array functions (details below), but this PR is a start in that direction. If the default is changed via:
```php
use PhpOffice\PhpSpreadsheet\Calculation\Calculation;
Calculation::setArrayReturnType(Calculation::RETURN_ARRAY_AS_ARRAY);
```
When that is done, `getCalculatedValue` will return an array (no code change necessary). However, Writer/Xlsx will now be updated to look at that value, and if an array is returned in that circumstance, will indicate in the Xml that the result is an array *and* will include a reference to the bounds of the array. This gets us close, although not completely there, to what Excel does, and may be good enough for now. Excel will still mess with the formula, but now it will treat it as `{=UNIQUE(A1:A19)}`. This means that the spreadsheet will now look correct; there will be superficial differences, but all cells will have the expected value.

Technically, the major difference between what PhpSpreadsheet will output now, and what Excel does on its own, is that Excel supplies values in the xml for all the cells in the range. That would be difficult for PhpSpreadsheet to do; that could be a project for another day. Excel will treat the output from PhpSpreadsheet as "Array Formulas" (a.k.a. CSE (control shift enter) formulas because you need to use that combination of keys to manually enter them in older versions of Excel). Current versions of Excel will instead use "Dynamic Array Formulas". Dynamic Array Formulas can be changed by the user; Array Formulas need to be deleted and re-entered if you want to change them. I don't know what else might have to change to get Excel to use the latter for PhpSpreadsheet formulas, and I will probably not even try to look now, saving it for a future date.

Unit testing of this change uncovered a bug in Calculation::calculateCellValue. That routine saves off ArrayReturnType, and may change it, and is supposed to restore it. But it does not do the restore if the calculation throws an exception. It is changed to do so.
@oleibman oleibman mentioned this pull request Jun 5, 2024
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants