Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add 'expand_table' feature #1475

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

Conversation

lavigne958
Copy link
Collaborator

Add a new feature that allows a user to expand a cell range
into a table.

the expand will look for the right most cell with adjacent value.
the expand will look for the bottom most cell with adjacent value.
the expand will table down from top left celle range to bottom right
value.

closes #1414

Signed-off-by: Alexandre Lavigne [email protected]

@lavigne958 lavigne958 self-assigned this Jun 2, 2024
@lavigne958 lavigne958 requested a review from alifeee June 2, 2024 22:39
Add a new feature that allows a user to expand a cell range
into a table.

the expand will look for the right most cell with adjacent value.
the expand will look for the bottom most cell with adjacent value.
the expand will table down from top left celle range to bottom right
value.

closes #1414

Signed-off-by: Alexandre Lavigne <[email protected]>
Copy link
Collaborator

@alifeee alifeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice self-contained change. good work here :)

I think we need more tests, as I foresee a lot of edge cases, and it would be good to catch those before the feature is in gspread.

For example, what is the expectation for expanding this table from the top left?

1 1 - 1
1 1 1 1
- - 1 -
1 1 1 1

This is not clear to me. Also the internal (private) utils expanding functions are unclear what they do.

As well, we should have a utils test and a worksheet test. Currently there is only utils

gspread/utils.py Outdated Show resolved Hide resolved
gspread/utils.py Outdated Show resolved Hide resolved
gspread/utils.py Outdated Show resolved Hide resolved
gspread/utils.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
tests/utils_test.py Outdated Show resolved Hide resolved
@lavigne958
Copy link
Collaborator Author

lavigne958 commented Jun 13, 2024

this is a nice self-contained change. good work here :)

I think we need more tests, as I foresee a lot of edge cases, and it would be good to catch those before the feature is in gspread.

I thought about it and I thought:
we need to put limits to what we want to handle here:
Do we want to handle empty cells in the middle of the table ?
Do we want to handle cases when the given table is not square ?

For example, what is the expectation for expanding this table from the top left?

1	1	-	1
1	1	1	1
-	-	1	-
1	1	1	1

I would expect something like:

  • assuming the first row in the above example is data and part of the table, top left 1 is A1 coordinate:

| 1 | 1 |
| 1 | 1 |

  • assuming the first row are headers we don't take them:

| 1 | 1 |

This is not clear to me. Also the internal (private) utils expanding functions are unclear what they do.

As well, we should have a utils test and a worksheet test. Currently there is only utils

so far the method in Worksheet class does nothing particular, but I'll add test just in case, so we have a real test with data we get.

@alifeee
Copy link
Collaborator

alifeee commented Jun 18, 2024

I thought about it and I thought:
we need to put limits to what we want to handle here:
Do we want to handle empty cells in the middle of the table ?
Do we want to handle cases when the given table is not square ?

I think a reasonable implementation is "the same as what happens when you CTRL+RIGHT and CTRL+DOWN", i.e., to stop just before empty cells.

As for whether that is what other people will expect... that's another question... if people think it should work a certain way, they may think there is a bug... which is why very clear docstrings are good

assuming the first row are headers we don't take them:
| 1 | 1 |

wait, huh? why are we ignoring the headers? the function should just return a List[List]] with all items in the table starting from the called cell?

I will look back at the code now :)

Copy link
Collaborator

@alifeee alifeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are doing good work on this change :)

I have left some comments. I am a little worried that a new feature will result in something we do not consider, causing errors. But I think we will be ok now.

thanks!!!

gspread/utils.py Outdated Show resolved Hide resolved
gspread/utils.py Outdated Show resolved Hide resolved
gspread/utils.py Outdated Show resolved Hide resolved
gspread/utils.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
gspread/worksheet.py Outdated Show resolved Hide resolved
@alifeee alifeee added this to the 6.2.0 milestone Jun 29, 2024
@alifeee
Copy link
Collaborator

alifeee commented Jun 29, 2024

I have added some new tests. In particular, they are "what should happen" tests. They are:

1 - missing header item

        values = [
            ["A1", "", "C1", ""],
            ["A2", "B2", "C2", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["A1"],
            ["A2"],
            ["A3"],
        ]

2 - missing initial cell

        values = [
            ["", "B1", "C1", ""],
            ["A2", "B2", "C2", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["", "B1", "C1"],
            ["A2", "B2", "C2"],
            ["A3", "B3", "C3"],
        ]

3 - missing first column item

        values = [
            ["A1", "B1", "C1", ""],
            ["", "B2", "C2", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["A1", "B1", "C1"],
            ["", "B2", "C2"],
            ["A3", "B3", "C3"],
        ]

4 - missing last column item

        values = [
            ["A1", "B1", "C1", ""],
            ["A2", "B2", "", ""],
            ["A3", "B3", "C3", ""],
            ["", "", "", ""],
        ]
        expected_table = [
            ["A1", "B1", "C1"],
        ]

Conc.

Are these what you expect? Personally I might want case 4 to end up in the entire table, as I reckon I am more likely to have a full first column, and empty gaps in the final column.

However, if I use the CTRL+RIGHT and CTRL+DOWN shortcuts in most spreadsheet software, then the tables above are the same. So I am happy with the behaviour above.

I am ready to merge this 👍. However, we should also add it to the list of examples and documentation somewhere.

Copy link
Collaborator

@alifeee alifeee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

include in examples docs. otherwise approve :)

@lavigne958
Copy link
Collaborator Author

thank you for the extra tests ! ⭐

I am a bit confused with the tests and what I see when I try it online using keyboard controls 😞

  1. in this case, I agree with you, when I run the same data set with CTRL+RIGHT the cell jumps to C1... which is not expected at all.... I believe here we should keep our behavior it makes more sense to me according to our documentation and what I would expect from the function
  2. I disagree, the first cell is empty, then the array stops here, and returns an empty array 🙄 I checked online with keyboard controls and for some reasons it jumps to celle B1 🤦 then when you go down it goes to B3 (as expected)... I don't know what choice is the best here 😞
  3. I completely agree with it, as expected, like the documentation says: we can't scan for empty cells in the middle of the table.
  4. I completely agree with it, though online when I try, it get to C1, nice, then I use CTRL+DOWN then it jumps to C3 🤦 I believe our example is more logic and provides the expected result.

so for 1., 3. and 4. I agree with your tests and the result is what I would expect.
what is left is case 2., do you think if we start on a blank cell we should start looking right/down for the next cell if it has a value or look right/down for the first non empty cell and start from here ? 🤔

@alifeee
Copy link
Collaborator

alifeee commented Jul 5, 2024

I have made this sheet

https://docs.google.com/spreadsheets/d/1huMwgagFCGTMVBf5mONk_W74-qHZUnpASHuVHnU6Hsk/edit?usp=sharing

image

I think maybe we do not "do the same as CTRL+arrow keys does", as this is confusing. I think we find another rule to follow (perhaps "the same as what xlwings does")

@lavigne958
Copy link
Collaborator Author

Hey thanks for the working sample ! I completely agree with you some behaviors are completely unexpected 😞

To me we should go for what feels logical to us, which seems to be what is described in the xlwing library:

  • going right until first empty cell is found
  • going then down until first empty cell is found.
  • ignore empty starting cell (like you mentioned)
  • table direction should be: down then right as mention in their documentation here: expand parameters

does that sounds good to you ? 🤔
after all we can chose the implementation we like as well, regardless of what the Spreadsheet UI in a web browser does 😆

gspread/utils.py Outdated
Comment on lines 1024 to 1029
# this row is smaller than the others, just keep looking
if col >= len(values[rows]):
continue

if values[rows][col] == "":
return rows
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I noticed we have rows smaller than the starting row sometimes.
I assumed we should keep looking for rows bellow, but I believe this is wrong.

we should look down for any empty value "" OR any row smaller than starting row right ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to answer questions by writing tests. not sure exactly what you mean

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I mean is currently we can have a response from the API that looks like this:

A B C
1 v1 v2 v3
2 v4 v5
3 v6 v7 v8

and currently in the code we:

  1. start looking right from A1, until C1 (latest value in the row)
  2. then we start to look down.
  3. when we check C2, the row 2 returned by the API is smaller than the starting row 1
  4. currently we just ignore it and keep looking to the row below
  5. I believe that's wrong, we should stop.

So the code if col >= len(values[rows]): is wrong we should not continue but instead consider it as an empty cell.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. With my suggestion (look for full top row & left column), this should not matter(?)

@alifeee
Copy link
Collaborator

alifeee commented Jul 9, 2024

I believe my preference is to "find a table" by "complete row 1 and column 1"

so, to look at these h columns

h h h h
h - - -
h - - -
h - - -

...and if they are empty, terminate the table there.

I have updated the tests to do this (please have a look at the tests to see if you agree). What do you think?

@lavigne958
Copy link
Collaborator Author

I believe my preference is to "find a table" by "complete row 1 and column 1"

so, to look at these h columns

h h h h
h - - -
h - - -
h - - -
...and if they are empty, terminate the table there.

I have updated the tests to do this (please have a look at the tests to see if you agree). What do you think?

I agree with this, the only detail would be: look down first then right.
to follow what xlwings does. it's completely arbitrary but it makes us ISO with the xlwings behavior.

@lavigne958
Copy link
Collaborator Author

I did quite a big rework on the iteration functions.

all tests now pass, though the tests already here needed to be adjusted:

  • we now look down first
  • then right
  • so the expected result changes.

it works as expected, like xlwings.

@alifeee
Copy link
Collaborator

alifeee commented Aug 3, 2024

why does xlwings use the left column and bottom row?

it makes more sense to me that one would desire the top row?

@lavigne958
Copy link
Collaborator Author

I know, that's what I did first, but we agreed on making it ISO with xlwings.

I don't mind changing it, I just want us to agree on one implementation.

@alifeee
Copy link
Collaborator

alifeee commented Aug 3, 2024

I don't think that's how xlwings works. I think xlwings uses the top row and left column.

See here for their implementation: https://github.com/xlwings/xlwings/blob/a159cc9cb7ef5f168ad0d902934404bce4a76386/xlwings/expansion.py#L31-L52

(I can't parse it with my mind fully but it looks to use the top row, not the bottom)

see also this comment: xlwings/xlwings#557 (comment)

@lavigne958
Copy link
Collaborator Author

I don't think that's how xlwings works. I think xlwings uses the top row and left column.

See here for their implementation: https://github.com/xlwings/xlwings/blob/a159cc9cb7ef5f168ad0d902934404bce4a76386/xlwings/expansion.py#L31-L52

(I can't parse it with my mind fully but it looks to use the top row, not the bottom)

see also this comment: xlwings/xlwings#557 (comment)

you're right !
same I can't parse the code with my mind, but I understand the same thing and it makes sense. to start from the given cell, go right until empty cell, from starting cell again, go down until empty cell.

this provides the right and down boundaries then the rest is just to extract the matrix from here.

I'll update the code, expect potentially a big code change/refactoring then !

@lavigne958
Copy link
Collaborator Author

lavigne958 commented Aug 19, 2024

Update

  • reworked the way we look for table, look right from starting cell, then look down from starting cell
  • updated the tests suite to reflect the behavior
    • empty starting cell will return an empty list []
    • empty cell on the way right or down will stop the search
    • empty cell in the middle of the matrix will be ignored
    • in the assert functions in the test, iterate over the expected result and expect the values to match
      • I notice tests passing when it should not because we iterate over the result and it's empty so we don't compare anything
  • updated the doc a bit
  • added extra checks on the size of the matrix and the starting cell coordinates in order to raise Exception if starting cell is outside the range of the given matrix.

I hope this covers any wrong inputs, the expected behavior from all (most?) users.

I'll rebase my branch before merge

@alifeee
Copy link
Collaborator

alifeee commented Aug 21, 2024

this looks all good to me!

only final thing is that I would expect a table like

1 2 3
a 0.342 0.145 0.166
b 0.982 0.576 0.600
c 0.340 0.285 0.585

to expand properly, whereas it seems you suggest with

empty starting cell will return an empty list []

that it would return as an empty list

@lavigne958
Copy link
Collaborator Author

Yes to me this function will return the table starting from the given coordinates. The starting cell is part of the resulting table. so if the starting cell is indeed empty then the table will be empty.

So far that seems to be the last key point where we can't find seems to be best. Should we just put it as an option like "ignore empty starting cell" ? Of true then we follow your choice of false we follow mine. This way the user decides the behavior.

What do you think?

@alifeee
Copy link
Collaborator

alifeee commented Aug 30, 2024

apologies for the delay

while I think a special case for a blank row 0 column 0 would be nice (e.g., for a labelled adjacency matrix), it is probably not worth the complication (e.g., if the row 0 column 0 is empty, but also row 1 column 0 and row 0 column 1 are empty, it should return [] not [[""]])

Thus, I think we return [] as you say. If people want their table to be parsed, they should fill the first cell with e.g., "-", or only call the function on the table contents, and use get_row and get_column (e.g.,) to get the headings.

However, there should be a test to validate this behaviour.

After that, I think this is ready to merge!

@lavigne958
Copy link
Collaborator Author

I added a new commit with a test to validate the top left cell empty scenario.
In case we find a top left empty cell we return an empty list [].
Is this the test you referred to in you example ? if not let me know I'll add extra tests if need be.

I added an extra test too in the basic test for find_table to check when we reach the end of the list and the end of the matrix. just in case.

@alifeee
Copy link
Collaborator

alifeee commented Sep 10, 2024

this is good with me!

with the amount of work spent on a new feature... let's put reference to it in all the relevant places in the documentation :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cell.expand('table') feature requests
2 participants