Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ColumnNumberToName function performance, reduce about 50% memory usage and 50% time cost #1935

Merged
merged 7 commits into from
Jul 5, 2024

Conversation

zhayt
Copy link
Contributor

@zhayt zhayt commented Jul 2, 2024

PR Details

Optimizing memory usage by pre-allocation.

Description

  1. ColumnNumberToName Function:

    • The ColumnNumberToName function previously concatenated rows using the + operator, leading to significant memory allocation when called frequently.
    • The function now uses a pre-allocated slice to build the Excel column name. During construction, the converted number to letter is added to this slice. At the end, the slice of runes is converted to a string and returned.
  2. Sheet Pre-allocation:

    • Added an option to create a Sheet with pre-allocated rows.
    • The xlsxWorksheet structure now contains a field with a slice type. When calling the SetCell... function to add a row to the sheet, it in turn calls the prepareSheetXML function. This function adds an xlsxRow structure to the slice.
    • Previously, multiple calls to prepareSheetXML caused the slice to grow, allocating a lot of memory and performing unnecessary copying. The new approach pre-allocates the maximum number of rows, reducing memory allocations and copying overhead.

Related Issue

  • Resolves Issue #XXX

Motivation and Context

  • When working with a lot of data, some functions create many objects and trigger garbage collection (GC). The GC then loads the processor.
  • This optimization reduces memory allocations and the associated GC overhead, improving performance.

How Has This Been Tested

  • The changes have been tested in a development environment with datasets of varying sizes.
  • Benchmark tests were added to measure performance improvements in memory usage and processing time.
  • Existing unit tests were run to ensure no regressions were introduced.

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

Feel free to modify any part of this to better fit your needs!

zhayt added 3 commits July 2, 2024 11:40
Optimize memory allocation by pre-allocating enough space for the resulting string.
… creation sheet time

Optimize memory by using when SetCell is called many times, the slice grows and allocates a lot of memory each time it reaches its cap limit.
@xuri xuri added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jul 2, 2024
Copy link
Member

@xuri xuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR. I've left some comments.

lib.go Show resolved Hide resolved
sheet.go Outdated Show resolved Hide resolved
Copy link
Member

@xuri xuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we separate these two improvements into 2 PRs, and I will take a reconsider about the design of reallocate rows for worksheets.

@zhayt
Copy link
Contributor Author

zhayt commented Jul 5, 2024

Sure, I will separate these two changes into different PRs. Do I need to create two new PRs, or can I keep the changes related to the ColumnNumberToName function in this PR?

Thanks for reviewing!

@xuri
Copy link
Member

xuri commented Jul 5, 2024

You can keep the changes related to the ColumnNumberToName function in this PR. Thanks.

@zhayt
Copy link
Contributor Author

zhayt commented Jul 5, 2024

Done, I left in this PR only the changes related to the ColumnNumberToName function. Thanks.

Copy link

codecov bot commented Jul 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.30%. Comparing base (4e6457a) to head (d4e406e).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1935   +/-   ##
=======================================
  Coverage   99.30%   99.30%           
=======================================
  Files          32       32           
  Lines       24961    24965    +4     
=======================================
+ Hits        24787    24791    +4     
  Misses         93       93           
  Partials       81       81           
Flag Coverage Δ
unittests 99.30% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@xuri xuri changed the title 876 Optimize ColumnNumberToName function performance, reduce about 50% memory usage and 50% time cost Jul 5, 2024
@xuri xuri added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jul 5, 2024
Copy link
Member

@xuri xuri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution.

@xuri xuri merged commit b18b480 into qax-os:master Jul 5, 2024
34 checks passed
zhangyimingdatiancai pushed a commit to zhangyimingdatiancai/excelize that referenced this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
Status: Performance
Development

Successfully merging this pull request may close these issues.

2 participants