Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

finish hw04 #28

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

finish hw04 #28

wants to merge 1 commit into from

Conversation

luozhiya
Copy link

@luozhiya luozhiya commented Mar 8, 2022

小彭老师,又好久不见哈,前段时间非常非常忙,现在才有空做作业,请见谅哈

@luozhiya
Copy link
Author

luozhiya commented Mar 8, 2022

第04讲笔记

效果

优化前

Initial energy: -13.414000
Final energy: -13.356842
Time elapsed: 1304 ms

优化后

Initial energy: -13.414000
Final energy: -13.403915
Time elapsed: 81 ms

优化

SOA

把AOS改成SOA,虽然没有了面向对象的属性,但更方便数据成块读取

对齐

数据对齐成 64 bytes cache line width,有利于CPU 缓存,减少内存访问次数。

AVX256

手写AVX指令,可以一次批量处理8个float数据

  • _mm256_fmadd_ps 这个比自己写加乘要快
  • sqrt使用avx的_mm256_sqrt_ps

循环优化

把在循环中常量放到循环外层,或者减少部分计算次数

  • Gdt和mass的相乘是常量
  • AVX __m256 转 float的reduce_sum单独一个循环可以减少次数
  • 使用AVX,则循环步进可以一次+8

空间换时间

把计算中间结果存为全局变量,有利于减少计算次数

cmake配置

针对MSVC和G++有单独的配置,这个案例只测试Release所以只把编译选项加入Release。

if (CMAKE_COMPILER_IS_GNUCXX)
    target_compile_options(main PRIVATE $<$<CONFIG:Release>:-march=native -funroll-loops -O3>)
endif()
if (MSVC)
    target_compile_options(main PRIVATE $<$<CONFIG:Release>:/arch:AVX2 /fp:fast>)
endif()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant