You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
def _get_valid_value(
self,
value: Union['torch.Tensor', np.ndarray, np.number, int, float],
) -> Union[int, float]:
"""Convert value to python built-in type.
Args:
value (torch.Tensor or np.ndarray or np.number or int or float):
value of log.
Returns:
float or int: python built-in type value.
"""
import time
s = time.time()
if isinstance(value, (np.ndarray, np.number)):
assert value.size == 1
value = value.item()
elif isinstance(value, (int, float)):
value = value
else:
# check whether value is torch.Tensor but don't want
# to import torch in this file
assert hasattr(value, 'numel') and value.numel() == 1
value = value.item()
print(f"get_valid_value use {time.time() - s}")
return value # type: ignore
Prerequisite
Environment
OrderedDict([('sys.platform', 'win32'), ('Python', '3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)]'), ('CUDA available', True), ('MUSA available', False), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3070'), ('CUDA_HOME', 'C:\Program Files\
NVIDIA GPU Computing Toolkit\CUDA\v11.6'), ('NVCC', 'Cuda compilation tools, release 11.6, V11.6.55'), ('MSVC', 'Microsoft (R) C/C++ Optimizing Compiler Version 19.38.33130 for x64'), ('GCC', 'n/a'), ('PyTorch', '1.13.1+cu116'), ('PyTorch compiling details', 'PyTorch built with:\n - C++ Version: 199711\n -
MSVC 192829337\n - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)\n - OpenMP 2019\n - LAPACK is enabled (usually provided by MKL)\n - CPU capability usage: AVX2\n
arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37\n - CuDNN 8.3.2 (built against CUDA 11.5)\n - Magma 2.5.4\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.6, CUDNN_VERSION=8.3.2, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl
.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -IC:/actions-runner/_work/pytorch/pytorch/builder/windows/mkl/include -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO, LAPACK_INFO=mkl,
PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.13.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.14.1+cu116'), ('OpenCV', '4.9.0'
), ('MMEngine', '0.10.3')])
Reproduces the problem - code sample
在
mmengine
的logging/message_hub.py
文件中,_get_valid_value
函数在被running_info_hook
的after_train_iter
方法调用时,使用torch.Tensor.item()
进行类型转换时的性能存在问题。我的测试表明,该函数在第一次调用时耗时显著,随后的调用耗时则为零,这在训练循环中由于after_train_iter
被频繁调用而导致整体耗时严重上升。复现步骤
after_train_iter
调用的训练循环。_get_valid_value
函数中torch.Tensor.item()
调用的耗时。以下是调用时间的输出:
get_valid_value use 0.02899909019470215
get_valid_value use 0.0
get_valid_value use 0.0
单位:秒
预期行为
我期望
torch.Tensor.item()
调用不会在首次调用时造成如此显著的延迟。Reproduces the problem - command or script
No comment
Reproduces the problem - error message
No comment
Additional information
为了解决这个性能问题,我建议考虑修改
base_model
中的parse_losses
函数,让其提前进行类型转换,将损失值和准确率等转换为标量(scalars),以避免在_get_valid_value
中进行昂贵的torch.Tensor.item()
调用。这是一个可能的解决方案示例:The text was updated successfully, but these errors were encountered: