x86_64环境如何调整memcpy默认水位线？

背景信息

glibc memcpy默认水位线参数x86_non_temporal_threshold对内存带宽影响较大，业务可以根据实际情况调整水位线大小以获得较优的内存拷贝性能，本节将介绍如何调整memcpy默认水位线及其具体影响。

调整方法

以下为glibc社区推荐配置：

export GLIBC_TUNABLES=glibc.cpu.x86_non_temporal_threshold=$(($(getconf LEVEL3_CACHE_SIZE) * 3 / 4))

memcpy算法综述

在glibc-2.34中，memcpy和memmove共享一套逻辑，其实现算法在glibc的源码中有简要介绍：

sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S

/* memmove/memcpy/mempcpy is implemented as:
   1. Use overlapping load and store to avoid branch.
   2. Load all sources into registers and store them together to avoid
      possible address overlap between source and destination.
   3. If size is 8 * VEC_SIZE or less, load all sources into registers
      and store them together.
   4. If address of destination > address of source, backward copy
      4 * VEC_SIZE at a time with unaligned load and aligned store.
      Load the first 4 * VEC and last VEC before the loop and store
      them after the loop to support overlapping addresses.
   5. Otherwise, forward copy 4 * VEC_SIZE at a time with unaligned
      load and aligned store.  Load the last 4 * VEC and first VEC
      before the loop and store them after the loop to support
      overlapping addresses.
   6. If size >= __x86_shared_non_temporal_threshold and there is no
      overlap between destination and source, use non-temporal store
      instead of aligned store.  */

其中，如第6条所述，如果超过__x86_shared_non_temporal_threshold水线，将使用non-temporal store代替aligned store。non-temporal store使用的mov指令为movntdq指令即绕过CPU L3 cache直接访问内存，在cache missing下相比aligned store省略了读cache和写cache的操作，比较适用于大块内存拷贝的场景。

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

直接提交取消