Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: CAS修复时间轮并发问题 #3528

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Jerryzhengtao
Copy link

Please answer some questions before submitting your issue. Thanks!

Which version of XXL-JOB do you using?

2.4.1

Expected behavior

Actual behavior

Steps to reproduce the behavior

问题说明

时间轮Map由于remove和get没有并发控制,存在并发问题。

现假设线程A往map添加jobId,线程2读取jobId, 为了简便,都假设对同一个ringSecond读取

pushTimeRing方法操作序列 remove操作序列
get remove
add list.addAll

此时,假如A获取了list,并且不为null

B进行remove,也获取了不为空的list并且B.read操作先于A.add

步骤 线程A 线程B
1 get
2 remove
3 list.addAll
4 add

那么A.add的jobId将会丢失,因为A往list里add了该id,但map已经没有该list引用了,下次B执行remove时。获取不到该list

在秒级任务超过1w时,就会出现任务丢失现象,概率很低,但并不是没有

解决方式

1.尝试使用过线程安全list,并不管用
2.尝试使用过 issue: #2892的方案,无效
上述两方案能降低丢失概率,方案1在2w任务级别出现任务丢失
方案2在4w级别出现任务丢失。但没有解决本质问题:向没有被map引用的list添加了数据

最终方案:使用AtomicReferenceArray类CAS操作,取代Map。

关键代码如下:

private static void pushTimeRing(int ringSecond,int jobId){
        List<Integer> ringItemData
        =ringArr.getAndSet(ringSecond,null); // 这里get的同时将引用置空,避免remove的时候获取到同一个list并发操作
        if(ringItemData==null){
        ringItemData=new ArrayList<>();
        }
        ringItemData.add(job);
        ringArr.set(ringSecond,ringItemData);     // 由于前面已经将该index位置置空,直接赋值即可
        }


        ...省略部分代码

// 消费线程:
        for(int i=0;i< 2;i++){
        List<Integer> tmpData = ringArr.getAndSet((nowSecond+60-i)%60); // CAS读取并置空,这样put时获取不到即将被消费的list,避免并发操作同一个list
        if(tmpData!=null){
        ringItemData.addAll(tmpData);
        }
        }

经过测试,并未发现任务丢失,并且效率很高,单秒10w任务下,put总时间为毫秒级

ps:

  1. 测试环境为本地IDE,16G,i7-12700
  2. 测试目的为检测任务生产(put)和消费(remove)的并发问题,所以并未执行实际的jobTrigger(可以理解为jobTrigger只打日志)
  3. 测试数据量为5000,1w,2w,4w,10w
  4. 由于条件2,本地任务put操作效率偏高,并发量大,实际肯定偏低,但不会低很多。这块我后续测试补充下

Other information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant