kubernetes低版本java客户端ProcessorListener容量问题

事故现象

informer突然收不到pod变更的事件
日志报错如下:

INFO [2019/10/22 11:29:15.043][INFO][ReflectorRunnable:98] class io.kubernetes.client.models.V1Pod#Read timeout retry list and watch
INFO [2019/10/22 11:29:16.043][INFO][ReflectorRunnable:45] class io.kubernetes.client.models.V1Pod#Start listing and watching...
Exception in thread "Thread-10" java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98)
	at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:312)
	at io.kubernetes.client.informer.cache.ProcessorListener.add(ProcessorListener.java:75)
	at io.kubernetes.client.informer.cache.SharedProcessor.distribute(SharedProcessor.java:101)
	at io.kubernetes.client.informer.impl.DefaultSharedIndexInformer.handleDeltas(DefaultSharedIndexInformer.java:183)
	at io.kubernetes.client.informer.cache.DeltaFIFO.pop(DeltaFIFO.java:313)
	at io.kubernetes.client.informer.cache.Controller.processLoop(Controller.java:140)
	at io.kubernetes.client.informer.cache.Controller.run(Controller.java:107)
	at java.lang.Thread.run(Thread.java:748)

事故排查

跟踪代码发现ProcessorListener初始化时设定了默认大小1000,当监听pod数量大于1000时就会出现上面异常。

ProcessorListener

  private static final int DEFAULT_QUEUE_CAPACITY = 1000;

  public ProcessorListener(ResourceEventHandler<ApiType> handler, long resyncPeriod) {
    this.resyncPeriod = resyncPeriod;
    this.handler = handler;
    this.queue = new ArrayBlockingQueue<>(DEFAULT_QUEUE_CAPACITY);
......
  }

  public void add(Notification<ApiType> obj) {
    if (obj == null) {
      return;
    }
    this.queue.add(obj);
  }

丫的,该类的queue 还只有这一个初始方法,也就是说没法改变这个容量。

AbstractQueue

    public boolean add(E e) {
        if (offer(e))
            return true;
        else
            throw new IllegalStateException("Queue full");
    }

抛异常后DefaultSharedIndexInformer的controller线程被中断,就再也监听不到新的pod变更消息了。

DefaultSharedIndexInformer

 public DefaultSharedIndexInformer(
      Class<ApiType> apiTypeClass, ListerWatcher listerWatcher, long resyncPeriod) {
    this.resyncCheckPeriodMillis = resyncPeriod;
    this.defaultEventHandlerResyncPeriod = resyncPeriod;

    this.processor = new SharedProcessor<>();
    this.indexer = new Cache();

    DeltaFIFO<ApiType> fifo = new DeltaFIFO<ApiType>(Cache::metaNamespaceKeyFunc, this.indexer);

    this.controller =
        new Controller<ApiType, ApiListType>(
            apiTypeClass,
            fifo,
            listerWatcher,
            this::handleDeltas,
            processor::shouldResync,
            resyncCheckPeriodMillis);

    controllerThread = new Thread(controller::run);
  }

github 找到该bug fix记录
https://github.com/kubernetes-client/java/issues/667
https://github.com/kubernetes-client/java/pull/669

官方解决方案是由ArrayBlockingQueue改成无界队列LinkedBlockingQueue,已经合并到了master。
image.png

事故解决

升级kubernetes client到6.0.1 问题解决。

# k8s 

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×