文章

startForegroundService() did not then call startForeground() Crash崩溃排查记录

首先看堆栈:

1
2
3
4
5
6
7
8
9
10
11
android.app.RemoteServiceException

Context.startForegroundService() did not then call Service.startForeground(): ServiceRecord{413a500 u0 com.xx.xx/com.xx.baseUi.service.BackgroundKeepService}

android.app.ActivityThread$H.handleMessage(ActivityThread.java:1796)
android.os.Handler.dispatchMessage(Handler.java:106)
android.os.Looper.loop(Looper.java:213)
android.app.ActivityThread.main(ActivityThread.java:6954)
java.lang.reflect.Method.invoke(Native Method)
com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
com.android.internal.os.ZygoteInit.main(ZygoteInit.java:909)

大概是说 startForegroundService 但是没有调用 Service.startForeground()。 回到业务代码里一看,BackgroundKeepService 的 onCreate 方法里确实已经调用了。这一下就很迷惑了…

思来想去都觉得这不会是AOSP的一个Bug吧… 有疑问,看源码!

在Framework的 ActiveService 里面有两个地方会抛出 Context.startForegroundService() did not then call Service.startForeground() 异常,分别是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
   void serviceForegroundTimeout(ServiceRecord r) {
        ProcessRecord app;
        synchronized (mAm) {
            if (!r.fgRequired || !r.fgWaiting || r.destroying) {
                return;
            }

            app = r.app;
            if (app != null && app.isDebugging()) {
                // The app's being debugged; let it ride
                return;
            }

            if (DEBUG_BACKGROUND_CHECK) {
                Slog.i(TAG, "Service foreground-required timeout for " + r);
            }
            r.fgWaiting = false;
            stopServiceLocked(r, false);
        }

        if (app != null) {
            final String annotation = "Context.startForegroundService() did not then call "
                    + "Service.startForeground(): " + r;
            Message msg = mAm.mHandler.obtainMessage(
                    ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_ANR_MSG);
            SomeArgs args = SomeArgs.obtain();
            args.arg1 = app;
            args.arg2 = annotation;
            msg.obj = args;
            mAm.mHandler.sendMessageDelayed(msg,
                    mAm.mConstants.mServiceStartForegroundAnrDelayMs);
        }
    }

1
2
3
4
5
6
7
8
9
 void serviceForegroundCrash(ProcessRecord app, String serviceRecord,
            ComponentName service) {
        mAm.crashApplicationWithTypeWithExtras(
                app.uid, app.getPid(), app.info.packageName, app.userId,
                "Context.startForegroundService() did not then call " + "Service.startForeground(): "
                    + serviceRecord, false /*force*/,
                ForegroundServiceDidNotStartInTimeException.TYPE_ID,
                ForegroundServiceDidNotStartInTimeException.createExtrasForService(service));
    }

由于第一个抛出的是ANR不是Crash,不是我们的目标,那么就重点分析 serviceForegroundCrash 方法。定位到调用的地方 ActivityManagerService 中有一段这样的代码:

1
2
3
4
5
6
7
8
         case SERVICE_FOREGROUND_CRASH_MSG: {
                SomeArgs args = (SomeArgs) msg.obj;
                mServices.serviceForegroundCrash(
                        (ProcessRecord) args.arg1,
                        (String) args.arg2,
                        (ComponentName) args.arg3);
                args.recycle();
            } break;

Crash并且符合堆栈抛出的特征。查找调用的地方在 ActiveService 的 bringDownServiceLocked 方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
private void bringDownServiceLocked(ServiceRecord r, boolean enqueueOomAdj) {
           //....
            mAm.mAppOpsService.finishOperation(AppOpsManager.getToken(mAm.mAppOpsService),
                    AppOpsManager.OP_START_FOREGROUND, r.appInfo.uid, r.packageName, null);
            mAm.mHandler.removeMessages(
                    ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG, r);
            if (r.app != null) {
                Message msg = mAm.mHandler.obtainMessage(
                        ActivityManagerService.SERVICE_FOREGROUND_CRASH_MSG);
                SomeArgs args = SomeArgs.obtain();
                args.arg1 = r.app;
                args.arg2 = r.toString();
                args.arg3 = r.getComponentName();

                msg.obj = args;
                mAm.mHandler.sendMessage(msg);
            }
        }

       //...
    }

继续分析,在 ActiveService 里共有9处调用这个方法的,排除掉应用启动、杀进程等场景的调用,其中 private String bringUpServiceLocked() 中两处调用,private final void bringDownServiceIfNeededLocked() 中一处调用。

OK,问题简化。进一步分析 private String bringUpServiceLocked() 中两处和进程启动和权限相关,那么可以确定不是这个问题。 问题再简化,针对 private final void bringDownServiceIfNeededLocked() 分析即可。 ActiveService 里共有三处调用的地方,分别是:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
  private void stopServiceLocked(ServiceRecord service, boolean enqueueOomAdj) {
        //...

        bringDownServiceIfNeededLocked(service, false, false, enqueueOomAdj); //这里
    }
    
        boolean stopServiceTokenLocked(ComponentName className, IBinder token,
            int startId) {
        if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "stopServiceToken: " + className
                + " " + token + " startId=" + startId);
        ServiceRecord r = findServiceLocked(className, token, UserHandle.getCallingUserId());
        if (r != null) {
            //...
            r.callStart = false;
            final long origId = Binder.clearCallingIdentity();
            bringDownServiceIfNeededLocked(r, false, false, false); //这里
            Binder.restoreCallingIdentity(origId);
            return true;
        }
        return false;
    }
    
     void removeConnectionLocked(ConnectionRecord c, ProcessRecord skipApp,
            ActivityServiceConnectionsHolder skipAct, boolean enqueueOomAdj) {
                //...
                bringDownServiceIfNeededLocked(s, true, hasAutoCreate, enqueueOomAdj); //这里
            }
        }
    }
    

最后一个是connection相关的,由于这个Service并没有Binder connection排除。再此分析下来竟然和stopService有关系。啊!这…

先来把代码里stopService的代码注掉,跑一遍Monkey,果然没事了。

那么究竟为什么stopService会导致这个Crash?让我们先追踪一下stopService的源码。先看 ActivityManagerService 的 stopService()

1
2
3
4
5
6
7
8
9
10
11
12
13
   @Override
    public int stopService(IApplicationThread caller, Intent service,
            String resolvedType, int userId) {
        enforceNotIsolatedCaller("stopService");
        // Refuse possible leaked file descriptors
        if (service != null && service.hasFileDescriptors() == true) {
            throw new IllegalArgumentException("File descriptors passed in Intent");
        }

        synchronized(this) {
            return mServices.stopServiceLocked(caller, service, resolvedType, userId);
        }
    }

再追踪 mServices.stopServiceLocked() 到 ActiveServices的 int stopServiceLocked(IApplicationThread caller, Intent service,String resolvedType, int userId) 方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    int stopServiceLocked(IApplicationThread caller, Intent service,
            String resolvedType, int userId) {
        if (DEBUG_SERVICE) Slog.v(TAG_SERVICE, "stopService: " + service
                + " type=" + resolvedType);

        final ProcessRecord callerApp = mAm.getRecordForAppLOSP(caller);
        if (caller != null && callerApp == null) {
            throw new SecurityException(
                    "Unable to find app for caller " + caller
                    + " (pid=" + Binder.getCallingPid()
                    + ") when stopping service " + service);
        }

        // If this service is active, make sure it is stopped.
        ServiceLookupResult r = retrieveServiceLocked(service, null, resolvedType, null,
                Binder.getCallingPid(), Binder.getCallingUid(), userId, false, false, false, false);
        if (r != null) {
            if (r.record != null) {
                final long origId = Binder.clearCallingIdentity();
                try {
                    stopServiceLocked(r.record, false);//这里
                } finally {
                    Binder.restoreCallingIdentity(origId);
                }
                return 1;
            }
            return -1;
        }

        return 0;
    }

到这里,就可以确定和上面的 private void stopServiceLocked(ServiceRecord service, boolean enqueueOomAdj) 对上了。细看这里面的逻辑:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
        // Check to see if the service had been started as foreground, but being
        // brought down before actually showing a notification.  That is not allowed.
        if (r.fgRequired) {
            Slog.w(TAG_SERVICE, "Bringing down service while still waiting for start foreground: "
                    + r);
            r.fgRequired = false;
            r.fgWaiting = false;
            synchronized (mAm.mProcessStats.mLock) {
                ServiceState stracker = r.getTracker();
                if (stracker != null) {
                    stracker.setForeground(false, mAm.mProcessStats.getMemFactorLocked(),
                            SystemClock.uptimeMillis());
                }
            }
            mAm.mAppOpsService.finishOperation(AppOpsManager.getToken(mAm.mAppOpsService),
                    AppOpsManager.OP_START_FOREGROUND, r.appInfo.uid, r.packageName, null);
            mAm.mHandler.removeMessages(
                    ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG, r);
            if (r.app != null) {
                Message msg = mAm.mHandler.obtainMessage(
                        ActivityManagerService.SERVICE_FOREGROUND_CRASH_MSG);
                SomeArgs args = SomeArgs.obtain();
                args.arg1 = r.app;
                args.arg2 = r.toString();
                args.arg3 = r.getComponentName();

                msg.obj = args;
                mAm.mHandler.sendMessage(msg);
            }
        }

如果r.fgRequired 是true,那么就会出现这个crash。 所以 r.fgRequired 是什么逻辑?在此之前,先看一下 Service 的 startForeground() 到底干了啥?老规矩,看代码。通过ActivityManagerService最终跳到我们的老朋友 ActiveService 中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
    @GuardedBy("mAm")
    public void setServiceForegroundLocked(ComponentName className, IBinder token,
            int id, Notification notification, int flags, int foregroundServiceType) {
        final int userId = UserHandle.getCallingUserId();
        final long origId = Binder.clearCallingIdentity();
        try {
            ServiceRecord r = findServiceLocked(className, token, userId);
            if (r != null) {
                setServiceForegroundInnerLocked(r, id, notification, flags, foregroundServiceType);
            }
        } finally {
            Binder.restoreCallingIdentity(origId);
        }
    }

继续 setServiceForegroundInnerLocked:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@GuardedBy("mAm")
    private void setServiceForegroundInnerLocked(final ServiceRecord r, int id,
            Notification notification, int flags, int foregroundServiceType) {
//...
      if (r.fgRequired) {
                if (DEBUG_SERVICE || DEBUG_BACKGROUND_CHECK) {
                    Slog.i(TAG, "Service called startForeground() as required: " + r);
                }
                r.fgRequired = false;
                r.fgWaiting = false;
                alreadyStartedOp = stopProcStatsOp = true;
                mAm.mHandler.removeMessages(
                        ActivityManagerService.SERVICE_FOREGROUND_TIMEOUT_MSG, r);
            }
//...
}

也就是 r.fgRequired = false; 这个代码没有执行,进而导致后面的Crash。可是为什么Service的onCreate()方法没有执行呢?这和Service的stop又有什么关系?难道说,我们的Servcie还没有执行onCreate方法就被stop了?思来想去也就只有这一个选项了!

由于Servcie调用onCreate方法是跨进程调用的,确实存在这种可能。写个代码验证下,在startForegroundService后立刻调用stopService,果然预期中的crash出现了:

1
2
3
4
5
6
7
8
9
10
11
12
android.app.RemoteServiceException$ForegroundServiceDidNotStartInTimeException: Context.startForegroundService() did not then call Service.startForeground(): ServiceRecord{1e9c4ea u0 com.example.test/.MyService}
    at android.app.ActivityThread.generateForegroundServiceDidNotStartInTimeException(ActivityThread.java:2042)
    at android.app.ActivityThread.throwRemoteServiceException(ActivityThread.java:2013)
    at android.app.ActivityThread.-$$Nest$mthrowRemoteServiceException(Unknown Source:0)
    at android.app.ActivityThread$H.handleMessage(ActivityThread.java:2282)
    at android.os.Handler.dispatchMessage(Handler.java:106)
    at android.os.Looper.loopOnce(Looper.java:201)
    at android.os.Looper.loop(Looper.java:288)
    at android.app.ActivityThread.main(ActivityThread.java:8049)
    at java.lang.reflect.Method.invoke(Native Method)
    at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:548)
    at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:942)

对应代码:

1
2
 startForegroundService(Intent(this@MainActivity,MyService::class.java))
 stopService(Intent(this@MainActivity,MyService::class.java))

为什么会出现先后调用这两个方法呢?原来App做了保活,应用退到后台前会启动一个前台Service,防止应用在后台被kill。这个前后台监听是通过Application的ActivityLifecycleCallback实现的。也就是说可见页面小于0,就会startForegroundService,大于0就会stopService。问题是应用换皮肤的方案就是通过重建应用完成的,那么在系统切换黑/白模式的时候,应用会重建,相当于经历了一次快速前后台切换,导致Crash。

那么如何解决呢?要通过Service的stopSelf()进行Service销毁。具体代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
    override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int {
        when (intent?.action) {
            ACTION_UPDATE_INFO -> {
              
            }

            ACTION_STOP_SERVICE -> {
                stopSelf()
            }
        }

        return START_NOT_STICKY
    }

调用的地方:

1
2
3
val intent = Intent(this, BackgroundKeepService::class.java)
intent.action = BackgroundKeepService.ACTION_STOP_SERVICE
startService(intent)

至此,完成了Crash的分析、复现和验证。线上也没有相关Crash。

总结:这是AOSP对startForeground的一种限制,如果业务逻辑无法绕开短时间内启动暂停Service,上文提到的解决方案可以规避Crash问题。

转载自:https://www.jianshu.com/p/d974a7ff6cf8

本文由作者按照 CC BY 4.0 进行授权