BLE. discoverAllCharacteristics() hangs

Yes, it seems to be an intentional mechanism to preserve battery or prevent an unauthenticated device staying connected longer than it needed to carry out the authentication. Both the smart locks I am working on have this same mechanism. The other one the characteristic discovery is basically instant though.

Here’s a slight modification of my program. The lock disconnects the master device after 30 seconds:

app-connect-and-wait.ino
#define BLE_WIRING_DEBUG_ENABLED 1
#include "Particle.h"

SYSTEM_MODE(MANUAL);

SerialLogHandler logHandler(LOG_LEVEL_ALL);
BleScanResult targetScanResult;
bool found = false;
auto targetDeviceMAC = "EC:F3:FA:F0:CF:DD";

void scanResultCallback(const BleScanResult* scanResult, void *context) {
	Log.info("Scan result %s", scanResult->address.toString().c_str());
	if (scanResult->address.toString() == "EC:F3:FA:F0:CF:DD") {
		targetScanResult = *scanResult;  // Copy
		found = true;
		BLE.stopScanning();
	}
}

void setup() {
	// Allow time for serial monitor to connect
	delay(700);

	while (!found) {
		Log.info("Starting scan looking for %s", targetDeviceMAC);
		BLE.scan(scanResultCallback, NULL);
	}

	BlePeerDevice device;
	while (!BLE.connected()) {
		Log.info("Connecting");
		int connectedDuration = 0;
		device = BLE.connect(targetScanResult.address, false);
		if (!device.connected())
			continue;

		Log.info("Device connected");
		delay(500);
		connectedDuration += 500;

		Log.info("Discover services...");
		auto services = device.discoverAllServices();
		Log.info("%d services discovered:", services.size());
		for (auto &&s : services) {
			Log.info(s.UUID().toString().c_str());
		}
		if (!device.connected())  // Incase remote disconnects us
			continue;

		BleService mainService;
		bool gotMainService = device.getServiceByUUID(mainService, BleUuid("FFF0"));
		Log.info("Got service? %s. UUID = %s", gotMainService ? "True" : "False", mainService.UUID().toString().c_str());
		if (!BLE.connected() || !gotMainService)
			continue;

		while (BLE.connected()) {
			Log.info("Connected for approx %dms", connectedDuration);
			delay(500);
			connectedDuration += 500;
		}
		Log.info("No longer connected");

		Log.info("Exiting");
		return;
	}

}

void loop() {
	delay(1000);
}
app-connect-and-wait.ino Output
Serial monitor opened successfully:
0000000657 [sys.power] TRACE: Battery state CHARGED -> DISCONNECTED
0000000842 [app] INFO: Starting scan looking for EC:F3:FA:F0:CF:DD
0000000843 [hal.ble] TRACE: hal_ble_stack_init().
0000000843 [hal.ble] TRACE: hal_ble_set_callback_on_periph_link_events().
0000000843 [hal.ble] TRACE: hal_ble_gap_start_scan().
0000000844 [hal.ble] TRACE: | interval(ms)   window(ms)   timeout(ms) |
0000000844 [hal.ble] TRACE:   160*0.625        80*0.625      5000
0000000968 [app] INFO: Scan result 5F:07:F1:36:C8:C4
0000001057 [app] INFO: Scan result F7:66:FC:2F:9F:FE
0000001090 [app] INFO: Scan result EC:F3:FA:F0:CF:DD
0000001090 [hal.ble] TRACE: hal_ble_gap_stop_scan().
0000001091 [app] INFO: Connecting
0000001091 [hal.ble] TRACE: hal_ble_gap_connect().
0000001596 [hal.ble] TRACE: BLE GAP event: connected.
0000001596 [hal.ble] TRACE: BLE role: 2, connection handle: 0      
0000001597 [hal.ble] TRACE: | interval(ms)  latency  timeout(ms) | 
0000001597 [hal.ble] TRACE:   36*1.25          0       500*10      
0000001598 [hal.ble] TRACE: Attempts to exchange ATT_MTU if needed.
0000001598 [wiring.ble] TRACE: New peripheral is connected.        
0000001599 [app] INFO: Device connected
0000001642 [hal.ble] TRACE: BLE GATT Server event: exchange ATT MTU request: 83, desired: 247
0000001643 [hal.ble] TRACE: Effective ATT MTU: 83.
0000002099 [app] INFO: Discover services...
0000002099 [wiring.ble] TRACE: Start discovering services.
0000002100 [hal.ble] TRACE: hal_ble_gatt_client_discover_all_services().
0000002183 [hal.ble] TRACE: BLE GATTC event: services discovered.
0000002272 [hal.ble] TRACE: BLE GATTC event: services discovered.
0000002407 [hal.ble] TRACE: BLE GATTC event: services discovered.
0000002408 [app] INFO: 4 services discovered:
0000002408 [app] INFO: 1800
0000002409 [app] INFO: 1801
0000002409 [app] INFO: 180A
0000002409 [app] INFO: FFF0
0000002410 [app] INFO: Got service? True. UUID = FFF0
0000002410 [app] INFO: Connected for approx 500ms
0000002910 [app] INFO: Connected for approx 1000ms
0000003411 [app] INFO: Connected for approx 1500ms
0000003911 [app] INFO: Connected for approx 2000ms
0000004411 [app] INFO: Connected for approx 2500ms
0000004912 [app] INFO: Connected for approx 3000ms
0000005412 [app] INFO: Connected for approx 3500ms
0000005602 [hal.ble] TRACE: BLE GAP event: connection parameter update request.
0000005912 [app] INFO: Connected for approx 4000ms
0000005917 [hal.ble] TRACE: BLE GAP event: connection parameters updated.
0000005918 [hal.ble] TRACE: | interval(ms)  latency  timeout(ms) |
0000005918 [hal.ble] TRACE:   8*1.25          0       300*10
0000006413 [app] INFO: Connected for approx 4500ms
0000006913 [app] INFO: Connected for approx 5000ms
0000007413 [app] INFO: Connected for approx 5500ms
0000007913 [app] INFO: Connected for approx 6000ms
0000008414 [app] INFO: Connected for approx 6500ms
0000008914 [app] INFO: Connected for approx 7000ms
0000009415 [app] INFO: Connected for approx 7500ms
0000009915 [app] INFO: Connected for approx 8000ms
0000010415 [app] INFO: Connected for approx 8500ms
0000010915 [app] INFO: Connected for approx 9000ms
0000011416 [app] INFO: Connected for approx 9500ms
0000011916 [app] INFO: Connected for approx 10000ms
0000012416 [app] INFO: Connected for approx 10500ms
0000012917 [app] INFO: Connected for approx 11000ms
0000013417 [app] INFO: Connected for approx 11500ms
0000013917 [app] INFO: Connected for approx 12000ms
0000014418 [app] INFO: Connected for approx 12500ms
0000014918 [app] INFO: Connected for approx 13000ms
0000015418 [app] INFO: Connected for approx 13500ms
0000015919 [app] INFO: Connected for approx 14000ms
0000016419 [app] INFO: Connected for approx 14500ms
0000016919 [app] INFO: Connected for approx 15000ms
0000017420 [app] INFO: Connected for approx 15500ms
0000017920 [app] INFO: Connected for approx 16000ms
0000018420 [app] INFO: Connected for approx 16500ms
0000018921 [app] INFO: Connected for approx 17000ms
0000019421 [app] INFO: Connected for approx 17500ms
0000019921 [app] INFO: Connected for approx 18000ms
0000020422 [app] INFO: Connected for approx 18500ms
0000020922 [app] INFO: Connected for approx 19000ms
0000021422 [app] INFO: Connected for approx 19500ms
0000021923 [app] INFO: Connected for approx 20000ms
0000022423 [app] INFO: Connected for approx 20500ms
0000022923 [app] INFO: Connected for approx 21000ms
0000023424 [app] INFO: Connected for approx 21500ms
0000023924 [app] INFO: Connected for approx 22000ms
0000024424 [app] INFO: Connected for approx 22500ms
0000024925 [app] INFO: Connected for approx 23000ms
0000025425 [app] INFO: Connected for approx 23500ms
0000025925 [app] INFO: Connected for approx 24000ms
0000026426 [app] INFO: Connected for approx 24500ms
0000026926 [app] INFO: Connected for approx 25000ms
0000027426 [app] INFO: Connected for approx 25500ms
0000027927 [app] INFO: Connected for approx 26000ms
0000028427 [app] INFO: Connected for approx 26500ms
0000028927 [app] INFO: Connected for approx 27000ms
0000029428 [app] INFO: Connected for approx 27500ms
0000029928 [app] INFO: Connected for approx 28000ms
0000030428 [app] INFO: Connected for approx 28500ms
0000030929 [app] INFO: Connected for approx 29000ms
0000031429 [app] INFO: Connected for approx 29500ms
0000031929 [app] INFO: Connected for approx 30000ms
0000031972 [hal.ble] TRACE: BLE GAP event: disconnected.
0000031973 [wiring.ble] TRACE: Disconnected by remote device.
0000032430 [app] INFO: No longer connected
0000032430 [app] INFO: Exiting

I also just tested without discovering services either i.e. just connect and wait and the result is the same, disconnected by the remote after exactly 30s. My other lock disconnects after 8s in both situations, but I am able to discover the characteristics and as long as I do that and then complete the locks auth procedure within the 8s it will allow the Boron to stay connected.

Morning,

I’ve been reviewing the code again and I’m convinced it is a failure to give up the discovery semaphore as per my original few posts. Looking at the processSvcDiscEventFromThread method I see that if the service has a 128-bit UUID then the method actually returns SYSTEM_ERROR_NONE without releasing the semaphore (it wouldn’t do this for a shortened UUID). I can’t explain this behaviour, unless there is a second call in to processSvcDiscEventFromThread which gives up the semaphore due to the extended UUID?

I’m not near my setup right now, so I can’t test, but it would be interesting to see what happens if we comment out the return statement in that if so it continues to process the callbacks and gives the semaphore back in the case of a 128-bit UUID.

Mark

@delaneyb it appears you have one working device and one failing? If so could you share the service UUIDs of the two different devices? It might help to clarify if this is related to 32-bit / 128-bit UUIDs.

Thanks

Mark

Here’s the first model (No problems with discoverAllCharacteristics):

And the second model from a different manufacturer (discoverAllCharacteristics hangs):

Here is how they look in chrome://bluetooth-internals:

First model (No problems with discoverAllCharacteristics)

Second model from a different manufacturer (discoverAllCharacteristics hangs)

Another note, discoverAllCharacteristics() fails regardless of which service I make the mainService, I have tried all four on the lock that hangs. They all contain at least 1 characteristic but all of them hang until the lock disconnects the Boron.

Thanks @delaneyb - that is the exact opposite of what I expected, the first device has some services and characteristics that are 128-bit and the second device has only 32-bit services and characteristics. I definitely think this might have something to do with the problem - I’ll see if I can try out some code later.

Mark

@mrhornsby I know what you were wondering about. The code below

    if (readServiceUUID128IfNeeded()) {
        return SYSTEM_ERROR_NONE;
    }

in the processSvcDiscEventFromThread will initiate a procedure reading the 128-bits UUID and then processDataReadEventFromThread will be invoked on read the UUID successfully followed by calling processSvcDiscEventFromThread again inside processDataReadEventFromThread . So discovering characteristics with 128-bits UUID won’t hang the discovery procedure.

@delaneyb I’m assuming that the lock require a secure connection, otherwise, it disconnects from BLE central actively. Could you try using the LightBlue mobile app to connect to the lock and see if it can discover the characteristics and the connection is retained?

@ielec The lock’s SDK explains the steps to “authenticate” with it via an exchange using 1 write and 1 notify characteristic. If you do not complete these authentication steps (which obviously requires discovering the characteristics first), then the lock will disconnect the central after 30 seconds.

LightBlue has no trouble connecting to the lock and discovering it’s characteristics. After 30 seconds it is disconnected:

Screenshots



Just out of curiosity, why do you need to discover the characteristics? Presumably you must know the characteristic UUIDs so there’s no reason to discover them, just interact with them directly. You can do that immediately after connecting to the peer, no need to discover anything.

2 Likes

Unless they have mutating characteristic UUIDs and only the service UUID stays the same.
I’ve had some issues with a soft-device doing just that.

1 Like

For now, without discovering the characteristic user cannot construct a peer characteristic object manually to interact with the peer device. The basic requirements for a peer characteristic are the connection handle and the attribute handles, but both of them are not exposed to user application.

I was just wondering why it takes this long time (more than 30s) to perform the discovery procedure.

@rickkas7 @ielec It would indeed be very useful if I could just construct the characteristics directly, both the service and characteristic UUIDs are fixed and do not change on these smart locks. I always assumed there was some behind the scenes negotiation going on or something when characteristic discovery was performed.

Hi there,

I just got the lock that cause the discovery issue on the Central side. I’m using an Argon and running a minimum user application to connect to the lock followed by discovery its services and characteristics. It worked perfect. See the log below:

0000007444 [app] INFO: Target device found.
0000008444 [app] INFO: Connecting to target device.
0000008454 [wiring.ble] TRACE: New peripheral is connected.
0000008455 [wiring.ble] TRACE: Start discovering services.
0000008546 [hal.ble] INFO: Service discovered, handle: 1 - 9
0000008547 [hal.ble] INFO: Service discovered, handle: 12 - 15
0000008547 [hal.ble] INFO: Service discovered, handle: 16 - 30
0000008636 [hal.ble] INFO: Service discovered, handle: 31 - 41
0000008726 [wiring.ble] TRACE: Start discovering characteristics.
0000008816 [hal.ble] INFO: Characteristic discovered, handle: 3
0000008817 [hal.ble] INFO: Characteristic discovered, handle: 5
0000008817 [hal.ble] INFO: Characteristic discovered, handle: 7
0000008906 [hal.ble] INFO: Characteristic discovered, handle: 9
0000009086 [hal.ble] INFO: Characteristic discovered, handle: 14
0000009401 [hal.ble] INFO: Characteristic discovered, handle: 18
0000009401 [hal.ble] INFO: Characteristic discovered, handle: 20
0000009402 [hal.ble] INFO: Characteristic discovered, handle: 22
0000009491 [hal.ble] INFO: Characteristic discovered, handle: 24
0000009491 [hal.ble] INFO: Characteristic discovered, handle: 26
0000009492 [hal.ble] INFO: Characteristic discovered, handle: 28
0000009581 [hal.ble] INFO: Characteristic discovered, handle: 30
0000014756 [wiring.ble] TRACE: Disconnected by remote device.

As you can see, the discovery procedure is performed successfully within 7 seconds. I’m using the BLE.connect() API with the automatic parameter setting to true or omitted (default is true).

@delaneyb You can try the automatic discovery procedure so that it performs the procedure on connected. In this case, the BLE.connect() would block until all characteristics are discovered. And then you can retrieve the characteristic with UUID peer.getCharacteristicByUUID() followed by performing the authentication procedure for the lock.

Here is my test application:

#include "application.h"

SYSTEM_MODE(MANUAL);

Serial1LogHandler logHandler(115200, LOG_LEVEL_ALL);

BleScanResult targetScanResult;
bool found = false;
auto targetDevice = "EC:F3:F6:F1:D2:E4";
BlePeerDevice peer;

void scanResultCallback(const BleScanResult* scanResult, void *context) {
    Log.info(" -------- MAC: %s | RSSI: %ddBm --------", scanResult->address.toString().c_str(), scanResult->rssi);
    if (scanResult->address.toString() == targetDevice) {
        Log.info("Target device found.");
        targetScanResult = *scanResult;  // Copy
        found = true;
        BLE.stopScanning();
    }
}

void setup() {
    delay(1000);
    Log.info("Application started.");
}

void loop() {
    if (!BLE.connected()) {
        if (!found) {
            Log.info("Starting scan looking target device");
            BLE.scan(scanResultCallback, NULL);
            delay(1000);
        } else {
            found = false;
            Log.info("Connecting to target device.");
            peer = BLE.connect(targetScanResult.address);
            if (peer.connected()) {
                Log.info("BLE connected.");
            }
        }
    }
}

I also made some modifications in the BLE HAL to print more logs:

diff --git a/hal/src/nRF52840/ble_hal.cpp b/hal/src/nRF52840/ble_hal.cpp
index d0d4b6f09..92e24f98f 100644
--- a/hal/src/nRF52840/ble_hal.cpp
+++ b/hal/src/nRF52840/ble_hal.cpp
@@ -2899,6 +2899,7 @@ int BleObject::GattClient::processSvcDiscEventFromThread(const ble_evt_t* event)
         const ble_gattc_evt_prim_srvc_disc_rsp_t& primSvcDiscRsp = event->evt.gattc_evt.params.prim_srvc_disc_rsp;
         if (event->evt.gattc_evt.gatt_status == BLE_GATT_STATUS_SUCCESS) {
             for (uint8_t i = 0; i < primSvcDiscRsp.count; i++) {
+                LOG(INFO, "Service discovered, handle: %d - %d", primSvcDiscRsp.services[i].handle_range.start_handle, primSvcDiscRsp.services[i].handle_range.end_handle);
                 hal_ble_svc_t service = {};
                 service.version = BLE_API_VERSION;
                 service.size = sizeof(hal_ble_svc_t);
@@ -2961,6 +2962,7 @@ int BleObject::GattClient::processCharDiscEventFromThread(const ble_evt_t* event
             const ble_gattc_evt_char_disc_rsp_t& charDiscRsp = event->evt.gattc_evt.params.char_disc_rsp;
             if (event->evt.gattc_evt.gatt_status == BLE_GATT_STATUS_SUCCESS) {
                 for (uint8_t i = 0; i < charDiscRsp.count; i++) {
+                    LOG(INFO, "Characteristic discovered, handle: %d", charDiscRsp.chars[i].handle_value);
                     hal_ble_char_t characteristic = {};
                     characteristic.version = BLE_API_VERSION;
                     characteristic.size = sizeof(hal_ble_char_t);

If you’re using a Boron, you need to change the log level to ERROR in BLE HAL, otherwise, the log level will be overridden by build option.

Cheers!

Guohui

2 Likes

Hmm… The Log.info("BLE connected."); didn’t get hit though :thinking: The connection is disconnected by the lock before exiting BLE.connect(). Digging…

1 Like

Hi @delaneyb,

After digging deeper, the discovery procedure hangs because that the connection is disconnected by the lock, which results in the supervision timeout on the Argon. I set the connection supervision timeout to be 15s on the Argon and then connected to the lock. Once the discovery procedure hangs, I can use the LightBlue App to connect to the lock, which means that the lock is in the advertising state. Besides, the disconnected event is generated after 15s on the Argon.

I have no idea why the lock disconnects from the Argon during the discovery procedure, as I cannot get any information from the lock. Maybe you can collaborate with the lock vendor and see what’s going on on the lock side.

Cheers!
Guohui

Hi @ielec

Sorry for the radio silence, new app update in the works for the last few days.

I have been doing some digging yesterday and today as well and found some interesting things. I edited spark_wiring_ble.cpp:1289 (BleDiscoveryDelegator::discoverAllCharacteristics()) as follows to make sure the board only looks for characteristics on the service service FFF0 which I need:

for (auto& service : peer.impl()->services()) {
    if (service.UUID().toString() == "FFF0") {
        hal_ble_svc_t halService;
        halService.size = sizeof(hal_ble_svc_t);
        halService.start_handle = service.impl()->startHandle();
        halService.end_handle = service.impl()->endHandle();
        LOG(INFO, "spark_wiring_ble BleDiscoveryDelegator discovering characteristics for service %s, start_handle %d, end_handle %d", service.UUID().toString().c_str(), halService.start_handle, halService.end_handle);
        CHECK(hal_ble_gatt_client_discover_characteristics(peer.impl()->connHandle(), &halService, onCharacteristicsDiscovered, &peer, nullptr));
    } else {
        LOG(INFO, "Skipping discovery of characteristics for service %s", service.UUID().toString().c_str());
    }
}

After doing this, everything is working great, the call to discoverAllCharacteristics() returned a vector with just the characteristics on service FFF0 which I need, I successfully subscribed for notifications on FFF1, wrote to FFF2 and my onDataReceived callback was called.

I’m not sure I can use this fix once the Boron is out in the field though, will this have implications for when I want to flash updates OTA - like having to transmit the entire deviceOS over 2G/3G?

It seems to be discovery of characteristics on the 180A service specifically that causes problems…

1 Like

The spark_wiring_ble.cpp is compiled along with the user application. So it won't be affected if you OTA update the Device OS. But you should be careful about building your user application, which means that you should always compile the spark_wiring_ble.cpp locally with the modification you made. In another word, you cannot build your user application on Web IDE and flash it OTA, because the spark_wiring_ble.cpp being built on Web IDE don't introduce the modification.

I still think that we should figure out the reason why it causes the disconnection on the lock side. Either only discovering the characteristics under 0xFFF0 or discovering other characteristics repeatedly works for me, but not discovering all characteristics.

Hi,

I’ve just added a whole load of trace log messages in to the BLE code to try and figure out what is going on here. It appears that the code is getting stuck in an infinite loop in relation to the 128-bit uuid as I thought.

The function readCharacteristicUUID128IfNeeded() never returns false in relation to my particular device (Lego Technic Hub) and I assume also @delaneyb’s lock.

I’ll add some more debugging inside that method to see what data it is processing.

Mark

Hi @ielec,

I’m wondering about the call to sd_ble_gattc_read in readCharacteristicUUID128IfNeeded():

sd_ble_gattc_read(currDiscConnHandle_, characteristic.charHandles.decl_handle, 0) 

Shouldn’t the third argument be some kind of incrementing counter rather than zero?

Mark

The third argument is the offset of the attribute value to be read. For that case, we're reading the characteristics' 128-bits UUID, the offset should be zero. We don't intend to read the 128-bits UUID partially.