BLE. discoverAllCharacteristics() hangs

Hi all,

I’m trying to get my Argon speaking to a Lego BLE Hub. I can discover the device and call connect(), however I found this would hang and then cause an assertion error (10 red flashes). After investigating the automatic flag I tried setting it to false and then the call to connect() succeeded, but I can’t get the characteristics by calling getCharacteristicByUUID(). Trying to call discoverAllCharacteristics() directly also hangs. Here is what I got from the trace logs:

0000016440 [system] ERROR: Failed to load session data from persistent storage
0000053237 [app] INFO: Pre connect
0000053345 [wiring.ble] TRACE: New peripheral is connected.
0000053346 [wiring.ble] TRACE: Start discovering services.
0000053885 [wiring.ble] TRACE: Start discovering characteristics.

I’ll be looking through the device-os source code to see if I can spot the issue, but if anyone has any advice it would be much appreciated!

Just to add - this is on firmware 1.4.2.

Thanks

Mark

Hi,

I’m still looking in to this issue and now have a particle debugger so I managed to get trace logs out of the hal.ble logger:

Serial monitor opened successfully:
0002026836 [hal.ble] TRACE: hal_ble_gap_start_scan().
0002026837 [hal.ble] TRACE: | interval(ms)   window(ms)   timeout(ms) |
0002026838 [hal.ble] TRACE:   160*0.625        80*0.625      5000
0002031792 [hal.ble] TRACE: hal_ble_gap_start_scan().
0002031793 [hal.ble] TRACE: | interval(ms)   window(ms)   timeout(ms) |
0002031794 [hal.ble] TRACE:   160*0.625        80*0.625      5000
0002036749 [hal.ble] TRACE: hal_ble_gap_start_scan().
0002036750 [hal.ble] TRACE: | interval(ms)   window(ms)   timeout(ms) |
0002036751 [hal.ble] TRACE:   160*0.625        80*0.625      5000
0002041610 [ot] INFO: -MLE-----: Send Advertisement (ff02:0:0:0:0:0:0:1)
0002041706 [app] INFO: Pre connect
0002041707 [hal.ble] TRACE: hal_ble_gap_connect().
0002041836 [hal.ble] TRACE: BLE role: 2, connection handle: 0
0002041837 [hal.ble] TRACE: | interval(ms)  latency  timeout(ms) |
0002041838 [hal.ble] TRACE:   36*1.25          0       500*10
0002041839 [hal.ble] TRACE: Attempts to exchange ATT_MTU if needed.
0002041840 [wiring.ble] TRACE: New peripheral is connected.
0002041841 [app] INFO: Post connect
0002041842 [app] INFO: Getting services
0002041843 [wiring.ble] TRACE: Start discovering services.
0002041844 [hal.ble] TRACE: hal_ble_gatt_client_discover_all_services().
0002042376 [app] INFO: Found 3 services
0002042377 [wiring.ble] TRACE: Start discovering characteristics.
0002042378 [hal.ble] TRACE: hal_ble_gatt_client_discover_characteristics().
0002042641 [hal.ble] TRACE: Request to change ATT_MTU from 23 to 247
0002042642 [hal.ble] TRACE: sd_ble_gattc_exchange_mtu_request() failed: 17
0002042646 [hal.ble] TRACE: hal_ble_gatt_client_discover_characteristics().
0002043141 [hal.ble] TRACE: hal_ble_gatt_client_discover_characteristics().
0002073346 [hal.ble] PANIC: AssertionFailure false^CSerial connection closed.

I’ll be looking in to the source code again now I have more details, but if anyone has any advice or is seeing a similar issue then please let me know.

Thanks

Mark

Are you by any chance calling that from a BLE.scan() callback function?
I had a similar issue when doing that. Moving BLE.connect() and anything thereafter out of the callback and running that from loop() did solve the issue for me.

Can you post your code?
What are the characteristics UUIDs you are intending to use?

Hi @ScruffR,

Thanks for your reply. No, I’m using the non-callback version of scan. Here is my code:

const size_t SCAN_RESULT_MAX = 30;
BleScanResult scanResults[SCAN_RESULT_MAX];
BleUuid foundServiceUuids[SCAN_RESULT_MAX];
BleUuid serviceUuid("00001623-1212-efde-1623-785feabcd123");
BleUuid characteristicUuid("00001624-1212-efde-1623-785feabcd123");
const size_t CHAR_RESULT_MAX = 1;
BleCharacteristic characteristic[CHAR_RESULT_MAX];
BlePeerDevice peer;

SerialLogHandler logHandler(LOG_LEVEL_INFO, {
  { "app", LOG_LEVEL_INFO },
  { "wiring.ble", LOG_LEVEL_TRACE },
  { "hal.ble", LOG_LEVEL_TRACE }
});

void onDataReceived(const uint8_t* data, size_t len, const BlePeerDevice& peer, void* context) {
  Log.info("Received: %d", len);
}

void setup() {
}

void loop() {
  if (!BLE.connected()) {
    int count = BLE.scan(scanResults, SCAN_RESULT_MAX);

    for (int idx = 0; idx < count; idx++) {
      int count2 = scanResults[idx].advertisingData.serviceUUID(foundServiceUuids, SCAN_RESULT_MAX);
      
      for (int idx2 = 0; idx2 < count2; idx2++) {
        if (serviceUuid == foundServiceUuids[idx2]) {
          Log.info("Pre connect");

          peer = BLE.connect(scanResults[idx].address, false);
          
          Log.info("Post connect");
          
          if (peer.connected()) {
            const Vector<BleService> services = peer.discoverAllServices();

            Log.info("Found %d services", services.size());
            
            const ssize_t count = peer.discoverAllCharacteristics(characteristic, CHAR_RESULT_MAX);

            Log.info("Found %d characteristics", count);
          }
        }
      }
    }
  } else {
    delay(5000);
  }
}

I’m trying to connect to a lego Control+ hub. I can successfully connect to this characteristic using command line tools on a raspberry pi. (UUIDs are included in the code above).

The panic occurs either if I set automatic to true or if I try to discover characteristics manually (service discovery appears to complete successfully and returns good data).

Thanks for your help

Mark

Have you tried the vector version of discoverAllCharacteristics()?
Or have you tried to increase CHAR_RESULT_MAX?

In my code I also use the array version of discoverAllCharacteristics() but with a slightly oversized array (for “superstition” :blush:).

Thanks @ScruffR - I tried the vector version and with CHAR_RESULT_MAX set to 100 and neither solved the issue.

If it gives any further clues the device panics almost exactly 30 seconds after calling discoverAllCharacteristics().

Just a quick update - I’ve got further with my debugger and it appears that the Argon is panicking because it can’t acquire the BLE discovery semaphore (line 2721 in hal/src/nRF52840/ble_hal.cpp):

    CHECK_NRF_RETURN(ret, nrf_system_error(ret));
    isDiscovering_ = true;
    if (os_semaphore_take(discoverySemaphore_, BLE_OPERATION_TIMEOUT_MS, false)) {
        SPARK_ASSERT(false);
        return SYSTEM_ERROR_TIMEOUT;
    }
    return SYSTEM_ERROR_NONE;

I’ll continue trying to figure out why this call to os_semaphore_take is timing out.

Mark

1 Like

Hi,

So I’m not certain, but what appears to be happening is that the discoverySemaphore_ is never “given” after being taken. As the semaphore is initialised with a count of 1 then os_semaphore_take can only be called once before any further calls to os_semaphore_take will block.

Is there a call to os_semaphore_give missing in resetDiscoveryState? (I’m following the pattern that we take when setting discovering_ to true and give when setting discovering_ to false.)

Thanks

Mark

That’s an interesting finde. Maybe @avtolstoy can comment on that.

But it seems to be something specific to the targeted device since I cannot reproduce that behaviour with a mock-up device exposing the UUIDs you got there in your code.

Hi @ScruffR and @avtolstoy,

I added the os_semaphore_give call in to the resetDiscoveryState() method and my Argon no longer hangs, so it certainly seems like this might be a valid fix? I can’t see anywhere in the code where that discovery semaphore is released, so I’m almost certain something isn’t right …

Thanks

Mark

3 Likes

@ScruffR I can confirm this is specific to the targeted device. I have 2 Bluetooth smart locks from different manufacturers I have been able to operate using iOS, Android and Web Bluetooth API’s with no problems.The first model I have just successfully implemented discovery, authentication and unlocking from the Boron. The second one we have deployed in our smart locker network however, the Boron refuses to discover any characteristics on. I have tried every way of calling discoverAllCharacteristics and @mrhornsby s suggestion just makes the call return immediately saying it’s discovered 0 characteristics

Would I be able to express post one of the second smart locks to you or someone else to take a look at?

I have tried 1.5.0, 1.5.2 and now 2.0.0rc1, all of which behave the same.

1 Like

You could, but since I'm in Austria/Europe I'm not sure how practical that would be for you.

1 Like

What’s the best way to contact you for an address? I can’t tell if you work for Particle - I don’t want to place an unnecessary burden on someone who’s just a community member/enthusiast if you’re not able to put a patch into the next deviceOS or whatever’s necessary to get this sorted. I’ve also sent an email to guohui and submitted a complaint via the contact us forum on particles website.

@delaneyb

@ScruffR Is your guy if you want to get this sorted :grinning:

He doesn’t work for Particle but he should be on the payroll for the number of people he helps on here.

3 Likes

You can always shoot me a PM

@delaneyb Could you enable logging in your application and post the log here? If you were compiling your application against the Device OS source code locally, I would suggest you to also enable logging in interrupt service routine by uncommenting line35 in <path_to_device_os>/wiring/src/spark_wiring_logging.cpp, so that you can capture more details on BLE events.

The semaphore is given on services or characteristics discovered, or on connection disconnected if the discovery procedure is in progress. There is a 30s timeout taking the semaphore, if it timeouts, the device asserts failure with 10 of red blinks and the only way to exit the assertion condition is reseting the device itself. So we don't need to give the semaphore in the resetDiscoveryState(). If the semaphore is given successfully, we have no reason to give the semaphore again in resetDiscoveryState().

Kind regards
Guohui

4 Likes

That assertion failure after 30 seconds is exactly what I experienced while trying to discover characteristics of the same type of device (Lego Control+ hub). The connection is established and services are discovered, but not characteristics. I tried on the same device (Xenon) using CircuitPython and it worked. So it’s certainly a software issue.

1 Like

Hi @jaafar, in that case, enabling logging in interrupt service routine is necessary to target the issue.

1 Like
Serial monitor opened successfully:
0000000462 [sys.power] TRACE: Battery state CHARGED -> DISCONNECTED
0000000852 [app] INFO: Starting scan looking for EC:F3:FA:F0:CF:DD
0000000852 [hal.ble] TRACE: hal_ble_stack_init().
0000000853 [hal.ble] TRACE: hal_ble_set_callback_on_periph_link_events().
0000000853 [hal.ble] TRACE: hal_ble_gap_start_scan().
0000000854 [hal.ble] TRACE: | interval(ms)   window(ms)   timeout(ms) |
0000000854 [hal.ble] TRACE:   160*0.625        80*0.625      5000
0000000861 [app] INFO: Scan result 70:5C:11:0A:34:E8
0000000888 [app] INFO: Scan result 08:A7:40:8D:95:47
0000000897 [app] INFO: Scan result 54:23:26:9F:B1:62
0000000903 [app] INFO: Scan result 56:60:A2:55:E7:B8
0000000907 [app] INFO: Scan result D3:96:FC:51:B7:EA
0000001071 [app] INFO: Scan result 55:55:55:87:95:F4
0000001103 [app] INFO: Scan result 4F:8A:7F:3E:DB:22
0000001177 [app] INFO: Scan result F3:00:55:7E:19:35
0000001179 [app] INFO: Scan result EC:F3:FA:F0:CF:DD
0000001179 [hal.ble] TRACE: hal_ble_gap_stop_scan().
0000001180 [app] INFO: Connecting
0000001180 [hal.ble] TRACE: hal_ble_gap_connect().
0000006134 [hal.ble] ERROR: BLE GAP event: Connection timeout
0000006134 [app] INFO: Connecting
0000006134 [hal.ble] TRACE: hal_ble_gap_connect().
0000006174 [hal.ble] TRACE: BLE GAP event: connected.
0000006174 [hal.ble] TRACE: BLE role: 2, connection handle: 0
0000006175 [hal.ble] TRACE: | interval(ms)  latency  timeout(ms) |
0000006176 [hal.ble] TRACE:   36*1.25          0       500*10
0000006176 [hal.ble] TRACE: Attempts to exchange ATT_MTU if needed.
0000006177 [wiring.ble] TRACE: New peripheral is connected.
0000006177 [app] INFO: Device connected
0000006221 [hal.ble] TRACE: BLE GATT Server event: exchange ATT MTU request: 83, desired: 247
0000006221 [hal.ble] TRACE: Effective ATT MTU: 83.
0000006677 [app] INFO: Discover services...
0000006678 [wiring.ble] TRACE: Start discovering services.
0000006678 [hal.ble] TRACE: hal_ble_gatt_client_discover_all_services().
0000006761 [hal.ble] TRACE: BLE GATTC event: services discovered.
0000006851 [hal.ble] TRACE: BLE GATTC event: services discovered.
0000006941 [hal.ble] TRACE: BLE GATTC event: services discovered.
0000006941 [app] INFO: 4 services discovered:
0000006942 [app] INFO: 1800
0000006942 [app] INFO: 1801
0000006942 [app] INFO: 180A
0000006943 [app] INFO: FFF0
0000006943 [app] INFO: Got service? True. UUID = FFF0
0000007443 [app] INFO: Discover characteristics...
0000007444 [wiring.ble] TRACE: Start discovering characteristics.
0000007444 [hal.ble] TRACE: hal_ble_gatt_client_discover_characteristics().
0000007526 [hal.ble] TRACE: BLE GATTC event: characteristics discovered.
0000007616 [hal.ble] TRACE: BLE GATTC event: characteristics discovered.
0000007706 [hal.ble] TRACE: BLE GATTC event: descriptors discovered.
0000007707 [hal.ble] TRACE: hal_ble_gatt_client_discover_characteristics().
0000007796 [hal.ble] TRACE: BLE GATTC event: characteristics discovered.
0000007886 [hal.ble] TRACE: BLE GATTC event: characteristics discovered.
0000007976 [hal.ble] TRACE: BLE GATTC event: descriptors discovered.
0000007976 [hal.ble] TRACE: hal_ble_gatt_client_discover_characteristics().
0000008066 [hal.ble] TRACE: BLE GATTC event: characteristics discovered.
0000008156 [hal.ble] TRACE: BLE GATTC event: characteristics discovered.
0000008246 [hal.ble] TRACE: BLE GATTC event: characteristics discovered.
0000008337 [hal.ble] TRACE: BLE GATTC event: descriptors discovered.
0000008338 [hal.ble] TRACE: hal_ble_gatt_client_discover_characteristics().
0000013421 [hal.ble] TRACE: BLE GAP event: disconnected.
0000013421 [wiring.ble] TRACE: Disconnected by remote device.
0000013422 [app] INFO: Discovered 0 characteristics
0000013422 [app] INFO: Exiting
app.ino
#define BLE_WIRING_DEBUG_ENABLED 1
#include "Particle.h"

SYSTEM_MODE(MANUAL);

SerialLogHandler logHandler(LOG_LEVEL_ALL);
BleScanResult targetScanResult;
bool found = false;
auto targetDeviceMAC = "EC:F3:FA:F0:CF:DD";

void scanResultCallback(const BleScanResult* scanResult, void *context) {
	Log.info("Scan result %s", scanResult->address.toString().c_str());
	if (scanResult->address.toString() == "EC:F3:FA:F0:CF:DD") {
		targetScanResult = *scanResult;  // Copy
		found = true;
		BLE.stopScanning();
	}
}

void setup() {
	// Allow time for serial monitor to connect
	delay(700);

	while (!found) {
		Log.info("Starting scan looking for %s", targetDeviceMAC);
		BLE.scan(scanResultCallback, NULL);
	}

	BlePeerDevice device;
	while (!BLE.connected()) {
		Log.info("Connecting");
		device = BLE.connect(targetScanResult.address, false);
		if (!device.connected())
			continue;

		Log.info("Device connected");
		delay(500);
		
		Log.info("Discover services...");
		auto services = device.discoverAllServices();
		Log.info("%d services discovered:", services.size());
		for (auto &&s : services) {
			Log.info(s.UUID().toString().c_str());
		}
		if (!device.connected())  // Incase remote disconnects us
			continue;

		BleService mainService;
		bool gotMainService = device.getServiceByUUID(mainService, BleUuid("FFF0"));
		Log.info("Got service? %s. UUID = %s", gotMainService ? "True" : "False", mainService.UUID().toString().c_str());
		if (!BLE.connected() || !gotMainService)
			continue;
		delay(500);

		Log.info("Discover characteristics...");
		auto chars = device.discoverAllCharacteristics();
		Log.info("Discovered %d characteristics", chars.size());

		Log.info("Exiting");
		return;
	}

}

void loop() {
	delay(1000);
}

The Boron is always disconnected by the remote device before it can find any of the characteristics. All other devices I have tried this with manage to discover the characteristics on the smart lock before the remote disconnect timeout no problems.

Also, here is an nRF52840 dongle connecting and enumerating the same devices characteristics in nRF connect, so I assume that rules out a hardware issue with the Nordic chip:

Edit: Ran with debugger attached and updated output. Looks like hal_ble_gatt_client_discover_characteristics() is actually discovering characteristics, but continues looking when it should stop and return, and the lock then disconnects the Boron.

2 Likes

Will the Boron be disconnected by the remote device if you don’t initiate the procedure of discovery characteristics?