Photon DNS resolution with long hostnames

I just got my first Photons yesterday, so I’m still a bit of a n00b. I looked around for answers to this and didn’t find any, but I apologize if this is a known issue or I’m doing something wrong…

I am seeing what appears to be a problem with the DNS resolver. I started out trying to connect to the AWS API Gateway using the new-ish HTTPSClient library. It was inexplicably failing, so I started narrowing things down. First I found that TCPClient.connect() was not working. I sniffed the packets coming from the Photon, and I saw that it was sending SYN packets to 0.0.0.0:443, which was clearly wrong. So I simplified my code to simply call WiFi.resolve() and log the result. Sure enough, when I try to resolve the IP address of my API Gateway hostname (randomstr.execute-api.us-east-1.amazonaws.com), WiFi.resolve() returns 0.0.0.0. It works fine in ‘dig’ and everywhere else I tried it.

I did some more experimentation. Not enough to conclusively figure out what’s happening, it appears that for hostnames that have more than two dots (e.g. “www.two.example.com”), the Photon doesn’t even try to resolve it – I see no packets sent to the nameserver at all for those hostnames.

I’m not sure if it’s related, but I also saw a few packets like this, which seems odd:

20:03:21.798918 IP [photonip] > 8.8.8.8: ICMP [photonip] udp port 4097 unreachable, length 36

Any ideas here? Has anyone else seen this issue in the past?

2 Likes

I think I have seen this too but did not make the connection to the hostname having more than two dots! I can resolve “pool.ntp.org” but not “north-america.pool.ntp.org” on a Photon, for instance. I had thought that hostname length was the problem but could not prove it, but the more than two dots seems to fit the pattern I have seen perfectly!

Maybe @mdma can comment if the host name format used elsewhere in WICED could be causing this: I know in the DNS redirector for setup that the format is like { 0x03, “www”, 0x06, “google”, 0x03, “com”, 0x00}. Could that be used internally and be limiting the hostname resolution in wiced_hostname_lookup?

Parsing a char array hostname into that format would be problematic if they work “front-to-back” and stop after three parts to the hostname, for instance. Could you check in the private WICED sources for bugs in this area?

2 Likes

The public interface that is used (which you’ll find in our HAL is wiced_hostname_lookup()). A search online leads me to the full sources here - https://github.com/asmcos/wiced-emw3165/blob/416fee12d45b4ac6291ea0ce6f4f08fdb80b37b9/WICED/network/wiced_tcpip_common.c#L265

This is a wrapper for dns_client_hostname_lookup()

https://github.com/asmcos/wiced-emw3165/blob/416fee12d45b4ac6291ea0ce6f4f08fdb80b37b9/libraries/protocols/DNS/dns.c#L86

Well it definitely does parse into that format (it is the DNS packet format btw)–see lines 762-791. I don’t see the bug by inspection but my feeling is it must there or in other code that assumes only three segments:

uint8_t* dns_write_string(uint8_t* dest, const char* src)
{
uint8_t* segmentLengthPointer;
uint8_t segmentLength;
while (*src != 0)
    {
/* Remember where we need to store the segment length and reset the counter */
        segmentLengthPointer = dest++;
        segmentLength = 0;
/* Copy bytes until '.' or end of string */
while (*src != '.' && *src != 0)
        {
            *dest++ = (uint8_t) *src++;
            ++segmentLength;
        }
/* Store the length of the segment*/
        *segmentLengthPointer = segmentLength;
/* Check if we stopped because of a '.', if so, skip it */
if (*src == '.')
        {
            ++src;
        }
    }
return dest;
}

Are there docs somewhere on how I can modify those core firmware files and run the resulting build on my device? It would make debugging this a lot easier. (Unless I’m reading incorrectly, it appears that Particle Dev still only builds local copies of your own project files, but not the base firmware and libraries).

Hi @sharding

Search the forum for “Photon local” for build instructions but it looks like the bug is most likely in the non-open-source WICED part that is not part of the Particle github. I believe you can sign a license at Broadcom to access this.

There is an off-by-one error in the WICED code, I think, but I have not found anything that would directly implicate the three-dots versus two-dots problem.

1 Like

I don’t think the problem is in the dns_write_string() function, unless there’s something weird happening at a really low level on the real platform (which I think would cause all sorts of other issues). It’s not that complicated, and just to be sure, I pulled it out on its own and ran some tests against it. It seems to behave as designed (compiling it in llvm on OS X):

Testing www.google.com:
 [3] www [6] google [3] com
Testing www.foo.google.com:
 [3] www [3] foo [6] google [3] com
Testing google.com:
 [6] google [3] com
Testing really.stupid.crazy.long.utterly.too.frickin.long.hostname.example.com:
 [6] really [6] stupid [5] crazy [4] long [7] utterly [3] too [7] frickin [4] long [8] hostname [7] example [3] com

Within dns_client_hostname_lookup(), though, I think the problem has to occur prior to line 134 (where it calls wiced_udp_send()), because as far as I can tell the packet is never getting sent at all. Unless the packet being constructed is somehow breaking (or failing inside of) wiced_udp_send().

I think the packet is not being allocated with the correct size. The hostname function replaces the dots with the counts so that has the same byte length, but the packet allocation does not seem to account for the trailing 0x00 that is written into the packet after the name. This 0x00 is sent in the UDP DNS packet, so it is required.

Ah, ok, that could be. I haven’t looked into the memory allocation stuff yet.

Since this is third-party code, does that imply a long turnaround on a fix (i.e. should I be thinking about implementing my own simple resolver in the mean time)?

Particle can edit and build that source, but they cannot currently share it on their github repo. They can and have shared diff’s for their patches, so if you can sign the license and build the whole thing locally, you can fix it.

I think line 121 could be a problem:

/* Create IPv4 query packet */
if ( wiced_packet_create_udp( &socket, (uint16_t) ( sizeof(dns_message_header_t) + sizeof(dns_question_t) + hostname_length ), &packet, (uint8_t**) &iter.header, &available_space ) != WICED_SUCCESS )
            {
goto exit;
            }

It sure looks to me like that should be hostname_length+1 for the trailing 0x00 byte after the name.

1 Like

If I’m looking at the right implementation of wiced_packet_create_udp(), it looks like that second parameter is never even used? I see three implementations of it in the WICED SDK, and none of them use that parameter.

1 Like

The bug has been fixed in develop. The source of the error was str_to_ip() https://github.com/asmcos/wiced-emw3165/blob/master/WICED/internal/wiced_lib.c#L406

It doesn’t detect conversion error for non numeric parts and so any 4-part FQDN was resolved to the IP address 0.0.0.0. The fix is to check the return value from string_to_unsigned and return -1 from str_tp_ip() on conversion error.

It’s quite ironic since string_to_unsigned boasts being better than atoi because of the error handling. :stuck_out_tongue_winking_eye:

2 Likes

Awesome! Thanks @mdma. Now I can stop obsessing over finding this bug and move onto enjoying my weekend :smile:

Hello to all.
I am using Particle Photon, firmware 0.4.7.
I have the same issue. I look that “WiFi.dnsServerIP()” returns “0.0.0.0”. Then I try to switch from “dynamicIP” to “staticIP”, but “WiFi.dnsServerIP()” returns “0.0.0.0” always.

    IPAddress myAddress(192,168,23,41);
    IPAddress submask(255,255,255,0);
    IPAddress gateway(192,168,23,1);
    IPAddress dns(192,168,23,10);
    WiFi.setStaticIP(myAddress, submask, gateway, dns);
    WiFi.useStaticIP();

    delay(3000);
    WiFi.on();
    while(!WiFi.ready()) WiFi.connect();
    Particle.process();
    delay(1000);

    Serial.println("IP CONFIG:");
    Serial.println("- localIP = " + String(WiFi.localIP()));
    Serial.println("- subnetMask = " + String(WiFi.subnetMask()));
    Serial.println("- gatewayIP = " + String(WiFi.gatewayIP()));
    Serial.println("- dnsServerIP = " + String(WiFi.dnsServerIP()));
    Serial.println("- dhcpServerIP = " + String(WiFi.dhcpServerIP()));

Output

IP CONFIG:
- localIP = 192.168.23.41
- subnetMask: 255.255.255.0
- gatewayIP: 192.168.23.1
- dnsServerIP: 0.0.0.0
- dhcpServerIP: 0.0.0.0

I do not understand why dnsServer is 0.0.0.0

How can I use above patch in my project?

Thanks.

EDIT:

With my big surprise, I discovered that Photon solve host name: the code call a server via TCPClient. The issue was in server.
But I do not understand because WiFi.dnsServerIP() returns wrong IP Address

DNS and DHCP server IP are used by the Photon, but not presently available for reporting. (So they work, but you can’t find out what the address is.)

We have this in our backlog.

1 Like

May I humbly request that this be added to the reference documentation? I just wasted 30 minutes troubleshooting the DNS and DHCP calls before finding this old thread.

Sorry to hear of the struggles. The documentation is editable by everyone - feel free to add what you feel is missing to help anyone else in the same boat.