This is generally known as FSM or finite state machine parsing. I use this technique to parse an XML weather stream and then once I find the part I am interested in, I use the C function strtok() to find the boundaries between things, in my case I use the double-quote character.
OK, I am going to post this, but this is not the most beautiful code I have ever written--I have been meaning the go back and clean this up.
The serialEvent function consumes one byte out of the client buffer and uses a series of boolean flags to know where it is in stream, either tag or data--the title flag for future use. When it finds something that starts with "<yweather:forecast ", it gathers a line of data for the strtok() parsing part. I use under 200 bytes this way. The things called ptr are just indexes really--not the best name.
const char startMatch[] = "<yweather:forecast ";
const char titleStart[] = "<title>";
const char titleEnd[] = {'<', '/','\0'};
void serialEvent() {
char inChar = myTCP.read();
if (tagFlag==false && dataFlag==false && inChar == startMatch[matchPtr]) {
tagFlag = true;
dataFlag = false;
titleFlag = false;
matchPtr++;
} else if (tagFlag==true && inChar == startMatch[matchPtr]) {
matchPtr++;
if (matchPtr == strlen(startMatch)) { //done with tag, start data
clearStr(dataStr);
dataPtr = 0;
dataFlag = true;
tagFlag = false;
titleFlag = false;
matchPtr = 0;
}
} else if (tagFlag == true) {
matchPtr = 0;
tagFlag = false;
if (inChar == startMatch[matchPtr]) {
tagFlag = true;
dataFlag = false;
matchPtr++;
}
} else if (dataFlag==true && ( (inChar==char(10)) || (inChar==char(12)) ) ) { // carriage-return or line-feed
dataStr[dataPtr] = '\0'; //null term the string
parseForecast(); // call the next parse step
} else if (dataFlag == true) {
dataStr[dataPtr] = inChar; // store data away
if (dataPtr < MAX_DATA_STR_LEN-2) {
dataPtr++;
}
}
}
Here's a link to an older post I made about about the parseForecast() part:
I know this is a bit unclear but I hope it gets the ideas flowing.