Photon: HTTP get request only getting part of HTML code

I want to scrape the HTML from this website using the Particle Photon:
http://www.espn.com/college-football/game?gameId=400945016

Here is the code that I am running:
#include <HttpClient.h>

 unsigned int nextTime = 0;    // Next time to contact the server
 HttpClient http;

 http_header_t headers[] = {
     { "Accept" , "*/*"},
     { NULL, NULL } // NOTE: Always terminate headers will NULL
 };

 http_request_t request;
 http_response_t response;

 void setup() {
     Serial.begin(9600);
 }

 void loop() {
     if (nextTime > millis()) {
         return;
     }

     Serial.println();
     Serial.println("Application>\tStart of Loop.");
     request.hostname = "www.espn.com";
     request.port = 80;
     request.path = "/college-football/game?gameId=400945016";

     // Get request
     http.get(request, response, headers);
     Serial.print("Application>\tResponse status: ");
     Serial.println(response.status);

     Serial.print("Application>\tHTTP Response Body: ");
     Serial.println(response.body);

     nextTime = millis() + 10000;
 }

If you go to the actual url and look at the page source, you see that it is pretty extensive. However, if with the code that I am running, this is all I get:

Application>    Start of Loop.
Application>    Response status: 200
Application>    HTTP Response Body: 
	<!DOCTYPE html>
	<html class="no-icon-fonts">
	<head>
		<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta http-equiv="x-ua-compatible" content="IE=edge,chrome=1" />
<meta name="viewport" content="initial-scale=1.0, max

Any ideas as to why I am not getting the entire source code?

The truth of the matter is, I just need to grab the team scores from the web page. I was thinking the best way to do this was to read all of the HTML code into a string and parse it to get the scores. Maybe there is a better way to do this, like look through the page and grab certain code? thanks.

If you wrap your HTML in a block of these it’ll work

 ```HTML
 .. put your HTML here
*(these are grave accent and need to live on their **own** line without leading/trailing blanks)*

BTW, when I capture the first response I get when navigating there, this is what I see
```HTML
<html class="no-icon-fonts">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <meta http-equiv="x-ua-compatible" content="IE=edge,chrome=1">
    <meta name="viewport" content="initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
    <link rel="canonical" href="http://www.espn.com/college-football/game?gameId=400945016">
  </head>
</html>

And after that some extra redirections happen.
You may follow the flow via your browser console and see how your code needs to “traverse” that flow to get the final page.

1 Like

You think it may have something to do with the amount of characters that can be stored in a variable (response.body)?

The truth of the matter is, I just need to grab the team scores from the web page. I was thinking the best way to do this was to read all of the HTML code into a string and parse it to get the scores. Maybe there is a better way to do this, like look through the page and grab certain code? thanks.

Or the API, using webhooks?
http://www.espn.com/static/apis/devcenter/docs/scores.html#using-the-api

2 Likes

Thanks. That is what I am doing now using an api made by mysportsfeeds.com (espn doesn’t give away publickeys since 2014). However, I will need to create a specific webhook every time a different game is going (the url would be something like: https://api.mysportsfeeds.com/v1.1/pull/nhl/{season-name}/scoreboard.{format}?fordate={for-date}).

Is there a way to make the webhook automatically put in today’s date for “{for-date}”?
or will I have to some how create a new webhook and probably delete the old webhook (or just edit the webhook)?

Question. Is webhooks the only way of getting data from the API or can I just get it directly from the API? I would have to make a new webhook for every game that is going on that day which is just not possible.