Photon: HTTP get request only getting part of HTML code

SlinkyMation · November 11, 2017, 3:41pm

I want to scrape the HTML from this website using the Particle Photon:
http://www.espn.com/college-football/game?gameId=400945016

Here is the code that I am running:
#include <HttpClient.h>

 unsigned int nextTime = 0;    // Next time to contact the server
 HttpClient http;

 http_header_t headers[] = {
     { "Accept" , "*/*"},
     { NULL, NULL } // NOTE: Always terminate headers will NULL
 };

 http_request_t request;
 http_response_t response;

 void setup() {
     Serial.begin(9600);
 }

 void loop() {
     if (nextTime > millis()) {
         return;
     }

     Serial.println();
     Serial.println("Application>\tStart of Loop.");
     request.hostname = "www.espn.com";
     request.port = 80;
     request.path = "/college-football/game?gameId=400945016";

     // Get request
     http.get(request, response, headers);
     Serial.print("Application>\tResponse status: ");
     Serial.println(response.status);

     Serial.print("Application>\tHTTP Response Body: ");
     Serial.println(response.body);

     nextTime = millis() + 10000;
 }

If you go to the actual url and look at the page source, you see that it is pretty extensive. However, if with the code that I am running, this is all I get:

Application>    Start of Loop.
Application>    Response status: 200
Application>    HTTP Response Body: 
	<!DOCTYPE html>
	<html class="no-icon-fonts">
	<head>
		<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
<meta http-equiv="x-ua-compatible" content="IE=edge,chrome=1" />
<meta name="viewport" content="initial-scale=1.0, max

Any ideas as to why I am not getting the entire source code?

The truth of the matter is, I just need to grab the team scores from the web page. I was thinking the best way to do this was to read all of the HTML code into a string and parse it to get the scores. Maybe there is a better way to do this, like look through the page and grab certain code? thanks.

ScruffR · November 11, 2017, 5:46pm

If you wrap your HTML in a block of these it’ll work

 ```HTML
 .. put your HTML here

*(these are grave accent and need to live on their **own** line without leading/trailing blanks)*

BTW, when I capture the first response I get when navigating there, this is what I see
```HTML
<html class="no-icon-fonts">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
    <meta http-equiv="x-ua-compatible" content="IE=edge,chrome=1">
    <meta name="viewport" content="initial-scale=1.0, maximum-scale=1.0, user-scalable=no">
    <link rel="canonical" href="http://www.espn.com/college-football/game?gameId=400945016">
  </head>
</html>

And after that some extra redirections happen.
You may follow the flow via your browser console and see how your code needs to “traverse” that flow to get the final page.

SlinkyMation · November 12, 2017, 1:55am

You think it may have something to do with the amount of characters that can be stored in a variable (response.body)?

SlinkyMation · November 12, 2017, 3:01am

The truth of the matter is, I just need to grab the team scores from the web page. I was thinking the best way to do this was to read all of the HTML code into a string and parse it to get the scores. Maybe there is a better way to do this, like look through the page and grab certain code? thanks.

Moors7 · November 12, 2017, 8:19am

Or the API, using webhooks?
http://www.espn.com/static/apis/devcenter/docs/scores.html#using-the-api

SlinkyMation · November 15, 2017, 11:18pm

Thanks. That is what I am doing now using an api made by mysportsfeeds.com (espn doesn’t give away publickeys since 2014). However, I will need to create a specific webhook every time a different game is going (the url would be something like: https://api.mysportsfeeds.com/v1.1/pull/nhl/{season-name}/scoreboard.{format}?fordate={for-date}).

Is there a way to make the webhook automatically put in today’s date for “{for-date}”?
or will I have to some how create a new webhook and probably delete the old webhook (or just edit the webhook)?

SlinkyMation · January 29, 2018, 10:40pm

Question. Is webhooks the only way of getting data from the API or can I just get it directly from the API? I would have to make a new webhook for every game that is going on that day which is just not possible.

Topic		Replies	Views
Photon: Send and recieve HTTP GET/POST Firmware	3	687	January 4, 2020
Issue wget request from Photon General	5	1975	July 24, 2016
How to read http response headers? Troubleshooting photon	3	678	April 14, 2019
HTTP request data truncated Libraries	1	502	June 12, 2020
HTTP GET from web API with Photon Troubleshooting	3	2490	January 31, 2018

Photon: HTTP get request only getting part of HTML code

Related topics