Core Browsing Website By Itself?

Hi,

I recently received a Spark Core. I have quite a bit experience with Arduino, but very little web development knowledge. I’ve read multiple threads regarding “getting started” code, etc. and even read code from examples such as the Twitter Magnet (and tried to understand it best I could). I’ve also used the Remote Spark webpages (work great).
But… first of all…
Let’s say I want to create a simple website that I could monitor my Spark Core from… I’m still kind of lost as to the actual components that I need to put together (referring to web/code stuff, like .js and .html files). And getting started on that?

And more importantly, how can I make my Spark Core, by itself, browse a website using CURL/REST API? As in, let’s say I want to connect to www.yahoo.com, navigate to a button, navigate to another button, read a news topic, and display it on an LCD connected to the Core. Is there any way to do this all from the Core? How about using CURL straight from the Core?

Thanks. I apologize, I’m really a noob regarding web-development stuff.

I smell an opportunity for an in-depth tutorial here. I’m in the midst of getting my hands on the web stuff too!

There is one good example here: https://community.spark.io/t/lcd-buses-weather-reporter/2050

1 Like

Hi @helium,

You might also be interested in making GET and POST requests from the core (if you wanted to query yahoo for example) https://community.spark.io/t/making-a-get-or-post-request-from-the-core/2288

Thanks!
David

What you are basically requesting is a way to SPIDER or WEB CRAWL a webpage, and SCRAPE the data from it. This means you basically have to download all of the content and PARSE through the data looking for things, and move onto the next link or piece of data that’s important to you. This is a lot of work, but doable.

It would be best if you are trying to get to a particular set of data, to use a service that gives you the data in JSON format, directly from a GET or POST REQUEST.

You can look up these TERMS to get a good idea of what they all mean, and it will help you better understand what’s going on there.

1 Like

CaPs FoR EmPhAsIS! :slight_smile:

Also, HTML parsing in general is pretty tricky, Python’s BeautifulSoup is very good at it: http://www.crummy.com/software/BeautifulSoup/

1 Like

Thanks for all the replies. I’m going to spend some more time learning about those terms, examining the options you guys have introduced.
FYI, I’m a high school student, and am actually working on a grade-fetcher. Basically, the Spark Core logs in to an online grade database, browses to my grades, and displays certain grades to an LCD. Like this:
First, the Core would have to navigate to this site, which has a login box: https://campus.hallco.org/campus/portal/hall.jsp

Then login using my credentials, which would then display this page:

The Core would emulate a “click” on the “Schedule” button, which would then show this page:

Which are all of my classes. Clicking on the name of a class opens this page:

And from here I would fetch different grades/assignments. And do this for every class, and then display results on an LCD.

Now, I have a loose idea how to implement this, but what do you guys suggest as a good approach, software wise?

I would suggest that you make a GRADES twitter feed (to parents) with localized Lock Box control that stays unlocked when you grades are high, and locks when your grades are low... put your cars keys in there, your good cologne and your spending money. xD

That's a lot of web scraping... I would first suggest learning how to automatically log into your grades page from a tool like http://www.hurl.it/ first. This would likely be a post request with parameters of username and password. You can learn a lot about what types of requests are being made on the pages, by opening the Developer Tools window in Chrome (hit F12) and then click Network tab. Here you can watch when you login there will be a Name, Method, Status, Type etc.. If you click on link under Name, it will expand and show you all of the Request Headers, Form Data (where your username/password would be and the parameters associated with them), and Response Headers. Once you get the Spark Core logging into that grades page... check back for more info :smile:

Alright, I’ve made some progress. First of all, inspecting the forms on the login page, I found this:

So logging in is achieved through a POST, and there are multiple input values required to login, I’m expecting, including appName, portalUrl, username, password, url, lang, useCSRFProtection
I’ve found these two websites about using CURL to login to a website, and both employ PHP, so I tried my hand logging in via PHP to my site. (By the way, I’m brand new to PHP as well :smile: )


So far, I’ve come up with this PHP code, which I tried running with this website
http://www.compileonline.com/execute_php_online.php

<?php
$appName = "hall";
$portalUrl = "portal/hall.jsp?&rID=0.4559474728524171";
$username = "USERNAME";
$password = "PASSWORD";
$url = "https://campus.hallco.org/campus/portal/main.xsl?lang=en";
$useCSRFProtection = "true";
$lang = "en";

$postdata = "appName=$appName&portalUrl=$portalUrl&username=$username&password=$password&url=$url&useCSRFProtection=$useCSRFProtection&lang=$lang";

$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);

$result = curl_exec($ch);

echo $result;
curl_close($ch);
?>

This code returns a value of “1”
Hmm… so I expect it works? (By the way, what’s a good way to check if I’m really logged in?)
As I said, I’m brand new to PHP as well, but I understand this code.
Now… even though I’m not completely sure whether this is actually logging in, it’s not returning errors, so now… how could I transport this to the Spark Core itself, so it could run these CURL commands? I found this thread: https://community.spark.io/t/making-a-get-or-post-request-from-the-core/2288/3
Which includes code such as:

client.connect(LIB_DOMAIN, 80);
client.println("POST /update HTTP/1.0");
client.println("Host: " LIB_DOMAIN);
client.print("Content-Length: ");
client.println(strlen(msg)+strlen(**PLACE TOKEN HERE**)+14);
client.println();
client.print("token=");
client.print(**PLACE TOKEN HERE**);
client.print("&status=");
client.println(msg);

But this is specific to tweeting… what kind of “client.print” commands relate to what my PHP code is doing?

And if I’m going in the completely wrong direction, don’t hesitate letting my know. :slight_smile:

And after I log in, this is what I see in the “Network” tab of Chrome’s Developer Tools:

I’m guessing the login credentials get handled by the “verify.jsp” file. This reflects what I saw earlier (check the first picture of my previous post) which stated for the login boxes

<form method="POST" action="verify.jsp">

And now, inspecting the contents of “verify.jsp”, I find:

Which confirms this.

Hmm…

I was wondering how this project turned out? I am looking to do something very similar and was hoping to hear that someone else had been successful. Does anyone know of any resources or tutorials that could help with figuring out how to use the core to automatically log in and scrape a webpage?