Topic: Getting data from outside sources

I have read all the posts about Flash and security on getting XML and HTML from other sites.  However I am still not totally clear how this applies to the Chumby.

1.  Are all requests for web pages and xml send from "chumby.com"?
2.  Is getting html the same as getting XML?
3.  What if I have a decent widget I would like to post up to the public but it requires a local proxy, will Chumby consider hosting it?
4.  How does your Stock Quote widget work?  Does it make use of a local proxy or does Yahoo.com have a crossdomain.xml file?
5.  Are all the RSS feeds like Engadget and Google News done with a local proxy on Chumby?

Any other info would be helpful.  I need info on pulling web page data from other sites to parse out data and display on the widget.

Thanks,

Todd

Re: Getting data from outside sources

The requests don't come from chumby.com - they come from the IP of the chumby.  However, the security model presents uses chumby.com as the security domain, since that's where the widgets come from when they're running on the chumby.

Fetching HTML is no different from a security standpoint than fetching XML.

We're not set up to proxy arbitrary third-party data.  We have deals in place to proxy *some* services with some large companies, and we only do it when every party agrees to the change of the security model.

The current stock widget uses a third-party feed, which will be replaced sometime soon with an alternate provider with more up-to-date and complete information, but will be contractually restricted to only widgets we produce.

Re: Getting data from outside sources

OK - so basically, any widget that I create and want to allow others to use will have to use a proxy on my servers if it has to pull data from some outside source.

Basically I have to take the bandwidth hit and pay for it.

While I understand why you do this, it is a little limiting on being able to share widgets since I have to pay for everyone to use it, and I don't get any $$ to offset costs in any way.   This is bad for everyone at the end of the day.  Bad for Chumby since widgets can't easily be shared if Chumby is VERY successful since the bandwidth hit on cool widgets costs.  Bad for the Chumby users since they won't get a lot of cool widgets (without some form of advertising on them).  Bad for Chumby growth and expansion since chumby will get less free shared widgets!

Too bad.  Won't stop me from making my widgets since I can embed ads in them and sell ad-free ones, but kinda sucks for other users, and myself since others won't be sharing anything that pulls from other sites!  I would imagine that most widgets of any use will need to pull data from other sites than chumby.com wink

Todd

Re: Getting data from outside sources

This situation isn't limited, of course, to chumby - it's true of *any* Flash movie that's not running locally.

The Flash security model is indeed rather inconvenient sometimes, but the reasoning as to why it is the way it is is very sound from a security standpoint.

In many cases, we've been able to simply ask the various sites to add crossdomain files to their systems and they do.  The issue really is that folks don't know about them, and once you explain what they do and why they're useful, we've had pretty good success.

Re: Getting data from outside sources

As you pointed out so well in another post, Chumby is not a PC and thus relies on these widgets for things.  No Chumby proxy will limit widget development and thus limit Chumby growth.  Just a thought there smile 

It's your business not mine!  I'm just a customer.

Todd

Re: Getting data from outside sources

As a step towards the solution, can we put the crossdomain.xml on the chumby's own internal web server during next upgrade, please?

That would allow to at least run a proxy or relay of sorts using scripts in /psp directory. I know about the speed and capacity limitation, but it might still be sufficient for some interesting ideas.

In the same spirit, it would be useful if chumby's webserver could serve files/scripts from both /psp and flash directory areas. That would make for easier instructions on how to make things run (copy to flash, stick it in, [no copy/sshd step], hit X URL).

Re: Getting data from outside sources

limbo you're forgetting another key concept.  its not chumby's fault that the crossdomain.xml file is required.  its part of the standard flash model.  that model is there for a lot of good reasons - one of which is security, as duane has already stated. 

another one relates to the data owner's rights to their data.  its fine to say that technically you can scrape data from someone's website.  its another matter entirely to have the legal right to do so. 

many data owners publish data with the understanding that people will come to their website to view it, and possibly their own ads and revenue generators that are paying them for the collection and presentation of that data.  if you scrape the part you want and ignore the parts you're not interested in (ie their ads, etc) you may be undermining their ability to continue to deliver that data. 

some providers of course would have no objection to your scraping, and would probably be very willing to put a crossdomain.xml file on their server (and possibly even generate data in a more easily scraped fashion for your use, if you talk to them nicely and explain that it might bring them more business to their website).  others will insist that you NOT take any data at all - and if that is the case a proxy that "fakes" the crossdomain.xml file would not really be a very wise choice

just a thought...

Re: Getting data from outside sources

If we put a crossdomain.xml file and proxy on the internal server, it would expose your entire internal network to scraping and upload to external sites. That would make the chumby a security disaster, and it would be rightly banned from any business or sophisticated home environment.

We're simply not going to do it.

If you want to create such a system for your own chumby, you're absolutely welcome to do so.

Re: Getting data from outside sources

I'm confused.  The general advice here has been to get 3rd party sites to put crossdomain.xml's on their site, but yet putting a crossdomain.xml on a chumby is a "security disaster."  How can I, in good conscious, ask someone else to do something that I wouldn't do myself? 

In general, crossdomain.xml is not the answer.  The original motivation for crossdomain.xml was to allow large sites with multiple domains to avoid problems when asset x is on one server and asset y is on another.    The reason why Flash restricts access only to the server it came from is because, as a user, I don't want an ad banner going off and doing things I don't expect, like posting my personal information to a spam house, or messing with my gmail.   Crossdomain.xml has nothing to do with stopping evil screen scrapers, they can do that without Flash (and frankly with less headache).

The core problem here is that Chumby lacks the browser part of the internet model.  When I want to view the xkcd comic of the day, I don't have to ask xkcd for permission. I just do it, because I have a browser and I can type the URL.  Similarly for a Chumby channel, I should be able to make "http://xkcd.com" a channel.  It might look horrible at 320x240, but the point is that it doesn't break any security rules. 

How can a flash movie act like a browser?  What if a Chumby widget SWF was initialized with the URL of the site it's trying to display, not the extraneous chumby.com URL?  Maybe a Chumby widget is a larger thing than a Flash movie, an XML file, say, that specifies the movie to run, a site to get data from, and maybe later xlt transforms and configuration information.

Re: Getting data from outside sources

What is being proposed in this thread is to put a crossdomain file on a local proxy to eliminate the security for *all* domains, not just the chumby itself.  I don't have any particular issue with adding a crossdomain to access HTTP services hosted on the chumby, although I would restrict it to services that don't supply information that should probably be kept secure - the web server in the chumby, for instance, reports some network information that could be useful to a malicious hacker, such as the ESSID of the access point currently configured (even if it doesn't beacon).

For instance, the HTTP daemon that presents the iPod has a crossdomain response.

To the general issue, however....

The important difference between a web page showing ads and a Flash movie is that a Flash movie is basically an application.

Imagine that every web page could include an untrusted application that would download and run on your machine, get access to your files and the rest of your network and upload the data.

I hope it's obvious to everyone that that would be bad.

The problems with universally bypassing crossdomain on the chumby device itself is that the the drop in security would apply even to your own network - a widget would be able to scan through IPs behind your firewall, and upload any information it discovered to any external server.  No company with half a brain would allow a chumby to run on their networks, and I probably wouldn't even run it on my own home network.

We didn't come up with this security model, Adobe did, and their reasoning is pretty sound from a security standpoint.  Yes, it's inconvenient sometimes, but security often is, and Microsoft's abysmal track record has shown over and over again that convenience over security is bad policy.  Both Java applets, and Javascript in a webpage have similar, but not identical, security models.

If we did what you're proposing, which is, essentially a mechanism to spoof, what's to stop people from specifying the runtime domain as 192.168.x.x, 172.22.1.x, or 10.x.x.x and having free reign to wander around behind the firewall?

Incidentally, the crossdomain issue does not apply to images - a widget can show a comic from xkcd without security barriers.

Re: Getting data from outside sources

Thanks for the thoughtful response.   You are rightfully concerned about security and you've obviously thought a lot about this issue.  I agree that the Flash, Javascript, and Java applet restrictions are a pain to developers, but that they're there to prevent the applet from having access to anything but it's own server.  I'm not in favor of undoing their security models, or even posting crossdomains everywhere. 

I am in favor of allowing a widget to access a URL as if it were it's server.  This can be justified like so: say Chumby had a full-blown browser and a "widget" was really just a URL.  This browser would display whatever was at the URL including the Flash movies and javascript actions all within their sandboxes.  In effect, this would be no different than having a PC on the network as long as I selected the URLs to display. 

The insightful issue you raised is that someone may choose the URL to be an internal address like 10.0.x.x, and you're right, they will do it, sometimes with good intentions and sometimes not.  There are several ways to solve this, some technological, and some social, but it is solvable in a robust manner.  In the end, if I select the URLs to display, I should be OK.

As far as xkcd, the image URL changes.  To get the URL, I can fetch the xkcd atom feed, but that seems to fail when I publish the widget.

Re: Getting data from outside sources

There is a mechanism in place to allow widgets to load from other domains besides chumby.com- it's only partially implemented on the server, but you can play with it locally using this trick because some of the required support for the client is in the current Control Panel release.

This doesn't completely solve the crossdomain issue, but *does* lift the particular restriction that the site must open to chumby.com.

On the server, at some point, we're considering allowing people to provide a full URL to their widget rather than upload it to our server - the big issue. of course, is quality of service.  If some developer hosts a popular widget from a residential DSL, it would degrade the chumby experience, and we're the ones that get the support call about crappy performance.

As you've said, we've given this a lot of thought, but we're certainly open to possible solutions that preserve the security - hopefully the community will help us come up with something that works.

13

Re: Getting data from outside sources

What if Chumby Inc provided a proxy service at another domain (ex. chumbyproxy.com) with an open crossdomain file? This would protect chumby.com from needing an open crossdomain file.

In the same idea, what if Widget developers could get a personal proxy at this domain by registering? (ex. dev_jvc.chumbyproxy.com. This could help manage monitor what widget/author is pulling what.

Re: Getting data from outside sources

Having Chumby Inc provide a proxying service would patch the problem and could provide interesting services like caching for the content, something that will become important when Chumby takes off.   Duane is right to be wary of having widgets hosted else wheres, a single Chumby can use a lot of bandwidth.

So, going back to the idea of a widget being an XML file that points to the Flash file, the data URL could be fetched by Chumby.com and optionally cached according to another entry in the XML file.  This eliminates the problem of using an internal, 10.0.x.x, URL (well it shifts it to chumby.com, but I'm sure the network people there could figure it out).  Which gets back to Jvc's idea, the implementation of the proxy could use the data in the XML file to drive how it works.    Later, if a better mechanism was figured out, the XML file wouldn't have to change.