Page 1 of 1
Problem OFDb - IMDb (DE)
Posted: 2011-09-03 15:15:24
by gerol
Hello,
I have found that Internet Explorer displays the imdb.com site differently than Firefox and even the source code is different.
example: The source code in Firefox includes the words "rating rating-big". In the source code of Internet Explorer they are not included.
With the script OFDb - IMDb (DE) I get with the command
Page.Text = GetPage (IMDbURL)
different html versions for the same movie. This seems to happen randomly.
Can I adjust this in any way?
Posted: 2011-09-04 15:05:52
by bad4u
Nope. At least I don't think so.
There are more differences in what IMDB delivers depending on where the user resides, e.g. french or italien users would get different HTML back than german users, and I sometimes see (probably) random changes, like ' instead of " or HTML encoded and not encoded versions of same character on one page. It makes it annoying to work with screenscraping, but I guess that's what they want it to be - to make people use their API.
There's more, at least the size of images IMDB delivers seems to depend on the user agent of browsers, they delivered smaller images for AMC user agent (didn't test that for a while, so don't know if that changed meanwhile).
Your best bet is using Firefox with add-on User Agent Switcher installed, so that you can test with AMC's user agent. Still some of the problems above won't be solved..
Posted: 2011-09-04 16:13:22
by gerol
@bad4U
Thanks for the answer.
There is one thing, I don't understand:
Why do I get with the same command once the 'Firefox-html-version' and once the 'Internet Explorer-html-version'?
I have tried it with about 150 movies:
1st experiment:
All movies have a rating.
2nd experiment:
I close and restart AMC.
I mark the same movies: no rating
I don't know how AMC gets the html-Code from imdb.com.
Is there no way to specify: This is a Firefox-request or: This is a IE-request?
I've fixed the bug (works with both html versions):
here is the new version 1.8.1
But (as you said already): It is difficult in the future, to write scripts and test.
Posted: 2011-09-04 17:56:40
by bad4u
There most probably is different code for FF and IE on IMDB, but I doubt it's these both that you see when using just one browser (AMC in this case).
Usually the site uses user agent information to know what browser is loading data (afaik), but that doesn't change with just one browser, so I guess changing code is a feature, not a bug, maybe to make screensrcaping more difficult. It's not AMC only that gets data from their website and I think officially they now only support access through their API, which is not free.
For free, personal applications they have offline versions of their database for download, but these don't make sense for AMC as they don't include latest releases, I think. And updates of db would require a huge download.
So I don't see a real solution there. Depending on what data you extract from there, maybe you could try
http://akas.imdb.com/ which should give same results for all users, but I don't know if it switches code, too. And it can have UTF8 characters in the text/listings that AMC doesn't support (e.g. try "Nochnoy dozor"), while
www.imdb.com doesn't list these.
Posted: 2011-09-04 18:22:09
by antp
UTF8 is supported, it is the unicode resulted from the UTF8 decoding which is not supported
(but not much a problem since it only concerns some alternative titles I guess)
Posted: 2011-09-04 18:53:29
by bad4u
antp wrote:UTF8 is supported, it is the unicode resulted from the UTF8 decoding which is not supported
I will never learn that, it always confuses me ^^
When using akas. subdomain during tests I stumbled about these characters quite often, as they are already listed on search results (where script shows aka titles by default). I think there were problems with danish and some other characters on aka titles too, but I don't remember exactly (and didn't try to convert somehow).
Posted: 2011-09-05 09:06:44
by antp
All western-European languages should work fine, but there would be problems with eastern ones (Cyrillic/Greek/Romanian/Czech/etc.) as these use other character sets.
Though that for people using these languages, it would be the opposite: on a Russian Windows the Cyrillic characters would probably be properly handled, but the ö à ê etc. of western-Europe would be lost.
Posted: 2011-09-05 13:20:02
by gerol
You're right:
I did some tests again:
I get different html versions for the same movie with the same browser.
It seems to work randomly.
I'm afraid we'll have to live with this.