Page 5 of 7

Posted: 2007-09-06 22:07:36
by bad4u
Viridarium wrote:I have a problem with the update of the imdb-script. The import of the actors do not work as well as in the old version. Where can I get the old imdb-script? :??:
I have tested the last "official" version 3.19 and the forum release 3.20 from the link above with all "ActorsLayout" option settings - they both work pretty fine.

So what do you mean by "do not work as well as in the old version" ? Could you explain what does not work well for you ?
Have you tried changing the IMDB script option "ActorsLayout" (on the right side of the scripting window) ?

Posted: 2007-09-07 06:26:42
by Viridarium
bad4u wrote: Have you tried changing the IMDB script option "ActorsLayout" (on the right side of the scripting window) ?
Sorry, I did not try changing the "ActorsLayout". :/ Now it works fine, just like the old Version of the Script. Thank You. :)

Posted: 2007-09-21 13:06:52
by dwayne2005
bad4u, did you ever implement my Google search recommendations? I gave up waiting. The +maindetails bit doesn't work, you have to put something from the main page. The +maindetails method has about a 10% error rate, whilst an indication to the text identifier on the films main page will give you an almost 100% success rate.

Posted: 2007-09-21 22:01:10
by bad4u
dwayne2005 wrote:The +maindetails method has about a 10% error rate, whilst an indication to the text identifier on the films main page will give you an almost 100% success rate.
Sorry, but as I told on an earlier posting, I do NOT think Google is the best choice for batch mode - it's only very useful for manual mode and as a backup for IMDB search function. On the same posting I explained why I think so (try "Batman" or "Superman" with "comment on this title" and you will still get wrong results because of Google's own results ranking... this seems far away from 100% - and I did not test any other films yet, there will be lots more -.- )

I will propably upload a changed version these days, but I have not enough free time at the moment to find a better solution (or maybe someone else wants to draw 2 PCBs this weekend ? :D) .. and changes will not be released on "official" version until I find more time and better parameters for Google's search. But feel free to change the script by yourself, it should be easy (there are only 3 or 4 lines that have to be changed, I can have a look on it and tell you which ones)

Posted: 2007-09-22 05:09:02
by dwayne2005
Well I batch converted almost a thousand movies and I beg to differ, but anyway I would like to make the amendments to my own version of the script if you can help provide me the alterations.

I want to recommend AMC to other people with large collections, but the strange IMDB search results has discouraged me.

Posted: 2007-09-23 14:20:26
by bad4u
dwayne2005 wrote:Well I batch converted almost a thousand movies and I beg to differ, but anyway I would like to make the amendments to my own version of the script if you can help provide me the alterations.
I have uploaded the modificated version of IMDB 3.19 : http://www.bad4u.741.com/beta/IMDB_319_mod.ifs (copy link into a new browser window)

I still do NOT recommend to use this version and I will not add these modifications to official versions. I added the "Comment on this title" instead of "maindetails" to the search parameters, but as told before, it does not work very well. As Google's rankings are affected on page impressions and other factors, you will still have problems with films like Batman (..begins), Superman (..returns) or Taxi (.., un encuentro). Another problem is that it seems impossible to make Google search for two exact phrases at the same time or to combine "allintext" and "allintitle".

I also changed the URL part of the script on Google results. Now it should fetch the correct address from the result list, even if the link points to an additional page like /maindetail or /combined. So you do not need to care about this if you like to try other settings on the script. Just edit the two lines at the bottom of the script beginning with :

Code: Select all

AnalyzeGooglesResultsPage('http://www.google.com/search?num=20&.....
If you find a better solution please let me know. It might be added to next official version, if it solves the problems mentioned above.

Posted: 2007-09-24 06:49:59
by dtsr
antp wrote:You have an example of an IMDB page with unicode characters? As far as I know they only use iso-8859-1. And anyway AMC does not handle unicode characters :??:
I guess they use it sometimes in 'Also Known As' section for foreign language films. For example, http://www.imdb.com/title/tt0095574/ or http://www.imdb.com/title/tt0374298/. Unfortunately, as you told, AMC doesn't support unicode and these strings are already corrupted right at the GetPage output, so I do not see any good workaround, except, probably fetching HTML to file with some external application instead of GetPage call and doing some preprocessing externally or within the script.

Posted: 2007-09-24 08:47:18
by antp
What kind of external processing could be done? All could be done in the script itself I think: the page is not in unicode, it is in iso-8859-1 which does not allow the Russian characters. These are stored as HTML entities (&# + number) which could converted by the script "manually"... but converted to what? It would only work on a Russian system, as there is nothing to convert them to on a non-Russian system.

Posted: 2007-09-24 15:39:10
by dtsr
Hi Antoine,

Of course no universal solution can be made unless native Unicode support is presented by AMC itself. Sure, any type of preprocessing will currently work only for some particular locale and fail for others. And sure, its not going to be a difficult task to convert Russian &# representation into win-1251 encoding which is common for Windows Russian locale. The only reason why I talk about external processing is that (I guess I may have done something wrong) GetPage function doesn't return &# representation for these Russian strings, I guess it already does some HTML decoding assuming iso-8859-1 (?) and returns 1 byte char for each &# item. So I'm looking to use something like wget to store the page into file to preserve &# representation, and process it then either within the script, or externally into win-1251 the same way as ConvertToASCII function is implemented.

Posted: 2007-09-24 16:46:01
by antp
GetPage does not do any decoding. That replacement of HTML entities is done by HTMLDecode.
GetPage only converts linebreaks to windows-style linebreaks, I think.

Posted: 2007-09-24 19:24:36
by bad4u
Played around with the IMDB pages mentioned above, but it seems it is not possible to find a (quick) solution, at least on western european systems, right ?

Posted: 2007-09-24 19:25:01
by dtsr
I didn't have much time to play with this yesterday, will double check and let you know. Basicaly, what I've tryed to do:

var
Page:TStringList;
Value:string;

begin
Page :=TStringList.Create;
Value:=Getpage('http://www.imdb.com/title/tt0095574/');
Page.Text:=Value;
Page.SaveToFile('1.html');
end

Something like that...
1.html contained already converted string.

Posted: 2007-09-24 19:50:27
by dtsr
Ooops, looks like I've missed something yesterday, just tried again this example, everything works as it's supposed to be, no decoding within Getpage..., then I'll probably just add something similar to ConvertToASCII, e.g. ConvertTo1251

Posted: 2007-09-24 19:53:50
by dtsr
bad4u
I guess you are right, for western european locales there is no way to display cyrillic characters within non-unicode applications.

Posted: 2007-09-24 20:37:03
by antp
Only thing that could be done is convert russian IMDB texts to locale-russian texts for display on russian-Windows only. Not very hard to do, but still work to do :D

Posted: 2007-09-24 21:47:50
by bad4u
antp wrote:Only thing that could be done is convert russian IMDB texts to locale-russian texts for display on russian-Windows only.
But does it make sense to put this into standard IMDB script or StringUtils1 then ? Russian is not the only language with this problem, is it ?

@antp: What about the ConvertToASCII and modified StringUtils1.pas, should it be added to the "official" IMDB or leave it as "experimental" ? It's still a forum only release (named IMDB 3.20), but it might become confusing, because 3.21 is a modification of 3.19 (last official) without the ConvertToASCII function. ;)

Posted: 2007-09-24 22:21:16
by antp
Well, this function to remove accents can be used by people from Russia, Asia, East-Europe. So maybe it could be useful to many users, as these "European" characters may appear at many places of the site, not only in "akas" titles.
A function to correctly convert Russian characters would only be used by people that use a Russian charset, I do not know if among those many will use IMDB for movies with such titles, as IMDB is usually more complete on US & European movies than for these movies.

Posted: 2007-09-25 06:21:32
by bad4u
Ok, so I will add ConvertToASCII to the next "official" script update then.

Posted: 2007-09-26 08:39:09
by dwayne2005
Thanks bad4u for going to the effort for me anyway, I haven't tried it out yet but I just got hold of some films I'd like to try out. I notice you capitalised the `b' in the `beta' bit in the url, but I found it in the end. :grinking:

Posted: 2007-09-26 10:17:57
by bad4u
dwayne2005 wrote:I notice you capitalised the `b' in the `beta' bit in the url, but I found it in the end. :grinking:
Sorry, you're right. I corrected the URL in the previous posting for other users that might not find my mistake.