dwayne2005 wrote:HTTP/1.1 404 Not Found
HTTP/1.1 400 Bad Request
A rare occurence, but it's due Google finding the entry to another page, like /movieconnections.
That is not a problem of the corresponding page itself, especially not if it is the /maindetails page. It is just because of the fact, that some other parts of the script need the 'standard' URL of the film to find the correct data (f.e. when /combined is used). When Google returns a /movieconnections, /maindetails, /news etc. (there are many others available) as a result page, the URL needs to be cleaned to point to the standard page.
dwayne2005 wrote:Just getting rid of the -movieconnections -usercomments and adding maindetails and fixing up the page differences might make it perfect (does every film have a /maindetails and is it in sync with the other one, and why does the normal one miss the bit contained on the ordinary page using Google; is the normal page better, and what identifiable text exists on the non-/maindetails pages)?
Every film has a /maindetail page. I added the 'maindetails' keyword to Google's search function so that only the /maindetails pages are shown on the result list. When a film is picked, the /maindetail is deleted from the address again, so that the standard page will be loaded. It seems to be working fine this way.
BUT.. after testing for a while, I don't think Google is the perfect search engine for IMDB
when using batch mode. Why ? Because of its own search algorithms and its very special way of sorting the results page. Try some films like "Superman" or "Batman" - Google will always show more popular results like "Superman Returns" or "Batman begins" on first position instead of the original films (while IMDB often delivers correct results). It doesn't seem to be possible to search for exact titles with Google, only for an exact match
within a title
I think the best possible solution for batch mode would be an IMDB AKA titles search (like
http://www.imdb.com/find?q=Superman;s=tt;site=aka) to find ALL titles, even these used in other countries, and then search the page for the first
exact match for the title via comparison. This should work fine for at least 99%, the only problem would be if there are identical names for different films (like 'Superman' from 1941 and 1996). Maybe I will modify the original IMDB script to a more precise 'batch mode only' version when I find time for it. This way it finds even german titles like "Hart auf Sendung"
http://www.imdb.com/find?q=Hart+auf+Sen ... t;site=aka (= Pump up the volume), that is neither found with Google nor with any other search function on us.imdb.com.
So what about Google now ? It's still useful when running script in non-batch mode, and it's very nice to have some kind of 'backup solution' for the original IMDB search function if IMDB should change the design of its result pages again. Google should still work then and link to the correct film details pages, until the IMDB script has been updated ...
@Antoine: The latest version of the script (3.14) could be downloaded from
http://www.bad4u.741.com/IMDB.ifs (copy the link into a new browser window). Maybe you should have a short look on the code and decide if you want the Google option in the original IMDB script. I only added the "procedure AnalyzeGooglesResultsPage" and the "GetOption('GoogleSearch')" in the beginning of the program (at the bottom of the script). I think it makes sense, especially as it is only selectable via option and not set by standard, but it should be your decision. Maybe you'd prefer if it is released as a separate script, else it could be uploaded to server