With some changes on the script it should be possible. Maybe it could be implemented as an option, I'll have a closer look on it later.dwayne2005 wrote:I was thinking, would it be possible to analyse Google's IMDB specific results so that I can parse more fields (or combine them into one field) like year or director to more accurately obtain automatic results (I tried the old IMDB script a year ago for automated results, got a number of wacky films and no idea what they were originally)?
IMDB script problem
Re: IMDB script problem
-
- Posts: 22
- Joined: 2007-06-29 04:49:09
Re: IMDB script problem
Thanks. I'll give feedback on how accurate the two are when I get to import it over. I forgot also, there's IMDBs `power search':
http://www.imdb.com/list
Never used that before, but Google might be sufficient, easier to do, and more reliable than IMDB is at present.
http://www.imdb.com/list
Never used that before, but Google might be sufficient, easier to do, and more reliable than IMDB is at present.
copy/pasting script contents is not a good idea as it would not include the new option (as those are not shown in the code in AMC's editor) and you also lose the version number (that you can use to check if you have latest script).Albetra Boy wrote: heres what I did should work for you also
open the program click on tools, click on scirpts, highlight the IMBD script, at the top of the page click the spot that says editor.
Delete from #1 all the way to the bottom
Past the updated sript in the #1 spot it will fill in al the way to the bottom, click save should work fine for you now did for me.
I missed changing a line of code in IMDB script (v.3.10) that is important only when using batch mode. Just overlooked it
Please download version 3.11 from http://www.bad4u.741.com/IMDB.ifs (copy the link into a new browser window, then the download will start immediatly)
Sorry for the inconvenience, was my fault.
Please download version 3.11 from http://www.bad4u.741.com/IMDB.ifs (copy the link into a new browser window, then the download will start immediatly)
Sorry for the inconvenience, was my fault.
I added a "GoogleSearch" option to the script, so that users can choose between IMDB website's search and Google's search on IMDB. I don't think Google's search is more accurate, but it might be useful for batch mode operation where only the first result is going to be imported. "GoogleSearch" can be used with batch mode and standard mode.dwayne2005 wrote:Never used that before, but Google might be sufficient, easier to do, and more reliable than IMDB is at present.
What could be very useful : This feature is the best workaround if IMDB changes the result page again, until a new script update is released then
Technical Information: I did not like the idea to do major changes on the existing script's code. So I decided to add a new procedure 'AnalyzeGooglesResultsPage' instead of rewriting main part of the existing code. Only the option and the new URL have been added to the beginning of the script, no further changes necessary
@antp: Please do not upload the script, it should work perfect but I want some extra time for testing.
Download new version 3.12 from http://www.bad4u.741.com/IMDB.ifs (copy the link into a new browser window, then the download will start immediatly)
-
- Posts: 22
- Joined: 2007-06-29 04:49:09
I'll be testing it out later once I can get my items exported and have sufficient time to run the whole thing through automation. Unfortunately, I suspect this may take a day or so with all the interruptions I expect.bad4u wrote:I added a "GoogleSearch" option to the script, so that users can choose between IMDB website's search and Google's search on IMDB. I don't think Google's search is more accurate, but it might be useful for batch mode operation where only the first result is going to be imported. "GoogleSearch" can be used with batch mode and standard mode.dwayne2005 wrote:Never used that before, but Google might be sufficient, easier to do, and more reliable than IMDB is at present.
Thanks very much for giving this a look into!
Edit: I'll be exporting the results from DVD Profiler and using BK Replace 'em to encapsulate that files movie titles in exact quotations (if that works) and/or to mix another detail, like director name into the title search criteria.
Edit 2: done a brief test (first 50 titles) and found 1 mistake in both searches, due to a non-listing: IMDB changed `The Exorcist Collection' to a `Collection of Films for the Armed Forces' :| (would never have figured that one out!) while Google changed it to `Teenage Exorcist'. If I can work exact quotation marks into the .csv titles (or part of the titles excluding articles of speech ", La" for instance), Google will have the additional benefit of not linking to items which do not list on IMDB like, for instance, music DVDs. It gives user that additional benefit of exact quotes, eg. "The Exorcist Collection" or "Exorcist Collection" to guarantee that it won't get replaced by something weird and unexpected.
Edit 3:
{}p{}e"([^"]+)",
{}e"“\1”",
in BK Replace 'em will enclose first items in .csv files (the titles in my DVD Profiler output) in quotation marks, alternate quotations that can also be read by Google to access the function. Maybe I'm better off learning whatever scripting language is used in AMC to add quotation marks in there but I do love BK Replace 'em. Plus, this has the very handy additional benefit of marking those which do not update so I can deal with them manually.
Edit 4:
You've got to keep this in it! Last year or the year before when I first tried the automated IMDB script, I got this downloaded for at least 3 films and I had no idea what they were originally:
http://imdb.com/title/tt0327428/
Now with my method of marking them in quotation marks to get exact results using Google, I'm probably getting 100% or near 100% success while still having the unfound items in their original titles still quoted, not `Another Demonstration of the Cliff-Guibert Fire Horse Reel, Showing a Young Girl Coming from an Office, Detaching Hose, Running with It 60 Feet, and Playing a Stream, All Inside of 30 Seconds'. This wouldn't have even been possible with ordinary IMDb search, since they don't -- to my awareness -- have a quotation function. Plus, I found IMDB permits "Dolce Vita, la" as "La Dolce Vita", so it all works out nicely.
No need to hurrydwayne2005 wrote:... Unfortunately, I suspect this may take a day or so with all the interruptions I expect...
Always a good ideadwayne2005 wrote:...Maybe I'm better off learning whatever scripting language is used in AMC ...
On the latest version (3.13) of the script you can set 'GoogleSearch' option to value '2', then the script uses quotation marks for the movie title when searching with Google. So everyone can use this easily.dwayne2005 wrote:...Now with my method of marking them in quotation marks to get exact results using Google, I'm probably getting 100% or near 100% success ...
-
- Posts: 22
- Joined: 2007-06-29 04:49:09
I'm putting it on hold until I can learn a little about the language. I would like to get arithmetic averages instead main page weighted averages from IMDB. I have a long history of deliberately avoiding the weighted scores. I'm probably the only one. I am keen to also customize several other aspects.bad4u wrote:No need to hurrydwayne2005 wrote:... Unfortunately, I suspect this may take a day or so with all the interruptions I expect...
Always a good ideadwayne2005 wrote:...Maybe I'm better off learning whatever scripting language is used in AMC ...
Many thanks.bad4u wrote:On the latest version (3.13) of the script you can set 'GoogleSearch' option to value '2', then the script uses quotation marks for the movie title when searching with Google. So everyone can use this easily.dwayne2005 wrote:...Now with my method of marking them in quotation marks to get exact results using Google, I'm probably getting 100% or near 100% success ...
-
- Posts: 22
- Joined: 2007-06-29 04:49:09
IMDB uses a formula for their ratings, so it's not precisely the way people vote. I find it unfair on lesser seen films, and often quite misrepresentitive. And I don't believe the aim -- to stop people from engineering the results -- is equal to the consequence of misrepresentation for films that I believe deserve every bit of their actual vote.bad4u wrote:Which arithmetic averages are you talking about ? User ratings ?dwayne2005 wrote:I would like to get arithmetic averages instead main page weighted averages from IMDB.
On the main page of Somewhere in Time it scores a 6.9 IMDB weighted average, but here, under the breakdown, you can see the arithmetic average is 7.8.
http://imdb.com/title/tt0081534/ratings
The difference for very little known films can be as much as a real average of close to 8 and a weighted in the 4 range. I spent an hour at trying to redo the rating values in your script, but failed miserably.
Edit: the other main change I would have liked to have done was to include full cast and crew for the `actors' field. I'd like to be able to search through various collaborators such as composer to uncover the results for that particular name. I was also a bit curious about whether I would like to see Memorable Quotes as well. When I export to HTML, I like to keep the description field simple. So I move all the other details apart from tag and summary to the comment field (as your script allows). I jettison User Comments as I believe they're too biased and the accumalated votes of >hundreds of voters is of more worth to me. The comments field becomes: Trivia / Awards / and hopefully Memorable Quotes, at least I'd like to give that a try if only I can figure it out.
That's simple. Go to the "Rating" part of the script and changedwayne2005 wrote:I spent an hour at trying to redo the rating values in your script, but failed miserably.
Code: Select all
Value := TextBetween(PageText, '<b>User Rating:</b>', '<br/>');
Value := TextBetween(Value, '<b>', '/');
Code: Select all
Value := GetPage(MovieURL + '/ratings');
Value := TextBetween(Value, 'Arithmetic mean = ', '. ');
-
- Posts: 22
- Joined: 2007-06-29 04:49:09
It looked so much like that! I got the TextBetween bit right. I added the GetPage bit (copied and pasted from the trivia section; added /ratings) but maybe that's a bit different and that's why it didn't work.bad4u wrote:That's simple. Go to the "Rating" part of the script and changedwayne2005 wrote:I spent an hour at trying to redo the rating values in your script, but failed miserably.
toCode: Select all
Value := TextBetween(PageText, '<b>User Rating:</b>', '<br/>'); Value := TextBetween(Value, '<b>', '/');
Now it loads the additional ratings page into 'Value' and gets the "Arithmetic mean" instead of IMDB's user ratingsCode: Select all
Value := GetPage(MovieURL + '/ratings'); Value := TextBetween(Value, 'Arithmetic mean = ', '. ');
I'm happy, time to start batch downloading official.
-
- Posts: 22
- Joined: 2007-06-29 04:49:09
I downloaded the 884 listed in my old DVD Profiler profile (I still have to add 87 manually), 79 have failed to locate using the exact search criteria. I did encounter a few wayward titles, but the exact quotation search at least made it obvious what they were. The only exception was when I found a Crash of the Titans film, not knowing I somehow downloaded a gay porn into my DVD Profiler account. Must've have had something to do with Clash of the Titans, but anyway...
Took a lot longer than what I expected. I didn't try the bulk with IMDB regular search again, but from past experience, it wouldn't have workable to me with replacing titles with something perplexing. I did have `socket' and connection errors with the Google search, not sure if I would have experienced them with the IMDB search.
Took a lot longer than what I expected. I didn't try the bulk with IMDB regular search again, but from past experience, it wouldn't have workable to me with replacing titles with something perplexing. I did have `socket' and connection errors with the Google search, not sure if I would have experienced them with the IMDB search.
-
- Posts: 22
- Joined: 2007-06-29 04:49:09
HTTP/1.1 404 Not Found
HTTP/1.1 400 Bad Request
A rare occurence, but it's due Google finding the entry to another page, like /movieconnections.
But there is another page, also, with a URL of /maindetails [EDIT: sorry, don't know why I put /usercomments ] and it appears identical to the non-/maindetails page and seems to be giving the exact same error.
It's a minor hassle, but these little things interupt the flow of the search in batch mode. Maybe rather than excluding pages with -movieconnections etc, maybe having a single identifiable text (exact quotation) taking you to the main page every time or returning nothing.
Eg. Try Wallace and Gromit: The Curse of the Were-Rabbit with Google set to 2 and batch mode off. Select the second entry. Replicating it on Google, I find the page has a /maindetails branch. I'm not sure why the non-maindetails page isn't picking up that quoted text. That wouldn't error on batch processing, but I'm sure I encountered the above error numerous times while batch downloading. Not sure which titles were affected.
Just getting rid of the -movieconnections -usercomments and adding maindetails and fixing up the page differences might make it perfect (does every film have a /maindetails and is it in sync with the other one, and why does the normal one miss the bit contained on the ordinary page using Google; is the normal page better, and what identifiable text exists on the non-/maindetails pages)?
http://tinyurl.com/247mep (of course, I meant site:imdb.com/title ... EDIT: of course, I meant site:www.imdb.com/title ... otherwise, it confuses with pro and other imdb.com/title in some instances http://tinyurl.com/3a9dta)
http://tinyurl.com/yt92hg
EDIT: The difference in results between /maindetails and normal pages appears to be the Google cache. The /maindetails page is cached 1 day before the other, allowing for a different user comment.
HTTP/1.1 400 Bad Request
A rare occurence, but it's due Google finding the entry to another page, like /movieconnections.
But there is another page, also, with a URL of /maindetails [EDIT: sorry, don't know why I put /usercomments ] and it appears identical to the non-/maindetails page and seems to be giving the exact same error.
It's a minor hassle, but these little things interupt the flow of the search in batch mode. Maybe rather than excluding pages with -movieconnections etc, maybe having a single identifiable text (exact quotation) taking you to the main page every time or returning nothing.
Eg. Try Wallace and Gromit: The Curse of the Were-Rabbit with Google set to 2 and batch mode off. Select the second entry. Replicating it on Google, I find the page has a /maindetails branch. I'm not sure why the non-maindetails page isn't picking up that quoted text. That wouldn't error on batch processing, but I'm sure I encountered the above error numerous times while batch downloading. Not sure which titles were affected.
Just getting rid of the -movieconnections -usercomments and adding maindetails and fixing up the page differences might make it perfect (does every film have a /maindetails and is it in sync with the other one, and why does the normal one miss the bit contained on the ordinary page using Google; is the normal page better, and what identifiable text exists on the non-/maindetails pages)?
http://tinyurl.com/247mep (of course, I meant site:imdb.com/title ... EDIT: of course, I meant site:www.imdb.com/title ... otherwise, it confuses with pro and other imdb.com/title in some instances http://tinyurl.com/3a9dta)
http://tinyurl.com/yt92hg
EDIT: The difference in results between /maindetails and normal pages appears to be the Google cache. The /maindetails page is cached 1 day before the other, allowing for a different user comment.
That is not a problem of the corresponding page itself, especially not if it is the /maindetails page. It is just because of the fact, that some other parts of the script need the 'standard' URL of the film to find the correct data (f.e. when /combined is used). When Google returns a /movieconnections, /maindetails, /news etc. (there are many others available) as a result page, the URL needs to be cleaned to point to the standard page.dwayne2005 wrote:HTTP/1.1 404 Not Found
HTTP/1.1 400 Bad Request
A rare occurence, but it's due Google finding the entry to another page, like /movieconnections.
Every film has a /maindetail page. I added the 'maindetails' keyword to Google's search function so that only the /maindetails pages are shown on the result list. When a film is picked, the /maindetail is deleted from the address again, so that the standard page will be loaded. It seems to be working fine this way.dwayne2005 wrote:Just getting rid of the -movieconnections -usercomments and adding maindetails and fixing up the page differences might make it perfect (does every film have a /maindetails and is it in sync with the other one, and why does the normal one miss the bit contained on the ordinary page using Google; is the normal page better, and what identifiable text exists on the non-/maindetails pages)?
BUT.. after testing for a while, I don't think Google is the perfect search engine for IMDB when using batch mode. Why ? Because of its own search algorithms and its very special way of sorting the results page. Try some films like "Superman" or "Batman" - Google will always show more popular results like "Superman Returns" or "Batman begins" on first position instead of the original films (while IMDB often delivers correct results). It doesn't seem to be possible to search for exact titles with Google, only for an exact match within a title
I think the best possible solution for batch mode would be an IMDB AKA titles search (like http://www.imdb.com/find?q=Superman;s=tt;site=aka) to find ALL titles, even these used in other countries, and then search the page for the first exact match for the title via comparison. This should work fine for at least 99%, the only problem would be if there are identical names for different films (like 'Superman' from 1941 and 1996). Maybe I will modify the original IMDB script to a more precise 'batch mode only' version when I find time for it. This way it finds even german titles like "Hart auf Sendung" http://www.imdb.com/find?q=Hart+auf+Sen ... t;site=aka (= Pump up the volume), that is neither found with Google nor with any other search function on us.imdb.com.
So what about Google now ? It's still useful when running script in non-batch mode, and it's very nice to have some kind of 'backup solution' for the original IMDB search function if IMDB should change the design of its result pages again. Google should still work then and link to the correct film details pages, until the IMDB script has been updated ...
@Antoine: The latest version of the script (3.14) could be downloaded from http://www.bad4u.741.com/IMDB.ifs (copy the link into a new browser window). Maybe you should have a short look on the code and decide if you want the Google option in the original IMDB script. I only added the "procedure AnalyzeGooglesResultsPage" and the "GetOption('GoogleSearch')" in the beginning of the program (at the bottom of the script). I think it makes sense, especially as it is only selectable via option and not set by standard, but it should be your decision. Maybe you'd prefer if it is released as a separate script, else it could be uploaded to server
Too stupid
I uploaded a previous, not fully completed file version (with the "http/1.1 404 Not Found" error when using batch mode with Google). Correct version uploaded now. Please delete the wrong script from your server asap.
I've got more than a dozen of different versions from testing & changing on my script's directory, seems I should take a minute now and clean it up.
Sorry
I uploaded a previous, not fully completed file version (with the "http/1.1 404 Not Found" error when using batch mode with Google). Correct version uploaded now. Please delete the wrong script from your server asap.
I've got more than a dozen of different versions from testing & changing on my script's directory, seems I should take a minute now and clean it up.
Sorry