Page 1 of 2

Culturalia+IMDB (Batch) Script, Latest Version

Posted: 2003-09-23 17:03:04
by folgui
Hello!

New script to import several movies in batch mode, mixing culturalia+imdb.

It's a first release that works Ok, i think. Try it and tell me any bugs and
enchancements.

THE LATEST SCRIPT:

Code: Select all

// SCRIPTING
// Culturalia+IMDB (Batch) v1.0

(***************************************************
 *  Script merged by Jose Miguel Folgueira, based  *
 *  on a similar script merged by Antoine Potten   *
 *                                                 *
 *  Movie importation script for:                  *
 *    Culturalia, http://www.culturalianet.com     *
 *                                                 *
 *  Original version made by David Arenillas       *
 *  New version made by Antoine Potten             *
 *  Contributors:                                  *
 *    Jose Miguel Folgueira                        *
 *    RedDwarf                                     *
 *    Hades666                                     *
 *    KaBeCi                                       *
 *    PolloPolea                                   *
 *    Moises Déniz                                 *
 *    Val                                          *
 *                                                 *
 *  Thanks to Culturalia's webmaster for his help  *
 *  and for providing more direct access to his    *
 *  database                                       *
 *                                                 *
 *  Movie importation script for:                  *
 *      IMDB (US), http://us.imdb.com              *
 *                                                 *
 *  (c) 2002-2004 Antoine Potten                   *
 *                          software@antp.be       *
*                                                  * 
*  TransformTitle function from IMDB (US) Script   *
*  (c) 2002-2004 Antoine Potten                    *
*                          software@antp.be        *
*                                                  *
 *  Contributors :                                 *
 *    Danny Falkov                                 *
 *    Kai Blankenhorn                              *
 *    lboregard                                    *
 *    Ork <ork@everydayangels.net>                 *
 *    Trekkie <Asimov@hotmail.com>                 *
 *    Youri Heijnen                                *
 *                                                 *
 *  For use with Ant Movie Catalog 3.4.x           *
 *  www.antp.be/software/moviecatalog              *
 *                                                 *
 *  This program is free software; you can         *
 *  redistribute it and/or modify it under the     *
 *  terms of the GNU General Public License as     *
 *  published by the Free Software Foundation;     *
 *  either version 2 of the License, or (at your   *
 *  option) any later version.                     *
 *                                                 *
 ***************************************************)

program Culturalia_IMDb_Batch;
const
  BaseURLCulturalia = 'http://www.culturalianet.com/bus/catalogo.php';
  UseLongestDescIMDB = False; // If set to False shortest description available will be imported, faster since taken from main page

  // Set the following constants to True to import field, or False to skip field (fiels to import from IMDB). By default, only the fields not available at  Culturalia are set to True.
  // Pon las siguientes constantes a True para importar campo o False para no hacerlo (campos a importar de IMDB). Por defecto, sólo los campos no disponibles en Culturalia están a True.
  ImportActors = False;
  ImportCategory = False;
  ImportComments = False;
  ImportCountry = False;
  ImportDescription = False;
  ImportDirector = False;
  ImportLength = True;
  ImportLanguage = False;
  ImportOriginalTitle = False;
  ImportTranslatedTitle = False;
  LeaveOriginalTitle = True; // True will get Translated Title, yet Original Title field will remain same
  ImportPicture = False;
  ImportLargePicture = False; // If set to False small pic will be imported
  ImportRating = True;
  ImportURL = False;
  ImportYear = False;

  TitleMixedCase = False; // If true, each letter of each word of title begins with Uppercase. If false, the script transforms the titles in lowercase except first word

  ExternalPictures = False;
    { True: Pictures will be stored as external files in the folder of the
            catalog
      False: Pictures will be stored inside the catalog (only for .amc files) }
  ManualPictureSelect = False;
    { True: If no Title Match found a picture selection window appears
      False: Revert to IMDB picture }

  // Donde vamos a buscar en el caso de no haber introducido ni el título original ni el traducido en los campos correspondientes, sino en la ventana que 
  // se abre y nos lo solicita.
  // 1-Titulo traducido, 2-Titulo original, 3-General
  // What type of search in case we don't write the original title nor traslated one in the corresponding fields, we do it in the input box.
  // 1-Translated title, 2-Original Title, 3-General
  defdonde='1';

var
  MovieName, Titulo: string;
  MovieURL: string;
  tmp: string;
  donde: string; 
  Articles: array of string;
  Index: Integer;

function FindLine(Pattern: string; List: TStringList; StartAt: Integer): Integer;
var
  i: Integer;
begin
  result := -1;
  if StartAt < 0 then
    StartAt := 0;
  for i := StartAt to List.Count-1 do
    if Pos(Pattern, List.GetString(i)) <> 0 then
    begin
      result := i;
      Break;
    end;
end;

procedure AnalyzePageIMDB(Address: string);
var
  Page: TStringList;
  LineNr: Integer;
  MovieURL: string;
begin
  Page := TStringList.Create;
  Page.Text := GetPage(Address);
  if pos('<title>IMDb', Page.Text) = 0 then
  begin
    AnalyzeMoviePageIMDB(Page)
  end
  else
  begin
    MovieURL := AddMoviesTitles(Page, '<b>Exact Matches</b>');
    if MovieURL = '' then
      MovieURL := AddMoviesTitles(Page, '<b>Partial Matches</b>');
    if MovieURL = '' then
      MovieURL := AddMoviesTitles(Page, '<b>Approximate Matches</b>');
    if MovieURL <> '' then
      AnalyzePageIMDB(MovieURL);
  end;
  Page.Free;
end;

function FindValue(BeginTag, EndTag: string; Page: TStringList; var LineNr: Integer; var Line: string): string;
var
  BeginPos, EndPos: Integer;
  Value: string;
begin
  Result := '';
  Value := '';
  BeginPos := Pos(BeginTag, Line);
  if BeginPos > 0 then
  begin
    BeginPos := BeginPos + Length(BeginTag);
    if BeginTag = EndTag then
    begin
      Delete(Line,1,BeginPos-1);
      BeginPos := 1;
    end;
    EndPos := pos(EndTag, Line);
    while ((EndPos = 0) and (LineNr < Page.Count-1 )) do
    begin
      Value := Value + copy(Line, BeginPos, Length(Line) - BeginPos);
      // Next Line
      BeginPos := 1;
      LineNr := LineNr + 1;
      Line := Page.GetString(LineNr);
      if Value = '' then
        Exit;
      EndPos := Pos(EndTag, Line);
    end;
    Value := Value + copy(Line, BeginPos, EndPos - BeginPos);
   end;
  Result := Value;
end;

procedure AnalyzeMoviePageIMDB(Page: TStringList);
var
  Line, Value, Value2, FullValue, OldOriginalTitle: string;
  LineNr, Desc, i: Integer;
  BeginPos, EndPos: Integer;
  OldTitleParts, AllTitles: TStringList;
  LongDescr: Boolean;
begin
  LongDescr := UseLongestDescIMDB;
  if (LongDescr) and (Pos('<a href="plotsummary">', Page.Text) = 0) then
    LongDescr := False;

  MovieURL := 'http://imdb.com/title/tt' + Copy(Page.Text, Pos('?pending&add=', Page.Text) + 17, 7);

  // URL
  if ImportURL then
    SetField(fieldURL, MovieURL);

  AllTitles := TStringList.Create;

  // Original Title & Year
  if (ImportOriginalTitle) or (ImportYear) then
  begin
    LineNr := FindLine('<title>', Page, 0);
    Line := Page.GetString(LineNr);
    if LineNr > -1 then
    begin
      BeginPos := pos('<title>', Line);
      if BeginPos > 0 then
        BeginPos := BeginPos + 7;
      EndPos := pos('(', Line);
      if EndPos = 0 then
        EndPos := Length(Line);
      Value := copy(Line, BeginPos, EndPos - BeginPos - 1);
      HTMLDecode(Value);
      if ImportOriginalTitle then
        OldOriginalTitle := GetField(fieldOriginalTitle);
      if (ImportTranslatedTitle) and not (LeaveOriginalTitle) then
        SetField(fieldOriginalTitle, Value);
      BeginPos := pos('(', Line) + 1;
      if BeginPos > 0 then
      begin
        EndPos := Pos('/I', Line);
        if EndPos < BeginPos then
          EndPos := pos(')', Line);
        Value := copy(Line, BeginPos, EndPos - BeginPos);
        if ImportYear then
          SetField(fieldYear, Value);
      end;
    end;
  end;

  // Translated Title
  if ImportTranslatedTitle then
  begin
    OldTitleParts := TStringList.Create;
    // Tokenize OldOriginalTitle while removing certain chars/common words ("the", "of")
    Value := AnsiUpperCase(OldOriginalTitle);
    Value := StringReplace(StringReplace(Value, ',', ' '), ':', ' ');
    Value := StringReplace(StringReplace(Value, '(', ' '), ')', ' ');
    Value := StringReplace(StringReplace(Value, 'OF', ' '), 'THE', ' ');
    repeat
      Value := StringReplace(Value, '  ', ' ');
    until Pos('  ', Value) = 0;
    Value := StringReplace(Trim(Value), ' ', ',');
    // Value now contains the original title (comma-separated) that was filled in before running the script
    Value2 := '';
    for i := 1 to Length(Value) do
    begin
      if Pos(',', Copy(Value, i, 1)) = 0 then
        Value2 := Value2 + Copy(Value, i, 1);
      if (Pos(',', Copy(Value, i, 1)) = 1) or (i = Length(Value)) then
      begin
        OldTitleParts.Add(Value2); // put each comma-separated value from Value into a separate string in TitleParts
        Value2 := '';
      end;
    end;
    for i := 0 to OldTitleParts.Count - 1 do
    // Begin comparing title parts (from the title originally filled in by moviedb owner) with
    // the 'true' Original Title (extracted from IMDb) to see if it's a foreign title and needs a Translated Title
    begin
      if Pos(OldTitleParts.GetString(i), AnsiUpperCase(GetField(fieldOriginalTitle))) <= 0 then
      begin // no match, must be a foreign title
        LineNr := FindLine('Also Known As', Page, 0);
        if LineNr > -1 then
        begin
          Line := Page.GetString(LineNr);
          if Pos('Also Known As', Line) > 0 then
          begin
            BeginPos := Pos('Also Known As', Line) + 26;
            Value := Copy(Line, BeginPos, Length(Line) - BeginPos - 4);
            Value := StringReplace(Value, '<br>', '/ ');
            HTMLDecode(Value);
            SetField(fieldTranslatedTitle, Trim(Value));
          end;
        end;
        Break;
      end;
    end;
    OldTitleParts.Free;
  end;

  // Rating
  if ImportRating then
  begin
    LineNr := FindLine('User Rating:', Page, 0);
    if LineNr > -1 then
    begin
      Line := Page.GetString(LineNr + 4);
      if Pos('/10', Line) > 0 then
      begin
        BeginPos := pos('<b>', Line) + 3;
        Value := IntToStr(Round(StrToInt(StrGet(Line, BeginPos), 0) + (StrToInt(StrGet(Line, BeginPos + 2), 0) / 10)));
        SetField(fieldRating, Value);
      end;
    end;
  end;

  // Language
  LineNr := FindLine('Language:', Page, 0);
  if LineNr > -1 then
  begin
    Line := Page.GetString(LineNr + 1);
    BeginPos := pos('/">', Line) + 3;
    EndPos := pos('</a>', Line);
    if EndPos = 0 then
      EndPos := Length(Line);
    Value := copy(Line, BeginPos, EndPos - BeginPos);
    if ImportLanguage then
      SetField(fieldLanguages, Value);
  end;

  if ImportPicture then
    GetMoviePicture(Value, Page, AllTitles);
  AllTitles.Free;

  // Director
  if ImportDirector then
  begin
    LineNr := FindLine('Directed by', Page, 0);
    if LineNr > -1 then
    begin
      FullValue := '';
      Line := Page.GetString(LineNr + 1);
      repeat
        BeginPos := pos('">', Line) + 2;
        EndPos := pos('</a>', Line);
        Value := copy(Line, BeginPos, EndPos - BeginPos);
        if (Value <> '(more)') and (Value <> '') then
        begin
          if FullValue <> '' then
            FullValue := FullValue + ', ';
          FullValue := FullValue + Value;
        end;
        Delete(Line, 1, EndPos);
      until Pos('</a>', Line) = 0;
      HTMLDecode(FullValue);
      SetField(fieldDirector, FullValue);
    end;
  end;

  // Actors
  if ImportActors then
  begin
    LineNr := FindLine('ast overview', Page, 0);
    if LineNr = -1 then
      LineNr := FindLine('redited cast', Page, 0);
    if LineNr > -1 then
    begin
      FullValue := '';
      Line := Page.GetString(LineNr);
      repeat
        BeginPos := Pos('<td valign="top">', Line);
        if BeginPos > 0 then
        begin
          Delete(Line, 1, BeginPos);
          Line := copy(Line, 25, Length(Line));
          BeginPos := pos('">', Line) + 2;
          EndPos := pos('</a>', Line);
          if EndPos = 0 then
            EndPos := Pos('</td>', Line);
          Value := copy(Line, BeginPos, EndPos - BeginPos);
          if (Value <> '(more)') and (Value <> '') then
          begin
            BeginPos := pos('.... </td><td valign="top">', Line);
            if BeginPos > 0 then
            begin
              EndPos := pos('</td></tr>', Line);
              BeginPos := BeginPos + 27;
              Value2 := copy(Line, BeginPos, EndPos - BeginPos);
              if Value2 <> '' then
              begin
                Value := Value + ' (as ' + Value2 + ')';
              end;
            end;
            if FullValue <> '' then
              FullValue := FullValue + ', ';
            FullValue := FullValue + Value;
          end;
          EndPos := Pos('</td></tr>', Line);
          Delete(Line, 1, EndPos);
        end else
        begin
          Line := '';
        end;
      until Line = '';
      HTMLDecode(FullValue);
      SetField(fieldActors, FullValue);
    end;
  end;

  // Country
  if ImportCountry then
  begin
    LineNr := FindLine('Country:', Page, 0);
    if LineNr > -1 then
    begin
      Line := Page.GetString(LineNr + 1);
      BeginPos := pos('/">', Line) + 3;
      EndPos := pos('</a>', Line);
      Value := copy(Line, BeginPos, EndPos - BeginPos);
      HTMLDecode(Value);
      SetField(fieldCountry, Value);
    end;
  end;

  // Category
  if ImportCategory then
  begin
    LineNr := FindLine('Genre:', Page, 0);
    if LineNr > -1 then
    begin
      Line := Page.GetString(LineNr + 1);
      BeginPos := pos('/">', Line) + 3;
      EndPos := pos('</a>', Line);
      Value := copy(Line, BeginPos, EndPos - BeginPos);
      HTMLDecode(Value);
      SetField(fieldCategory, Value);
    end;
  end;

  //Description
  if ImportDescription then
  begin
    LineNr := FindLine('Plot Summary:', Page, 0);
    if LineNr < 1 then
      LineNr := FindLine('Plot Outline:', Page, 0);
    if LineNr > -1 then
    begin
      Line := Page.GetString(LineNr);
      BeginPos := pos('</b>', Line) + 5;
      EndPos := pos('<a href="/rg/', Line);
      if EndPos < 1 then
      begin
        Line := Line + Page.GetString(LineNr+1);
        EndPos := pos('<a href="/rg/', Line);
        if EndPos < 1 then
          EndPos := pos('<br><br>', Line);
        if EndPos < 1 then
          EndPos := Length(Line);
      end;
      Value := copy(Line, BeginPos, EndPos - BeginPos);
      HTMLDecode(Value);
      HTMLRemoveTags(Value);
      if UseLongestDescIMDB then
        SetField(fieldDescription, GetDescriptions(MovieURL + 'plotsummary'))
      else
        SetField(fieldDescription, Value);
    end;
  end;

  // Comments
  if ImportComments then
  begin
    LineNr := FindLine('<b>Summary:</b>', Page, 0);
    if LineNr > -1 then
    begin
      Value := '';
      repeat
        LineNr := LineNr + 1;
        Line := Page.GetString(LineNr);
        EndPos := Pos('</blockquote>', Line);
        if EndPos = 0 then
          EndPos := Length(Line)
        else
          EndPos := EndPos - 1;
        Value := Value + Copy(Line, 1, EndPos) + ' ';
      until Pos('</blockquote>', Line) > 0;
      HTMLDecode(Value);
      Value := StringReplace(Value, '<br>', #13#10);
      Value := StringReplace(Value, #13#10+' ', #13#10);
      SetField(fieldComments, Value);
    end;
  end;

  // Length
  if ImportLength then
  begin
    LineNr := FindLine('Runtime:', Page, 0);
    if LineNr > -1 then
    begin
      Line := Page.GetString(LineNr + 1);
      EndPos := pos(' min', Line);
      if EndPos = 0 then
        EndPos := pos('  /', Line);
      if EndPos = 0 then
        EndPos := Length(Line);
      if Pos(':', Line) < EndPos then
        BeginPos := Pos(':', Line) + 1
      else
        BeginPos := 1;
      Value := copy(Line, BeginPos, EndPos - BeginPos);
      SetField(fieldLength, Value);
    end;
  end;

  DisplayResults;
end; 

procedure GetMoviePicture(Language: string; Page, AllTitles: TStringList); 
var
  Line, Value, Value2, Aka, PictureAddress: string;
  AmazonPage: TStringList;
  FoundOnAmazon, PickTreeSelected, PictureAvailable: Boolean;
  TitleRef, ImgRef, NoImage: string;
  LineNr, BeginPos, EndPos, PickTreeCount, ParagraphIndex, Index, TitleLine, LastMatch: Integer;
begin
  FoundOnAmazon := False;

  if ImportLargePicture then
  begin
    // Find Alternate Titles for Movies which are not in English
    Aka := '';
    if Language <> 'English' Then
    begin
      LineNr:= FindLine('Also Known As',Page,0);
      EndPos:=0;
      if LineNr > -1 then
      begin
        Line := Page.GetString(LineNr);
        repeat
          Aka:=FindValue('<br>','<br>',Page,LineNr,Line);
          if Aka <> '' then
          begin
            BeginPos:=1;
            EndPos:=Pos('(',Line);
            if EndPos = 0 then
              EndPos := Length(Aka);
            Value := copy(Aka, BeginPos, EndPos - BeginPos - 1);
            Value:=TransFormIMDBTitle(Value);
            AllTitles.Add(Value);
          end;
        until (Pos('</td>', Line) > 0) or (Pos('Runtime', Line) > 0) or (Pos('MPAA', Line) > 0 ) or (Pos('Country', Line) > 0) or (Pos('Certification', Line) > 0);
      end;
    end;

    TitleRef:='dvd>';
    ImgRef:='dvd><img';
    NoImage:='/icons/dvd-no-image.gif';
    LineNr := FindLine('title="DVD available at Amazon.com"', Page, 0);
    if LineNr = -1 then
    begin
      LineNr := FindLine('title="VHS available at Amazon.com"', Page, 0);
      if LineNr > -1 then
      begin
        TitleRef:='video>';
        ImgRef:='video><img';
        NoImage:='/icons/video-no-image.gif';
      end;
    end;

    if LineNr > -1 then
    begin
      Line := Page.GetString(LineNr);
      if(TitleRef='dvd>') then
      begin
        EndPos := pos('title="DVD', Line);
        BeginPos := pos('title="VHS', Line);
        while (BeginPos > 0) and (BeginPos<EndPos) do
        begin
          Delete(Line, 1, BeginPos+1);
          BeginPos := pos('title="VHS', Line);
        end;
      end;
      BeginPos := Pos('href="', Line) + 5;
      Delete(Line, 1, BeginPos);
      EndPos := Pos('"', Line);
      Value := Copy(Line, 1, EndPos - 1);
      AmazonPage := TStringList.Create;
      AmazonPage.Text := GetPage('http://us.imdb.com' + Value);

      // Original Title
      Value2 := AllTitles.GetString(0);
      Value2 := TransFormIMDBTitle(Value2);
  
      PickTreeClear;
      PickTreeCount := 0;
      PickTreeAdd('Available Titles for matching a picture to: ' + Value2, '');
  
      ParagraphIndex := 1;
      LineNr := 0;
      LastMatch := -1;
      TitleLine := -1;
      repeat
        LineNr := FindLine('<b>'+IntToStr(ParagraphIndex)+'.', AmazonPage, LineNr);
  
        if LineNr > -1 then
        begin
          TitleLine:=LineNr;
          Value:='';
          PictureAvailable:=False;
          repeat
            TitleLine:=TitleLine +1;
            Line:= AmazonPage.GetString(TitleLine);
            BeginPos:=0;
            if Pos(TitleRef,Line) > 0 then
            begin
              if Pos(ImgRef,Line) = 0 then
              begin
                for Index:=0 to AllTitles.Count -1 do
                begin
                  Value2:=AllTitles.GetString(Index);
                  BeginPos:=Pos(Value2,Line);
                  if BeginPos > 0 then
                    Break;
                end;
                // Match not found
                if BeginPos = 0 then
                begin
                  BeginPos:=Pos(TitleRef,Line)+Length(TitleRef);
                  EndPos:=Pos('</a>',Line);
                  Value:=Copy(Line,BeginPos,EndPos-BeginPos);
                end;
              end
              else
              begin
                PictureAvailable:=(Pos(NoImage,Line) = 0);
                PictureAddress:=IntToStr(TitleLine);
              end;
            end;
            if BeginPos > 0 then
              Break;
          until (Pos('</table>',Line ) > 0);

          // Try to Find a Title Match
          if Pos(Value2,Line) > 0 then
          begin
            // Compare Current Title to Original
            BeginPos := Pos(TitleRef, Line) + Length(TitleRef) -1;
            Delete(Line, 1, BeginPos);
            EndPos:= Pos('(',Line);
            if EndPos = 0 Then
              EndPos := Pos('</a>', Line);
            Value := Copy(Line, 1, EndPos - 1);
            Value:= Trim(Value);
            if Value = Value2 then
            begin
              if PictureAvailable then
                LastMatch:=LineNr;
                //Break
            end;
          end;
          if PictureAvailable then
          begin
            PickTreeAdd(Value,PictureAddress);
            PickTreeCount:=PickTreeCount+1;
          end;
        end;
        ParagraphIndex:=ParagraphIndex+1;
      until (LineNr = -1);
      LineNr:=LastMatch;
      if (LineNr = -1) then
      begin
        // Handle Amazon Page Redirection(s)
        LineNr:= FindLine('You clicked on this item',AmazonPage,0);
        if (LineNr = -1) then
          LineNr:=FindLine('Customers who bought',AmazonPage,0);
        // Display the Picture Selection Window
        if (LineNr = -1) and ManualPictureSelect and (PickTreeCount > 0) then
        begin
          PickTreeSelected:=PickTreeExec(PictureAddress);
          if PickTreeSelected then
            LineNr:=StrToInt(PictureAddress,0);
        end;
        if (LineNr > -1 ) then
        begin
          LineNr := FindLine('src="http://images.amazon.com/images/P/',AmazonPage, LineNr);
          if not PickTreeSelected then
            TitleLine:= FindLine('/exec/obidos/ASIN/',AmazonPage, 0);
          if (LineNr > TitleLine) then
            LineNr:=-1;
          if LineNr > -1 then
          begin
            Line := AmazonPage.GetString(LineNr);
            BeginPos := Pos('src="http://images.amazon.com/images/P/', Line) + 4;
            Delete(Line, 1, BeginPos);
            EndPos := Pos('"', Line);
            Value := Copy(Line, 1, EndPos - 1);
            Value := StringReplace(Value, 'TZZZZZZZ', 'LZZZZZZZ');
            Value := StringReplace(Value, 'THUMBZZZ', 'LZZZZZZZ');
            GetPicture(Value, ExternalPictures);
            FoundOnAmazon := True;
          end;
        end;
      end
      else
      begin
        LineNr := FindLine('http://images.amazon.com/images/P/', AmazonPage, LineNr);
        if LineNr < TitleLine then
        begin
          Line := AmazonPage.GetString(LineNr);
          BeginPos := Pos('src="', Line) + 4;
          Delete(Line, 1, BeginPos);
          EndPos := Pos('"', Line);
          Value := Copy(Line, 1, EndPos - 1);
          Value := StringReplace(Value, 'THUMBZZZ', 'LZZZZZZZ');
          GetPicture(Value, ExternalPictures);
          FoundOnAmazon := True;
        end;
      end;
      AmazonPage.Free;
    end;
  end; // if ImportLargePicture

  if not FoundOnAmazon then
  begin
    {  not found on Amazon, so taking what's available directly on IMDB.  }
    LineNr := FindLine('<img border="0" alt="cover"', Page, 0);
    if LineNr > -1 then
    begin
      Line := Page.GetString(LineNr);
      BeginPos := pos('src="', Line) + 4;
      Delete(Line, 1, BeginPos);
      EndPos := pos('"', Line);
      Value := copy(Line, 1, EndPos - 1);
      GetPicture(Value, ExternalPictures);
    end;
  end;
end;

function TransformTitle(Title: string): string;
var
  BeginPos, EndPos: Integer;
  Value: string;
  Words: array of string;
  Articles: array of string;
  Replace,Original: string;
  Index, CommaCount: Integer;
Begin
  // Original Title
  Result:=Title;

  Setarraylength(Words,11);
  Words[0]:=' In ';
  Words[1]:=' On ';
  Words[2]:=' Of ';
  Words[3]:=' As ';
  Words[4]:=' The ';
  Words[5]:=' At ';
  Words[6]:=' And A ';
  Words[7]:=' And ';
  Words[8]:=' An ';
  Words[9]:=' To ';
  Words[10]:=' For ';

  SetArrayLength(Articles,35);
  Articles[0]:=' The';
  Articles[1]:=' a';
  Articles[2]:=' An';
  Articles[3]:=' Le';
  Articles[4]:=' L''';
  Articles[5]:=' Les';
  Articles[6]:=' Der';
  Articles[7]:=' Das';
  Articles[8]:=' Die';
  Articles[9]:=' Des';
  Articles[10]:=' Dem';
  Articles[11]:=' Den';
  Articles[12]:=' Ein';
  Articles[13]:=' Eine';
  Articles[14]:=' Einen';
  Articles[15]:=' Einer';
  Articles[16]:=' Eines';
  Articles[17]:=' Einem';
  Articles[18]:=' Il';
  Articles[19]:=' Lo';
  Articles[20]:=' La';
  Articles[21]:=' I';
  Articles[22]:=' Gli';
  Articles[23]:=' Le';
  Articles[24]:=' Uno';
  Articles[25]:=' Una';
  Articles[26]:=' Un''';
  Articles[27]:=' O';
  Articles[28]:=' Os';
  Articles[29]:=' As';
  Articles[30]:=' El';
  Articles[31]:=' Los';
  Articles[32]:=' Las';
  Articles[33]:=' Unos';
  Articles[34]:=' Unas';

  // Count the Comma in The Title
  CommaCount := 0;
  EndPos := 0;
  Value := Title;
  repeat
     BeginPos := Pos(',', Value);
     if BeginPos > 0 then
     begin
       Delete(Value, 1, BeginPos);
       CommaCount := CommaCount + 1;
       EndPos := EndPos + BeginPos;
     end;
  until( Pos(',',Value) = 0);

  // Compare the Article to a list of known ones
  for Index := 0 to 34 do
  begin
    if Pos(Articles[Index], Value) <> 0 then
    begin
       CommaCount := 1;
       BeginPos := EndPos;
       Break;
    end;
  end;

  if (BeginPos > 0) and (CommaCount = 1) then
  begin
    Value := Copy(Title, BeginPos + 1, Length(Title));
    Value := Trim(Value);
    Result := Value + ' ' + Copy(Title, 1, BeginPos - 1);
  end;

  BeginPos := Pos(': ', Result);
  if BeginPos > 0 then
    Result := StringReplace(Result, ': ', ' - ');

  Result := AnsiMixedCase(Result, ' ');

  for Index := 0 to 10 do
  begin
    if Pos(Words[Index],Result) <> 0 then
    begin
      Original := Words[Index];
      Replace := AnsiLowerCase(Original);
      Result := StringReplace(Result, Original, Replace);
    end;
  end;

  Result := StringReplace(Result, ' - the ', ' - The ');
  Result := Trim(Result);
end;

function GetDescriptions(Address: string): string;
var
  Line, Value: string;
  LineNr: Integer;
  BeginPos, EndPos,Longest: Integer;
  Page: TStringList;
begin
  Result := '';
  Longest := 0;
  Page := TStringList.Create;
  Page.Text := GetPage(Address);
  LineNr := FindLine('<p class="plotpar">', Page, 0);
  while LineNr > -1 do
  begin
    Value := '';
    repeat
      Line := Page.GetString(LineNr);
      BeginPos := pos('"plotpar">', Line);
      if BeginPos > 0 then
        BeginPos := BeginPos + 10
      else
        BeginPos := 1;
      EndPos := pos('</p>', Line);
      if EndPos < 1 then
        EndPos := Length(Line) + 1;
      if Value <> '' then
        Value := Value + ' ';
      Value := Value + copy(Line, BeginPos, EndPos - BeginPos);
      LineNr := LineNr + 1;
    until (pos('</p>', Line) > 0) or (LineNr = Page.Count);
    HTMLDecode(Value);
    HTMLRemoveTags(Value);
    PickListAdd(Value);

    if Length(Value) > Longest then
    begin
      Result := Value;
      Longest := Length(Value);
    end;

    LineNr := FindLine('<p class="plotpar">', Page, LineNr);
  end;
  Page.Free;
end;

function AddMoviesTitles(Page: TStringList; Tag: string): string;
var
  Line: string;
  LineNr: Integer;
  StartPos: Integer;
begin
  Result := '';
  LineNr := FindLine(tag, Page, 0);
  if LineNr > -1 then
  begin
    Line := Page.GetString(LineNr);
    HTMLRemoveTags(Line);
    PickTreeAdd(Trim(Line), '');
    LineNr := LineNr + 5;
    Line := Page.GetString(LineNr);
    StartPos := pos('href="', Line) + 5;
    Delete(Line, 1, StartPos);
    Result := Copy(Line, 1, pos('">', Line) - 1);
  end;
end;

procedure AnalyzePageCulturalia(Address: string);
var
  Page: TStringList;
  LineNr: Integer;
  Code: string;
  TitleFound: Boolean;
begin
  TitleFound := False;
  Page := TStringList.Create;
  Page.Text := GetPage(Address);
  LineNr := 1;
  Page.Text := StringReplace(Page.Text, '<br>', #13#10);
  if Pos('No se ha encontrado ningún artículo por título', Page.Text) = 0 then
   begin
    TitleFound := True;
    Code := GetValueAfter(Page.GetString(LineNr), 'Codigo = ');     
    Address := (BaseURLCulturalia + '?catalogo=1&codigo=' + Code);
   end;

  if TitleFound then
    AnalyzeMoviePageCulturalia(Address);
  Page.Free;
end; 

procedure AnalyzeMoviePageCulturalia(Address: string);
var
  Page: TStringList;
  Comments: string;
  strTitle: string;
  strSinopsis: string;
  Line: string;
  LineNr: Integer;
  tmp: string;
begin
  Page := TStringList.Create;
  Page.Text := StringReplace(GetPage(Address), '<br><br>', #13#10);
  Page.Text := StringReplace(Page.Text, '<br>', #13#10);
  strTitle := GetValueAfter(Page.GetString(1), 'Titulo = ');
  if copy(strTitle, Length(strTitle), Length(strTitle)) = '.' then
  begin
    tmp := Copy(strTitle, 1, Length(strTitle) -1);
  end else
  begin
    tmp := strTitle;
  end;
  SetField(fieldTranslatedTitle, TransformTitle(tmp));
  tmp := GetValueAfter(Page.GetString(2), 'Titulo original = ');
  SetField(fieldOriginalTitle, TransformTitle(tmp));
  SetField(fieldYear, GetValueAfter(Page.GetString(3), 'Año = '));
  SetField(fieldCategory, GetValueAfter(Page.GetString(4), 'Genero = '));
  SetField(fieldCountry, GetValueAfter(Page.GetString(5), 'Nacion = '));
  SetField(fieldDirector, GetValueAfter(Page.GetString(6), 'Director = '));
  SetField(fieldActors, GetValueAfter(Page.GetString(7), 'Actores = '));
  SetField(fieldProducer, GetValueAfter(Page.GetString(8), 'Productor = '));
  Comments := 'Guión: ' + GetValueAfter(Page.GetString(9), 'Guion = ');
  Comments := Comments + #13#10 + 'Fotografía: ' + GetValueAfter(Page.GetString(10), 'Fotografia = ');
  Comments := Comments + #13#10 + 'Música: ' + GetValueAfter(Page.GetString(11), 'Musica = ');
  SetField(fieldComments, Comments);
  LineNr := FindLine('Sinopsis = ', Page, 0);
  Line := Page.GetString(LineNr);
  strSinopsis := GetValueAfter(Line, 'Sinopsis = ');
  LineNr := LineNr + 1;
  Line := Page.GetString(LineNr);
  while pos('URL = ', Line) = 0 do
  begin
    strSinopsis := strSinopsis + #13#10 + Line;
    LineNr := LineNr + 1;
    Line := Page.GetString(LineNr);
  end
  HTMLRemoveTags(strSinopsis);
  SetField(fieldDescription, StringReplace(StringReplace(strSinopsis, '"', '"'), '"', '"'));
  LineNr := FindLine('URL = ', Page, 0);
  if LineNr <> -1 then
    SetField(fieldURL, GetValueAfter(Page.GetString(LineNr), 'URL = '));
  LineNr := FindLine('Imagen = ', Page, 0);
  if LineNr <> -1 then
    GetPicture(GetValueAfter(Page.GetString(LineNr), 'Imagen = '), ExternalPictures);
  Page.Free;
end;

function GetValueAfter(Line, Identifier: string): string;
begin
  if Pos(Identifier, Line) = 1 then
    Result := Copy(Line, Length(Identifier)+1, Length(Line))
  else
    Result := '';
end;

begin
  SetArrayLength(Articles,11);
  Articles[0]:='Lo ';
  Articles[1]:='La ';
  Articles[2]:='Le ';
  Articles[3]:='Uno ';
  Articles[4]:='Una ';
  Articles[5]:='Un ';
  Articles[6]:='El ';
  Articles[7]:='Los ';
  Articles[8]:='Las ';
  Articles[9]:='Unos ';
  Articles[10]:='Unas ';

  if CheckVersion(3,4,0) then
   begin
     MovieName := GetField(fieldOriginalTitle);
     donde := '&donde=2'; 
     if MovieName = '' then
      begin
       MovieName := GetField (fieldTranslatedTitle);
       donde := '&donde=1'; 
      end
     if MovieName = '' then
      begin
       Input('Importar de Culturalia', 'Introduce el Titulo de la Pelicula:', MovieName);
       donde := '&donde=' + defdonde;
      end
     if MovieName <> '' then
       begin
        // Eliminate spanish article if exists
        for Index := 0 to 10 do
        begin
         if Pos(Articles[Index], MovieName) <> 0 then
         MovieName := copy(MovieName, length(Articles[Index]), length(MovieName));
        end;


      // Eliminate point(s) at final of MovieName before search
      tmp := MovieName;
      if Copy(tmp, Length(tmp), Length(tmp)) = '.' then
        MovieName := Copy(tmp, 1, Length(tmp) -1);
      AnalyzePageCulturalia(BaseURLCulturalia + '?catalogo=1&texto=' + UrlEncode(MovieName) + donde);
      AnalyzePageIMDB('http://us.imdb.com/Tsearch?title='+UrlEncode(GetField(fieldOriginalTitle)));
     end;
  end else
    ShowMessage('This script requires a newer version of Ant Movie Catalog (at least the version 3.4.0)');
end.
Enjoy!

Regards, folgui.

Posted: 2003-09-23 18:29:44
by antp
thanks :)

Posted: 2003-09-25 01:26:22
by KaBeCi
great job!!!! it works very well, havent seen any bug yet, but i see something we were talking before, about how culturalia (and imdb too)doesnt give the correct result for some movies. If you use the single script, you can choose the correct movie and its all right, but with the batch... you got wrong info for your movies and may be you dont realize it if you run 100+ movies at once (or you have to check one by one to see if its ok).
the problem is the results order, culturalia seems to alfabetically order'em, and imdb as "most popular", but the usefull order for us should be "the movie who has less non entered words". well, this sounds too complicated, i know, i'll explain it:
lets search for the movie "Ugly"
on imdb and culturalia, the first result is "Coyote Ugly". WTF, i wanted the info for movie "Ugly", not "Coyote Ugly"
so what i say is if we search for "Ugly" and the scripts receives this list (culturalia):

Bar Coyote, El (Coyote Ugly), 2000
Hermanastra, La (Conffessions of an Ugly Stepsister), 2002
Rabbit Ears: The Ugly Ducking. (Rabbit Ears: The Ugly Ducking), 1985
Ugly, The (The Ugly), 1997

its obvious that the correct movie is "Ugly, The", so how to automate the choosing? we've searched for "Ugly", the result "Bar Coyote, El (Coyote Ugly), 2000" has 5 words that were not in the search. The result "Hermanastra, La (Conffessions of an Ugly Stepsister), 2002" has 7 words that were not in the search... and the result "Ugly, The (The Ugly), 1997" has only 3 words that were not in the search, so that one is the correct result.

This is a common bug of ALL batch scripts, but its worst with culturalia because they alfabetically order the results, so if there are more than one, most probably you got the wrong one.

well, thats all. thanks. may be i should opened a new thread...

Posted: 2003-09-25 08:54:54
by antp
KaBeCi wrote: may be i should opened a new thread...
please, no, there is already enough threads about Culturalia :p

Posted: 2003-09-25 10:00:56
by folgui
Hello!

KaBeCi this is the "main" thread to improve this script so ATM (at the moment), no more threads.

I think it's fixed ATM! Made some changes.

Otherwise, we'll have problem with some movies like "La Hermanastra", because culturalia returns 2 posibilities :

---
Codigo = 16850
Titulo = Hermanastra, La.
Titulo original = Confessions of an Ugly Stepsister
Año = 2002

Codigo = 12939
Titulo = Hermanastra, La.
Titulo original = The Stepsister
Año = 1997
---

Like you see the same title in spanish for both. I think, the only solution is use part of the original title like "confessions of an ugly" or check after import, the year or original title to confirm it's ok. Few movies like this example, so don't worry, be happy :lol:

Regards, folgui.

Posted: 2003-10-06 15:35:40
by KaBeCi
well... i've been away for a while, i'm back now, i used the script for a lot of movies with great results. But... when a movie is not found on culturalia, the script "hangs" (and deletes the original title), but that doesnt bothers me.... what it really bothers me is when it get the wrong movie because sometimes you dont realize it.
try to search for "Narc"
you'll get "La ruta de los narcoticos" on iMDB AND Culturalia. why? i dont know, but if i use the culturalia single movie script, "Narc" is the first movie, also with iMDB single movie (large picture).
thanks

Posted: 2003-10-06 19:50:40
by Guest
If i search for "Narc" i get "Narc" info. Culturalia returns:

Codigo = 18721<br>Titulo = Narc.<br>Titulo original = Narc<br>Año = 2002
Codigo = 17468<br>Titulo = Narcos.<br>Titulo original = Narcos<br>Año = 1992
Codigo = 17728<br>Titulo = Ruta de los narcóticos, La.<br>Titulo original = La ruta de los narcóticos<br>Año = 1962

So, it gets the first movie info, not "La Ruta de los narcóticos". And this is OK. Don't know why it gives you the 3rd. movie. ¿Another movie example like "narc" to check?

I'll review and work on the "not found" at culturalia problem.

Regards, folgui.

KaBeCi

Posted: 2003-10-22 15:53:05
by KaBeCi
is because is "narc" not "Narc" i wrote it wrong on my last post
but its not a big problem anyway, forget about it

Posted: 2003-10-25 17:16:32
by folgui
Script modified.

Added an option to obtain different case of titles. This option is a "const" at beginning of script: TitleMixedCase

By default is "False", that means all letters lowercase except first one. With "True" we obtain a mixed case, with first letter of each word in Uppercase.

Set it as you like.

Regards, folgui

Posted: 2003-11-04 16:57:30
by KaBeCi
hi Folgui, thanks for your work in this script
2 only things to know about:

When it doesnt find a movie, it hangs up (and i've to press the stop button)

Retaking something i told you before, search for the movie "Casino" and you'll see... you'll get the info for "Zafarrancho en el Casino" instead of "Casino".

gracias

Posted: 2003-11-04 18:21:03
by folgui
It gets "Casino" for me, either if i put it in original o translated title. Sure you are using the version posted here?

Otherwise, i've already copied again my latest version in the first message, so get it from here.

Yes, if it doesn't find a movie it returns to the scripts window and you have to press "Stop" and "Close", for one movie, no problem; but for several movie import it's bad. I'll give it a look.

Regards, folgui

Posted: 2003-11-04 23:02:37
by folgui
Fixed the problem (hangup) when script doesn't find a movie in Culturalia.

The latest script is always the one posted in the first message of this thread.

Regards, folgui

Posted: 2003-11-18 18:03:10
by KaBeCi
the script is working great!!! thank you!!!!!
sometimes i get the HTTP/1.1 500 internal server error, but i dont know if its your script, imdb, culturalia or my PC
thanks folgui!!!!

if ExternalPictures is set to true, when importing pic from Culturalia it always save it internally.
Old code (line 991)

Code: Select all

    GetPicture(GetValueAfter(Page.GetString(LineNr), 'Imagen = '), False);
Corrected Code (line 991)

Code: Select all

    GetPicture(GetValueAfter(Page.GetString(LineNr), 'Imagen = '), Externalpictures);
i've realized it when i saw the TOO BIG size of my catalog, it doesnt fit on a floppy anymore and zip/rar wont compress too much.
BTW, does anyone knows a way to externalize all pictures??? (i'm doing it manually, save it, delete it and then linking it to the saved one)

Posted: 2003-11-18 18:08:20
by antp
KaBeCi wrote: BTW, does anyone knows a way to externalize all pictures??? (i'm doing it manually, save it, delete it and then linking it to the saved one)
Save to XML, the program will ask if it can make external pictures.
Then you can save it back to AMC or keep it in XML.

Posted: 2003-11-19 13:11:03
by folgui
Thanks KaBeCi ;)

Script updated with what you comment two messages above.

The same applies to the other culturalia scripts, so also updated.

Regards, folgui

Posted: 2003-11-20 05:05:53
by KaBeCi
thanks ANTP, you're the best... i know there could be a much easy way to do that, now my catalog is 200k and 70k zipped (before it was 3MB and 2.8 zipped), fits fine on 1 floppy. :grinking: :grinking: :grinking:
is there any difference to have it in XML or AMC???

Posted: 2003-11-20 08:48:18
by antp
XML is a text format, where tags are used, a little like HTML, and this can be used more easily by other people if they want to import it in other applications. You can see it with a text editor (notepad) or a XML parser (Internet Explorer or Mozilla).
AMC is a binary format.
Uncompressed, the AMC file is a little smaller.
But in the case that the file is damaged, the XML file will be easier to repair ;)

Posted: 2004-01-08 20:05:28
by antp
After IMDB changes (see viewtopic.php?t=1080 about that), I updated this script :
http://www.antp.be/temp/Culturalia+IMDB%20(batch).ifs

Posted: 2004-01-11 10:30:46
by folgui
Thank you very much antp! :grinking:

Posted: 2004-02-28 18:15:40
by folgui
Script updated.

The latest scripts in version 3.4.2 of AMC (date 09/01/2004) didn't include all code of the version published here, so updated with that code and with the code modifications by Antoine to support changes in IMDB a few months ago.

The code at first message of this thread is the latest updated/working version of the script. So copy/paste/save it.

Regards, folgui