Problem with GET method and GZIP decoding

You found an error in the program ? Report it here
Post Reply
Dekert
Posts: 36
Joined: 2020-03-14 08:32:31

Problem with GET method and GZIP decoding

Post by Dekert »

Hi, I have noticed a two problems with downloading the web pages, and also tried to find a solution.

1. Problem downloading the pages from youtube when use the GET method, example:
data := GetPage('https://www.youtube.com/results?search_query=trailer');
Image
I looked at the source code of AMC and also do some tests with example codes of Indy package.
It looks like the problem is with '*/*' ContentType of request header in GET method, line:
GetScriptWin.http.Request.ContentType := '*/*';
("function GetPage(const address, referer, cookies: string): string;" located in getscript.pas file)

Perhaps you just need to remove that ContentType line to fix it. See below link as a reference:
https://stackoverflow.com/questions/566 ... t-requests

2. The other problem, not related to above is support for gzip Encoding. Downloaded web page encoded with gzip compression is not automatically decoded.
You can easily recognize it, downloaded data starts with 1F 8B 08 bytes.
To test it, you can force gzip encoding by adding below line to above mentioned GetPage() function.
GetScriptWin.http.Request.AcceptEncoding := 'gzip';
Then just try to download google web page and check data content:
data := GetPage('https://www.google.com');
Possible that update of Indy component to latest version will be enough to fix that issue.

Unfortunately, I don't have the Delphi software to compile AMC source code files, and used free Lazarus/FPC compiler for partial testing only.
Dekert
Posts: 36
Joined: 2020-03-14 08:32:31

Re: Problem with GET method and GZIP decoding

Post by Dekert »

Antp, update of Indy component turned out not to be necessary.
I added corrections to the getscript source file to fix previously mentioned two problems.

"getscript.pas" (http://www.mediafire.com/file/ljt8vf5ytq0temw) changes:

Code: Select all

uses
...
  IdCompressorZLib;

Code: Select all

type
  TGetScriptWin = class(TBaseDlg)
  ...
    private
    ...
    CompressorZLib: TIdCompressorZLib;

Code: Select all

procedure TGetScriptWin.FormCreate(Sender: TObject);
var
  i: Integer;
begin
...
  //Init ZLib for deflate and gzip compressed content
  CompressorZLib := TIdCompressorZLib.Create;  

Code: Select all

procedure TGetScriptWin.FormDestroy(Sender: TObject);
begin
...
  // ZLib Compressor
  FreeAndNil(CompressorZLib);

Code: Select all

function GetPage(const address, referer, cookies: string): string;
var
  UseSSL: Boolean;
begin
...
  GetScriptWin.http.Compressor := GetScriptWin.CompressorZLib;
  GetScriptWin.http.Request.ContentType := '';

Code: Select all

function PostPage(const address, params, content, referer: string; forceHTTP11: Boolean; forceEncodeParams: Boolean): string;
var
  UseSSL: Boolean;
begin
...
  GetScriptWin.http.Compressor := GetScriptWin.CompressorZLib;[code]

Code: Select all

function GetPicture(const extraIndex: Integer; const address, referer: string): Boolean;
var
  Stream: TMemoryStream;
  UseSSL: Boolean;
begin
...
  GetScriptWin.http.Compressor := GetScriptWin.CompressorZLib;
  GetScriptWin.http.Request.ContentType := '';
The http request headers no longer contains 'Content-Type': '*/*' value, but in addition sending information about supported encodings "Accept-Encoding": "deflate, gzip, identity".

Access to the Youtube and Amazon pages with above changes should should be fixed.
The side effect of this mods is faster loading of data from Internet :)
antp
Site Admin
Posts: 9629
Joined: 2002-05-30 10:13:07
Location: Brussels
Contact:

Re: Problem with GET method and GZIP decoding

Post by antp »

Hi,
Thanks for the investigation and the details :)
I should take some time to make a new build then.
I have no idea when that will be, though, as I really do not have much time for AMC :/
I'll try to keep that in mind so I can do that and the update of MediaInfo (also requested a few times) in a not-so-distant future.

I'm not sure why the content-type was set to */*
I assume that back then there was a reason, either another default value or it was to fix something else.
I hope that setting it to an empty string won't have other side effects. Anyway I'll release a beta version with the fixes before putting that as "official" version.
Dekert
Posts: 36
Joined: 2020-03-14 08:32:31

Re: Problem with GET method and GZIP decoding

Post by Dekert »

Beta release makes sense. Thanks for a great job.
antp
Site Admin
Posts: 9629
Joined: 2002-05-30 10:13:07
Location: Brussels
Contact:

Re: Problem with GET method and GZIP decoding

Post by antp »

Hi,
I finally took some time to build a new version with your suggested changes.
It seems to solve the problem for Youtube.
I was hoping to solve at the same time the "Bad Request" error on MovieMeter but that's another problem it seems.
Here is the new exe, replace it in the folder of a regular version for testing:
http://update.antp.be/amc/beta/amc4230b.rar

Instead of forcing the content-type I rather kept its old value by default and made new 'GetPage4' and 'GetPicture3' calls with an extra parameter:
data := GetPage4('https://www.youtube.com/results?search_query=trailer', '', '', '');
(content-type is the last one, others are cookies and referrer)
Dekert
Posts: 36
Joined: 2020-03-14 08:32:31

Re: Problem with GET method and GZIP decoding

Post by Dekert »

Thanks antp, confirm that new beta version fix the youtube problem in GetPage4 call.

I don't know if I can help with MovieMeter, but may I see new version of getscript.pas file?
antp
Site Admin
Posts: 9629
Joined: 2002-05-30 10:13:07
Location: Brussels
Contact:

Re: Problem with GET method and GZIP decoding

Post by antp »

Post Reply