Monday, February 17, 2025

Fetch Cross-domain Content Using a PHP Proxy

If you’ve ever tried to fetch a resource outside of your domain via an Ajax call, you probably got this error message:

XMLHttpRequest cannot load http://www.domain.com/path/filename. Origin null is not allowed by Access-Control-Allow-Origin.

This error was triggered by the Same-origin policy. It permits scripts running on pages originating from the same site to access each other’s data with no specific restrictions, but prevents scripts from accessing data that is served from a different domain. Luckily, there are ways round it, including:

  1. CORS (Cross-Origin Resource Sharing)
  2. JSONP (JSON Padding)
  3. the postMessage() method
  4. local proxy

This tutorial describes how to set up a local proxy, as per item 4 above, to combine document fragments from another domain with your own web content.

How it Works

All of the workarounds above have their strengths and weaknesses, but I find that the proxy solution is best for serving up unadulterated HTML content from another domain when your web host supports some sort of server-side scripting such as PHP, .NET, Python, etc.

The idea is simple; you make a request to your own script to fetch a resource from another domain, and it returns it to your browser. Hence, the content comes from your own server – no more Access-Control-Allow-Origin error! Here’s a diagram to illustrate:

proxy_server

Our loadFrame.html page will call our proxy.php script and then parse the response in order to display selected content from my robgravelle.com landing page in an iFrame. You can read more about filtering iFrame content here.

The finished page will look like this:

robgravelle.com_in_iframe

The Server Code

The proxy script executes on the server so that, as far as your browser knows, content is coming from it and nowhere else. I am using PHP because it makes fetching web content a snap and because it runs on my local WampServer with no setup whatsoever.

Accessing the URL Parameter

As mentioned previously, the URL is passed to the proxy script as a GET parameter. Here’s how to read it with PHP:

<?php
$url = (isset($_GET['url'])) ? $_GET['url'] : false;
if(!$url) exit;

Delivering the Goodness

PHP offers several ways to fetch web content; I settled on file_get_contents(). I just set the Content type header, fetch the web page, and send it back to the browser:

header('Content-Type: text/html');
$string = file_get_contents($url);
echo $string;

In Case of Emergency

Let’s say, heaven forbid, that the URL was malformed or that the other server was down. The script should include some error handling for that. The file_get_contents() function is a little strange in that, rather than throw an exception, it only fires an E_WARNING level event. Therefore, you can’t use a try/catch. I found that, for my money, what works best is to set my own error handler for E_WARNINGs. Here’s how that works:

set_error_handler("warning_handler", E_WARNING);

header('Content-Type: text/html');
$string = file_get_contents($url);
echo $string;

restore_error_handler();

function warning_handler($errno, $errstr) { 
  header('Content-Type: text/plain');
  echo $errstr;
}

The Client Code

Back on the client-side, the loadFrame.html page contains a jQuery-powered script that performs the Ajax call on document load. The target domain is stored to a variable because we’ll be needing it later. Inside the success() handler, we obtain a reference to the iFrame’s contents:

var baseUrl = 'http://www.robgravelle.com/';
$(function() {
  $.ajax({
    url: "proxy.php?url=" + baseUrl + 'news/',
    success: function(data, status, jqXHR) {
      var iFrameContents = $('iframe').contents();
      //do stuff...
         }
  });
});

Checking for the All Clear

Our Ajax call only contains a success() handler because our proxy doesn’t throw any HTTP errors. Instead, it returns a plaintext string with the error (warning) message. We can check for that by testing for ‘text/plain’ content. If it is, we set the iFrame’s content to the data string, and give it a red font for good measure:

success: function(data, status, jqXHR ) {
  var iFrameContents = $('iframe').contents();
  if( jqXHR.getResponseHeader('content-type').indexOf('text/plain') >= 0 ) {
    iFrameContents.find('body')
      .css('background-color', 'white')
      .append( $('P').css('color', 'red').text(data) ); 
  }
  else {
    //do stuff...
  }
}

Parsing the Target Document

I tried to put off converting the HTML into a DOM for as long as possible. That’s why you’ll see some RegEx parsing in the code. It’s not until the fetching of the “header_content_footer_wrapper” DIV that conversion occurs via the $.parseHTML() method. The second (context) argument is set to null so that inline events will not execute when the HTML is parsed. The false argument ensures that all scripts passed in the HTML string are removed:

var head        = /<head(.*)>([\s\S]+)<\/head>/.exec(data),
    body        = /<body(.*)>([\s\S]+)<\/body>/.exec(data),
    bodyClasses = body[1].match(/class=['|"]([^'|"]*)['|"]/)[1],
    tempDOM     = $('<output>').append( $.parseHTML(body[2], null, false) ),
    mainDiv     = tempDOM.find('#header_content_footer_wrapper');

Setting Custom Properties

A few images in the target page are set using JavaScript that was not included in our page. We don’t need it as it’s just is as easy to do from here:

mainDiv.find('#content').css("background-image", "url('/@/Storage/_files/68/file.jpg')");  
mainDiv.find('#main_heading img').attr('src', '/@/Storage/_files/62/file.gif');

Appending the Document Parts

The last line in the script populates the iFrame. In order, it:

  1. Adds the <BASE> tag with the href set to the baseUrl variable. Without it, all relative links will be broken!
  2. Appends the document HEAD content to the iFrame
  3. Inserts classes in the iFrame body tag
  4. Appends to content DIV to the iFrame body.
 
iFrameContents.find('head')
    .append( $('<base>').attr('href', baseUrl) )
    .append(head)
       .next('body')
            .attr('class', bodyClasses)
            .append(mainDiv);

I included the two files in a zip file for your enjoyment. Extract them to your server’s www root and bring up the loadFrame.html page in your browser using your server’s URL as in http://localhost:8080/Frames/loadFrame.html.

Conclusion

Using a local proxy is best for serving up HTML content from another domain when your web host supports some sort of server-side scripting. Just be careful because it also happens to be the riskiest of the cross-domain workarounds due to its importing of raw HTML – and possibly scripting – from other domains.

Rob Gravelle
Rob Gravelle
Rob Gravelle resides in Ottawa, Canada, and has been an IT guru for over 20 years. In that time, Rob has built systems for intelligence-related organizations such as Canada Border Services and various commercial businesses. In his spare time, Rob has become an accomplished music artist with several CDs and digital releases to his credit.

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends & analysis

Popular Articles

Featured