Tuesday, January 02, 2007

How do search engines' bots handle javascript?

This is a freestyle translation of www.seoweblog.ru: Как индексаторы поисковых систем обрабатывают javascript?.

We've just completed experiment, targeted the real knowledge of how do indexers/bots of different search engines handle HTML code with javascript included within it and javascript redirects in particular.

In our experiment we used high traffic site positioned in google for some popular keywords. On the main page of this site we created links to the (experimental) pages with a different fragments of javascript within each of it. These fragments redirect clients' browsers to the other (destination) pages specially created for this experiment. To be safe destination pages were truly secret and weren't linked with the main site in any way. This way we were sure that bots had came for the destination pages only via experimental pages. All we need to do after that is just look at raw server's log at which destination pages were actually crawled by search engines bots.

At the end of experiment it was clear that Googlebot and other search engines' bots were able to correctly handle almost any variants of javascript redirects, i.e. bots had crawled destination pages and pages were appeared in the search engines' index. Below are concrete examples that were correctly interpreted by bots:

In the first example processed by indexer we see plain redirect code:

<script language=”JavaScript”>
  document.location.href = “http://www.site.com/directory/1.html”;
</script>
Second one was redirect executed by encoded script:
<script language=’JavaScript’>var str = ‘wbs%21s%3Eepdvnfou%2Fsfgfssfs-u%3E%23%23-r%3C
%0B%21%21%21%21%21%21%21%21%21%21%21%21%21%21
epdvnfou%2Fmpdbujpo%3E%23iuuq%3B00xxx%2Fbetpgu.efwfmpqnfou
%2Fdpn0uftukt03fod%2Fiunm%23%3C’; str = unescape(str); res = ‘’;
for (var i = 0; i < str.length; i++){ res += String.fromCharCode(str.charCodeAt(i)-1); } eval(res);</script>
In the third example indexers were required to process part of the script inlined in iframe (and they did it correctly):
<iframe
  xsrc=”http://www.site.com/directory/f.html” width=”100%”   height=”100%” frameborder=0 hspace=0 vspace=0
  marginwidth=0 marginheight=0
  allowtransparency=true scrolling=no>
</iframe>
But there were exceptions. Below is two javascript examples could be used for redirecting client browsers that search engines do not understand (i.e. seo safe).

On the first page redirect was done in a manner that allows to execute it only by client's browser or a bot with html code rendering capability. Example (slightly modified) code is:
<table width=”100%”>
<tr>

<td id=”first”>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>

<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>

<td>aassssssdddddffffgggghhhhjjjkklll</td>
</tr>
<tr>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>

<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td>aassssssdddddffffgggghhhhjjjkklll</td>

<td>aassssssdddddffffgggghhhhjjjkklll</td>
<td id=”second”>aassssssdddddffffgggghhhhjjjkklll</td>
</tr>
</table>
</div>

<script language=”JavaScript”>
  var D=document;
  function AbsPos(O, Parent){
    var X=0, Y=0, Next, D=document;
    Next=O; if (Parent==null) Parent=D;
    while (Next!=null && Next!==Parent){
      Y+=Next.offsetTop; X+=Next.offsetLeft; Next=Next.offsetParent;
    }
    return [X, Y];
  }
  var first = AbsPos(D.getElementById(’first’));
  var second = AbsPos(D.getElementById(’second’));
  if (first[0] != second[0]) {
    document.location.href = “http:/’+'/www.site.com/directory/t.html”;
  } else {
    document.write(’whatever‘);
  }
</script>
The experiment has shown us that search engines bots do not have rendering capability (and this is understandable). That fact could be used by anyone who wants to have redirect either executed by alive users and not accounted by (hided from) search engines' bots.

In the second example redirect is triggered by an "active window" event:
<script language=”JavaScript”>
  function f(){
    document.location.href = “http://www.site.com/directory/x.html”;
  }
  window.onFocus = f();
</script>
Of course bot didn't follow (crawl, index in turn) this redirect because it don't have such capabilities (again).

In the next special example:
<script language=”JavaScript”>
  function rnb() {
    http://www.site.com/directory/abc.html
  }
</script>
were URL was simply inlined in javascript (without any redirect) we have verified that bots didn't follow the URL. This means that search engines' bots (Google and others) do indeed correctly "execute" javascript and see the result of it's execution. But the subset of javascript they support is limited. E.g. they haven't have rendering capability yet.

Our conclusions

Bots of the main search engines (Google in particular) do support some subset of javascript. I.e. in general they are able to distinguish between normal javascript (that is part of dynamic html page) and sneaky redirects. But there is still a possibility to create sneaky redirect unnoticed by the search engines. E.g. you could exploit the difference between a real html browser and and se bot (last one haven't have rendering capability yet).

7 comments:

Matt Cutts said...

Interesting.

The Yacht Broker said...

Was rereading Xooglers and found
your blog, very interesting.

See you.

Anonymous said...

Мудак ты, блять.
Перед кем выслуживаешься? Расписываешь, переводишь... Полицай.

rathamahata said...

Dear anonymous, please write in English next time.

Anonymous said...

Google Translate deciphers the anonymous Russian comment as:

"You asshole, blyat.
Before vysluzhivaeshsya by whom? Signs, transfers ... Politsay."

Gregg said...

rathamahata, could you please provide a translation of the anonymous comment?

Zoey said...
This post has been removed by a blog administrator.