The last session of the eMetrics Marketing Optimization Summit in Washington DC that I attended before hitting the road and heading back north to Philly was on the Acqisition Track, entitled "Search from Now On". None other than my great friend Mike Grehan of Acronym Media was presenting. If you didn't know, Mike's writing his 3rd book on Search Marketing, as well as a white paper about search engines and their new listening signals.
A Bit of Search History
Mike starts of by showing a slide of a quote from Vannevar Bush, then summarizing what was on the screen as "information can become lost and all over the place and it would be great to put it all together."
"As We May Think", is a piece that Vannevar Bush wrote that questions, instead of making the weapons of mass destruction, couldn't we instead create something great for mankind? 1945 Bush invented the fax machine, computer and the internet, MEMEX - is really the world wide web.
Bush argues that as humans we should turn our scientific efforts from increasing physical ability too making all previous collected human knowledge more accessible. Now take a look at Google today. Google's mission is to organize the world's information to make it universally accessible and useful.
1989 Sir Tim Berners-Lee actually created the
internet World Wide Web (I actually made a mistake here, Mike said World Wide Web, not internet). "I just had to take hypertext idea and connect it to the Transmission Control Protocol and Domain Name System ideas and ... tada .. the World Wide Web." In a space of 10 minutes he invented the world wide web.
Information Retrieval on the Web - Phase One
Data collection carried out by web crawler assigned to download web pages and parse text into index. With this method though, it's difficult to tell which pages are most authoritative documents in a corpus of millions purely by analyzing the similarity of text. Too many documents are relevant to the query and thus, creates what's know as the "abundance problem". Then there's the issues of it being way to easy to manipulate the results.
Take the following example: If a music student writes a paper on Beethoven's 5th symphony, and a conductor does, who's more relevant? The conductor obviously, but by just looking at the text, there's no way for search engines to really tell.
Information Retireval on the Web - Phase Two
1998 John Kleinberg did a search on "search engines" on Alta Vista, Alta Vista didn't show up in the results. Then he went and look at "Japanese automotive maker", neither Toyota, or Honda appeared in those results. He wondered why and found that in either case none of them had those words on their page. This is when he realized the words were important.
Network theory applied to link analysis provides major new signal based on hubs and authorities. Google develops PageRank based primarily on citation analytics (a subset of network theory). Link anchor text provides context for latent semantic analysis. However, the ranking mechanism is biased towards web content creators and not end users. Also possible, as with crawls analyzing content, to artificially inflate link data - easy to manipulate.
About 1993-1995 everything was based on links. Mike wasn't even looking at words on page. Just the words that are in the link. It became not about the quantity, but then it became about the quality. Link building is about getting great links from within your community. The strongest signals search engines looked at, up until recently, were based on text on a page and links pointing backwards and forwards.
The Taxonomy of Search
Knowing the taxonomy of search is very Important for doing keyword research, and understanding the intent of the keywords being used.
This applies to the surfer who is really looking for factual information on the web
Navigational is when a surfer really want to reac a particular web site.
Transactional means that ultimately the surfer wants to do something on the web, through the web. Shopping is a good example. You really want to buy stuff.
Understanding the user intent is most important when it comes down to it. For example a bank looking at keywords thinks "Lend Money" is their most important keyword since that's what they are in business to do. However, that's not how the end user sees it, the wan tot "Borrow Money".
The New Search Signals
Early information retrieval techniques limited to two major signal: Text and Links. It's very susceptible to dubious intentions of content creators. As the web grows exponentially, more content is created than can be collected by a web crawler, there for too many relevant pages are outside of the scope of the search engine crawlers.
Most searches are non-commercial, for example a search on "History of Cookies" gives back no ads. But take a look a search since Google implemented Universal Search over a year and a half ago. There's "Vertical Creep" which drops the natural search results below the fold, and making "in the top 10" not matter any more. "Ranking reports" don't really matter any more, especially if you are now below the fold!
Universal search has changed everything, the "golden triangle of search" is changing. The minute images are put in the results, the eyes dart all over the place.
Social mediais a new signal for search engines to learn relevance. Text book SEO is going to eventually disappear, crawling the web will become a back fill for the search engines. Text on an HTML Page, linkage data and link anchor text, along with Social Media - Tagging, Bookmarking, Rating, etc. are now becoming the signals, however, a new huge signal is the use of the Google Toolbar, and now the recently introduce browser, Google Chrome.
Since the rise of social media, content creators, such as copywriters for corporate web sites and online publishers, are outweighed by at least a factor of five by user generated content such as blog posts, forum posts, rating reviews, etc. End users want a much richer experience at the search engine interface, more color more images, more choice.
Connecting end users is with the content they're looking for may not be achieved as Google and other search engines have attempted by creating the current signal database repository. Too much information is now beyond their reach. New relationships between content creators and search engines need to emerge to cater to the demand for many different types of information that the end users crave.
Is HTTP the right platform?
Mike wraps up this session by questioning, is the current way we are viewing all of the information the "right way"? He's said to me a few times in other conversations, "what we are doing now is like trying to shove a giant elephant through a tiny hole". Changes in protocol, HTTP are going to have to come since was built 20 years ago!
"As content becomes more diverse, more complex, bigger and more fragmented, getting it through HTTP and HTML may not be the right model anymore." - Andrew Tomkins, Vice President of Search Research at Yahoo!
Mike's new whitepaper is coming out soon, "New signals to Search Engines, Future Proofing Your Search Marketing Strategy".