« Be Informed: Peak Oil and its ramifications | Main | Order your own blinkenlights »

The End of Google: Distributed Search Engines

Amid rising concern of online privacy in the light of subpoenas issued to the major search engines I have a look at alternatives that ensure privacy in a world that runs amok on private citizens right of anonymity.

NetworkPicture.pngHow many of you can remember times when the site to look for information on the net was NOT google but some other obscure search engine? Infoseek, Webcrawler, Yahoo and many more competitors in the late 1990s net where searching for the holy grail in Internet search. And then one day someone asked me through a chat that if I had tried out google yet - I didnīt and when I went over to the google.com page I was won over instantly and have not used any other websearch engine since then.
The question bumming in my head today was why - why did I switch over to google so instantly like many other early adopters of the net?
First of all it was the interface - while you had to click through heavy graphic pages on slow internet connections back then to get to the search queries just to be presented with an even uglier search result page that loaded even slower because it had a trillion of ads and banners and buttons on it. The navigation through results was shady at best and it took you longer to find out which result would work best for you then it took the server to actually find the results - Google in all this respects was much better. The same clean (and it was even cleaner in the old times) interface that you see today and the result page that does make sense and allows you to see and analyze your results fast.
When more and more information about google surfaced netizens agreed that it was an independent company that worked for the good of the net and respected the basic principles of privacy.
Time changes things and so Google today is one of the biggest technology companies catering to their shareholders. Search results are infested with google ads and other things have put a shady light on the netizens darling. If you have a google account and forget to log out then you searches are "personalized" and surely put into a database together with your name - and even if you do not have a google account your searches end up in a database - very probable together with your IP address. That database is what the recent subpoena from the US government was aiming at and even so google declined the request - unlike Microsoft and Yahoo who fully or partially complied and handed over all the data big brother wanted - it seems just a matter of time until some terrorist laws surface that require google to hand over the information and looking at what google is doing in China - complying with all and every rule against free speech - you would think that the once so activistic looking search giant is turning into the overpowered tentacle that misuses its trust.
In the light of all this a look at alternatives is in order. The problem like with so many other problems facing the internet today - seems to be the central server philosophy. Have one central server (or a private cluster) just gives the people who would regulate the net to death an easy target and once something becomes to big there are easy ways to curb this power easily - pull the plug or sue the hell out of the entity running the server - a look at the all so Popular Napster is in order. Since we seem to turn from a free open internet to a private controlled network with rules and without privacy searching that old wild net seems to be an offense to the government - I mean who needs search if you get bombarded with tons of URLs on TV commercials and if you look for something else on the net you must be an enemy of the state - well its not that bad yet but its easily visible that we are heading towards this direction.
Now the most interesting thing about the Napster phenomenon is that Napster was never extremely big compared with the millions of users using the net. Since the shutdown other technologies have taken over and they seem to be much more efficient - I am talking of course about Peer to Peer networks. Workload and traffic distribution has made it possible for the "Napster" community of filesharing to grow ten to hundredfold and the subject to sharing is not only music any more but also movies, code and about any stuff that is digital. The peer - to - peer concept seems to work so much that the government wants to outlaw it all together - which is besides some claims not possible because the Peer to Peer networks have shown a lot resistance to such trials and the further they are threatened the more they do to protect them self - the once so common tracker sides which run on just one server have been made distributed as well for examples. The data is split up into tiny pieces and controlling such a network seems to be extremely difficult.
Now I woke up this morning with the thought that someone must have started to build a distributed search engine - its such a simple idea and indeed I was greeted with some search results from google itself.
By further examining what was turned out I found that the most popular in the news was GRUB from looksmart, Inc. - a company that was once a popular search engine itself before the rise of google. This distributed search engine concept seems not operational anymore.
Then there are some Distributed Search Engines coming from the Open Source Community. The Windows only GPU originally just another Peer to Peer Client based on the popular Gnutella P2P client it has a plugin architecture with one plugin supporting a Distributed Search - the windows only nature will make it not very successful with the geek community who would be the needed early adopters for a such a project. Another seemingly succesful one is Majestic 12 even sporting a user-ranking which looks like the community is active and alive - yet there is no Mac version for me to try out - only Windows and Linux are supported at the moment.
The last one I want to mention is YaCy this comes as a java binary and runs flawlessly on OSX. Unfortunately there is no graphical user interface for it and only a shell command to start the crawler. Yes once you found the right command (a shortcut is in the directory "startYACY.command") everything runs smoothly and after the startup phase you are presented with an interface within your browser to further configure the distributed crawler - and so I am crawling away at the moment. Using the search (of your own crawl together with the accumulated crawls from the other users) is on the same local page that opened in your browser - making a bookmark is in order.
Now I do actually like the concept and the results - so far fewer then what you get with google - are very accurate and usable. Massadoption will not come until nice shiny likable interfaces are put on top but its a good step in the right direction.
Yet not everything is great yet and again a look at current Peer to Peer networks shows us what the pitfalls might be. Even if you cancel out the central server mentality of the google, yahoo, mcrsft giants there is still a week point between you and the government which would so badly look at what you look at. This weak point is the ISP you connect to of course and the big ISP do not seem to hesitate much when it comes to record all your movement and even with the smallest threat from big brother are handing this information over. The big rage right now is to encrypt all and every communication over peer to peer with a good enough key. There is still not a workable solution to P2P encryption out there but there are open source groups working on it and in the light of the current search engine fiasco you can bet that it will be introduced to the distributed search engines as well - once they become widespread that is.

TrackBack

TrackBack URL for this entry:
http://prototypen.com/cgi-bin/mt/mt-tb.cgi/1457

Post a comment