Maria Hirlekar: Search Engines

How many times have you been left with a subject with just vague information coming in? How many times have you been befitted from Internet on updating you rightly on a subject? And how many times have you thought of ever thanking search engines for providing help at the tip of your fingers.

Search Engines have now become so much a part of our lives today, right from safety pins to aircrafts; you get every single detail that is needed. But just think about it, there are billions of web pages that are available today, and you just type your key word, and voila there you go with all the relevant information almost 99% of the times. But how does this happen in fraction of seconds, how is the search engine able to scan all the pages on the Internet and give you the relevant details. How is it able to discriminate?

The basic ideology behind this is first to optimize the key word that is given. Then once they find the key word, it is indexed and presented for the user to go through them.

Today the process is so simplified, but in days gone by, it was not so easy. The .Net programs like Gopher were used to retrieve the files put up on servers. So basically at that point of time, this was the biggest service that they could get on a service engine.

As time went by, technology has taken a massive leap, and today search engine have everything that you may want on silver platter. The working strategy is simple. What do you do at home or office when some member asks you for any object, let us say a pen that you they don’t know where they have placed. Well the answer is simple, you as a search engine first have to search the pen and in the process you may find so many pens, next you just put up all these pens in front of your friend for him to choose what he wants. The same is the logic here.

Technically now to understand how search engines work, we use a term called web crawling. Here we have pieces of code that are called as spiders that scan through the web for the required, then build a list. The first step involved is that the spiders target servers that are heavily used or else go to a very popular page. Everyone who is on the Internet today knows what is Yahoo or MSN, so this is a very popular page.

Indexing of words found on these pages then, begins. A key search is then made available. The traditional approach would next be to go the ISP to get more information on URLs, but in case your search engine has got its own DNS, the time for web crawling is drastically reduced, with web pages right at your service. And Information retrieval would then be very simple. The key words, which will now be found within these pages along with the position in the content, would next be highlighted, making it convenient for the user.

Now, Google when it searches, it does point out that the articles of English language have been left out in the search, for optimizing the search, however not all of them work that way. Some search only the first few lines or paragraphs in a page, or some check only headings etc. Lycos follows this approach, while Alta Vista searches for everything from articles of speech to every key word.

Meta Tags is another useful concept helping the search engines. These allow the user to define the context of the key word, and this approach proves to be highly helpful, since many a times we have words with similar meanings. But then like they say, every coin has got two sides, there are always those unscrupulous prowling around. Well now you must be wondering how and what is this all about? Well, to make my web site the most wanted one, I just make use of words which have no relevance with my website, the idea is to get a maximum number of visitors and get my idea through them. This is totally an unethical process, but that is how life is, with advantages and disadvantages.

And again another saying goes ‘Necessity is the mother of invention’; so now, we know that there are vicious elements around and hence we ask our web crawling process to co-relate, the page content to the tag, thus filtering the unwanted and protecting the usability of our search engine.

Now once we have initialized the task of information building by web crawling, the next would be to index it and present it to the user. And since our idea of building a search engine is to actually facilitate the user with maximum benefits, we need to properly stack the list with relevant information, and not just store URLs as an output. As of date, many search engines highlight the content where our key phrase has been used, and some even keep a count of the number of times the keyword has been used.

Also, another factor that is to be noticed here is that we need to even save storage area, and this can be achieved with encoding data, monitoring font size, etc.

Indexing more than often follows the principle of hashing where in every key word has a numerical key value. The distribution of the key values of words in the alphabet is spread evenly which helps to reduce the optimization time. The pointer in the hash table points to the data or output, thus increasing the usability.

The search after the index has been built is more Boolean in nature with most of the choice made by the AND OR NOT. Thus the search engines are all set to make lives easy for us, to update and enlighten us.

Maria Hirlekar

Labels

Sunday, April 29, 2007

Search Engines

1 comment:

copyscape

Just Me

Blog Archive

About Me