Hybrid Approach for optimizing the search engine result

Due to the tremendous growth of the Internet in recent years, huge amount of data is added to the World Wide Web, search engines have to perform complex task of sorting billions of pages and displaying only the most convenient and relevant pages for the submitted search query. With this huge amount of data over on web lead to difficulty in managing and displaying data according to end user perspective and become bottleneck for SEO Engineer and Webmaster. It becomes very essential to promote a website in search engine result in website development. Webmaster or search engine optimization engineer have to be actively learning the techniques and algorithms that drive visitors to their site. For this purpose some ordering of webpage is in result list became important.

Most relevant page should be place on the top of list and least relevant page should be at bottom according to user query. For this purpose ranking of web page is needed for arranging of page according to user demand dynamically. Page ranking is assigning a value (rank) to the web page among the similar type of page to decide its importance. In this we present some algorithm used in page ranking and their comparison and work proposed aims to optimize the results of a search engine by displaying the more relevant and most user relevant pages on the top of search result list. For this we propose a Hybrid of Query Recommendation and Document clustering, Genetic algorithm. This approach starts with finding most popular query by pre-mining the query logs to fetch the potential clusters of queries and from this all clusters we get most popular queries.

Every cluster entries are again mined to obtain sequential patterns of pages accessed by the users. After both mining process, output of both mining is combined to get relevant pages to users with recommendation of popular historical queries. After this document clustering and genetic algorithm is applied resultant output. Document clustering is applied to output to group all similar pages together in one cluster (partition) after genetic algorithm is applied on results to optimize the result and Select the best pages which have highest score depending on other features like number of keywords. At last list of web pages are chosen from different regions of information which are the result of genetic algorithm. This give a optimize list of WebPages for user demand query in a short time.

                                                                                                                                                            PROPOSED WORK

The proposed hybrid model is the hybrid of Query Recommendation and document clustering, genetic algorithm, model consists of Query Recommendation system in paper learning from historical query logs. This proposed system calculate user’s information requirements in a better way by performing query clustering to find the similarities between the two queries,which is based on user query keywords and clicked URLs. After that Generalized Sequential Patterns algorithm is used to generate the frequent sequential pattern of web pages visited by user in each cluster then previously assigned rank score of the web page are modified to re-rank the search result list by using the discovered sequential patterns. The relevancy of the webpages based on its access history is enhanced by rank updation.

 Proposed Hybrid model

After that, the frequent sequential patterns of web pages visited by the users in each cluster are generated with the help of Generalized Sequential Patterns algorithm. The final approachis to re-rank the search result list by modifying the previously assigned rank score of the web pages using the discovered sequential patterns. The rank updation enhances the relevancy of the web pages based on its access history. By this method, the time user spends looking for the required information from search result list can be reduced and the more relevant Web pages can be obtained.

When user writes a query on the interface of search engine, query terms are matched with the index repository of the search engine by query processor and produce a list of matched document. Result optimization system performs its task of gathering user intentions from the query logs in reverse order. Query similarity module continuously analyse the user browsing behavior as well as the submitted queries and clicked URLs get stored in the logs. The output of which is forwarded to the Query Clustering Tool to create potential groups of queries based on their similarities. Sequential patterns of web pages in every cluster are discovered by Pattern Generator module.

Matched documents retrieved by query processor are input to Pattern Generator module. Sequential patterns improve the rank of page which contains search context and the user preference. This improved ranked list is feed to Intelligent Search Engine described in paper [5]. In this first step is Page vectorization in which list from sequential pattern is used to create vector of characteristics for each page. Then this vectorized pages are clustered into similar page called cluster this step is known as page clustering. Finally in third step optimizing is done by applying genetic algorithm on structure identified by the cluster and the score of pages for selecting the best sets of page from each cluster to get most relevant result for user demand query.

In this paper, proposed hybrid approach of optimizing using query recommendation and Document clustering, genetic algorithms can be useful for search engine to optimize the displaying result and able to display the most relevant WebPages with recommendation to user query so user not have to search through list of displayed page and seeking time the of user to retrieve the needed information from the list of pages is reduced by displaying most relevant and use information at the top as per user requirement.

In future, query clustering and page clustering will be combined for as a single phase so the time for both clusters will be minimizes and we will able to provide the most relevant in least time.

Leave a Reply

Your email address will not be published. Required fields are marked *