eSearch Engine

The need for search: Reduced costs and increased revenues. Its the solution for finding it via a high power web based search engine written in Java. If a bunch of everyday computer users wrote their Top 10 Lists of What I Hate About Computers, "Not finding the information I'm looking for" would probably be near the top of most lists.

eSearch 1.1 brings eSearch considerably more up to date with Expresso. It has the ability to index and dump all the normalized documents into the back-end database through Expresso. Work has been done to make the WeightingManager bug-free as well and calculating correcty the relevance ranking. We look forward to have a simple interface so uses can see how to incorporate search capabilities into Expresso projects.

Introduction

One of the most important application for the Internet is the Search Application that let us look for one document inside of millions of this that reside actually on Internet, probably the internet couldn’t be as important as is today if we can’t have tools as Google, Yahoo, Altavista, Excite and many other that let us serching on internet. This tools keep on its application database many reference to many documents on internet and in some case hold a copy of it on its database and when you ask (query) for this information they show to you in a friendly way. The field of computer science that cover these study is called Information Retrieval.

Search and Information retrieval remains a core component of any Enterprise IT strategy and budgets. Despite a worldwide economic slowdown, the content management and search and retrieval market is forecast by IDC to grow to over $8.3 billion in 2006.(1) In its report “Worldwide Search and Retrieval Technologies Forecast, 2002-2006,” IDC opines, “[The search and retrieval market] will continue to grow at a faster rate than software as a whole due to the critical role that it plays in linking people with information, particularly within the enterprise.(2)

(1) IDC, March 2002, Worldwide Content Management and Retrieval Software Forecast, 2002-2006; (2) IDC, June 2002.

Why do you need eSearch?

Usage statistics bear out the growing importance of search and retrieval. Knowledge workers spend 35 percent of their productive time searching for information online, while 40 percent of corporate users report they cannot find the information they need to do their jobs on their intranets.(3) The drag on enterprise efficiency and profitability is clear, with the volume of data inside the average corporation doubling every six to eight months.

Furthermore, IDC estimates that an enterprise employing 1,000 knowledge workers annually waste the equivalent of $2.5 to $3.5 million dollars in productive time as they search for nonexistent information, fail to find existing information, or recreate information that can’t be found. The opportunity cost to the enterprise is even greater, with the loss of potential additional revenue exceeding $15 million annually.(4)

Beyond the enterprise, the financial impact of poor information accessibility is more severe. Seventy percent of all online purchases start with search,(5) and 80 percent of online users will abandon a Web site if the search function doesn’t work well.(6) Business-to-consumer (B2C) and business-to-business (B2B) eBusiness sites are therefore extremely dependent on the quality of search results; prospective buyers who are presented with incorrect or incomplete information will quickly click away to a better-equipped competitor.

In summary, abundant empirical and anecdotal evidence exists to support the simple notion that fast access to superior information is a fundamental requirement for business success. Therefore advanced information retrieval solutions are critical to companies of all kinds: they boost eBusiness revenues, enhance user productivity, and increase operational efficiencies by dramatically improving the freshness, accuracy, and completeness of the information received.

(3) Working Council of CIOs, Business Wire, February 27, 2001; (4) Source: IDC Research “The Cost of Inefficient Search.”; (5) Source: Forrester Research; (6) Source: Jupiter Research.

eSearch Engine Architecture

Feature Summary

For businesses and organizations that depend upon rapid and reliable access to information, there is a better way - information retrieval with a search engine. It was no surprise then that we learned from the developer community that a full featured search engine was something that a lot of developers were interested in.

eSearch

eSearch Subproject: Tarantula

eSearch Features

While a basic search engine is available now, a group of community members are at this point defining specifications for evolving the project into a full search engine - some of the ideas are very interesting, involving artificial intelligence capabilities for "smart" searching of both intranet and internet sites.

Indexer

Creates an Index for later use by the search servlet. This index can be stored in any JDBC data store, and can utilize cached row sets to hold all or part of the index in memory (subject to size limitations). Can "crawl" webs of information by traversing the tree from an initial URL. Can read any type of URL: URL types include HTML, XML, Word Documents, PDF files, etc. Each type of URL has a dynamically loadable object that knows how to extract attributes and associated values for that type of URL. In this way, adding new types of URL's can be supported by adding new objects - for example, an object could be created that knows how to scan GIF images for certain attributes (size, number of colors, etc). Attributes are items which identify a particular URL, for example, an HTML document may have a "Title" attribute, which is the string that appears between the tags in the HTML file. Text files can have a "Content Keyword" attribute, e.g. key words/terms that appear in the body of the document Attributes can either have only one value per document or many (e.g. only one title, but many content keywords).

User Interface

Some of the basic assumptions at this point is that the user-interface will be servlet (or JSP) based, and customizable to different purposes.

Search Servlet

Allows users to place a query, returns resulting URL's Tracks which users place which queries (history) Tracks which one of the proposed results is selected by the user (weighting) Servlet is isolated from the search logic, so the method to produce the result list is not coupled to the user interface.

Search Engine

The "engine" part of the search engine is decoupled from the user interface (e.g. non-visual), and able to be accessed by any front-end (e.g. another application, servlets, applets, etc).

  1. Use the index prepared by the Indexer to fulfill query requests
  2. Provide the option of tracking who is making a query, for the purpose of "learning" about user preferences and interests so that search results can be customized/personalized to that user.
  3. Track which of the results the user chooses to view, and remember this for future searches (e.g. which result "won" for the given query).

Administrative Interface

Allow the setup and configuration of the search engine to be modified. Allow tracking/history information to be viewed.

Also, are there any other "wish list" items for features? Please let us know.


Home | Products | Services | Partners | Customers | About Us | Login | Forums | Contact Us

Copyright © 2000-2003 Jcorporate Ltd. All rights reserved. Copyright Privacy

Last Modified: 10-Nov-2003