Let's search for treasure!

Let's search for treasure!

Search and retrieval software in portals is becoming more sophisticated, and more modular.

By Paul Montgomery

The search function of a major Internet site is a given these days. It is tucked away unobtrusively somewhere on the front page, a blank space waiting for an input. Behind that perfectly simple interface, however, is a Pandora's boxful of problems, pitfalls and potential errors that would force searchers to wade through pages and pages of irrelevant links to find the content they want. After all, even with the advent of Google, how many times do you find what you actually want on the first page of your query results?

In a corporate environment, as part of an enterprise portal, the little box looks much the same, but the complexities of setting up a useful database of what your organisation knows are far greater. Instead of having sites come to you for indexing, or being able to send spiders out over linked networks to index files in standard format, the corporate information manager has to deal with marooned islands of data in legacy file formats with outdated contents and non-existent metadata stores.

Ian Davies, managing director of Australian search and retrieval software developer Odyssey Development, said that public search engines like Google can afford to miss documents, whereas if portal search engines missed out on including any documents in their databases, the consequences would be quite different.

"It's all very well to have this high-faluting stuff in a portal, but search is where they [the users] go," he said.

The three main vendors in the search and retrieval sector of the portal industry are Odyssey, US-based Verity and British firm Autonomy. The message between the three about their role in portal projects is very similar: search is an essential function, but never the only function, so they are quite comfortable working with other vendors to provide spare parts for a holistic solution.

"You build a portal out of bits and pieces. One of the key bits and pieces is search," said Mr Davies. "But we don't take the angle that we are a one stop portal shop. You have to build your own portal, and we're one good bit of it."

"We have the ability to provide one single point of access to provide any piece of information a company needs, across multiple repositories," said Stephen Cottrell, Verity's director for the Pacific rim. "Verity is key portal infrastructure. We're not a portal vendor - we don't pretend to be. We have the technology to complement a portal solution."


One text entry box may look much the same as another, but this is not to say that all three vendors' products are the same under the bonnet. They remain very competitive, especially in the original equipment manufacturer (OEM) channel where the battle to secure relationships with third parties to bundle search software with other applications is fierce.

"A lot of our competitors don't do elementary things like highlight the hits. It is quite rare to have your hits highlighted. I find that remarkable in the 21st century," said Mr Davies.

Mr Cottrell said Verity had over 200 OEMs, including Documentum and Adobe, although only two per cent of the company's revenue came from this channel.

"We've been around since 1988, we're a proven player in the marketplace," he said. "We have more OEMs than anyone else. We are seeing smaller companies being overly aggressive in their efforts to obtain OEM relationships. We're seeing a lot of companies leave the marketplace and get bought out."

Ian Black, director of communications at Autonomy, said it was "normal" for Autonomy's interface to be submerged into a wider portal shell, with agreement in place for the company's search application to be integrated with portal offerings from BEA, Brio, Business Objects, Corechange, Sybase and Vignette, amongst others.

"With a lot of Autonomy deployments, the user will deploy our Portal-In-A-Box, and they will then embed that functionality into their backend into interfaces that the users are already familiar with, like the SAP interface, or unstructured tools like Word or PowerPoint," he said.

One recent movement in the field has been the advent of indexing servers released by Microsoft and IBM, to support their own portal, groupware and document management applications.

"We don't see them as a competitive threat. Our customers are coming to us, saying that their software has limitations. Our business is being driven by our customers," he said.


Another consistent message form all three vendors is that many corporate and government users are asking for more than just the simple text-based search box. The most common mode of searching by employees is the standard Boolean text parser to look up key words, but more advanced users with more targeted needs require a different system.

"It depends on people's ideas on how to interact with the computer system," said Steve Gibson, Pacific rim technical director for Verity. "Some people don't know what they're looking for, and they can work within the browser interface. Others know what they are looking for, so they can use the system more directly.

"They have a different idea of what they are looking for. Some people are looking for a concept, an area of interest, not a specific document. They can see the documents in that area, and see them laid out in a logical, hierarchical fashion."

The content stores within the user organisation need to be classified in a hierarchical structure based on the subject matter of the documents, much like Roget's Thesaurus, in a system called taxonomy. This feature is so important that it is becoming the differentiator between the competing applications.

"The fundamental difference of the Autonomy portal to traditional portals is that [other portals] are no more than a thin layer that links data across repositories, and the Autonomy approach goes far further than that. Our approach is to say, what is the meaning of the information in these repositories? What are the concepts, and the relationships between the concepts?" said Mr Black.

Verity recently licensed the Content Organiser product from LexisNexis for its K2 Enterprise application, a pre-built taxonomy which users of K2 can overlay on its existing data repositories.

"It makes it very simple for customers to create taxonomies that suit their business," said Mr Gibson. "Librarians can just drag and drop it into their own system to help them with their classification."


Autonomy's various offerings are termed collectively as IDOL, and the software modules are classified into "slots" and "racks", mimicking the server racks they are physically stored on. Mr Black said the "next big thing" in the next version of IDOL would be automation of the knowledge management functions of finding like-minded colleagues with complementary expertise sets to collaborate with.

"People want greater intelligence and more automation. They have burnt their fingers on technology that promised automation but didn't say the set up time would be extortionate," he said.

"Anything that actually delivers on automation, and anything that does that in real time and decreases the bare minimum of manual effort the user has to employ, scores for us big time."

Another potential use for search is in business-to-business portals, or even internal portals for which many functions are outsourced beyond the firewall. Odyssey Development recently released a supporting module for its ISYS application suite to enable hosting at an application service provider (ASP), called ISYS:web.asp. The company has also signed a deal recently with EDS to provide services to Odyssey customers.

"A lot of people are deciding that ASP is how they want to structure their portals," said Mr Davies. "This increases our catchment area for people who can use our software, where ISYS:web is for people who didn't have anything in place, and wanted to get started as quickly as possible."

Another recent Odyssey addition is ISYS:rdu, or Replication and Distribution Utility, which is a mirroring backup solution for unstructured data.

"In an ideal world, you would have data in only one place, with infinite bandwidth so it was always fast and everyone could access it. In the real world it makes sense to replicate data from one place to another," said Mr Davies.

Business Solution: