iBoogie is a search site developed and owned by CyberTavern. This site using Clusterizer - a clustering engine developed by CyberTavern. You now have a way to navigate through large numbers of search results. Clusterizer puts documents with similar content or with related topics into the same group. Each group is assigned a label based on the content of the documents. You can easily see the main topics and focus on the ones that are of interest to you, without being forced to scroll through a very long list of ranked documents.
IBoogie combines metasearch and clustering to deliver and organize search results from multiple sources into structured content. This is done dynamically in real time and presented to a user in a hierarchy of topics (clusters) for browsing and exploring. The cluster labels easily identify main topics of the search results, and there is no more need to go through multiple pages to find what you looking for.
Given today's state of the art search technology and the availability of the Internet search engines, searching is easy. User types a query and usually gets back tens of thousands of search results. Usually users look at the first couple of search result pages. If you do not see what you need, do the search again with re-formulated query. After a number of few different searches user either found what he/she was looking for, or if not you can give up or go to a different search engine and do the same process again.
Most of the search engines provide user with a lot of data not information. Information is organized or structured data.
IBoogie gives user information based on the results returned by the search engines.
The information you are looking for is most probably somewhere there on the Internet, we just do not know how to ask the right question to get it.
To use the famous quote the problem is really this one: "I know that I don't know what I don't know, but I don't know what that is."
The technology developed by CyberTavern and implemented in iBoogie is trying to solve this problem. Using information in clusters users can navigate through the search results in an easy and fast way. They can also use clusters to re-formulating the original query and modify search request and use it as a tool to navigate over the Internet.
How Clusterizer works
Clustering is a process of grouping similar objects from a given set of inputs. In the context of document retrieval systems (text search engines) it will put documents with similar content or with related topics into the same cluster (group).
Most of the Web search engines return search results as a long ordered (ranked) list of text records. Each record usually contains a title, a short description and an URL; we'll call this record a text snippet. Document clustering organizes search results into hierarchical groups of similar document. Each cluster is assign a label based on the content of the documents belong to this cluster. This presents a user with an alternative way to navigate through the large number of returned text snippets. He/she can easily see the main topics in the return snippets and focus on the one that are of interest to him/her, without being forced to scroll through very long list of ranked documents. A good example is: search for "Madonna" will create not only clusters that are about Madonna as an artist and singer, but also clusters that are about Madonna as a religious figure (Virgin Mary) and clusters about Catholic religion.
Real-time term extraction
Term extraction is the first step in the clustering process. Real-time text processing is done on all the snippets returned by the search engines. A partial parser and a shallow stemmer are invoked in this step. The terms that are most informative (descriptive) to be used as a label for clusters/grouping are identified by both linguistic and statistical methods. To correct the problem of term variations within the returned text snippets the system uses term normalization algorithms. This generates meaningful and easy to understand cluster annotations (labels) and avoid redundancy, for example: "Bill Clinton" and "Clinton Bill" or "games downloads" and "download games". Since the speed is very important in the Web text search environment, some of the heuristic algorithms we are using are optimized for speed in expense of more extensive and precise linguistic processing.
The second step is the clustering process. The algorithms are using a combination of linguistic clustering and statistical clustering. They generate hierarchical clustering as opposed to a simple "flat" grouping of similar documents. This is done in real-time on a set of documents return by the search, without any predefine grouping, pre-build knowledge base, or pre-processing of all the document collections used by the search engines. In general the return documents will have more than one descriptive term associated with it. The clustering algorithms allow the same document to be in multiple clusters. This reflects the fact that different people usually will group same information differently.
Search Source Directory
iBoogie provides the means for the search sources to organize themselves. Users can add their specific library, scientific, government or educational search sources to our database. Afterwards users can query sources simultaneously instead of one at a time.
Similar topic search sources are grouped into categories, creating search engine communities. Users can create customized search tabs depending on their interests. This is the first step in creating truly personalized search environment.