Clustering msn 193 results for "largest tank battle" took 93 milliseconds and 600kb of RAM on 1.6GHz P4 w/ 512 RAM. Clustering 486 search results from AllTheWeb for "largest tank battle" took 266 milliseconds and 1.7MB of RAM on the same PC.
That's almost twice as fast and twice less memory than vivisimo's clustering engine for the same computer configuration.
Clusterizer was written in C++ and is available for Windows 2000/XP/2003, Mac OS X, Fedora 32 bit and Fedora 64 bit. Clusterizer can be ported to other platforms if needed.
Clusterizer's API comes a COM object, JNI (java) or C++ API and can be called directly from ASP (VB), JSP, PHP, C++ or Java.
Clusterizer's process can be customized. You can have many saved configurations and choose one when you cluster. You can configure:
- Minimum number of results in cluster, affects the depth of the tree.
- Bad clusters list. List of clusters that are too common.
- Many more tweaks.
Clusterizer is language independent therefore it can work with any language. The core of our clustering technology is language independent. However results in the European languages will be clustered the best since we have stop word lists and stemmers for them.
Stop word list is a list of commonly used words, uninformative for clustering, like "The", "If", "We" in English. They differ from language to language. We have stop word lists for 12 languages:
English, French, German, Spanish, Italian, Russian, Portuguese, Danish, Dutch, Finish and Swedish.
Stemmer is a small, language specific, algorithm that finds the common root. For example, English stemmer finds the root "eat" from words like "eating", "eated". We have stemmers for 6 languages:
English, French, German, Spanish,
Italian, Russian, Dutch.
Testing & Installation
You can test our clustering using Add search source page. All you need is to setup a search source with your results. Add search source page will guide you though the process of creating a template for your search source. After you finished just create a custom tab and see how your results are clustered.
We can also supply you with ASP or C++ example code with the source code that you can install on you machine and test our clustering.
Each search source is represented as an xml template. This xml template is easily created for each source you want to aggregate. Templates can be saved on HD and referenced by name or passed with the search query. This structure gives you complete control and total search source customization.
Our collection of those search templates is currently growing and is available for free. See Add search source.
iBoogie Meta-Search uses UTF-8 encoding and converts search sources seamlessly from other encodings. See Add search source.
iBoogie Meta-Search was written in C++ and is available only for Windows 2000/XP/2003. iBoogie Meta-Search can be ported to other platforms if needed.