January 07 2008

Creating Vector Models from Text Documents

MC is a C++ program that creates vector-space models from
text documents that can be used for text mining applications. MC provides
an efficient multi-threaded implementation that can process very
large document collections. For example, MC took 1,189 seconds using
only 17.5 MBytes of main memory to process a sample collection of
about 114,000 documents (the experiment was run on a Sun Ultra10
workstation). More details on MC and its use in a fast clustering
algorithm are available in
this paper.

Download

Post a comment

This blog is multi language by p.osting.it's Babel