9 Jan, 2008
KDnuggets.com (KD stands for Knowledge Discovery) is the leading source of information on Data Mining, Web Mining, Knowledge Discovery, and Decision Support Topics, including News, Software, Solutions, Companies, Jobs, Courses, Meetings, Publications, and more.
Go to KDnuggets.com
7 Jan, 2008
MC is a C++ program that creates vector-space models from
text documents that can be used for text mining applications. MC provides
an efficient multi-threaded implementation that can process very
large document collections. For example, MC took 1,189 seconds using
only 17.5 MBytes of main memory to process a sample collection of
about 114,000 documents (the experiment was run on a Sun Ultra10
workstation). More details on MC and its use in a fast clustering
algorithm are available in
this paper.
Download