X11 X12 .... X1m X21 X22 .... X2m ... Xn1 Xn2 .... Xnmwhere each row represents a vector of expression values for a gene; and (3) The number of clusters k. Note that you may use the data file to implicitly specify m and n if it turns out to be easier. The output of your program should be (1) The squared error distortion value d; (1) The k centroid vectors in the same format as the data file; and (3) The objects in each cluster in the following format:
cluster1: 1 3 5 cluster2: 2 6 7 10 ... clusterk: 4 8 12 9where the numbers are the indices of objects, so for example, the first cluster (i.e., "cluster1") has three objects, corresponding to the 1st, 3rd, and the 5th rows in the data table. In each iteration, you should compute the squared error distortion as described on page 346 of the textbook and test whether the squared error distortion decreases. Use a threshold on the change of the squared error distortion to test whether you are done with the clustering.
2.1 2.2 1.0 3.1 -1.5 0.5 1.0 0.5 -1.5 3 -0.5 0.5 1.2 2.2 -3Run your code for 5 iterations. Report the squared error distortion value, the centroid vectors, and the clusters you obtain.
2.1 2.2 1.0 3.1 -1.5 1.5 2.3 0.9 3.5 -1.9 1.7 2.4 1.2 3.2 -2.3Are you getting the same results as above? Why?
A simple test case is available here.
Please turn in a hardcopy of your written answers at the class. Please email your code and any other related information about your code to the grader Xuehua Shen (xshen AT cs.uiuc.edu).