Commit bb04a165 authored by BRAMAS Berenger's avatar BRAMAS Berenger

Add a clustering lib to test in hmat

parent d96ccef1
License Information
===================
The Open Source Clustering Software consists of several packages, which have
different licenses.
* Cluster 3.0 is a GUI-based program for Windows, Mac OS X, Linux, and Unix.
It is based on Michael Eisen's Cluster/TreeView code. Cluster 3.0 is covered
by Michael Eisen's original license, available at
http://rana.lbl.gov/EisenSoftwareSource.htm. The command-line version of
Cluster 3.0 is also covered by this license.
* Pycluster is an extension module to the scripting language Python. It is
covered by the Python License (same license as Python itself).
* Algorithm::Cluster, the interface to the scripting language Perl. It was
released under the Artistic License (same license as Perl itself).
* The routines in the C Clustering Library can also be used directly by calling
them from other C programs. In that case, the Python License applies.
In all cases, copyright notices must be retained in their original form.
Open Source Clustering Software
===============================
The Open Source Clustering Software consists of the most commonly used routines
for clustering analysis of gene expression data. The software packages below all
depend on the C Clustering Library, which is a library of routines for
hierarchical (pairwise single-, complete-, maximum-, and average-linkage)
clustering, k-means clustering, and Self-Organizing Maps on a 2D rectangular
grid. The C Clustering Library complies with the ANSI C standard.
Several packages are available as part of the Open Source Clustering Software:
* Cluster 3.0 is a GUI-based program for Windows, based on Michael Eisen's
Cluster/TreeView code. Cluster 3.0 was written for Microsoft Windows, and
subsequently ported to Mac OS X (Cocoa) and Unix/Linux. Cluster 3.0 can
also be used as a command line program.
* Pycluster (or Bio.Cluster if used as part of Biopython) is an extension
module to the scripting language Python.
* Algorithm::Cluster is an extension module to the scripting language Perl.
* The routines in the C Clustering Library can also be used directly by calling
them from other C programs.
INSTALLATION
============
See the INSTALL file in this directory.
VIEWING CLUSTERING RESULTS
==========================
We recommend using Java TreeView for visualizing clustering results.
Java TreeView is a Java version of Michael Eisen's Treeview program with
extended capabilities. In particular, it is possible to visualize k-means
clustering results in addition to hierarchical clustering results.
Java TreeView was written by Alok Saldanha at Stanford University; it can be
downloaded at http://jtreeview.sourceforge.net.
MANUAL
======
The routines in the C Clustering Library is described in the manual
(cluster.pdf). This manual also describes how to use the routines from Python
and from Perl. Cluster 3.0 has a separate manual (cluster3.pdf). Both of these
manuals can be found in the doc subdirectory. They can also be downloaded from
our website:
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/cluster.pdf;
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/cluster3.pdf.
LITERATURE
==========
M.J.L. de Hoon, S. Imoto, J. Nolan, and S. Miyano: "Open Source Clustering
Software", Bioinformatics 20(9): 1453-1454 (2004).
CONTACT
=======
Michiel de Hoon
University of Tokyo, Institute of Medical Science
Human Genome Center, Laboratory of DNA Information Analysis
Currently at
RIKEN Genomic Sciences Center
mdehoon 'AT' gsc.riken.jp
This source diff could not be displayed because it is too large. You can view the blob instead.
/******************************************************************************/
/* The C Clustering Library.
* Copyright (C) 2002 Michiel Jan Laurens de Hoon.
*
* This library was written at the Laboratory of DNA Information Analysis,
* Human Genome Center, Institute of Medical Science, University of Tokyo,
* 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.
* Contact: mdehoon 'AT' gsc.riken.jp
*
* Permission to use, copy, modify, and distribute this software and its
* documentation with or without modifications and for any purpose and
* without fee is hereby granted, provided that any copyright notices
* appear in all copies and that both those copyright notices and this
* permission notice appear in supporting documentation, and that the
* names of the contributors or copyright holders not be used in
* advertising or publicity pertaining to distribution of the software
* without specific prior permission.
*
* THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL
* WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED
* WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE
* CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT
* OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS
* OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE
* OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE
* OR PERFORMANCE OF THIS SOFTWARE.
*
*/
// @SCALFMM_PRIVATE
#ifdef WINDOWS
# include <windows.h>
#endif
#define CLUSTERVERSION "1.52a"
/* Chapter 2 */
double clusterdistance (int nrows, int ncolumns, double** data, int** mask,
double weight[], int n1, int n2, int index1[], int index2[], char dist,
char method, int transpose);
double** distancematrix (int ngenes, int ndata, double** data,
int** mask, double* weight, char dist, int transpose);
/* Chapter 3 */
int getclustercentroids(int nclusters, int nrows, int ncolumns,
double** data, int** mask, int clusterid[], double** cdata, int** cmask,
int transpose, char method);
void getclustermedoids(int nclusters, int nelements, double** distance,
int clusterid[], int centroids[], double errors[]);
void kcluster (int nclusters, int ngenes, int ndata, double** data,
int** mask, double weight[], int transpose, int npass, char method, char dist,
int clusterid[], double* error, int* ifound);
void kmedoids (int nclusters, int nelements, double** distance,
int npass, int clusterid[], double* error, int* ifound);
/* Chapter 4 */
typedef struct {int left; int right; double distance;} Node;
/*
* A Node struct describes a single node in a tree created by hierarchical
* clustering. The tree can be represented by an array of n Node structs,
* where n is the number of elements minus one. The integers left and right
* in each Node struct refer to the two elements or subnodes that are joined
* in this node. The original elements are numbered 0..nelements-1, and the
* nodes -1..-(nelements-1). For each node, distance contains the distance
* between the two subnodes that were joined.
*/
Node* treecluster (int nrows, int ncolumns, double** data, int** mask,
double weight[], int transpose, char dist, char method, double** distmatrix);
void cuttree (int nelements, Node* tree, int nclusters, int clusterid[]);
/* Chapter 5 */
void somcluster (int nrows, int ncolumns, double** data, int** mask,
const double weight[], int transpose, int nxnodes, int nynodes,
double inittau, int niter, char dist, double*** celldata,
int clusterid[][2]);
/* Chapter 6 */
int pca(int m, int n, double** u, double** v, double* w);
/* Utility routines, currently undocumented */
void sort(int n, const double data[], int index[]);
double mean(int n, double x[]);
double median (int n, double x[]);
double* calculate_weights(int nrows, int ncolumns, double** data, int** mask,
double weights[], int transpose, char dist, double cutoff, double exponent);
......@@ -25,16 +25,16 @@ if(SCALFMM_ADDON_HMAT)
# Adding cpp files to project
add_library( scalfmmhmat STATIC ${source_lib_files} )
# Add blas library (even if it is set to off)
target_link_libraries( scalfmmhmat scalfmm)
# Adding the entire project dir as an include dir
INCLUDE_DIRECTORIES(
${CMAKE_BINARY_DIR}/Src
${CMAKE_SOURCE_DIR}/Src
${SCALFMM_INCLUDES}
)
${CMAKE_BINARY_DIR}/Src
${CMAKE_SOURCE_DIR}/Src
${SCALFMM_INCLUDES}
)
# Install lib
install( TARGETS scalfmmhmat ARCHIVE DESTINATION lib )
......@@ -45,6 +45,13 @@ if(SCALFMM_ADDON_HMAT)
file( GLOB hpp_in_dir Src/*.hpp Src/*.hpp)
INSTALL( FILES ${hpp_in_dir} DESTINATION include/ScalFmm/HMat )
# Add C Clustering Library
file( GLOB_RECURSE ccl_lib_files CClusteringLibrary/*.c )
add_library( cclusteringlib STATIC ${ccl_lib_files} )
INCLUDE_DIRECTORIES(CClusteringLibrary/)
target_link_libraries( cclusteringlib scalfmm)
install( TARGETS cclusteringlib ARCHIVE DESTINATION lib )
file( GLOB_RECURSE source_tests_files Tests/*.cpp )
INCLUDE_DIRECTORIES( ${CMAKE_BINARY_DIR}/Src )
......@@ -71,14 +78,13 @@ if(SCALFMM_ADDON_HMAT)
# Dependency are OK
if( compile_exec )
add_executable( ${execname} ${exec} )
# link to scalfmm and scalfmmhmat
# link to scalfmm and scalfmmhmat and cclusteringlib
target_link_libraries(
${execname}
${scalfmm_lib}
scalfmmhmat
# ${BLAS_LIBRARIES}
# ${LAPACK_LIBRARIES}
${SCALFMM_LIBRARIES}
cclusteringlib
${SCALFMM_LIBRARIES}
)
LIST(APPEND hmat_list_execs ${execname})
endif()
......
This diff is collapsed.
......@@ -287,6 +287,10 @@ public:
nbRhs, dim);
}
}
static int GetNbPartitionsForHeight(const int inHeight){
return FMath::pow2(inHeight-1);
}
};
......
// @SCALFMM_PRIVATE
#include "../Src/Clustering/FCCLTreeCluster.hpp"
#include "../Src/Utils/FMatrixIO.hpp"
#include "../Src/Containers/FStaticDiagonalBisection.hpp"
#include "../Src/Utils/FSvgRect.hpp"
#include "../Src/Viewers/FDenseBlockWrapper.hpp"
#include "../Src/Blocks/FDenseBlock.hpp"
#include "Utils/FParameters.hpp"
#include "Utils/FParameterNames.hpp"
#include <memory>
int main(int argc, char** argv){
static const FParameterNames SvgOutParam = {
{"-fout", "--out", "-out"} ,
"Svg output directory."
};
static const FParameterNames DimParam = {
{"-N", "-nb", "-dim"} ,
"Dim of the matrix."
};
FHelpDescribeAndExit(argc, argv,"Test the bisection.",SvgOutParam,DimParam,FParameterDefinitions::OctreeHeight);
const char* filename = FParameters::getStr(argc, argv, FParameterDefinitions::InputFile.options, "../Addons/HMat/Data/unitCube1000_ONE_OVER_R.bin");
const int height = FParameters::getValue(argc, argv, FParameterDefinitions::OctreeHeight.options, 4);
const char* outputdir = FParameters::getStr(argc, argv, SvgOutParam.options, "/tmp/");
int readNbRows = 0;
int readNbCols = 0;
double* values = nullptr;
FAssertLF(FMatrixIO::read(filename, &values, &readNbRows, &readNbCols));
FAssertLF(readNbRows == readNbCols);
const int dim = readNbRows;
FCCLTreeCluster<double> tcluster(dim, values, CCL::CCL_TM_MAXIMUM /*CCL::CCL_TM_AVG_LINKAGE*/);
std::unique_ptr<int[]> permutations(new int[dim]);
tcluster.fillPermutations(permutations.get());
const int nbPartitions = FMath::pow2(height-1);
std::unique_ptr<int[]> partitions(new int[nbPartitions]);
tcluster.getPartitions(height, nbPartitions, partitions.get());
{
typedef double FReal;
typedef FDenseBlock<FReal> LeafClass;
typedef FDenseBlock<FReal> CellClass;
typedef FStaticDiagonalBisection<FReal, LeafClass, CellClass> GridClass;
GridClass bissection(dim, height, partitions.get(), nbPartitions);
FSvgRect output(outputdir, "ccl.svg", dim);
bissection.forAllBlocksDescriptor([&](const FBlockDescriptor& info){
output.addRectWithLegend(info.col, info.row, info.nbCols, info.nbRows, info.level);
});
}
tcluster.saveToXml(outputdir, "ccl.xml");
tcluster.saveToDot(outputdir, "ccl.dot");
return 0;
}
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment