diff --git a/README b/README index 32b21b305915371a1528833f2a191d29b7f4a733..8635d774e869428bf4313539dd85af73a6b3e492 100644 --- a/README +++ b/README @@ -2,68 +2,84 @@ MIT Language Modeling Toolkit ============================= -The MIT Language Modeling (MITLM) toolkit is a set of tools designed for the -efficient estimation of statistical n-gram language models involving iterative -parameter estimation. It achieves much of its efficiency through the use of a -compact vector representation of n-grams. Details of the data structure and -associated algorithms can be found in the following paper. +The MIT Language Modeling (MITLM) toolkit is a set of tools designed +for the efficient estimation of statistical n-gram language models +involving iterative parameter estimation. It achieves much of its +efficiency through the use of a compact vector representation of +n-grams. Details of the data structure and associated algorithms can +be found in the following paper. - * Bo-June (Paul) Hsu and James Glass. Iterative Language Model Estimation: - Efficient Data Structure & Algorithms. In Proc. Interspeech, 2008. + * Bo-June (Paul) Hsu and James Glass. Iterative Language Model + Estimation: Efficient Data Structure & Algorithms. In + Proc. Interspeech, 2008. Currently, MITLM supports the following features: * Smoothing: Modified Kneser-Ney, Kneser-Ney, maximum likelihood - * Interpolation: Linear interpolation, count merging, generalized linear - interpolation + * Interpolation: Linear interpolation, count merging, generalized + linear interpolation * Evaluation: Perplexity - * File formats: ARPA, binary, gzip, bz2 + * File formats: ARPA, binary, gzip, bz2 -MITLM is available for download under the MIT License. It has been built and -tested on 32-bit and 64-bit Intel CPUs running Debian Linux 4.0. It currently -requires the following: +MITLM is available for download under the MIT License. It has been +built and tested on 32-bit and 64-bit Intel CPUs running Debian Linux +7.0. It currently requires the following: - * ANSI C++/Fortran compiler (GCC 4.1.2+) - * Boost C++ Libraries (1.32+) + * ANSI C++/Fortran compiler (GCC 4.7.1+) + +For more information about MITLM, please visit: + + https://code.google.com/p/mitlm/ + +If you find any BUG and/or want to provide a patch, please file an +issue about it at: + + https://code.google.com/p/mitlm/issues/list =============== Acknowledgments =============== -The design and implementation of this toolkit benefited significantly from the -SRI Language Modeling Toolkit by Andreas Stolcke. The vector library is -partially derived from the Flexible Library for Efficient Numerical Solutions -by Michael Lehn. +The design and implementation of this toolkit benefited significantly +from the SRI Language Modeling Toolkit by Andreas Stolcke. + +The vector library is partially derived from the Flexible Library for +Efficient Numerical Solutions by Michael Lehn. Copyright (c) 2007, Michael Lehn All rights reserved. Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are met: - - 1) Redistributions of source code must retain the above copyright notice, - this list of conditions and the following disclaimer. - 2) Redistributions in binary form must reproduce the above copyright notice, - this list of conditions and the following disclaimer in the documentation - and/or other materials provided with the distribution. - 3) Neither the name of the FLENS development group nor the names of its - contributors may be used to endorse or promote products derived from this - software without specific prior written permission. + modification, are permitted provided that the following conditions + are met: + + 1) Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + 2) Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + 3) Neither the name of the FLENS development group nor the names of + its contributors may be used to endorse or promote products + derived from this software without specific prior written + permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR - CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, - EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, - PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR - PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF - LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING - NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS - SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -The project is supported in part by the T-Party Project, a joint research program between MIT CSAIL and Quanta Computer Inc. + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS + FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE + COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, + INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, + BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; + LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER + CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN + ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + POSSIBILITY OF SUCH DAMAGE. + +The project is supported in part by the T-Party Project, a joint +research program between MIT CSAIL and Quanta Computer Inc. Bo-June (Paul) Hsu Computer Science and Artificial Intelligence Laboratory