############################################################################ # # # The Sanskrit Heritage Platform # # # # Gérard Huet # # # ############################################################################ # README Copyright Gérard Huet 2018 # ############################################################################ This platform aims at computer treatment of Sanskrit, in view of diverse philological applications: - statistics on corpuses, stylistic analyses, vocabulary computations - establishment of concordance listings, morphology-driven search in corpus - diachronic simulation and dialect features correlation of texts - assistant for critical editions management - compilation of digitalized corpus in a fully tagged Sanskrit Library. It may also be used for interactive consulting of a Sanskrit Dictionary with grammatical knowledge, on the Web, on a computer workstation, or on a personal communicator device. Finally, it may be used as support for teaching the language. This project is the follow-up of two previous stages. Gérard Huet started in 1994 the compilation of a Sanskrit to French lexicon for indological use (Lexique sanskrit-français à l'usage de glossaire indianiste) as a personal project. By year 2017 it had grown to 10700 entries, and the scope had changed. Various standards of completeness and of consistency were enforced, and the algebraic structure of a bilingual dictionary provided with grammatical annotations was progressively established and enforced by computer tools. Starting from 2000, this project became a central object of research on computational linguistics within Gérard Huet's research at the Paris Center of the Institut National de Recherche en Informatique et en Automatique (INRIA). A systematic reverse engineering of the TeX source of the lexicon produced a structured lexicographic database appropriate for making various computations. The first major contribution was the release in 2002 of the Zen toolkit for lexical, phonological and morphological treatment of natural language, a Pidgin ML library of finite-state machine tools. Zen was presented at ESSLLI in Trento in August 2002, and at the same time released on Internet under the LGPL license. Zen, and its conceptual background the AuM model of transducers with reversible regular relations, are independent of Sanskrit, and thus a distinct Ocaml library was developed for Sanskrit specific processing. During the years 2005-2009, Zen was developed further as a vehicle for a general paradigm of relational programming using effective Eilenberg machines. This constituted the thesis research of Benoît Razet. From 2006 onwards, an international cooperation on Sanskrit Computational Linguistics was started between INRIA and the Sanskrit Studies Department at the University of Hyderabad, headed by Pr. Amba Kulkarni. The goal of this joint team is to develop inter-operable packages for the processing of Sanskrit corpus, as communicating Web services. In 2011-2012, Pawan Goyal spent a post-doctoral year working with Gérard Huet on the Sanskrit Platform. He contributed many new modules: a regression analysis facility, a hypertext version of the Monier-Williams Dictionary interfaced with the platform's grammatical tools, and a user-aid facility allowing an annotator to augment on the fly the generative lexicon for substantive stems missing from the Heritage Dictionary, but listed in Monier-Williams. In 2013, Pawan Goyal and Gérard Huet designed and implemented an innovative new interface, sharing efficiently the segmentation solutions of the platform reader, which may run by billions. This allowed a much more productive use of the platform, that may now accommodate long sentences. In 2017 A rehaul of the global architecture was achieved, separating the production of linguistic data resources issued from the generating lexicon and the platform Web services operations proper. The two packages are available at git repositories, namely git@gitlab.inria.fr/huet/Heritage_Resources.git git@gitlab.inria.fr/huet/Heritage_Platform.git In summer 2017 Idir Lankri spent his Master internship implementing a Corpus Manager prototype, which is used by Gérard Huet to link citations of the Heritage Dictionary to selected readings stored in the corpus. May 21st 2018

Gérard Huet
authored
Name | Last commit | Last update |
---|---|---|
DOC | ||
IMAGES | ||
JAVASCRIPT | ||
ML | ||
SETUP | ||
SITE | ||
.gitignore | ||
.merlin | ||
INSTALLATION | ||
README | ||
configure |