manual.html 48.3 KB
Newer Older
huet's avatar
huet committed
1 2 3 4 5 6 7
<!doctype html>
<html>
<head>
<meta charset="utf-8">

<title>The Sanskrit Heritage Engine Reference Manual</title>
<meta name="author" content="G&#233;rard Huet">
8
<meta property="dc:datecopyrighted" content="2018">
huet's avatar
huet committed
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
<meta property="dc:rightsholder" content="G&#233;rard Huet">
<meta name ="keywords" content="india,dictionary,indology,sanskrit,lexicography,linguistics,indo-european,dictionnaire,sanscrit,panini,indology,linguistics">
<meta name="description" content="This is an online manual to the Sanskrit
Heritage Engine tools.">

<link rel="shortcut icon" href="IMAGES/favicon.ico">
<link rel="stylesheet" type="text/css" href="DICO/style.css" media="screen">
</head>

<body class="pink_back"> <!-- Pale_rose --> 
<table class="body">
<tr><td>

<h1 class="title"> The Sanskrit Heritage Engine Reference Manual</h1>

24
<!--
huet's avatar
huet committed
25 26
<div class="center">
<img src="IMAGES/jaganyantra.png" alt="Jagannath in Lotus Yantra">
27
</div> -->
huet's avatar
huet committed
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52

<h2 class="b2" id="history">About the Sanskrit Heritage Site</h2>

The Sanskrit Heritage website, at URL
<a href="http://sanskrit.inria.fr"><strong>sanskrit.inria.fr</strong></a>,
provides tools for the processing of the Sanskrit language.
<p>
This site offers public access to various Web services and Sanskrit lexicons
since 2003. It offers dictionary search, declension/conjugation, stemming,
and segmentation/tagging/parsing of Sanskrit sentences.
The site started as a set of tools to exploit a digital version of the
Sanskrit Heritage Dictionary, which had been developped as a personal
independent project  by G&eacute;rard Huet since 1996 as a
Sanskrit-French dictionary intended as a small encyclopedia of Indian culture.
These tools use the finite-state methods implemented in the ZEN
Objective Caml library to provide efficient lexicon representation,
morphology generation, and segmentation by sandhi recognition.
This technology was published in 2005 as
<a href="http://pauillac.inria.fr/~huet/PUBLIC/tagger.pdf">A Functional Toolkit
for Morphological and Phonological Processing, Application to a Sanskrit
Tagger</a>. A graphical interface, designed jointly with Pawan Goyal, has been
published recently as
<a href="http://jlm.ipipan.waw.pl/index.php/JLM/article/view/108/140">Design and
analysis of a lean interface for Sanskrit corpus annotation</a>.
<p>
53
Written on September 9th 2018, for Sanskrit Engine Version 3.09.
huet's avatar
huet committed
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

<h2 class="b2" id="tour">First approach to using the Sanskrit Heritage engine</h2>

The following scenario may be played remotely, if you are connected to
Internet with a Web browser.
Visit URL
<a href="http://sanskrit.inria.fr" target="_blank">sanskrit.inria.fr</a> to
go to the standard Inria Sanskrit Heritage server.
The same scenario may be played locally, if your workstation
is equipped with its own HTTP server, and if you install the Sanskrit Heritage
Engine software. This is explained below in the section
<a href="#installation">How to install the Heritage Engine on your own server</a>.
<p>
What you are seeing on the entry page is a somewhat ancient-looking Web document
in the HTML style of the 90's. Don't be put off by the look-and-feel,
but rather thank Inria for supporting this effort without throwing advertisements
at you.
<p>

The page has a green bar at the bottom, which is the navigation control panel.
Just click on the Reader link and you reach the Sanskrit Reader Companion page.
You may now enter a Sanskrit sentence candidate in the input window.
You may choose for this purpose a variety of input conventions
in the corresponding menu.
Let us assume you choose Velthuis transliteration proposed as default,
and input
"praaptavyam artha.m labhate manu.syo devo'pi ta.m lafghayitu.m na zakta.h".
After you press the Read button, you will see your input displayed in
devanāgarī script, followed by a graphical display with colored
rectangles labeled by word forms. Notice how <i>manu.syo</i> (resp. <i>devo</i>)
became <i>manuṣyaḥ</i> (resp. <i>devaḥ</i>) 
by sandhi analysis. Similarly <i>lafghayitu.m</i> became <i>laṅghayitum</i>.
<p>

The display actually represents all possible decompositions of your sentence
into padas (word forms),
aligned on your input represented in the blue line above.
Blue rectangles are <i>subantas</i> (adjectives and substantives), red rectangles
are <i>tiṅantas</i> (finite verbal forms).
Indeclinable words (adverbs and particles) are purple,
pronouns are sky blue, vocative forms are green.
<p>

When you click on a rectangle, its morphology is displayed.
For instance, clicking on the red <i>labhate</i> reveals that it is a 3rd person
singular form of the present of root <i>labh</i> in the middle voice
100
(<i>ātmanepadī</i>) of present class (<i>gaṇa</i>) 1.
huet's avatar
huet committed
101 102 103 104 105 106 107 108 109 110
Furthermore, underlined <i>labh</i> is a link to the lexicon,
which you may visit to check its meaning.
<p>

Here there are two scenarios. Had the Lexicon Access field been set to
Heritage in the reader input page, you would be directed to the Sanskrit-French
Heritage dictionary. If it had been set to Monier-Williams, you would be
directed to the Sanskrit-English Monier-Williams dictionary.
By default, you will get Monier-Williams access if you enter the site through its
<a href="http://sanskrit.inria.fr/index.en.html" target="_blank">English</a>
111 112
entry URL. Furthermore, this choice is sticky, which means it is persistent
throughout a session, until an explicit choice of lexicon is set. 
huet's avatar
huet committed
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141
<p>
In both cases
you obtain the definition of root <i>labh</i>, decorated with its present
class index (<i>gaṇa</i>), here 1, in red.
This index is itself a link to the conjugation
service, that reveals all the conjugated forms of <i>labh</i>, as well as all its
participial stems. These stems, e.g. past passive participle <i>labdha</i>, are
listed as gendered stems, the gender marks being links to the declension
service. This exhibits the generative nature of the lexicon: all the forms
obtainable from a root, either as finite conjugated forms, or as declined
first-level nominals (<i>kṛdanta</i>), are the building blocks of our analyser. 
<p>

Similarly, if you click on the blue <i>artham</i> form in the graphical
display, you will get its lemma as singular accusative of masculine stem
<i>artha</i>. This stem itself is a link leading to the corresponding lexicon
entry <i>artha</i>,
decorated by active gender marks. If you click on blue <i>prāptavyam</i>,
however,
you see a more complex morphological decomposition, informing you that
it is a form of the kṛdanta (primary nominal stem) <i>prāptavya</i>,
obtained by prefixing the preposition <i>pra-</i> to
the 3rd formation (in <i>-tavya</i>)
of the passive future participle (gerundive) <i>āptavya</i> of root <i>āp</i>.
Please note how the root is linked to its lexical access, from which the
stem <i>āptavya</i> and form <i>āptavyam</i> may be derived using the
conjugation cum declension tools.
<p>

142 143 144 145 146
Here we are lucky - the correct word analysis (<i>padapāṭha</i>) of the
sentence is obtainable as the sequence of all the words in the upper line
of the diagram. Some have no competitor, they are checked blue.
The remaining ambiguous segments have two marks, a green check sign
and a red cross sign.
huet's avatar
huet committed
147 148 149 150 151
In two clicks on the green upper signs you get the intended segmentation.
<p>

Now let us return to the reader window, and remove all the blanks in your input:
"praaptavyamartha.mlabhatemanu.syodevo'pita.mlafghayitu.mnazakta.h".
152
The segmenter now returns more solutions, 48 instead of 6, and you see
huet's avatar
huet committed
153 154 155 156 157 158
unexpected new forms appear, such as <i>prāptavyamartham</i>,
whose stem happens to be lexicalized as the name <i>Prāptavyamartha</i> of
the young boy from Pañcatantra, blamed by his father for having bought
a book containing just one poem, starting precisely with our sentence.
Other forms such as <i>ude</i> or <i>evaḥ</i>
are just noise due to sandhi ambiguity. But the
159
correct segments appear here too in prominent places, and in 3 clicks the
huet's avatar
huet committed
160 161 162 163 164
correct solution is easily attained. Note how clicking on blue segment
<i>manuṣyaḥ</i> determines unambiguously the next one <i>devaḥ</i>.
<p>
If you click on a selection by mistake, it is easy to backtrack by clicking
on the Undo button of the page. Other command links on the same
165
line (Filtered Solutions, All 48 Solutions) should be ignored at this stage,
huet's avatar
huet committed
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218
and will be explained in section <a href=#parser2>Shallow parsing</a> below.
<p>
Please note that the default Velthuis transliteration is just an option.
You may input devanāgarī script like
यतः कृष्णस्ततोधर्मः यतोधर्मस्ततोजयः by selecting the appropriate slot in
the "Input convention menu". Try this now by cut and paste from this document.
Similarly you may use the IAST standard Indic romanisation in Unicode,
like <i>yataḥ kṛṣṇastatodharmaḥ yatodharmastatojayaḥ</i>.  

<h2 class="b2" id="grammar">Morphological tools</h2>

<h3 class="b3">Grammar</h3>
The Sanskrit Grammarian, accessed from link Grammar in the green control bar,
gives you declined forms of nouns and conjugated
forms of root verbs. It is the workhorse of morphological derivation.
For nouns (under Declension heading) you must provide the base stem,
and its intended gender.
For verbs (under Conjugation heading), 
you must provide the root and its present class. The resulting table of
inflected forms is displayed either in Roman with diacritics (IAST),
or in devanāgarī text, according to your choice in the
Output font buttons.
<p>

The Declension tool accepts 4 gender parameters:
Mas for masculine, Neu for Neuter, Fem for Feminine,
and a final All that is to be used for deictic personal pronouns,
and for numbers. 
<p>

The Conjugation tool accepts 12 Present class parameters: 1 to 10 are used
for the traditional quality (gaṇa). 11 is used for denominative verbs.
Finally 0 gives the secondary conjugations: causative, intensive, and
desiderative.
Please note that in the Roman output the first person appears first,
whereas in the devanāgarī output the third person appears first
(<i>prathama</i>), consistently with vyākaraṇa tradition.
<p>

Homonyms are adressed using homonymy indexes, like in <i>kara#1</i> and
<i>kara#2</i>. In case of doubt, access the tool from the intended entry in
the lexicon.
If you do not specify the index, the system will make an educated guess of
the intended homonym. For instance, if you ask for the conjugation of
root <i></i> in class 1, the system will propose the forms of <i>mā_4</i>;
in class 2 or 3 it will propose <i>mā_1</i>. But if you intend <i>mā_3</i>
of class 3 you must address it explicitly as <i>maa#3</i>.
If you enter random stems and parameters, you will get arbitrary nonsense,
according to the principle "garbage-in garbage-out". Thus if you ask for the
declension of stem <i>blablabla</i> in the masculine you will get 
nonsensical forms such as ablative <i>blablablāt</i>.
But at least you are warned by the
system, that indicates its doubt by labeling the declension table as
219
<i>blablabla?</i> If you ask for its forms in the feminine,
huet's avatar
huet committed
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258
you will get a Gender anomaly report.
<p>

This morphological engine is available from within the dictionary pages,
where the gender indications of nouns, and the present family indications
of roots, are active links which activate the Sanskrit Grammarian with the
right parameters. 

<h3 class="b3" id="stemmer">Stemmer</h3>
Conversely, an inflected form which is derivable from the dictionary entries
is retrievable, with its morphological taggings, from the Stemmer,
also accessible from the green control bar.
<p>

The user must provide the lexical category where to search the word from.
Available categories are Noun, for nominal and adjectival forms,
Pron for pronominal forms, Verb for finite root forms, Part for participial
forms as primary derivatives from roots, Inde for indeclinable forms (adverbs,
particles, infinitive forms, root absolutives), Absya for absolutive forms
in <i>-ya</i> (usable with preverbs prefixing), Abstvaa for absolutive forms
of roots in <i>-tvaa</i>, Voca for vocative forms,
Iic for stems usable as left component of a compound, Ifc for right
components of compounds, Iiv for inchoative forms in <i></i> 
usable to form compound verbal forms with auxiliaries
(the <i>cvi</i> construction), Piic for participial stems.
<p>

For instance, forms usable only <i>in fine compositi</i> such as <i>kāraḥ</i>
are to be found in the Ifc bank. 
There is some redundancy between the Noun and the Part banks.
Thus a word form such as <i>gataḥ</i> may be found in Noun,
tagged as { nom. sg. m. }[gata], as well as in Part,
tagged as { nom. sg. m. }[gata { pp. }[gam]]. Such lemmatisations are
linked to the lexicon by stems (here <i>gata</i>) as well as by roots
(here <i>gam</i>).
<p>

These linguistic resources are freely provided
in XML form under various transliteration schemes.
259
Please visit the <a href="xml.html">Sanskrit linguistic resources page</a>.
huet's avatar
huet committed
260 261 262 263 264 265 266

<h2 class="b2" id="dictionary">The Sanskrit Heritage Dictionary</h2>
<p>
The Sanskrit Heritage Dictionary is the latest edition of a Sanskrit
to French Dictionary
"Dictionnaire Français de l'H&eacute;ritage Sanskrit" compiled by
G&eacute;rard Huet since 1994. This dictionary is freely available
267
as a 945 pages <a href="Heritage.pdf">book</a> under the pdf format,
huet's avatar
huet committed
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293
easily readable with Acrobat Reader, a free Adobe product.  
This dictionary is still under development, and is
automatically updated along with the site,
being now a computer-generated by-product of the lexical database 
of the platform.
<p>

This dictionary is the base for morphology generation used by the grammatical
tools. It may be used also as a small encyclopedia of Indian culture. 
The Sanskrit name that renders best our encyclopedic intention is 
<i>saṃskṛtibhāratīyakośa</i> -
Treasure of India according to Perfected tradition. 
Knowledge in this tradition is traditionally transmitted by lineages of teachers
(<i>paraṃparā</i>). Some of this knowledge is available to the West through
Indological litterature, but often in dessicated form. Many sources were used
to compile this information, and inevitable mistakes and inconsistencies
occur, not to speak of glaring omissions. We pray the reader who knows
better to signal such overcomings to us.
<p>
Perfected means Sanskrit or Sanskritized.
Thus usual names in vernacular [prak&#7771;ta] or p&#257;li are generally
given in their original Sanskrit form. Dravidian names are sometimes adapted
to Sanskrit as an approximate phonetic rendition, but our lexicon is too limited
to account for dravidian traditions, not to speak of tribal ones.
In any case, this modest dictionary ought not to be considered as a
scholarly erudite document, but rather as a simplified presentation
Gérard Huet's avatar
Gérard Huet committed
294
of Indian culture for the educated public. 
huet's avatar
huet committed
295 296 297 298 299 300 301 302 303 304
<p>
Entries in the dictionary are arranged by vocables, which may be verbs or nouns.
Verbs comprise verbal roots, but also their variations with prefix sequences of
preverb particles, and secondary stems for causatives, intensives and
desideratives.
Nouns comprise noun roots, primary noun derivatives from verbs, secondary noun
derivatives by suffixes from primary ones, and compounds.
The first two categories are individual entries
at toplevel, the others are sub-entries of a parent vocable, or sub-sub-entries.
Adjectives are just semantic roles of nominals. Pronouns and numbers are
305 306 307
subclasses of nouns. Indeclinable forms (adverbs) and tool particles such as
conjunctions complete the lexical categories.
Some idiomatic expressions and a few selected
huet's avatar
huet committed
308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
citations are listed at the end of entries at any level.
<p>
The list of abbreviations, of the Heritage dictionary as well as the grammatical
engine, is available 
as a standalone <a href="abrevs.pdf"><strong>pdf document</strong></a>. 
<p>
Two index engines are provided.
The main <a href="DICO/index.html"><strong>index</strong></a>
requires exactly transliterated input, possibly an initial
prefix of an existing entry, possibly some inflected form of a declined noun
or a conjugated verb.
The <a href="DICO/index.html#easy"><strong>Sanskrit made easy</strong></a>
index requires a romanized input for a full word, without diacritics and
aspiration marks, for easy access to words like Siva, Vishnou, Panini,
Sankara, etc. 
<p>
The user who opts for
<a href="DICO/index.en.html"><strong>Monier-Williams access</strong></a>
will have the benefit of seeing
definitions in English if he does not know French, while having access to
the grammatical online tools in the same way. However proper names are not
properly glossed as hyper-linked entities. Furthermore, the index tool
is not as smart as the Heritage one, since you have to give the exact
stem of the entry. Thus e.g. <i>devanāgarī</i> must be entered in full,
while the initial prefix <i>devanāg</i> suffices for its disambiguation
by the Heritage index.
<p>
The Sanskrit Heritage dictionary is also available in an ebook format,
usable with the Babyloo, Stardict or Goldendict software.
Please visit the <a href="goldendict.html">Golden Sanskrit Heritage</a> page.

<h2 class="b2" id="engine">The Sanskrit Engine </h2>

The Sanskrit Engine consists in a number of tools accessible online on the
Sanskrit Heritage site. These various tools are available through interfaces
easily reached from the green band at the bottom of your browser panel. 

<h3 class="b3" id="sandhi">Sandhi</h3>
The Sandhi Engine takes two phoneme streams (input as transliterated strings)
and gives as result their sandhi euphonic composition. There are two modes,
external for glueing together words in a sentence, as well as making
nominal compounds,
and internal, for appending of affixes to stems in morphological derivations.
We provide a deterministic answer, that is a choice is made when optional
forms are admitted. Its output does not preclude the obtention of different
forms using an optional rule.
A fuller non-functional sandhi relation is used by the
Gérard Huet's avatar
Gérard Huet committed
355
segmenter, in order to recognize the optional variants in conformity
huet's avatar
huet committed
356 357 358 359 360 361 362 363
with Pāṇini. 

<h3 class="b3" id="reader">Reader </h3>
The Sanskrit Reader Companion allows the analysis of Sanskrit sentences.
We already saw an example of its use in the graphical Summary mode.
Let us now examine the nature of its parameters.
<p>

Gérard Huet's avatar
Gérard Huet committed
364
The parameter "Lexicon Access" chooses the look-up dictionary. This parameter
huet's avatar
huet committed
365 366 367 368 369 370 371 372
is persistent within a session. On the standard server it is set by default
to Sanskrit Heritage, but if you are an English speaker you may want to set it
to Monier-Williams, by accessing the
<a href="http://sanskrit.inria.fr/index.en.html">English</a> entry URL.
If you install the tools on your own server, you will set such default
parameters at configuration time.
<p>

Gérard Huet's avatar
Gérard Huet committed
373 374
You should be aware that the choice of the look-up dictionary is of no
consequence to the reader tools, since the morphology generation lexicon
huet's avatar
huet committed
375 376 377 378 379 380
is Sanskrit Heritage. Thus the forms of certain stems in Monier-Williams
may not be recognized (however, see
<a href="#user-aid">user-aid</a> below for their acquisition).
Conversely, the richer generation of participles allows the recognition
of many forms, whose stems are not lexicalized in Monier-Williams.
The covering of Heritage within Monier-Williams is indicated explicitly
Gérard Huet's avatar
Gérard Huet committed
381
since entries lexicalized in Heritage are rendered highlighted in yellow in 
huet's avatar
huet committed
382 383 384 385 386 387 388
the Monier-Williams pages.
<p>

The parameter "Cache" is for advance use, explained below in
<a href="#user-aid">user-aid</a>.
<p>

Gérard Huet's avatar
Gérard Huet committed
389
The parameter "Text" is set by default to Sentence, and may be set to Word
huet's avatar
huet committed
390 391 392
if you want to recognize a single pada.
For instance, if you parse the following compound (taken from Pañcatantra):
"pravaran.rpamuku.tama.nimariicima~njariicayacarcitacara.nayugala.h"
393
in Sentence mode, you will be offered 32 solutions, but only 6 solutions
huet's avatar
huet committed
394 395 396 397 398 399
in Word mode. 
<p>

The next parameter "Format" is a toggle between reading sandhied text and reading
text which has already been analysed in words (padapāṭha).
Thus the sentence "si.mhovyaakara.nasyakarturaharatpraa.naanmune.hpaa.nine.h"
400 401 402 403
my be parsed in sandhied mode (yelding 58 potential solutions),
or may be presented in padapāṭha form as
"si.mha.h vyaakara.nasya kartu.h aharat praa.naat mune.h paa.nine.h"
(yielding only 24 solutions).
huet's avatar
huet committed
404 405 406 407 408 409 410 411 412
<p>

The parameter "Parser strengh" is by default set at "Full". It may be
set to "Simple", meaning that no generation of participial stems and
privative compounds is effected, all stems must be lexicalized. Simple mode
segmentation should be reserved to small sentences explained to learners.
<p>

The "Input convention" parameter allows a number of formats. Transliteration
413 414
using ASCII characters is possible in 4 varieties: Velthuis, WX (University of
Hyderabad), KH (Kyoto-Harvard), and SLP1 (Sanskrit Library). These various
huet's avatar
huet committed
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433
conventions are presented in a 
<a href="DOC/transliterations.pdf">synthetic document</a>.
Thus <i>vaiśeṣikaḥ</i> may be input as <i>vaize.sika.h</i> in the default Velthuis scheme,
as <i>vaizeSikaH</i> in the Kyoto-Harvard scheme, as <i>vESeRikaH</i> in the WX scheme, or as <i>vESezikaH</i> in the SLP1 scheme. 
<p>

In addition, Unicode input
may be used, both for devanāgarī and for the IAST romanisation with diacritics,
the Indology standard. Thus one may input directly वैशेषिकः or vaiśeṣikaḥ.
<p>

The "Optional topic" parameter is used in Parser mode to indicate a contextual
topic usable as ellipsed agent. This is an experimental feature.
<p>

Finally, the "Mode" parameter offers several modes of operation of the Engine.
We saw the default Summary mode. Other modes are provided to display
all solutions sequentially. These modes are mostly deprecated, since they
produce enormous pages when there are many solutions. It is possible to access
434 435
these modes from the graphical Summary mode, when there remain only a
few solutions.
huet's avatar
huet committed
436 437 438 439 440 441 442

<h3 class="b3" id="parser1">Shallow parsing, a first approach</h3>

Let us call the Reader with a simple sentence such as:
<i>vana.mgatvaadhyaana.mkaroti</i> (in Velthuis).
The summary interface returns a page
showing you your input sentence in blue devanāgarī, then a line with a number
443
of green check marks, the first one being labeled Undo, then the graphical
huet's avatar
huet committed
444
display where segments may be selected or discarded, as explained above.
445 446
The third button is labeled "All 14 Solutions". It indicates that there is a
total of 14 segmentation solutions at this initial stage. Indeed, when you
huet's avatar
huet committed
447 448
select segments by clicking on their green check signs (resp. discard
them by clicking on their red check signs) you see the count of solutions
449
decrease. Thus selecting segment <i>dhyānam</i> brings this count to only 2.
huet's avatar
huet committed
450 451 452 453 454 455 456 457 458

The only remaining choice is to select or reject the yellow <i>a</i>
segment indicating a possible privative compound. We remark <i>en passant</i>
that the only way to get the correct intended interpretation is to discard
this parasitic privative segment, which is entirely absorbed by sandhi
with the final <i></i> of segment <i>gatvā</i>, and thus cannot be rejected
from other segment selections. This shows the necessity of the red cross signs.
<p>

459
From the state after selection of <i>dhyānam</i>, click on the green check
huet's avatar
huet committed
460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489
labeled "All 2 Solutions". You see the two potential solutions listed one
after the other, with no sharing of common parts. Each segment is lemmatized
with hyperlinks to the lexicon. Segments are separated by sandhi annotations.
In this linear interface, it is possible to select solutions by clicking
on the green check after the index of the solution. For instance,
let us select Solution 1. 
<p>

We are facing now another user interface called the Sanskrit Parser Assistant.
The selected solution is displayed in 3 columns. The 1st column, yellow, is
the <i>padapāṭha</i>. The 2nd column displays possible stemmings as a sequence of
morphological multitags. For instance, on the first row, <i>vanam</i>
is analysed as:
{ acc. sg. n. | nom. sg. n. }[vana], where stem <i>vana</i> is hyperlinked
to the lexicon. Each selection in the multitag is equiped with a selecting
button, preset to a default value. Here you may chose the case accusative
(pre-selected) or nominative of word form <i>vanam</i>. 
<p>

The right column attempts to represent the cases by semantic roles,
with occasional English gloss of verbal forms. This representation is an
approximation which is actually slightly misleading,
since it attempts to relate nominatives to an
evasive syntactic notion of Subject which is of little relevance to Sanskrit.
Actualy nominative
forms denote just names of the "unexpressed" (<i>anabhihita</i>) semantic role.
Thus at best this representation is some approximation of syntax. 
<p>

Below the three columns you find a button labeled Submit. You may press it
490 491
to validate the morphological choices, and the resulting page gives you
a unique parse as a hypertext <i>padapāṭha</i> which you may save in user space.
492
<!-- Deprecated
huet's avatar
huet committed
493 494
Actually, if you install the software on your own workstation, by configuring it
in mode Station you will have in the page a Validate button
495 496
usable to record this solution in a non-regression suite. -->

huet's avatar
huet committed
497 498 499 500 501 502 503 504 505 506 507 508 509 510 511
<p>

Returning to the Parser Assistant page, you will notice some cryptic
notations attempting to assign penalties to morphological choices. This is
another way to make morphological selections, ranked by decreasing penalty.
Each selection is marked with a mouse sensitive heart symbol, which effects
its commitment. We shall return to this in the next section.

<h3 class="b3" id="parser2">Shallow parsing, advanced</h3>

Let us return to the original state of our interface interaction, after
entering <i>vana.mgatvaadhyaana.mkaroti</i>. We notice in the first menu line
a button labeled "Filtered Solutions". If you click on it, you see a listing
of solutions similar to what we saw in the last section, but now
solutions are listed according to some constraint satisfaction ranking.
512 513
The first one (labeled 2) is the intended one, followed by another one
(labeled 5), proposing <i>ādhyānam</i> in place of <i>dhyānam</i>.
514
You may select the one you prefer, and go directly to the Parser
huet's avatar
huet committed
515 516 517 518 519 520 521 522
Assistant page as seen above. You may also go back to the graphical summary
interface, allowing mutual interaction between the two modes of operation.
<p>

Let us illustrate this shallow parsing facility on a much-discussed
ambiguous sentence going back to Patañjali. 
Go back to the Reader interface, and enter
<i>zvetodhaavati</i>, using Summary Mode. You see a display of the
523
26 segmentation solutions. You are also offered a green check sign labeled
huet's avatar
huet committed
524
Filtered Solutions. Click on it.
525
You see one particular solution, labeled 14, formed with blue <i>śvetaḥ</i>
huet's avatar
huet committed
526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543
in the nominative, followed by red <i>dhāvati</i>, a verbal form in the present.
Actually form <i>dhāvati</i> is marked as ambiguous, since it may result from
root <i>dhāv_1</i> (running) or from root <i>dhāv_2</i> (cleaning).
<p>

Clicking on the green check sign brings you to the Sanskrit Parser Assistant
page. The interpretation "It runs" is pre-marked, favoring root <i>dhāv_1</i>.
Lower in the page, you see indeed that this interpretation incurs no
penalty. Clicking on the green heart sign, or equivalently to the preset Submit
button brings you to the fully disambiguated padapāṭha
"The white one is running".
The other segmentation has some penalty, explained with the "-Obj" indication,
marking the absence of the object to a transitive verb.
<p>

In this example, the machine has succeeded in focusing on a correct solution
automatically, among many interpretations.
If we come back to the initial selection, it indeed tells
544 545 546
"1 solution kept among 26",
but actually lists also 5 other plausible additional solutions. Indeed, among
them, Solution 23 gives another correct decomposition <i>śvā+itaḥ+dhāvati</i>
huet's avatar
huet committed
547 548 549 550 551 552 553 554 555 556 557 558 559
"The dog is running towards here". Here too, the tool analyses <i>dhāv_1</i>
as fitting the grammatical constraints. It has penalty 0 as well, but
was just disfavored over the first interpretation because it has 3 segments
rather than 2, exhibiting a "shortest length bias" heuristic. 
<p>

This shallow parser cannot be used on large input sentences, since its output
could become enormous to the point of choking the server. Thus we have its
access link "Filtered Solutions" appear only when the number of remaining
segmentation candidates is below a threshold set by default to 100.
This is in contrast with the situation with the graphical interface, which
is fast and robust. Thus entering the following verse from Kālidāsa, we obtain
very quicky a display factorizing an astronomical number
560
(24873394117017600) of solutions:
huet's avatar
huet committed
561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577
<i>yaa tapovize.saparizafkitasya sukumaaramprahara.nam mahendrasyapratyaadeza.h ruupagarvitaayaa.h zriya.h ala.mkaara.h svargasyasaana.h priyasakhyurvazii kuberabhavanaat pratinivartamaanaasamaapattid.r.s.tena kezinaadaanavenacitralekhaadvitiiyaa bandigraaha.mg.rhiitaa</i>.

<h3 class="b3" id="parser3">Lexical categories</h3>

The main lexical categories exhibited so far are:<br>
* substantive/adjective forms (blue)<br>
* vocative forms (green)<br>
* finite verbal forms (red)<br>
* undeclinable forms such as adverbs, conjunctions, prepositions (mauve)<br>
* pronominal forms (light blue)<br>
* left part of compounds (yellow)<br>
<p>
Actually, complex compounds with n+1 components appear as a sequence of n
yellow segments denoting stems, followed by a blue nominal inflected form.
For instance, enter in the Reader the following input (Velthuis)
<i>pravaran.rpamuku.tama.nimariicima~njariicayacarcitacara.nayugala.h</i>
or प्रवरनृपमुकुटमणिमरीचिमञ्जरीचयचर्चितचरणयुगलः (Devanagari). The returned display
578
exhibits 32 solutions in Sentence mode. But if you select Word rather than
579
Sentence as Text mode parameter, you get only 6 solutions, which denote various
huet's avatar
huet committed
580 581 582 583 584
nominal compounds with many components. Actually, there are remaining
ambiguities concerning the bracketing of their constituents. Let us examine
a few typical situations. <p>

First of all, some of the constituents may constitute a <i>dvandva</i>
585 586 587
compound. For instance, consider <i>yakṣagandharvanāgāḥ</i>.
It is a <i>dvandva</i> compound with 3 components <i>yakṣa</i>,
<i>gandharva</i>, and <i>nāgāḥ</i>. The first two are bare stems, only
huet's avatar
huet committed
588 589 590 591 592 593 594 595 596 597
the third one bears declension (<i>vibhakti</i>). Note that actually we
distinguish two cases of the last segment: a blue one for nominative,
and a green one for vocative. This distinction between vocatives
and other cases is important, since vocatives are not really syntactic
components of a sentence, but rather separate interjections, part of the
communicative structure.
<p>

Let us now consider binary branching compounds. A three component display
A-B-C may actually represent the compounding structure (A-B)-C
598
(for instance <i>viśvarūpadarśanam</i>) or (less commmonly) the structure A-(B-C)
huet's avatar
huet committed
599
(for instance <i>ubhayacakravartī</i>). Thus long compounds are represented
600 601
in ambiguous ways, since the mechanical reader does not know how to choose
between them on the sole basis of grammatical dependencies.
huet's avatar
huet committed
602 603 604 605 606 607 608 609 610
<p>

Now consider the compound stem <i>pitāmbara</i>. It may denote a determinative
compound (<i>tatpuruṣa</i>), meaning "yellow garment", of neuter gender
inherited from its component <i>ambara</i>. Or it may denote an exocentric
compound (<i>bahuvrīhi</i>), of adjectival meaning "who wears a yellow garment".
Thus, on input <i>pitāmbaram</i>, in mode Word, we have two solutions,
sharing the yellow initial component <i>pitā</i>. The first solution proposes
a blue neuter nominal segment <i>ambaram</i>, analysed as accusative or
611 612
nominative of stem <i>ambara</i>.
The second <i>ambaram</i> however is of a distinct cyan colour,
huet's avatar
huet committed
613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628
and is analysed as masculine accusative. This second solution is mandatorily
the exocentric compound "he who wears a yellow garment", typically an
epithet of Lord Viṣṇu. The cyan colour segment may not occur stand-alone,
it is mandatorily preceded by a yellow segment in order to form an
exocentric adjectival compound. But the first solution is ambiguous,
since it may be interpreted as a <i>tatpuruṣa</i> or as a <i>bahuvrīhi</i>.
This example ought to be thoroughly understood in order to learn how to
select the segments corresponding to the intended meaning.
<p>

There exists yet another variety of compound, the so-called <i>avyayībhāva</i>
"turned into undeclinable". Let us consider a typical example,
<i>nirmakṣikam</i> (without flies). Here this input is analysed as a sequence
of segments, first the preposition <i>nis</i>, colored lavender, and then the
stem <i>makṣikā</i>, turned into an invariable form <i>makṣikam</i>,
colored magenta. We remark that the segment <i>makṣikam</i> is not accepted
629
as stand-alone input. Please also note with this last example
huet's avatar
huet committed
630 631 632
that an unrecognized chunk of input yields a grey rectangle. 

<p>
633
Verbal compounds exist, such as the periphrastic perfect construction,
huet's avatar
huet committed
634 635 636 637 638
used for secondary conjugations and nominative verbs. It builds
a special stem in <i>-ām</i>, suffixed by a perfect form of
one of the auxiliaries <i>kṛ</i>, <i>as</i> and <i>bhū</i>.
Try for instance <i>āmantrayāṃcakre</i>. You see the periphrastic form
displayed as two segments, an orange <i>āmantrayām</i>, and the red
639
<i>cakre</i> of the perfect of root <i>kṛ</i>: "he/I summoned". The orange
huet's avatar
huet committed
640 641 642 643 644 645
and red segments are mutually linked, selecting one selects automatically
the other.
<p>
Another periphrastic construction is the inchoative "cvi" verbal compound.
Its left part is a special substantival stem in <i>ī</i> or <i>ū</i>,
and its right part a finite verb form of one of the auxiliaries,
646
like <i>kadarthīkaroti</i> or <i>mṛdūbhavati</i>.
huet's avatar
huet committed
647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673
It in turns gives rise to primary derivatives (<i>kṛdanta</i>) like
<i>khilībhūtaḥ</i>. Here too the left part is orange, and the right part is
either red for verbal forms or blue for participial forms. 

<p>

This concludes the main grammatical paradigms implemented by our machinery.
Some more exotic constructions may occasionally be met, like
the special construction of forms of <i>kāma</i> or <i>manas</i>,
preceded by a special infinitive verbal form in <i>-tu</i>. Try for instance
<i>vaktukāmaḥ</i> ("who wants to speak"). Note that two blue segments
<i>kāmaḥ</i> appear in the result. One is used as a stand-alone nominal form
(if you select the red imperative form <i>vaktu</i>), whereas the other one
is necessarily used together with the salmon-colored special infinitive segment
<i>vaktu</i>. Similarly for <i>draṣṭumanāḥ</i> ("inclined to see").
<p>
The user of our machinery may be occasionally puzzled by what may appear as
redundancies. For instance, consider the input <i>mānam</i>. Two blue
apparently identical segments labeled <i>mānam</i> occur. However,
closer inspection (by clicking on these blue rectangles) reveals that one
is a form of <i>māna_1</i> (past participle of root <i>man</i>), and the other
one is a form of <i>māna_2</i> ("measure"). Although the two segments have
the same color, being both <i>subanta</i> nominal forms, they do not obey
the same combinatorics, since a participle (<i>kṛdanta</i>) stem like
<i>māna_1</i> is liable to be prefixed by the preverb particles
(<i>upasarga</i>) allowed for root <i>man</i>.

Gérard Huet's avatar
Gérard Huet committed
674 675 676 677 678 679 680 681
<p>
Another interesting exemple is <i>virodhitayā</i>. The two blue segments look
alike, and they are both instrumental singular forms of the feminine stem
<i>virodhitā</i>. But one is the past participle of the causative of
verb <i>vi-rudh</i>, the other is an abstract <i>taddhitānta</i> noun,
obtained as <i>virodhi(n)-tā</i>. Distinguishing the two is essential, since
they don't have the same dependency, the first one being an adjective
requiring a substantive as its qualificand.
huet's avatar
huet committed
682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725
<p>

In order to understand the segmenting algorithm, one should study its control
automaton. Here is the <a href="IMAGES/lexer10.jpg">simplified automaton</a>,
explaining the main constructions.
Words (<i>pada</i>) are recognized by paths going doing from the starting state
S, and ending in the accepting state Accept. The link going upward from Accept
to S allows to recognize a sentence as a sequence of words, sandhi being
effected on the arcs of the diagram. Please note the cycle through state Iic,
allowing the recognition of arbitrary length (flattened) compounds.
When this diagram is mastered, consider the
<a href="IMAGES/lexer17.jpg">extended automaton</a>, which adds vocatives,
cvi verbal compounds, and privative compounds. The bank Nounc (respectively
Nounv) is the subset of nominal forms Noun starting with a consonant
(respectively a vowel). Privative compounds are obtained by prefixing them
with <i>a-</i> (respectively <i>an-</i>).
Finally, consider the <a href="IMAGES/lexer40.jpg">complete automaton</a>.
The state Krid corresponds to first-level nominal constructions from roots,
notably participial forms. These may be preceded by preverbs (Pvk).
The state Priv stands for one of the two forms of the privative prefix a/an.
One must imagine partitioning all banks whose state follows Priv into
forms started with a consonant or a vowel, similarly to the preceding diagram.
Finally, the state Neg stands also for the privative prefix a/an, prefixing
a root absolutive in <i>-tvā</i>, like <i>akṛtvā</i> (having not done). 
<p>
The results returned by our graphical interface may be thought of as describing
all paths following this state diagram, except that preverbs are glued
to the root and participial forms following them. 

<h3 class="b3" id="parser3">Deep parsing</h3>

The Sanskrit Heritage Engine may also be used as segmentation front-end
for the dependency parser designed by Pr Amba Kulkarni at University
of Hyderabad. 
<p>
This allows the production of dependency graphs labeled with semantic
relations, and ranked by decreasing satisfaction of dependency constraints.
We shall not explain further this facility, which is still under development
and not yet publicly released.

<h3 class="b3" id="user-aid">The user aid facility for lexicon acquisition</h3>

Our generating lexicon is not an extensive dictionary of Sanskrit.
Occasionally you will encounter forms that are not recognized.
726
For instance, assume you enter <i>patamaṃ śṛṇoti</i> in the Reader.
huet's avatar
huet committed
727
The result is a two segment solution attempt, where  <i>śṛṇoti</i> is
728
a red recognized verbal form of root <i>śru</i>, but <i>patamam</i>
huet's avatar
huet committed
729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751
appears as a grey unrecognized form. Its segment is available for selection with
a red spade symbol. If you click on it, you get to a help page labeled
"Feedback for Unknown Chunks". This page comprises 3 zones.
The first zone allows you to correct your sentence. The second one allows
you to correct the faulty chunk of text, here <i>patamam</i>.
<p>

The third zone is actually available only if you install the Heritage Engine
in Station mode on your own workstation, it is not available on the public
server. It is a facility that helps you recognize forms of stems that are
not lexicalized. It proposes you various hypothetical lemmatizations for
the unrecognized stems. Among those, the ones that are lexicalized in the
Monier-Williams dictionary are underlined, indicating a lexicon link that you
may consult to verify whether its meaning is appropriate. Each lemmatization
is marked with a selection button. For instance, if you choose acc. sg. n.
of stem <i>paṭhana</i> and press Submit Morphology, you are brought back
to the Sanskrit Segmenter Summary, where now segment <i>patamam</i> is blue.
This way you may progressively augment the recognized forms or correct faulty
input.
<p>

Furthermore, your choice has entered the stem <i>patama</i> to a local
cached lexicon on your platform. Thus, if later you encounter sentence
752 753
<i>patamāḥ śrūyante</i>, the segment <i>patamāḥ</i> will be
recognized as a bona fide form of stem <i>patama</i>. 
huet's avatar
huet committed
754 755 756 757
<p>

It may also happen that a chunk of text is successfully analysed, but none
of the segmentation solutions corresponds to the intended one, because of
Gérard Huet's avatar
Gérard Huet committed
758
some incompleteness in the lexicon. In this case, it is possible to invoke
huet's avatar
huet committed
759 760 761 762 763 764
the user aid by clicking anywhere in the chunk itself on its blue rendition
above the colored rectangles. This will allow the user to fill-in the
right segmentation, if it is a nominal form obtainable as the inflected form
of a nominal item lexicalized in the Monier-Williams dictionary.
<p>

765 766
This facility is an objective reason to install
the Sanskrit Heritage Engine on your own workstation.
huet's avatar
huet committed
767 768 769 770 771 772 773 774 775 776

<p>
The lexicon cache is reset by command "make empty_caches" in the installation
directory. 

<h3 class="b3" id="user-aid">Fine-grained input considerations</h3>

We already discussed above the parameters Format and Input conventions.
In the "Sandhied" format, blanks are necessary only when there is an
actual hiatus in the devanāgarī representation. For instance, in 
777
<i>vanād grāmam adyopetyaudana aazvapatenāpāci</i>, only the third
huet's avatar
huet committed
778 779 780 781 782 783
blank space is mandatory. The others may be removed. They are just
help for the segmenter, in indicating pada boundaries. Of course, if you remove
them, the number of potential solutions may increase, since the system
will attempt analyses not respecting these word boundaries. The third space
above
is mandatory, and actually gives rise to two distinct segmentations, one with
784
the form <i>odanaḥ</i>, the other with the form <i>odane</i>.
huet's avatar
huet committed
785 786 787 788
<p>
Note that in the system's rendering, the mandatory space is indicated by
an underscore symbol. Indeed, the user may use underscore to mark the
necessary pauses, and thus the above example may be entered without any
789
space as <i>vanādgrāmamadyopetyaudana_aazvapatenāpāci</i>.
huet's avatar
huet committed
790
On the other hand, a blank may be inserted between letters even though
791 792
the separate chunks are not in final sandhi, like after <i>vanād</i> above,
or in <i>vanaṃ gacchati</i>. Thus Sandhied format with optional blanks
huet's avatar
huet committed
793 794
is completely different from Unsandhied format, where each chunk of input must
be a pada in final sandhi form, like in:
795
<i>vanāt grāmam adya upetya odanaḥ aazvapatena apāci</i>.
huet's avatar
huet committed
796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821
When entering digitalized corpus in our machinery, one must understand well
this distinction, and possibly restore a consistent input.
<p>
The nasalisation sign anusvāra is optional when it stands for a nasal, and
mandatory only before sibilants and <i>h</i>. Thus <i>sandhi</i> and
<i>saṃdhi</i> are equivalent. Similarly for visarga before a sibilant. Thus
<i>śunaḥśepa</i> or <i>śunaśśepa</i>. 
<p>
Sandhi of <i>n</i> before <i>l</i> (anunāsika) is noted in our adaptation of
Velthuis notation by a pair of tilde symbols, like in <i>vidvaal~~likhati</i>,
leading to candrabindu in devanāgarī, like: विद्वालँलिखति.
<p>
It is also possible to help the segmentation of compounds, by inserting a
hyphen at the stem boundaries. For instance, the long compound:
"pravaran.rpamuku.tama.nimariicima~njariicayacarcitacara.nayugala.h"
may be disambiguated to a certain extent as:
<i>pravara-n.rpa-muku.ta-ma.ni-mariici-ma~njarii-caya-carcita-cara.na-yugala.h</i>
<p>
When initial short <i>a</i> is deleted by sandhi, it is possible to
indicate the situation with the <i>avagraha</i>
sign, noted by an apostrophe ' in
transliteration. Actually this notation is mandatory in certain situations
(after e and o) like <i>devo'pi</i>. Thus the Bhagavadgītā verse
नासतोविद्यतेभावोनाभावोविद्यतेसतः will only accommodate Śaṅkara's analysis 
<i>na asataḥ vidyate bhāvaḥ na abhāvaḥ vidyate sataḥ</i>, whereas
Madhva's interpretation (with <i>abhāvaḥ</i>) has to be made explicit as
822
नासतोविद्यतेऽभावोनाभावोविद्यतेसतः
huet's avatar
huet committed
823 824 825
<p>
Finally, the system does not currently support degemination of stems,
such as modern renditions of <i>tattva</i> as <i>tatva</i>
826 827 828 829 830 831 832 833 834 835 836 837 838 839 840
or <i>vārttā</i> as <i>vārtā</i>; only a few common stems such as 
<i>chatra</i>, <i>chātra</i> and <i>patra</i> are recognized. 

<h3 class="b3" id="zloka_input">Entering full verses (<i>śloka</i>).</h3>

It is possible to enter longer pieces of text than a single line.
Verses (<i>śloka</i>) may be entered as lists of lines ended with the vertical
bar | (<i>daṇḍa</i>), terminated by a line ended with two vertical
bars || (<i>pūrṇavirāma</i>). Thus, for instance:
<p>
d.r.s.tvaa tu paa.n.davaaniika.m vyuu.dha.m duryodhanastadaa |<br>
aacaaryamupasafgamya raajaa vacanamabraviit ||<br>
<p>
Please note that this notation is mandatory for such examples, where the
first verse should not be glued by sandhi to the second one. 
huet's avatar
huet committed
841

842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869
<h2 class="b2" id="corpus">The Sanskrit Corpus (Experimental)</h2>
<p>
  This is a set of tools to browse and manage a corpus.  You can explore
  the corpus tree and possibly add and modify the analysis of
  a sentence.  There are three modes of use (if you install the platform
  in the <em>Station</em> mode) :
  <ol>
    <li>Reader (available regardless of the installation platform):
      explore the corpus tree and display in read-only mode the analysis
      of a given sentence with the graphical interface of the segmenter
    <li>Annotator: add a new sentence and save the current state of the
      analysis at any time via the "Save" button in the graphical
      interface of the segmenter
    <li>Manager: add new branches to the corpus hierarchy
  </ol>
<p>
  When you place yourself in a certain corpus location (foo/bar for
  example) and you decide to add a new sentence, you are directed to the
  Sanskrit Reader Companion (note the subtitle "Corpus annotator mode -
  foo/bar") to enter a new sentence to be added to the corpus at the
  location you clicked on the "Add" button.
<p>
  Every time you want to switch from a mode to another you have to click
  on the "Corpus" link in the green control bar at the bottom.  If you
  simply want to go back quickly to the top of the corpus hierarchy
  preserving the current mode, you can click on the title of any page of
  the corpus browser.

870
<!-- Deprecated 
huet's avatar
huet committed
871 872 873 874 875 876 877 878 879 880 881
<h2 class="b2" id="regression">Regression analysis</h2>

If you install
the Sanskrit Heritage Engine on your own server in Station mode, you will
benefit of another facility, which allows you to do some regression analysis
of the tool across versions.
<p>
The regression suite tool is installed by executing "make install_regression",
which is executed at install time.
<p>
We do not further document this facility, which
882
is useful mostly for the developers of the system. -->
huet's avatar
huet committed
883 884 885 886 887 888 889 890 891

<h2 class="b2" id="software">Software and its documentation</h2>

A short documentation giving a general survey of the software components
is available as a text document README in the distribution directory.
<p>

The complete Ocaml source of all modules of the Heritage Engine is available
in literate programming style as a pdf document
Gérard Huet's avatar
Gérard Huet committed
892
<a href="DOC/Heritage_platform.pdf">
893
<strong>Heritage_platform_documentation</strong></a>.
Gérard Huet's avatar
Gérard Huet committed
894
It may be considered as our vyākaraṇasūtrasaṃgraha. 
huet's avatar
huet committed
895 896 897 898

<h2 class="b2" id="installation">How to install the Heritage Engine on your own server</h2>

The Heritage Engine is distributed as a stand-alone software for
899
workstations running versions of UNIX such as Linux or Apple's MacOSX.
huet's avatar
huet committed
900
<p>
901 902 903 904 905
In order to install it, you must download two git repositories:<br>
<a href="https://gitlab.inria.fr/huet/Heritage_Resources.git">
<strong>https://gitlab.inria.fr/huet/Heritage_Resources.git</strong></a>
and <a href="https://gitlab.inria.fr/huet/Heritage_Platform.git">
<strong>https://gitlab.inria.fr/huet/Heritage_Platform.git</strong></a>. 
huet's avatar
huet committed
906 907 908 909 910 911
<p>

Your first installation may be tricky if you are not familiar with the
UNIX/Apache technology.
But once your config file is correct, it will be very easy to install
updates, as summarized in the document INSTALLATION in the top distribution
912 913 914 915 916
directory of the Platform.
<p>
Signal installation difficulties and relate your experiences with these tools
to Gerard.Huet@inria.fr.

huet's avatar
huet committed
917 918 919 920 921
<p>
Authors of interesting feedbacks will be entered in the
<a href="gold.html">Heritage Hall of Fame</a>.

<p>
922
A useful supplement to this manual is our page of frequently asked questions
huet's avatar
huet committed
923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943
<a href="faq.html"><strong>Faq</strong></a>, also available from the "Help"
button on the site control bar. 

</td></tr>
</table> <!-- End of main contents -->

<table class="pad60">
<tr><td></td></tr></table>
<div class="enpied">
<table class="bandeau"><tr><td>
<a href="http://ocaml.org">
<img src="IMAGES/icon_ocaml.png" alt="Objective Caml" height="50"></a>
</td><td>
<table class="center">
<tr><td>
<a href="index.html"><b>Top</b></a> | 
<a href="DICO/index.en.html"><b>Index</b></a> | 
<a href="DICO/index.en.html#stemmer"><b>Stemmer</b></a> | 
<a href="DICO/grammar.en.html"><b>Grammar</b></a> | 
<a href="DICO/sandhi.en.html"><b>Sandhi</b></a> | 
<a href="DICO/reader.en.html"><b>Reader</b></a> | 
944
<a href="DICO/corpus.en.html"><b>Corpus</b></a> |
huet's avatar
huet committed
945 946 947
<a href="faq.en.html"><b>Help</b></a> | 
<a href="portal.en.html"><b>Portal</b></a>
</td></tr><tr><td>
948
&#169; G&#233;rard Huet 1994-2018</td></tr></table></td><td>
huet's avatar
huet committed
949 950 951 952 953 954 955
<a href="http://www.inria.fr/">
<img src="IMAGES/logo_inria.png" alt="Logo Inria" height="50"></a>
<br></td></tr></table></div>
</body>
</html>