... | ... | @@ -74,16 +74,16 @@ By defaults the segmentation for the selected images, both regions and lines, wi |
|
|
TextRegion tag have a liste of coordinates as type `x1,y1 x2,y2...xn,yn` it describe a polygon. Baseline tag is optional in PageXml.
|
|
|
the content ca be stored in `Textline` or each `Word` is separated for example
|
|
|
|
|
|
```xml<TextLine id="r2l1" custom="readingOrder {index:0;}">
|
|
|
> <TextLine id="r2l1" custom="readingOrder {index:0;}">
|
|
|
<Coords points="150,64 346,60 425,81 "/>
|
|
|
<Baseline points="155,55 180,55 206,55 231,55 257,55 283,55 308,55"/>
|
|
|
<TextEquiv>
|
|
|
<Unicode>ܡ ܗܘܡ ܐܘ ܥܒ</Unicode>
|
|
|
</TextEquiv>
|
|
|
</TextLine>```.
|
|
|
</TextLine>.
|
|
|
|
|
|
|
|
|
```xml<TextLine id="l1">
|
|
|
> <TextLine id="l1">
|
|
|
<Coords points="1550,422 1555,422"/>
|
|
|
<Word id="w122" language="Hebrew" primaryScript="Hebr - Hebrew" readingDirection="right-to-left">
|
|
|
<Coords points="926,424 926,426"/>
|
... | ... | @@ -96,7 +96,7 @@ the content ca be stored in `Textline` or each `Word` is separated for example |
|
|
<Unicode>הוט</Unicode>
|
|
|
</TextEquiv>
|
|
|
</Word>
|
|
|
<TextLine>```
|
|
|
<TextLine>
|
|
|
|
|
|
example of full PageXML file :
|
|
|
```xml
|
... | ... | |