... | @@ -71,8 +71,31 @@ if you faced an error message like this |
... | @@ -71,8 +71,31 @@ if you faced an error message like this |
|
|
|
|
|
Update the attributes xmlns and schemaLocation of `<PcGts>` to supported version as descirbed above.
|
|
Update the attributes xmlns and schemaLocation of `<PcGts>` to supported version as descirbed above.
|
|
By defaults the segmentation for the selected images, both regions and lines, will be deleted. You can disable this behavior by unchecking 'Override existing segmentation.', in which case the system will try to match the lines and regions by their `ID` attribute. The old content for matching lines is then stored in its history and new lines/regions are created when no matching existing element are found.
|
|
By defaults the segmentation for the selected images, both regions and lines, will be deleted. You can disable this behavior by unchecking 'Override existing segmentation.', in which case the system will try to match the lines and regions by their `ID` attribute. The old content for matching lines is then stored in its history and new lines/regions are created when no matching existing element are found.
|
|
Baseline tag is optional in PageXml, and TextRegion have a liste of cordonnates as type x1,y1...xn,yn it describe a polygon
|
|
TextRegion tag have a liste of coordinates as type `x1,y1 x2,y2...xn,yn` it describe a polygon. Baseline tag is optional in PageXml.
|
|
example of PageXML file :
|
|
the content ca be stored in `Textline` or each `Word` is separated for example :
|
|
|
|
```xml<TextLine id="r2l1" custom="readingOrder {index:0;}">
|
|
|
|
<Coords points="150,64 346,60 425,81 "/>
|
|
|
|
<Baseline points="155,55 180,55 206,55 231,55 257,55 283,55 308,55"/>
|
|
|
|
<TextEquiv>
|
|
|
|
<Unicode>ܡ ܗܘܡ ܐܘ ܥܒ</Unicode>
|
|
|
|
</TextEquiv>
|
|
|
|
</TextLine>```
|
|
|
|
```xml<TextLine id="l1">
|
|
|
|
<Coords points="1550,422 1555,422"/>
|
|
|
|
<Word id="w122" language="Hebrew" primaryScript="Hebr - Hebrew" readingDirection="right-to-left">
|
|
|
|
<Coords points="926,424 926,426"/>
|
|
|
|
<TextEquiv>
|
|
|
|
<Unicode>ע"י</Unicode>
|
|
|
|
</TextEquiv></Word>
|
|
|
|
<Word id="w45" language="Hebrew" primaryScript="Hebr - Hebrew" readingDirection="right-to-left">
|
|
|
|
<Coords points="531,464 687,464 "/>
|
|
|
|
<TextEquiv>
|
|
|
|
<Unicode>הוט</Unicode>
|
|
|
|
</TextEquiv>
|
|
|
|
</Word>
|
|
|
|
<TextLine>```
|
|
|
|
|
|
|
|
example of full PageXML file :
|
|
```xml
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
|
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
|
|
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15/pagecontent.xsd">
|
|
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15/pagecontent.xsd">
|
... | | ... | |