SMRL Metaparser

The Semantic Markup Rule Language (SMRL) is a specification language to mark or extract semantic meaningful sections and relations in/from structured human readable documents.

The SMRL Metaparser is an XSLT based implementation of an SMRL processor, reading and writing several different XML formats. Supported input formats are

  • ISO 26300 text documents, better known as OpenDocument Format (ODF), supported by OpenOffice 2.0 and KOffice 1.5,
  • Microsoft Word 2003 Word Markup Language (WordML),
  • Microsoft Excel 2003 Spreadsheet Markup Language (SpreadsheetML)
  • and DocBook 4.2.

Supported output formats are SHORE XML, XHTML, XML Topic Maps (XTM), W3C RDF and Graph eXchange Language (GXL).

This configurable parser dramatically reduces the effort needed for creating OpenSHORE parsers for rich text documents. Typically you can create a new parser for a medium complex document in 15 minutes. It is called "metaparser" because it reads abstract format independent SMRL specification files and handles several input and output formats.


Read the SMRL tutorial at http://openshore.sourceforge.net/pdf/OpenSHORE-SMRL-Tutorial.pdf.



 
last update on 28.03.2009 http://sourceforge.net/projects/openshore