Common use of Web-page cleaning Clause in Contracts

Web-page cleaning. Apart from a main textual content, a typical web page also contains certain ―noise‖ including navigation links, advertisements, disclaimers, etc. (often called boilerplate) of only limited or no use for the purposes of training an MT system. Such irrelevant parts should be removed and only the main content should be kept in order to produce good-quality language resources. This is the most challenging task of the CNC and special attention will be paid to it in WP4.

Appears in 2 contracts

Sources: Grant Agreement, Grant Agreement