Authors
Ralf Schenkel, Fabian Suchanek, Gjergji Kasneci
Publication date
2007
Journal
Datenbanksysteme in Business, Technologie und Web (BTW 2007)–12. Fachtagung des GI-Fachbereichs" Datenbanken und Informationssysteme"(DBIS)
Publisher
Gesellschaft für Informatik e. V.
Description
The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extracts additional information from lists, and utilizes the invocations of templates with named parameters. We give examples how such annotations can be exploited for high-precision queries.
Total citations
2007200820092010201120122013201420152016201720182019202020212022202344132526161413265256321
Scholar articles
R Schenkel, F Suchanek, G Kasneci - URL http://citeseerx. ist. psu. edu/viewdoc/summary