Monday, May 18, 2009

Glossary generators

OK, I have been rambling aloud and in this vacuum several times here about automatic glossary generation. Conversation with myself brought this finding : TermExtractor and GlossExtractor. The first one is running at the time of this writing. The second choked on capacity, the lack of it. I run TermExtractor with Wikipedia entry "Solar cell". First I was impressed while feeling the extracted list was short. After examination, I am not impressed but it is showing the future. The resulting glossary is missing huge amounts of words and expressions. I uploaded various formats from pdf to .doc down to txt. Strangely enough the pdf version downloaded from the browser after setting the Wikipedia page in Print mode yielded the most words and expression but still totally useless in real life. The same documents copy-pasted into a wordprocessor and cooke in pdf format yielded less words. The .doc, htm. txt versions yielded ... nothing. Anyway, I won't blame anyone. In a few years time it will be here and in a few more years we'll have multilingual glossary generation. Who could ask for more?

0 comments:

 
Free Blogger Templates