|Short description:||Extract important words in a text|
|Created:||2004-09-21 20:20:57 GMT|
|Last update:||2004-09-21 20:20:57 GMT|
|Owner:||Martin Ankerl (Projects of this owner)|
TextAnalyzer - Automatically Extracts Characterisic Words
TextAnalzyer is a text analysation tool that finds out words that are characteristic for a given input file. It is independent from any language, and even seems to work well with HTML files.
This program is only a little prototype, that shows that this technique seems to work. It’s public domain, feel free to do whatever you like with it.
The previous example seems a bit useless, but there certainly are a lot of useful applications. Here are some ideas:
The currently implemented algorithm even works well with HTML files (To my own surprise. Actually, I am surprised that it works at all…)
The main idea is quite simple: the algorithm assumes, that important words are :
For example, the second condition ensures that words like ‘the’, ‘and’ etc. are not considered important.