textanalyzer / 0.1
| Short description: | Extract important words in a text | |||||
|---|---|---|---|---|---|---|
| Category: | Library/textproc | |||||
| Status: | prototype | |||||
| Created: | 2004-09-21 20:20:57 GMT | |||||
| Last update: | 2004-09-21 20:20:57 GMT | |||||
| Owner: | Martin Ankerl (Projects of this owner) | |||||
| Homepage: | http://martin.ankerl.com/ | |||||
| Download: | http://martin.ankerl.com/files/textanalyze.rb | |||||
| License: | PublicDomain | |||||
| Dependency: |
|
|||||
| Description: | TextAnalyzer - Automatically Extracts Characterisic WordsTextAnalzyer is a text analysation tool that finds out words that are characteristic for a given input file. It is independent from any language, and even seems to work well with HTML files. This program is only a little prototype, that shows that this technique seems to work. It’s public domain, feel free to do whatever you like with it. Example
Other UsesThe previous example seems a bit useless, but there certainly are a lot of useful applications. Here are some ideas:
The currently implemented algorithm even works well with HTML files (To my own surprise. Actually, I am surprised that it works at all…) AlgorithmThe main idea is quite simple: the algorithm assumes, that important words are :
For example, the second condition ensures that words like ‘the’, ‘and’ etc. are not considered important. |
|||||