Increasing semantic indexing dictionary

Dictionary preparation for semantic indexing

Semantic indexing is necessary when the type and size of data is diverse. Without semantic indexing ,by which a user can access or search data by semantic knowledge, a user has to figure out the correct or at least a portion of data values or as well known, keywords, to query the database.

EFIM models proposed here can serve the general purpose that unlimited number of heterogeneous sources can be mingled together via events. Their identities will be correctly built up by the event and possibly, numeric numbers, to represent themselves. However, for the human, who mostly start their internal data retrieval from the small semantic seed, may be in a word form, the semantic indexing and querying environment should be supported for general purpose semantic data processing.

A number of approach have been employed to support semantic indexing. The first convenient and most popular way is to have a user create the semantic label that represents a data. It is the role of a user to assign the proper semantic label to the data to keep the data uniqueness. The second familiar way in current IT technologies is employing the electronic dictionary that may map the data to a word or a set of words as an semantic index. Our approach follows the second way but in a way to expand the dictionary size by merging WordNet and Wiktionary which is an open free encyclopedia growing very fast. WordNet is famous lexical electronic dictionary that is being used by many semantic information related researchers. Wikipedia, more specifically WikiDictionary that will be used in our application, is based on public free contributions by the numerous Internet users.

NLP(Natural Language Processing

For automated text processing, NLP is a must-have tool. I have explored [ Link Grammar] for several years. In addition, OpenNLP is also an excellent source to start NLP. I will describe some details in using their libraries with the extension developed by me.

Unstructured information models

Unstructured information represent the information set in which the model can not be specified. Natural language is one of such example since the rule for word association diverge in proportion to the number of word and their relationships. The modeling of natural language can be classified in two class: one is analyzing their word level association and the other is their clause level association.

Evolving function models

The technology to process media will ever grow with the progress made in IT (Information Technology) areas. Static data models with fixed schema can not handle this problem feasibly. Event-based functional model suggests a solution by using the evolving cascade functional information processing methods.


To-do lists


  • 11/22/2006: Modify insert nested model algorithms to insert item relationships automatically in a flat relationship.