Abstract

This paper presents a new framework for multimedia electronic chronicling systems. Its approach uses events as its driving force for heterogeneous information processing. Specifically, this new approach first separates symbols and data, then puts events between them to make a distinct connection. In addition, this approach provides spatio-temporal-semantic relations networks to map high-level semantic user queries into low-level queries that a machine can compute. The innovative user interfaces are designed for ease of use and are interactive to allow organization of information and a search capability for information retrieval. The results reported as a part of this paper show that we can achieve effective chronicling and high-quality user experiences by coupling together multimedia analysis, tagging and querying. We believe that our system can serve as a semantically accessible annotated multimedia log. The approach taken here may provide a foundation to provide further summaries of important events or access to events at the required level of granularity.

Introduction

The evolution of multimedia data, with their wealth of support for multimodal interactions, is definitely encouraging people to develop more ambitious systems. However, integration of these media raises concerns about the complexity and heuristic nature of this integration. Moreover, metadata or abstractions for each media are also diverging in terms of type, structure and the rule of associations.

As data itself is becoming a composite object as a consequence of the semantic enrichment of multimedia, it is becoming acute to develop new fundamental information storage system to handle the time-serial nature and enormous information size of multimedia data. Besides, in virtually every area of data managements, interoperation between applications presents a formidable challenge that the underlying applications interoperate meaningfully. Data warehouses require the correct semantic merging of data from semantically diverse sources. As a consequence, it is becoming essential to integrate such multimedia data into organizational frameworks that emphasize semantic coherence.

A multimedia electronic chronicle or eChronicle is at the center of these newly emerging fields. Many variants of the eChronicle system \cite{mann:eyetap04, rhodes:justintime03, erickson:personal96, schultz:isl01, bett:meeting00, jaimes:memory04} have already appeared and are being used. Although much of this work has been directed toward the capture and organization of information at a restricted domain level, little emphasis has been put on tools for people using a unified perspective for interoperable purposes. The role of the eChronicle systems fall into three categories: (1) recording data using multiple sensors, (2) supporting rich tags for access and presentation of appropriate information, and (3) providing access to this data at multiple levels of granularity and abstractions. To develop and exemplify these components, several preliminary research efforts have been made in our group for event-based multimedia modeling methods \cite{singh:event04}, personal chronicling systems \cite{pilho:personal04}, and multimedia event tagging systems \cite{pilho:mets04}. Based on this work, this paper raises two fundamental issues in multimedia chronicling: (1) the event-centric data organizational model and (2) the spatio-temporal and semantic relationship network.

The rest of this paper is organized as follows. Chapter \ref{chp:existingapproaches} reviews related research works to identify the roles of the system for users. It specifically handles the challenging problems in multimedia and its analysis, and how to represent relations in between data properly. Chapter \ref{chp:eventcentricrelationships} introduces our approaches for the data organization and their relation representation. This idea is formalized into the eChronicle relation network presented in Chapter \ref{chp:ern}. Chapter \ref{chp:implementation} explains research efforts extending our previous works on event capturing, media processing, and user tagging supporting. This is followed by results in Chapter \ref{chp:experiments} and conclusions in Chapter \ref{chp:conclusion}.

Conventional and dominant information storage architecture can be classified in two folds: (1) structured information storage and (2) unstructured information storage. Relational database systems~\cite{Chen:1976:ERM} and IMS (Information Management Systems, also known as hierarchical database systems) have been popular to store structured information, while a document has been for unstructured. Recently, XML standards and related technologies have been growing to serve as buffering solutions in connecting two systems regardless of their internal information processing architecture.

% From information integration, http://www.cs.ucsd.edu/users/goguen/projs/data.html This XML language for semi-structured data is rapidly gaining acceptance, and has been proposed as a solution for data integration problems, because it allows flexible coding and display of data, by using metadata to describe the structure of data (e.g. DTD or Schema). Although this is important, it is less useful than often thought, because it can only define the syntax of a class of documents. Moreover, even if an adequate semantics were available for each document class, this would still not support the integration of data that is represented in different ways, because it does not give any way to translate among the different datasets. In addition to dealing with datasets that appear in computer readable documents and databases, users may also want to compare the results of simulations packages with the empirical datasets. There also may involve still other research and in industrial and commercial practice.

A promising approach to go beyond syntax is using semantic metadata. But despite some optimistic projects in XML to the contrary, the representation of meaning, in anything like the sense that humans use that therm, is far beyond current information technology. As explored in detail in fields such as Computer Supported Cooperative Work (CSCW), understanding the meaning of a document often requires a deep understanding of its social context, including how it was produced, hot it is used, its role in organizational politics, its relation to other documents, its relation to other organizations, and much more, depending on the particular situation. Moreover, all these contexts may be changing at a rapid rate, as may the documents themselves, and the context of the data is also often both indeterminate and evolving. Another complication is that the same document may be use din multiple ways, some of which can be very different from others.

These complexities mean that it is unrealistic to expect any single semantics to adequately reflect the meaning of the documents of some class for every purpose. Most attempts to deal with these problems in existing literature and practice are either ad hoc or else are what we may call ``high maintenance solutions, involving complex infrastructure, such as commercial relational databases, high volume data storage centers, and ontolgoies written in specialized languages to describe semantics. Solutions of the first kind are typically undocumented and cannot be reused, whereas solutions of the second kind require considerable effort from highly skilled computer professionals, which can be frustrating for application experts, due to the difficulty of discovering, communicating, formalizing and especially updating, all the necessary contextual information. For this reason, many application scientists prefer to avoid high maintenance solutions, and do the data integration themselves in an ad hoc manner.

One approach is to let metadata integration engineers can use to generate mappings between a virtual master database and local databases, from which end-user queries can be answered. A second, more flexible, approach is using high level programming language based on equational logic.

It is unreasonable to expect fully automatic tools for information integration; in particular, it is difficult to find correct schema matches, especially where there are n-to-m matches, semantic functions, conditions and/or diverse data models; it may not even be clear what correctness means in such situations.

Semantic models are needed for mappings of XML DTDs and XML Schemas, relational and object oriented schemas, and even spreadsheets and structured files, all with integrity constraints. We have developed a theory of abstract schemas and abstract schema morphisms, which provides a semantics from n-to-m matches with semantic functions and/or conditions over diverse data models.

Ontolgoies, in the sense of formal semantic theories for datasets are increasingly being proposed, and even used to support the integration of information that is stored in heterogeneous formats, especially in connection with the world wide web, but also for other, less chaotic, forms of distributed database. In particular, ontolgoies have been proposed as a key to the success of the so called ``semantic web.

Formally speaking, an ontology is a theory over a logic. However, integrating datasets the semantics of which are given by different ontologies, will require that their ontolgoies be integrated first. This task is greatly complicated by the fact that many differeent languages are in use for expressing ontologies, including Owl, Ontologic, Flora, KIF, and RDF, each of which ahs its own logic. Therefore to integrate ontologies, it may be necessary first to integrate the logics in which they are expressed. Moreover, dataset integreation will also have to take accound of the fact that the schemas describing structure are also often expressed in different languages, reflecting different underlying data models, e.g., relational, objet oriented, spreadsheet, and formatted file. %

We think that there has been less discussion on whether above mentioned storage architecture are proper to process heterogeneous information in a unified way. So far, we can see several emerging variations from conventional storage systems introducing more axes in handling information features such as time, location and semantics~\cite{allen:temporal83, khatri:spatiotemporal04, koubarakis:spatiotemporal03, bohlen:spatio98, jiang:1998:wordnet}. However, in the view of processing, these approaches are limited to increasing the processing complexity due to their proprietary data architecture. So our research is arguing whether we can have a better information storage architecture to challenge problems categorized in three folds:

\begin{enumerate} \item Duplicated data handling: As sensing and logging technologies for various kinds of raw data increasing, handling a duplicated data set occurs frequently and actually this problem becomes a main issue in many industrial applications in data mining, information searching and semantic processing fields.

\item Dynamic data processing: A dynamic signal as such multimedia always has been an issue in handling their features due to their storage costs and required system computing power. So information abstraction models with complex data processing algorithms are typically introduced at a raw-data level and their outputs are processed and sampled at a desired level of frequency for user's interests. Moreover, this step is mostly separated from the storage system and system functional variables such as their running domain proprietary constants are simply blackboxed.

\item Information representation: In most data processing, data encapsulation or data abstraction is necessary for user's purposes. For this, conventionally a new data structure is devised by pruning all others but collecting only interested data sets but in application specific proprietary ways. \end{enumerate}

This tangle of questions can be approached in following ways. The first thing to visit is the correct representation of the information identify. As multimedia getting promising, one symbol is not enough to represent a data set and vice versa. This topic has been under active discussions by Semantic Web organization\footnote{http://www.w3.org/2001/sw/} and Ontology researchers\footnote{http://www.w3.org/TR/owl-features/}. But their approaches pay less attention to the information storage architecture, while mostly suggesting various types of XML typing rules as the problem solution. We believe that the information identity and their semantic problem is closely related with its information storage architecture. So we introduce a special object, called an \emph{e-node} that will be handled in details at Chapter~\ref{chp:eventcentricrelationships}.

The second topic that we consider is to devise a formal system representation method to store and retrieve the complete complex multimedia processing in a canonical way. Conventional data storage systems are focused on capturing the input and output of the system. As shown in Eq.~\ref{eqn:systemfunction}, they usually store $\bar{x}$ and $\bar{y}$ into the storage system and separately log the activity of $f$. \begin{equation} \label{eqn:systemfunction} \bar{y}=f(\bar{x}) \end{equation} What we are focusing in the present paper is representing and storing a complete Eq.~\ref{eqn:systemfunction} and their causal relations with other systems in a unified way. This approach is referenced as a functional category in category theory~\cite{barr:category95} that will be introduced at Chapter~\ref{chp:categoryextensions}.

The third thing as our research goal is merging above two topics and composing a new information system. We first define a new data model in Chapter~\ref{chp:eventcentricrelationships} and then illustrate how the information network can be built based on our system at Chapter~\ref{chp:categoryextensions}.

Aforementioned approaches are devised after examining examples in Chapter~\ref{chp:existingapproaches}. The cons and pros of our approaches will be discussed at the end and concluded by future research directions.

Related Works

In this chapter the functionalities of the chronicling system and formulate requirements will be articulated within the context of existing technologies developed in related research areas. A chronicle is by definition an extended account in prose or verse of historical events. It generally covers the entire recording of personal, organizational or social events. As it is getting easier to record disparate activities in different situations using different types of sensors, data are no longer simple alphanumeric values but occur in various types of composite media or multimedia. This evolution of multimedia data, with their wealth of support for multimodal interactions, is definitely encouraging people to develop more ambitious systems. However, metadata for each media type are also diverging in terms of type, structure and the rule of associations. So integration of these media raises concerns about the complexity and heuristic nature of this integration. For example, many of the approaches for encoding and decoding become proprietary to each specific application. However, conventional electronic chronicling systems provide a query environment that in most cases is limited to the analysis of alphanumeric values of data and metadata. These approaches employed by these systems are inadequate to capture and store heterogeneous spatially, temporally or semantically varying multimedia and its information.

Data integration is emerging as a major challenge in the early 21st century. The rise of inexpensive storage media, data warehousing, and especially the web, have made available vast amounts of data. But it can be very difficult to find what you want and then combine it properly to get what you really need. Difficulties include highly variable structure and quality of data and meta-data: science labs and business often store data in spreadsheets, or even just formatted files, with little or no documentation of structure or meaning; moreover, some entries may be incomplete, corrupted, or inconsistent. If all documents had associated schemas (also called data models) to accurately describe their structure, and if fully automatic schema integration were feasible, then some interesting problems could be solved at the syntactic level. But these assumptions are far from true, and format is only a small part of the difficulty.

One proposed solution is to connect items in e-documents to concepts in ontolgoies. However, this cannot capture real world semantics, but only logical relations between terms, such as that all humans are mammals; the actual meanings of ``human and ``mammal remain unformalized, as do potential exceptions to logical relations. Moreover, a given domain may have several ontolgoies, each in some ways incomplete and/or ambiguous, and possibly written in different ontology languages, which in turn may be based on different logical systems. Actually the ontology approach to data integration may require not just schema and ontology integration, but also ontology language integration, and even ontology logic integration, such that semantics is respected throughout the entire ``integration chain, from actual datasets or ``documents, through schemas and ontolgoies up to ontology logics. It is however clear that this is not sufficient to deal with low quality data, or absent meta-data.

Our goal is to develop a general framework to store multimedia information and we believe that many challenges in this chapter should be come up with.

\section{Multimedia analysis and associated metadata repository}\label{sec:multimediaanalysis}

Storage, retrieval and processing of relevant media are key challenges to the success of a data model. As mobile devices and sensors become more pervasive, more and more aspects of an individual's life are being captured in richer detail. Several different ways also have been developed to tackle specific problems of the retrieval process: feature extraction methods for multimedia data, query languages, problem specific similarity measures and interactive user interfaces (cf. current approaches on the multimedia retrieval can be found at one place \cite{kraaij:trecvid05}, and \cite{yoshitaka:mulitemdiadatabase99} is referred as the survey on the content-based multimedia retrieval). \begin{figure}[htb]

   \centering
   \includegraphics[width=5.3in]{../image/introduction_bounded.eps}
   \caption{Spatio-temporally or semantically connected heterogeneous events.}
   \label{fig:introduction_bounded}

\end{figure}

A common problem found specifically in multimedia analysis is that this analysis simply is incapable of achieving the synergistic effect of using multifarious signals. For instance, Figure~\ref{fig:introduction_bounded} is the multimedia processing result of group meeting multimedia \cite{ramesh:ems03}. At the time it was captured, one speaker introduced himself. To detect his ``introduction event that is represented by four signal processing results, it should overcome the gap between real-world events and data-level events. Without strict limitations on environments, the detection cannot be made simply by creating balance between each analysis output \cite{bett:meeting00, wu:multimodalfusion04}. Even if a user succeed in detecting the event, those trained parameters are typically useless for capturing other environments. This is because those multifarious signals are correlated, not only by signal strength and features, but also spatially, temporally and semantically \cite{snoek:multimodalreview05, pfleger:multimodalfusion04}.

The real issue in this multimodal approach is to capture those correlated relations at the time of events and to store and retrieve them. So the organizational principles used for the storage of multimedia and associated metadata should be carefully designed because of the time-serial nature and enormous size of data involved. An immediate problem that confronts the construction of such organizational principles is that current metadata for audios, videos, images, and other similar sources are more about data than about their semantic content. Existing multimedia database systems have been limited to offline indexing on a single stream and to low-level feature-based indices rather than to indexing that reflects a user's semantic criteria. This limitation has been recognized, and new ways of organizing, filtering, and searching for information are emerging within the context of these new repositories of unstructured and semistructured information \cite{mchugh:lore97, phelps:documents96, ahlberg:dynamicquery92}. However, much of this work has been directed toward capture and organization of information in proprietary ways. Moreover, most of it has proceeded with little thought to developing a unifying mechanism for heterogeneous information.

This oversight will be addressed by examining time, space and semantics as the possible avenues for unification of this information since those are becoming first class concepts in any information system to support the targeted applications effectively.

\begin{trivlist} \item[\textbf{Temporal Databases}] Temporal data and its related research areas have been studied for more than twenty years. James Allen was one of the foremost researchers in this field. His works \cite{allen:temporal83} provided the foundation for many succeeding temporal relation research efforts (cf. a cumulative bibliography \cite{bohlen:temporal95}, and survey \cite{wu:temporal98}). He defined thirteen possible interval-based temporal relationships that are contained in Table~\ref{tab:spatiotemporalrelationships}.

Allen's thirteen relationships assume for any interval $t$ that the lesser end-point is denoted by $t-$ and the greater by $t+$, and $t_p$ is for time-point and $t_i$ for time-interval. These temporal relations can represent the continuity and discontinuity features of events. For instance, a user may map a data registration event to an infinitesimal time-point and map moving-object tracking events to a time-interval.

\item[\textbf{Spatial Databases}] Spatial concepts are also being integrated into the database data model as a geometrical element \cite{rigaux:spatial01, guting:spatial94}. Table~\ref{tab:spatiotemporalrelationships} shows the spatial elements defined by OpenGIS consortium assuming $g_1$ and $g_2$ as a spatial object \cite{opengis:feature99}. These spatial relations are good enough to represent both the geographical locations associated with data and the spatial relations between detected objects in multimedia data.

\item[\textbf{Spatio-Temporal Databases}] Because spatial and temporal concepts are the primary aspects of information, research efforts are emerging that merge both concepts into one. Some researchers employ the tuple model in which all spatio-temporal features are aligned as one feature vector in tables \cite{bohlen:spatio98, guting:realms93} for computing speed and utilizing relational database systems. Some others \cite{brodeur:uml00} store spatio-temporal relations into E-R entity diagrams by using Unified Modeling Language (UML). This will make it easier to extend relationship sets and represent the hierarchical relations between data. However, these gains will come at the expense of computing speed and implementation complexity. \end{trivlist}

Research progress made in spatio-temporal research areas is very promising. However, many researchers assume that data sets are already given and consequently focus solely on the representation of special aspects of data - spatio and temporal objects and their relations. To ensure clarity about the origin of temporal and spatial information, variations in the data stream should be also detected and recorded as events. Besides, spatio-temporal concepts are too incomplete alone to represent a fact as fully and richly as they can in conjunction with all related data. As a further extension, the semantics of data will be handled in the next sub-section.

\section{Semantics of data}

It would seem a very good ideas to take account of what is known about the nature of human concepts and cognition, so as to construct a information system to be as useful and comfortable to human users as possible. One result of the research done by George Lakoff and others~\cite{lakoff:2003:metaphors} is that many metaphors come in families, called \emph{image schemas}, that share a common pattern, based on how humans live in the world. Some image schemas are grounded in the human body and are called \emph{basic image schemas}; they tend to yield the most persuasive metaphors. Also Blending theory~\cite{fauconnier:2003:think} says that concepts come in clusters, called \emph{conceptual spaces}, consisting of elements and relation instances among them, note that this abstraction necessarily omits the qualitative, experiential aspects of what is represented. Whereas conceptual spaces are constructed on-the-fly for particular purposes, \emph{conceptual domains}, though structurally similar, are large relatively stable configurations of related concepts and relations; conceptual spaces are constructed by selecting items from conceptual domains. \emph{Conceptual mappings} are partial functions from the item and relation instances of one conceptual space to those of another.

A mathematical definition of blending is given in~\cite{goguen:1996:imperative}, based on a modification of the category theoretic notion of ``pushout that takes advantage of an ordering relation on morphisms, with respect to their quality.

So data itself is becoming a composite object as a consequence of the semantic enrichment of multimedia. Therefore it is more challenging to clearly represent the relationships between data and semantics \cite{aloia:semantic98, budanitsky:semantic01}. Let us consider one specific example.

\begin{example} \textit{Select the meeting report submitted by Jimi Hendrix after the group meeting held on March 1st, 2005}. \end{example}

This query is in much abbreviated form from the viewpoint of information retrieval. It actually requires high-level semantic understanding because (1) this query does not specify the type of data but merely asks for the meeting report; (2) the name of the creator, \textit{Jimi Hendrix}, is specified, but it let up to the system to understand that this is a person's name; and (3) there may be two people who have the same name, which may cause semantic confusion. These problems are exactly the same as those encountered in Web keyword searches in which the same text can have different meanings.

Several approaches are ongoing to address such problems as those described above and enable data to be shared semantically and reused across application, enterprise, and community boundaries. For instance, the Semantic Web %\footnote{http://www.w3.org/2001/sw/} provides a common framework through a collaborative effort led by W3C that also involves participation from a large number of researchers and industrial partners. The Semantic Web is based on the Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming. The main thrust of the work led by W3C for semantically shareable data has been focused on dictionary supports such as RDF Vocabulary Description Language. Based on RDF Vocabulary Description Language, the Web Ontology Language, OWL %\footnote{http://www.w3.org/TR/owl-ref/}, has been developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web ontology language.

Interestingly, a similar approach is found in the knowledge engineering area. (\textit{Do ``meeting people require ``social interaction}) in the form of OMCSNet \cite{singh:commonsense03} Knowledge Base (KB) represents a common sense approach that ``\textit{meeting people} requires ``social interaction as an action. They indexed their existing KB (Knowledge Base) to be searchable and shareable via WordNet \cite{fellbaum:wordnet98}, which is known as the most popular semantic resource in computational linguistics. Technically, these developers first parse the sentence using Link Grammar \cite{lafferty:linkgrammar92} to identify grammatically each word as a noun, verb, adjective or adverb and so on. Then they index each word with the sense number in WordNet, (e.g., \textit{Do ``meeting.n\#1 people.n\#1 require.v\#1 ``social.adj\#1 interaction.n.\#1}). For instance, ``\textit{meeting.n\#1} in WordNet means the first sense of the noun ``meeting. They are doing this so that they can share their KB with other KB-related applications without semantic confusion.

As the Semantic Web has shown with its use of ontologies and OMCSNet has demonstrated with WordNet, a common and shared dictionary is essential to avoid semantic confusion. Ontology-based approaches are good at extending the dictionary of the application, but these approaches have limited ability to be shared beyond the group of interest. WordNet-based approaches are generally good at representing the fact because of its enriched definition of commonly used and shared words, but such approaches may lag in the inclusion of new words because of the need for frequent updates of content. In both cases , a user plays an important role in linking data to its correct semantic symbols. To accommodate the ability of users to create such linkages, any system that does information chronicling should support rich tags for access and presentation of appropriate information, and the interface for creation of such tags should be easy to use \cite{pilho:personal04}.

\section{Uncertainties in data and user queries} Implementation and usability should be given high priority if a system is to be practical. Foremost among such considerations should be recognition of the uncertainties that inevitably exist in data representation and in users' queries. Users frequently will be unable to express their interests at an appropriate level of accuracy. This inability arises partly from the uncertainties and imprecision inherent in language itself and partly from the limitations of human cognition. This inability that often afflicts users makes it important for the system to have a way to represent uncertainties in data, a capability that must exist for all sorts of sets of data relations. For instance in temporal relationships, some work employed the concept of granularities for temporal models \cite{dyreson:temporal03, zhang:temporal03} to mask these uncertainties. Granularities used in this way may be relative or absolute so that they can indicate data within some absolute value range or, alternatively, present relative variations from a given data value.

\section{Logical approaches for hiding information modules} Goguen et. al presented ``data integration chain in cite~{goguen:2006:logic}. It is composed of the chain loop from data to schema to ontology to ontology language to ontology logic integration; their main ideas are abstract schema, abstract schema species, and abstract schema morphism. Their basic assumptions lie in that data and its schema and their forming ontolgoies are already decided. What is missing in their approaches is the origin of information, how it is captured and the fact of information consumption.

\section{An example to investigate} We will illustrate several real-case scenarios that we are frequently facing in multimedia data processing. We first select a simple example and extend its coverage to more complex case and will see which solution is proper to challenge those problems. The first example is about person recognition. With a someone's picture, imagine that we do face detection and after perform face recognition.

\begin{figure}[htb] \centering \includegraphics[width=5.3in]{../image/example_conventional_approach_bounded.eps} \caption{A conventional multimedia processing example.} \label{fig:conventionalprocessing} \end{figure}

Figure~\ref{fig:conventionalprocessing} shows this computation flow that it first detects a face from a given image and produces several results in various data formats and types. Several parameters from face detection results and prior configuration parameters are transferred to the face recognizer. Imagine that we develop a system for this process and assume that a system always saves all data without a loss of results. May be most engineers will first save all data into a file and transferred results through it. If they use a database (cf. relational databases), then they will design two tables for each face detector and face recognizer and save a data set into there as shown in Figure~\ref{fig:exampledb}.

\begin{figure}[htb] \centering \includegraphics[width=2.5in]{../image/sample_db_bounded.eps} \caption{A sample relational database configuration.} \label{fig:exampledb} \end{figure}

Let us now make our example more complex. Imagine that we are applying the same algorithm to parent and children couples. We assume that our face recognizer does not work well since they are naturally quite alike with their family members. So we adopt their vocal features as an additional classifying rule to compute the similarity more accurately. To make it more realistic, we classify the face recognition training data for each person year by year. So we use only recent one year face samples for each person's face training. This new rule will be applied to the speech feature extraction algorithm in the same way.

Then, coming back to the system design, how our engineer should extend the system to add aforementioned new features? Let us assume that we do not change existing computing algorithms. Then, Figure~\ref{fig:exampledb} could be one possible solution (fairly, though we can implement the system in other ways.) This is not a rare case in data processing to add more feature extraction or abstracting tools to enhance their performance.

\begin{figure}[htb] \centering \includegraphics[width=5.3in]{../image/extended_example_bounded.eps} \caption{An extended computing example.} \label{fig:extendedexample} \end{figure}

Figure~\ref{fig:exampledb} is now the system separated into two main parts: (1) computing parts on the left side and (2) database parts on the right side. The computing flow in Figure~\ref{fig:exampledb} is already complex and their configuration are all separated and their internal source codes are also separated. What could be a unification driving force for this kinds of heterogeneous data processing and their data manipulation? For this, let us check the right side of Figure~\ref{fig:extendedexampleflow} which emphasizes the computing flow only.

\begin{figure}[htb] \centering \includegraphics[width=5.3in]{../image/extended_example_flow_bounded.eps} \caption{A computing flow abstract of Figure~\ref{fig:extendedesdexample}.} \label{fig:extendedexampleflow} \end{figure}

One more thing to consider is whether we are losing significant amounts of information in between separated computing processes. This information reduction frequently happens specifically in consecutive information processing. Typically only necessary information for next operation is transferred to the next step. This can be successful in many concrete instances. However, one difficulty in this approaches is that they decontextualize experience. So an enormous amount of potentially relevant information can be omitted by attaching a formal predicate to a real entity.

In addition, what can be practically limited in terms of its variance in this process to drop down the complexity of the system is the type of data transmitted in between systems and the formal definition to limit the way to connect different systems together. This is how we tackle the unification of heterogeneous media computing and their data exchanging problem. In the view of category theory that we will introduce in Chapter~\ref{chp:categoryextensions}, an arrow or a function (See Def.~\ref{def:function}) can be defined as an object of a category. And their elements, might be a heterogeneous composite data set, can be morphologically limited to a new data model suggested in Chapter~\ref{chp:categoryextensions}.

Why Events?

Relational Schema vs. Functional Schema

The flow of information in the world is in many cases considered as the collection of information trees due to its temporal variation. However, the set of information may be grouped together by their associated common properties like space, time, or just specific one of their properties. The fact that are not changed later or at least during for a considerable time now can be grouped by their relationships. Think the time dimension as the vertical relationships and the later for their grouping relationships. Then the informational view of the world becomes the composition of lateral and vertical relationships.

Our approach compromise these relationship by employing the nested data models on top of relational backends. This is not the new approach though, for instance, many commercial XML databases are built on top of their existing relational database engines. However, our approach provides more levels in accessing the information in a required level of granularities. The way that we challenge in our approaches will be introduced in details here in after.

The concept of nested set models has been popular over decades. Within numerous information references, Joe Celko's book, Trees and Hierarchies in SQL for Smarties, has influenced many database developers significantly.

Event Warehouse Design

Application Case Study

Increasing semantic indexing dictionary

Conclusions and Future Research Topics