Extracting Knowledge From Data
Ever wanted to do Data Mining?
Most companies have kept records over the years: production figures, financial data, time sheets, personnel profiles, stuff like that. Those records, if they can be accessed by computer, are called a data warehouse.
There is a possibility that they contain useful information not obvious by inspection of individual records. What accumulates over time are indicators of trends, relationships between events, differences in the long run. The question is how to get at this information, evaluate it and eventually use it.
STATEX is a tool which can help extract knowledge from a collection of data. The name suggests that the science behind the tool is inferential statistics. The application involves mating current analytical techniques, which are good at squeezing reams of loose data, with a managers' domain knowledge, which is good at spotting what makes sense.
Let's look at a session between a manager and STATEX. The transactions take place via a graphical interface, appealing to intuition rather than precise definitions.
At the start, STATEX will ask for some identification to prepare a user file and wants to know where to find the data. Almost all numerical records can be described as a table, of so many rows and so many columns. Suppose you want to describe the table as 2 groups and 2 variables. A diagram will show the results of this choice, you can see the relationship between the groups and the variables, or what can be called the hidden structure of the data.
On the desktop, appears a tool bar and a graphical surface with icons that represents the variables to work with. Some labeling and explicative text will later facilitate communication in terms which are already familiar.
The next step is a pre-analysis to find formal distribution parameters, that is the usual average and a measure of the variability of the data. If the system needs information, it will ask for it. This step serves as a prelude for a diagnostic of the selected variables, leading to suggestions how to dig further into those data.
Statex will present recommendations, as a list on a pop-up menu, and will perform the required computations. The results will be displayed in graphical and textual form, explaining the result in natural language. For example, the system may say:
'there is a significant relationship between the variable A and the variable B. The variable B is predicted by the variable A with a significant certainty '
The graphical representation of the association between the variables gives a feel for the strength and direction of the relation.
This kind of confirmation of a hunch is often all that is required to pursue the search for new and useful information from otherwise just actuarial records. Results may also be copy-pasted into documents for illustration and distribution.