Bibliomining
Process
Bibliomining
FAQ
Bibliomining
Bibliography
Researchers
in Bibliomining
Bibliomining
Discussion List
Bibliomining
Publicity
|
|
The Bibliomining Process
Bibliomining, a.k.a. data mining in libraries, is a small portion of
a larger process. This overall process is commonly known as Knowledge
Discovery in Databases (KDD). This page will outline the application
of KDD in the library environment.
- Identify Topic - The first step is determining the topic for the
bibliomining project. There are two types of bibliomining - predictive
and descriptive. Predictive bibliomining can either predict a future
event based upon the past and present state or predict a current event
that is difficult to measure based upon a smaller group or past measurements.
Descriptive bibliomining seeks to describe a current situation.
- Create Data Warehouse - The data sources that will help in this
topic area must be identified. This data is then extracted from the
appropriate systems and combined into a single data warehouse. In
addition, the data is cleaned and missing values may be dealt with.
This step can take up to 80% of the time of a bibliomining project;
however, the final results hinge upon successful completion of this
step. If a librarian finds that bibliomining is useful, then the time
should be taken to make a regularly updated data warehouse that extracts
data from the operational system, cleans it, and stores it on a regular
basis. This investment in time will make future bibiliomining projects
much easier to complete.
- Refine Data - The appropriate variables for this particular bibliomining
process are then considered. New variables (such as ratios or classifications)
can be generated from the original variables. Variables may need to
be floored or capped to deal with extreme values. Any missing values
not dealt with in the data warehouse must be dealt with here.
- Explore Data - This is where the actual bibliomining takes place.
Based upon the desired outcome and type of data, different techniques
and reports are used to discover novel and actionable patterns. (Then
a miracle occurs...)
- Evaluate Results - Patterns that are discovered should make sense
to the librarians who normally work with the topic area. If the pattern
seems suprising and goes against what makes "common sense",
the pattern probably represents a flaw in the data. If this occurs,
one can look at individual records to see why that pattern exists.
If a predictive model was created, it can be applied to a holdout
sample to test the reliability of the model.
- Report and Implement - If predictive models were used, they can
be implemented. If needed, they can be implemented on a small sample
of the real-world data in order to track performance before a full-scale
implementation. Reports can be created and presented to the involved
staff members. Many times, this will generate more questions, which
takes you back to step 1.
If you'd like to learn more about the KDD process, there are many
resources at KDNuggets.
This page last updated on
22-Jun-2002
by Scott Nicholson. Copyright
2002. All rights reserved.
|