Bibliomining
Process

Bibliomining FAQ

Bibliomining Bibliography

Researchers in Bibliomining

Bibliomining
Discussion List

Bibliomining
Publicity

 

 

 

The Bibliomining Process

Bibliomining, a.k.a. data mining in libraries, is a small portion of a larger process. This overall process is commonly known as Knowledge Discovery in Databases (KDD). This page will outline the application of KDD in the library environment.

  1. Identify Topic - The first step is determining the topic for the bibliomining project. There are two types of bibliomining - predictive and descriptive. Predictive bibliomining can either predict a future event based upon the past and present state or predict a current event that is difficult to measure based upon a smaller group or past measurements. Descriptive bibliomining seeks to describe a current situation.
  2. Create Data Warehouse - The data sources that will help in this topic area must be identified. This data is then extracted from the appropriate systems and combined into a single data warehouse. In addition, the data is cleaned and missing values may be dealt with. This step can take up to 80% of the time of a bibliomining project; however, the final results hinge upon successful completion of this step. If a librarian finds that bibliomining is useful, then the time should be taken to make a regularly updated data warehouse that extracts data from the operational system, cleans it, and stores it on a regular basis. This investment in time will make future bibiliomining projects much easier to complete.
  3. Refine Data - The appropriate variables for this particular bibliomining process are then considered. New variables (such as ratios or classifications) can be generated from the original variables. Variables may need to be floored or capped to deal with extreme values. Any missing values not dealt with in the data warehouse must be dealt with here.
  4. Explore Data - This is where the actual bibliomining takes place. Based upon the desired outcome and type of data, different techniques and reports are used to discover novel and actionable patterns. (Then a miracle occurs...)
  5. Evaluate Results - Patterns that are discovered should make sense to the librarians who normally work with the topic area. If the pattern seems suprising and goes against what makes "common sense", the pattern probably represents a flaw in the data. If this occurs, one can look at individual records to see why that pattern exists. If a predictive model was created, it can be applied to a holdout sample to test the reliability of the model.
  6. Report and Implement - If predictive models were used, they can be implemented. If needed, they can be implemented on a small sample of the real-world data in order to track performance before a full-scale implementation. Reports can be created and presented to the involved staff members. Many times, this will generate more questions, which takes you back to step 1.
  7. If you'd like to learn more about the KDD process, there are many resources at KDNuggets.

 


This page last updated on 22-Jun-2002 by Scott Nicholson. Copyright 2002. All rights reserved.