Trav's UVa Pages

1. Introduction

I forget what "TREC" stood for; probably "Trav's" something or other. The name was a pun on a song I had written a few years earlier ("Trek"). Regardless of the title, TREC formed a big part of my research.

2. Source Code

The TREC logic consisted of a bunch of PERL scripts. Most of the descriptions here are copied from the headers of the source files.

2.1 PERL Scripts

cmp_ranks.pl: Compares ranks, generates MSE and P vs R output files.
combine_graphs.pl: Combines multiple graphs onto one page (TEX).
compmap.pl: Compares our mappings (coll_map.txt) to Charlie Viles' mappings.
evaluate.pl: Reads rank comparisons for baseline_vs_estimate (one file) and baseline_versus_random (contains avg and std dev). It then calculates the statistical significance of the estimate's scores against the baseline.
get_fandw.pl: Generates global (gloss) vocab matrix from source F and W files.
graph.pl: Generates MSE points for plotting.
matrix.pl: Generates a tab-deliminated merit matrix (queries versus collections) for the given merit file(s).
mits.pl: Provides an easy and fast interface for processing trec data. This handles map files, merits, ranks, comparisons, summary matrices, evaluations versus random ranks, etc.
new_dir.pl: Creates a new test directory, creating all the needed subdirectories and making "stubs" for all the needed files. Also has options for creating links to files in an existing data directory.
postproc.pl: Performs post-processing logic.
problems.pl: Lists problematic query_ids and coll_ids (gotten from a diff on A.noord and B.noord).
reformat.pl: Makes diff-able format of ranks, and then gets n_star.
testmap.pl: Verifies counts of doc_ids in the mapping file, the source files, and a pre-generated frequency list. Run this after running buildmap.pl to make sure buildmap.pl worked correctly.

2.2 PERL Modules

This is a bunch of PERL modules used by the scripts. They are all in the subs subdirectory.

files.pm: Routines for manipulating files and directories.
gentable.pm: General-purpose table generator routines (for two-dimentional associative arrays).
graphs.pm: Routines for creating graphs out of MSEs, Recalls, and Precisions.
handler.pm: Handles MITS' menu choices (is responsible for interfacing with all the called routines.
ir_subs.pm: Miscellaneous commonly used subroutines for IR perl scripts.
maps.pm: Routines related to doc_id -> coll_id maps.
masks.pm: Routines related to masks.
merits.pm: Routines related to merits.
merit_est.pm: Calculates merit for estimates (query_id, coll_id), reads from F/W.txt, masks out certain collections from certain queries.
merit_ideal.pm: Gets ideal document similarities from smart files and generates merits (collection similarities) per specified thresholds.
merit_opt.pm: Reads in a qrels.all.txt file and generates a corresponding "optimal" merits file.
random.pm: Reads in a random weights file and then generates several random ranks from it (in memory only). It then compares the several random guesses to a baseline and calculates the average and standard distrubution (per query) of the scores, which it then writes to file.
ranks.pm: Rank-related routines.
terms.pm: Routines for manipulating terms and term files (e.g., query ids, coll ids, etc.).
weights.pm: Handles writing and reading of weights files, as well as the generation of the actual weights.

2.3 Shell Scripts

go: This shell script helps make calling some of the graphing-related PERL scripts more convenient.
graph_inq: Another shell script which calls graphing-related PERL scripts.

3. Data Files

The output files, which were hosted in a shared directory, are no longer available; the symlinks I was using are dead. Here's all that's left of the data files:

callan.txt: This looks like a list of publications, in human-readable format.
mits.cfg: MITS configuration file.
ratios.txt: Ratios relating to queries and other stuff. I don't remember.
test.zip: Zip of test data.

IR - TREC