Until recently archival storage tiers have consisted of tape-based devices with a large storage capacity, but limited I/O performance for data retrieval. However, the growing capacity and shrinking cost of disk-based devices means that disk-based systems are now a realistic option for enterprise archival storage tiers. Given the increasingly diverse options for archival storage, robust benchmarking of possible technologies for archival storage tiers is vital for reducing risk before deployment. This paper investigates benchmarks that utilize archival workloads developed from an analysis of historical file size distributions. These benchmarks not only provide more appropriate measurements of system performance as an archive than traditional approaches, but we also incorporate the variation observed in the historical data to provide "best" and "worst" case workloads for benchmarking. By considering not only the usual workload, but also workloads at either end of the archival workload spectrum, our benchmarking is robust. It provides measures of performance for the envelope of typical archival workload observed from empirical data.