ParaScale Document Archiving SolutionsCompanies that store on-line documents have a common need for a cost-effective, scalable file storage solution that allows them to efficiently store and serve large number of documents to numerous users simultaneously. The ChallengeThe most significant challenge in document archival is finding the balance between newness and access frequency of a document: the hot data/cold data problem. As disk drives are added to a repository, new data is stored on these new drives. Consequently, all the writing and most of the reading occurs on these drives. This irregularity limits read/write bandwidth as the older disks remain largely idle, while the new disks get saturated. The same problem appears when each "drive" is a distinct NAS appliance.
Another challenge is the increased growth of data - both in terms of number of images and their resolution. These days even low-end point-and-shoot digital cameras employ sensors with over 6 million pixels. After JPEG compression, images are over 3MB in size. Not surprisingly, today more documents are being written to repositories than are being viewed, and the size of these repositories is growing inexorably. The ParaScale SolutionParaScale provides an ideal solution for document archiving. First, ParaScale neatly resolves the hot data/cold data problem (hot data is data in high demand; cold data is data that is accessed rarely) with network file migration. It then redistributes the files across disk drives to balance the load across servers. When a new storage node is added to a ParaScale Cloud Storage (PCS) solution, the PCS software populates the new server with existing files, freeing up space on existing servers so that new files can be evenly distributed. The file migration is 100% transparent to applications across all storage nodes. The load on any individual disk will decline as the repository grows, enhancing performance with time. Second, because this file migration is automatic and all files are maintained in a single namespace, there is minimal administrative burden. Managing a 3-node or 300-node deployment makes no difference to a trained administrator.Third, ParaScale deployment is cost effective as it can be scaled as you grow – the storage capacity needs to be increased only when required. Thus, servers and disk can be acquired or replaced at the latest, lowest prices. If file redundancy across servers is not required for performance or protection against server failure, administrators can achieve redundancy across disks within a server (e.g. RAID-5), further lowering disk costs. |

