US 20050076173 Method and apparatus for preconditioning data to be transferred on a switched underlay network
Data may be preconditioned to be transferred on a switched underlay network to alleviate the data access and transfer rate mismatch, so that large files may be effectively transferred on the network at optical networking speeds. A data meta-manager service may be provided on the network to interface a data source and/or data target to prepare a data file for transmission, such as by dividing a large file into multiple pieces and causing those pieces to be stored on multiple storage subsystems. The file may then be read from the multiple storage subsystems simultaneously and multiplexed onto scheduled resources on the network. This enables the high bandwidth transfer resource to be filled by a data transfer without requiring the storage subsystem to be augmented to output the data at the network transfer rate. The file may be de-multiplexed at the data target to one or more storage subsystems.
This application relates to communication networks and, more particularly, to a method and apparatus for preconditioning data to be transferred on a switched underlay network.
2. Description of the Related Art
Data communication networks may include various computers, servers, nodes, routers, switches, hubs, proxies, and other devices coupled to and configured to pass data to one another. These devices will be referred to herein as “network elements,” and may provide a variety of network resources such as communication links and bandwidths. Conventionally, data has been communicated through the data communication networks by passing protocol data units (or cells, frames, or segments) between the network elements by utilizing one or more type of network resources. A particular protocol data unit may be handled by multiple network elements and cross multiple communication links as it travels between its source and its destination over the network.
Conventional data networks are packet switched networks, in which data is transmitted in packet form which allows the packets to be commingled with other packets from other network subscribers. As the size of a data transfer increases in size, the ability to handle the data transfer on a packet network decreases. For example, a traditional packet switched network, such as a TCP/IP based communication network, will tend to become overloaded and incapable or inefficient at handling large data transfers. Thus, it is desirable, at least in large transfers, to obtain a dedicated path through the network to handle the transfer.
Grid networks is one emerging application in which it may be desirable to obtain switched network resources to handle transfers between network participants. Grid networks is a technology that may be used to build an overlay network, i.e. a computational Grid, on an existing network infrastructure using Grid computing technology. In a Grid network, which forms a virtual organization, Grid nodes are distributed widely and share computational resources such as disk storage, storage servers, shared memory, computer clusters, data mining, and visualization centers, although other resources may be available as well. One example of Grids is the TeraGrid, in which Grid computing technology has been deployed to enable supercomputer clusters distributed in four distant locations in the United States to collaboratively work on computationally intense tasks, such as high-energy physics simulations and long-term global weather forecasting. Other potential uses for Grid computing include genomics, protein structure research, computational fluid dynamics, astronomy and astrophysics, Search for ExtraTerrestrial Intelligence (SETI), computational chemistry, “intelligent” drug design, electronic design automation, nuclear physics, and high-energy physics. Grid computing may be used for many other purposes as well, and this list is not intended to be inclusive of all possible uses.
Some of these applications are or are expected to be capable of producing an incredible amount of data that will need to be distributed to other Grid applications for analysis. For example, high energy physics experiments expected to begin in 2007 are expected to produce data at a rate that may exceed one petabyte of data per year (1 petabyte=1000 Terabyte=1015bytes). This data will need to be sent to many different sites, such as research facilities and universities around the world, for analysis and storage.
One technology that is capable of handling these large data transfers is the use of switched optical networking. Typically, each transfer, which is typically several hundred gigabytes to several terabytes in size, uses a dedicated switched optical link. These links are typically provisioned to operate at 10 gigabits/second over each dedicated wavelength (lambda), and multiple lambdas can be multiplexed together to provide bandwidth sufficient to transfer these vast quantities of data.
Conventionally, large data files have been stored on disk drives and other storage systems having a data output rate of up to about 10 Megabits per second (Mbps). Striping techniques, and other techniques, may enable this to increase to up to 100 Mbps, and large storage systems, such as the EMC Celerra Clustered Network Server™ storage system, may increase the data output rate to up to 1-4 gigabits per second. While these storage systems may be scaled to store hundreds of terabytes of data, the data output rate from the storage system may be one or more orders of magnitude slower than the transfer rate of the switched underlay network, especially when several 10 Gbps lambdas are aggregated to handle the transfer.
SUMMARY OF THE DISCLOSUREAs described in greater detail herein, a method and apparatus for preconditioning data to be transferred on a switched underlay network alleviates the data access and transfer rate mismatch so that large files may be effectively transferred on the network. According to one embodiment of the invention, one or more storage meta-managers are provided on the network to interface data source and data target resources to prepare the data files for transmission over the network. Specifically, when a large file is to be transferred over a fast connection, e.g. an optical channel, the meta-manager may precondition the file to be transferred by breaking it into multiple pieces and distributing those pieces between multiple storage subsystems, each of which has access to a network element with access to the switched underlay network resource. When the data has been distributed and is ready to be transferred, the storage resources begin reading and simultaneously provide the data to the network element. The network element multiplexes the data onto the optical channel or otherwise makes the data available to the switched data resource so that the data may be provided at a higher data rate. The file is then passed across the network over the switched underlay network resources and a similar process de-multiplexes the data on the data target end. In this manner a meta-manager may effect the transfer of large files across the network at high data rates using lower data rate storage systems. The data once transferred may optionally be collected and reconditioned into a single file for use by computation and other resources at the data target. The storage resources may be associated with the network element or may be independent of the network element and connected to the network element.