SATA And Beyond

SATA And Beyond

By Mathew Overington

Keeping up with the changing face - and pace - of storage formats isfull-time job.

Data storage requirements are growing daily, as compliance forces businesses archive and keep accessible records dating back years. Email, financial records, reports, audits, presentations, sales sheets, and legal documentation all contribute to the headaches.

Once upon a time, installing a storage network for an enterprise was a fairly straightforward task: set up a fibre-channel network, install a few high-speed drives on it (including a RAID array, for redundancy), and rig up a tape-backup drive to archive content considered non-business-critical. The advent of technologies like Wide SCSI and, more recently, Serial ATA disks have conspired to lower the total cost of ownership and allow network administrators and CIOs to increase online data storage without footing massive bills.

As with any business purchase, ROI is a key consideration, followed closely by performance, reliability and scalability. There’s little point in having the most reliable data storage system in the world if it’s slow. Likewise, in case of an audit, the impact of non-compliance fines could easily dwarf the cost of installing high-speed, high-availability disks in the first place.

Administrators are now free to choose between high-speed fibre, SCSI, Serial ATA and tape to balance cost of storage with availability, and each has an impact on cost and performance. While Fibre Channel disk systems are hands-down the fastest around, they’re also the most expensive. High-demand, high-availability infrastructure is best off employing Fibre Channel, but smaller businesses with more modest requirements will be able to get adequate performance from a network-attached SCSI or even Serial ATA disk array over regular copper.


Huge capacity: Emerging technologies like Blu-Ray will play a role in offline enterprise data storage.

The Serial ATA specification has been a revelation for the IT industry. It allows for inexpensive disks to be employed in enterprise environments with full RAID support. Revisions to the first Serial ATA implementation enables drives to be hot-swappable and introduces Native Command Queuing (NCQ), which groups read and write requests by where the data is physically stored on the drive platters to optimise performance. Serial ATA was primarily designed for the desktop, but has found a comfortable niche in small businesses and less-demanding enterprise environments alike.

Environmental Concerns
Performance, scalability and cost are moot points if your disk system is flaky and prone to falling over when stressed. There are few quicker ways to trash a disk – and with it your reputation as a network administrator or CIO – than through overheating.

Hard drives generate a lot of heat, and drawing it away from sensitive electronics is a serious consideration. Excessive heat can shorten the life of a drive and cause failure, so it’s crucial to ensure your drive enclosures offer adequate cooling – even on the hottest days. It pays to keep disk storage subsystems in an air-conditioned server room to minimise the strain on fans and choose controllers with monitoring software to alert you remotely in case one fails.

SCSI
Once the jewel in the crown of enterprise storage, the SCSI specification originally drawn up in 1980 has undergone a number of revisions to keep it backwards compatible while still advancing transfer speeds. The original specification, offered an 8-bit “narrow” interface and a peak data transfer rate of 5 MB per second. The parallel topology of SCSI allows up to 14 drives to be connected together, but they must all share the 3.2Gb per second peak bandwidth offered by the bus. Enterprise-friendly features like hot swapping and tweaks to the SCSI standards have allowed engineers to continue developing faster drives to meet business storage needs. SCSI drives include support for Native Command Queuing (NCQ), but it’s commonly held that SCSI’s market is slowly shrinking, under attack from Serial ATA at the lower end of the storage game. Serial-Attached SCSI (SAS) is on the horizon, though, and it looks to be the emerging technology of choice for high-demand enterprise storage.

Serial ATA
While Serial ATA was originally designed with desktop end users in mind, some slight changes brought about in revision two of the specification opened up the entire enterprise market. Serial ATA-II introduces a number of tweaks, including support for NCQ, a boosted peak bandwidth of 375MB per second (double the original), and the ability to hot swap drives (or to remove and replace a drive without powering down a system). These changes allow the Serial ATA drives to go toe to toe with enterprise-class SCSI devices. Hitachi announced a 500GB Serial ATA drive pitched squarely at the enterprise customer at CES Las Vegas at the start of the year, and Seagate is close behind with a 400GB model.

As the name suggests, Serial ATA drives work serially, and rely on a single controller per physical disk. It’s impossible to daisy chain Serial ATA drives together on a single cable, so you have to ensure you have enough controllers per drives. Serial ATA drives are ideal for low-priority storage in the data centre.

The biggest benefit of Serial ATA drives is their relatively low cost to performance ratio. Cabling is straightforward, and drives that adhere to the Serial ATA-II specification offer NCQ and the ability to hot swap at a fraction of the cost of their SCSI counterparts. On the negative side, Serial ATA drives are primarily designed for desktop, and it’s impossible to find extremely high-speed models running at over 10,000RPM.

Tape
With the various breakthroughs in hard disk storage in recent years, it’s easy to overlook the preferred backup medium for several decades: tape. Though the cost benefits are slowly being eroded by alternate storage media, tape is still the cheapest way to store large quantities of data offline. One of the advantages of tape is in density: tapes can hold several terabytes of data, which helps make the medium perfect for storing off-site to safeguard against theft or fire damage.

Tape is ideal for backup, as it is inexpensive and highly reliable. Unfortunately, it is also relatively slow to write to, and can be extremely time-consuming to retrieve from; as the tape must be wound through to the exact point each desired file is stored. Tape is an ideal medium for storing files that are accessed infrequently, as it’s possible to write data directly from a storage area network onto a tape for archiving. Many enterprise customers still rely on tape to archive directly from the SAN. At this stage, it looks like enterprise-class optical storage systems relying on Blu-Ray technology may exert some pressure on low-end tape systems, but don’t expect tape drives to vanish from the enterprise in the mid to short-term future.

Serial Attached SCSI
One major revolution on the horizon is Serial Attached SCSI (SAS). Just as the parallel ATA architecture was replaced on the desktop by serial ATA, Serial attached SCSI promises to bring similar benefits to SCSI drives and address the demands of the enterprise market. The specification offers a peak bandwidth of 300GB per second and simplified connection interface (similar to the Serial ATA cabling) to help maximise airflow and keep components cool. Though SAS communicates point-to-point like Serial ATA, up to 128 drives can be connected simultaneously via cables of up to 8m in length. This is far superior to Serial ATA’s 1m range, and allows plenty of freedom for data centre architects to spec enormous disk arrays. IDC analyst Dave Reinsel predicts widespread adoption of SAS and Serial ATA hard disks in enterprise workstations and servers, according to a whitepaper titled “Evolution in Hard Disk Drive Technology: SAS and Serial ATA”. Reinsel sees distinct benefits in terms of reduced cost and complexity in deploying SAS and Serial ATA drives through an enterprise as both high speed and high volume disk systems run on the same interconnect standard. “The ability to mix SAS and Serial ATA HDDs provides storage system suppliers with the opportunity to more precisely match technology to user needs without the complexity and cost of translating data from one interconnect standard to another”, says Reinsel.

Expect to see Serial Attached SCSI drives taking off later this year to threaten Fibre Channel at the performance end of the enterprise storage market. The relatively high cost of SCSI will limit its impact on the already-established Serial ATA market, though, and the SAS drives are slated to become the high performance replacements for today’s Fibre Channel models.

iSCSI
Another technology making an impact on SANs is iSCSI, which is an approach to transmitting storage data over an IP network. It relies on IP packets to send data across geographically distant data centres, and offers a benefit over Fibre Channel over IP (FCIP), as it works over regular Ethernet connections. A number of major industry vendors including Cisco and IBM have rallied behind iSCSI, and it will be an area to watch closely if you operate a number of data centres in separate locations.

In The Real World
Of course, even though Serial ATA disks may look good on paper, the acid test comes in real-world deployment. A few high profile and high-intensity data processing centres have successfully deployed Serial ATA into enterprise environments, but few are more impressive than the Genome Sequencing Center at Washington University in St Louis, USA. The centre crunches a lot of numbers and has complex network storage demands. The research facility is responsible for genomic sequencing, a task that places a significant demand on processor and storage space. The centre relies on small files to store data, and has had up to 400 million files in the system at once, which greatly complicates backup, as each has to be written individually to a backup medium.

Kelly Carpenter, senior technical manager at the research centre, presides over an impressive array of Fibre Channel and Serial ATA drives. Carpenter says that while the Fibre Channel SAN handles high-intensity storage, NAS is served by a 50TB BlueArc Titan 32, with a further 20TB of fibre cannel, and 30TB of Serial ATA space.

“We’ve used the Serial ATA disks mostly as backup for the Fibre Channel side of the BlueArc,” says Carpenter, in reference to the BlueArc storage system employed by the centre.

“We generally are pretty bleeding-edge and we tested a whole lot of things that are really brand-new out-of-the-gate,” says Carpenter. The centre relies on NDMP to backup data from the high-performance Fibre Channel network across to less-expensive Serial ATA disks, and then again to move data from Serial ATA drive onto a tape for archiving. This allows Carpenter to balance data lifecycles with storage cost and cascade data as its demand is reduced. Though the Serial ATA disks didn’t have a great reputation when Carpenter deployed them, he’s had no problems, “we haven’t hit it super-duper hard in production. We have done it and it works.” If Carpenter can successfully rely on Serial ATA disks to handle some of the storage duties in a high-performance network, other industries including rendering, animation, geographic information systems (GIS) and design can take home the same lessons.

Data Life-Cycles
One of the jobs of network or storage administrator is to monitor the life cycle of all data in the enterprise environment. One key element to cascading data from high availability through to low-priority and even offline storage is the notion of timeliness. For example, while the CEO may be able to wait for two hours while you extract a three-year-old email from a tape backup, you’d better be able to retrieve a month-old one instantly. To that end, it’s imperative to have a detailed plan covering when data can be moved from a high-availability (Fibre Channel and SCSI) disk system through to a low-priority (Serial ATA) or offline (tape) state. With each step down in availability, you reduce the cost of storage, so it’s vital to migrate data from one medium to another as it ages and becomes less mission-critical.

Fitting It All Together
The secret to designing and building a reliable, efficient data centre is in choosing the right product for each role. A well-designed data centre will employ a blend of high-speed Fibre Channel or SAS disks, Serial ATA RAID arrays and a backup medium like tape or optical disc. The key to optimising the system lies in working out when to cascade data from one medium – and one level of availability – to the next. It’s imperative to define policies to allow infrequently accessed content to be moved to low cost storage as needed, freeing up high-speed disks for more business-critical tasks, and reducing the cost of data warehousing.

Far from being merely restricted to desktops applications, Serial ATA has a serious role to play in the data centre. Current enterprise-class Serial ATA drives boast 1 million hours mean-time-to-failure (MTTF), which is the same level of reliability offered by SCSI drives a generation ago. Earlier generations of desktop hard disk averaged around 500,000 to 600,000 hours’ operation at around a 40% duty cycle. The current drives are more like yester-year’s SCSI drives, but with a Serial ATA interface. 8MB caches are standard, and support for Native Command Queuing only strengthens Serial ATA’s enterprise credibility. What’s more, the fluid dynamic bearings (FDB) found in most high-end Serial ATA drives greatly reduce spindle vibration, heat and noise and contribute to long-term reliability. Serial ATA disks with high spindle speeds configured in striped arrays can provide speed approaching the mark set by 15,000RPM SCSI disks. And at the other end of the spectrum, low-speed (5400RPM) drives are available in massive capacities; ideal for bulk storage with little associated cost or risk. In an industry prone to change, the emergence of serial technologies like Serial ATA and SAS promise to help simplify the data centre by reducing cost and complexity and allowing managers to allocate resources on demand.

Feeling Blue?

Optical storage is set for a huge shot in the arm in the next year as the battle for high-definition digital versatile disc (DVD) hots up. Here are the two major players.

In a war mirroring the VHS-Beta battle of the 80s, a couple of rival DVD formats – Blu-Ray and HD DVD – have fought hard for industry support over the last year. It now looks like Blu-Ray, championed by Sony, Dell, LG, Panasonic and Samsung, will gain widespread acceptance through 2006, if a bevy of product released at the January Consumer Electronics Show (CES) in Las Vegas is anything to go by.

While the focus of both HD DVD and Blu-Ray is to deliver high-definition video content, the technologies are equally well suited to enterprise storage.

DVD-RAM is currently relatively popular for archiving offline content, but the 25GB capacity of single-layer Blu-Ray discs is appealing. Though the discs are slated to be more expensive than conventional 4.3GB DVDs, their high capacity will allow network administrators to keep a few optical drives on a storage network for low-demand data.Consumer-level Blu-Ray hardware is already on sale in Japan, and a number of products will hit the market through 2006. Beefier, enterprise-oriented kit is set to follow late in the year to add another alternative to tape.

Fibre Overkill

Storage network plumbing is a serious issue for network designers, but it’s possible to balance performance and cost to come in under budget, with performance to spare.

Most Storage Area Networks (SANs) rely on Fibre Channel, but the high cost of fibre when compared with copper makes it inappropriate for many networks - or at least some subnetworks. Fibre Channel is hands-down the fastest network performer, with full duplex bandwidth topping out at 1200MB/sec over a 50km range. Unfortunately, fibre is extremely expensive, with a deployment costing twice as much as a comparable copper rollout.

The Fibre Channel Protocol (FCP) defines how to run SCSI commands over Fibre Channel, but it can also be extended to copper. FCP relies on a Fibre Channel Host Bus Adaptor (HBA) to replace the SCSI controller in each server and interface with the SAN fabric. From here, the SAN fabric is connected to disk arrays and backup systems, but it’s possible to reduce cost by deploying part of the network over copper instead of fibre.

Start out by mapping your SAN and working out where you could stand to lose a little performance? Realistically assess your expectations for performance and carefully consider load across low-demand servers and less time-critical assets like tape backup. In the case of a tape backup system, consider whether a fibre network would run faster than the disk system in the first place? Trim the fat,while maintaining an overhead to cover growth in storage and data access demands, and you have an elegant storage network design to provide (fibre) on-demand access to mission-critical files and resources, with near-line storage relying on copper.

Ground Control

While most thought is put into the disks and interconnects, it’s crucial to consider the role controller cards play in managing access to data.

Most current generation desktops rely on Serial ATA (Serial ATA) hard disks, which are inexpensive and easy to deploy and can be connected via on-board controllers. Unfortunately motherboard-based controllers generally suffer from poor management capabilities, so it can often be worthwhile to opt for a third-party controller from a manufacturer like Adaptec to allow centralised network management of client machines.

Most high-density servers still rely on high-performance SCSI-based disk systems to maintain reliability and scalability to a number of concurrent connections. Choosing a dual-channel SCSI controller with support for RAID arrays will allow for redundancy and let backups to a remote backup site be scheduled during periods of low demand. Even better, as network software is designed to interface directly with the cards, an administrator is able to monitor performance remotely, irrespective of host operating system.

The larger the environment, the more important is it becomes to standardise controllers as it allows administrators to interface directly with the hardware and perform disk tasks remotely, significantly reducing the maintenance impact, even on a heterogeneous network.

Related Article:

IDC Predicts SMB Storage Surge

Business Solution: