Metadata floes into innovation for Australian Antarctic Division

Metadata floes into innovation for Australian Antarctic Division

Australia’s Antarctic researchers may be half a world away, but their data management challenges aren’t. David Braue explains how metadata has helped the AAD turn mountains of scientific data into invaluable new information.

Australia's Antarctic researchers may be half a world away, but their data management challenges aren't. David Braue explains how metadata has helped the AAD turn mountains of scientific data into invaluable new information.

The need to better understand and manage data remains an ongoing issue for companies of all sizes, but companies serious about improving their data may find the best results by employing metadata police and linking data quality to funding.

Such efforts have significantly improved participation in initiatives to improve management of large volumes of data by the Australian Antarctic Division (AAD), which has spent the past few years standardising its efforts to use of metadata - descriptive data that provides a context for data sets that are collected - to keep up with the flood of new information being produced by its research teams.

Everything in its place

Location-related metadata, in particular, has proved invaluable in tying research trends to geographic locations. Because so much of the team's work is location-related, the value of that data can be compromised if data collectors fail to enter relevant metadata about the location where observations were made.

One AAD student researcher, for example, reported that a particular data set had been collected 'near that rock where Pat fell over' - hardly useful information for anybody else hoping to use the data later. "You'd be surprised how many data sets we get that don't even have the date recorded," says Dave Connell, scientific data coordinator with the AAD.

Recent field excursions have been backed by a range of mobile equipment designed to ease the creation of metadata. For example, researchers on a recent trip to Heard Island used handheld GPS units to collect data on local vegetation; each measurement they took was automatically marked with the exact location, date and time. And, because the data was already in a standard digital format, it could be broadcast daily via satellite for analysis by researchers back at the Antarctic base.

"When the researchers went out into the field, they could throw away their pen and pencil," Connell said. "Instead of writing down what plant species were present, it was just a matter of ticking them off of a list. This meant the data could be delivered much more quickly, since researchers didn't have to wait six to twelve months for the team to go home, process and hand over their data."

An Oracle based data management system, established several years ago at the organisation's 11-year-old, Tasmania-based Australian Antarctic Data Centre (AADC), consolidates all data collected by AAD research and uses associated metadata to extract relevant data sets for researchers at a later date. The database, which is built on Sun Solaris servers running the Oracle database, now has more than 1800 metadata records representing 33 million data points and references to more than 600GB of flat-file information.

In many cases, even the scientists never anticipate the uses that will be found for their data. For example, one team was able to use observations of seal and penguin colony locations to rework the helicopter flight paths so that flights to and from Casey Station wouldn't disturb local residents.

Another project saw GPS transmitters attached to all kinds of Heard Island fauna, with regular readings used to plot actual paths taken by the animals during the period of observation. Still another project used geographical metadata from a United States Antarctic Program database to monitor movement of sea ice along the Antarctic coast. Clean geographical metadata has also simplified the task of overlaying virtually any kind of data onto satellite maps and other photos.

"If you've standardised all your data, it allows for easy integration into databases," Connell explained. "This means that you can manipulate and extract data, and get data into Web services. You just never know what the data you collect may be used for; it could be reused for any purpose."

Line in the snow

AAD's scientific research and metadata collection may seem remote from the concerns of everyday business, but Connell's point is that the processes behind it are not. No matter what kind of data you collect, ensuring metadata is collected correctly and accurately at the point of generation means that it will be much easier to find and use down the track - and that can save time, money, frustration and effort by the people that need the data for their own activities.

Pointing out cost savings is particularly effective in gaining executive support for metadata initiatives, Connell added. Clean and consistent metadata can reduce duplication of data, prevent corporate knowledge from being lost, and save companies from having to regenerate data because it wasn't done properly the first time.

At $20,000 to $250,000 for a typical research expedition, "particularly in Antarctica, it's pretty expensive to collect data," said Connell. "At that kind of cost, you don't want to be collecting data and then have to go back and do it again."

Additional benefits typically supporting the decision to embrace metadata include clarification of obscure industry terms and acronyms; consistent formats of data; easier data recovery and backup; and simpler fulfilment of external reporting obligations.

Once executive support for metadata collection is achieved, the process tends to gain momentum: some organisations, Connell said, have as a matter of corporate policy simply refused to accept any data that doesn't have appropriate metadata.

Others have found the purse strings a more effective motivator: "our most successful policy has been to link the collection of data and metadata to provision of funding," said Connell. "Tell researchers they will get no money if there's no metadata, and it's amazing how quickly the metadata starts rolling in."

Building a metadata culture

To reinforce the importance of metadata, Connell recommends the appointment of a formal metadata coordinator - "someone who can police the data and has the power to chase people for their data," he explained - to ensure data quality and work with users to build and reinforce a culture of clean, standards-based information.

Such a position has been essential at the AADC, where a full-time metadata officer stays busy managing the constant flood of new data: the 2005/2006 Antarctic research season included 99 chief scientific investigators working on 59 different projects.

In the general business sphere, document management systems are the place where most users will come face to face with metadata: prompted for additional information about documents they store, some users have resented the additional effort such a system requires.

Here, explaining the value of metadata - and its importance in later information retrieval - may encourage buy-in, as will efforts to automate most metadata collection and minimise the task employees face. Data generators may also want to resist the urge to collect every type of metadata they can think of: for example, the ISO 19115 standard on geospatial data representation includes around 430 possible metadata fields, but less than ten of them are considered mandatory. Individual profiles, structured according to the companion ISO 19139 standard, dictate which parts of the ISO 19115 vocabulary are relevant to a particular project.

Availability of numerous open-source metadata systems and metadata standards - including DIF (used by the AAD), ANZLIC, FGDC, Dublin Core and AGLS (gaining currency within government departments) - means that the real challenge to improving data quality lies not with the technology, but the way it's implemented and used.

"It doesn't need to be expensive or difficult," Connell said. "The thing to do is to make sure you use existing standards, and make the whole process easy and use-friendly so you minimise the administrative burden. Automating that can take a lot of the burden off your users."

The path to good metadata

1. Get your metadata: Collect existing data, or decide what data you are going to be collecting.

2. Standardise your metadata: Decide what standards will be followed to ensure that it can be reused by others. These might include units of measurement, meaning of acronyms, date formats and other semantics. Also decide what baseline data (time, date, location) is relevant. Metadata standards such as DIH, AGLS, ANZLIC and ISO 19115 and 19139 provide flexible guidelines for representation of the metadata itself.

3. Capitalise on your data: Good metadata allows data to be used in many ways - for example, to support Web services, enable data analysis, facilitate database integration, and allow later reuse for other purposes. Make sure data is available to everyone that could potentially use it, and encourage lateral thinking to find new uses for it.

4. Demonstrate the value to management: Executive buy-in is important when trying to build a metadata culture. Let them know that good metadata reduces data collection costs; improves internal and external reporting capabilities; strengthens governance around data collection and retention; makes it easier to back up and recover data; and ensures that corporate data will remain readable long into the future.

5. Put your budget where your mouth is: Appointing a person whose sole responsibility is the quality and integrity of corporate data (and metadata) is a great way to lend weight to metadata efforts - and to put some teeth behind your rhetoric about the importance of metadata. Another way that's proved popular: link the funding of new projects to a guarantee that metadata will be provided alongside any data generated.

Comment on this story.