Australia at the helm of language tech research

Australia at the helm of language tech research

By Brendan O'Hanlon

A team of academics and marketplace watchdogs will spearhead Australian research into online fraud and breaches of the Corporations Act by Web sites.

Macquarie and Sydney Universities, alongside the Australian Securities and Investments Commission (ASIC) and the Capital Markets Cooperative Research (CMCRC) are in the preliminary stages of developing an Internet document classification system, called Scamseek.

The project will focus specifically on identifying breaches of the Corporations Act on Internet Web sites. Problems with corporate records and the realisation of the extent of white collar crime has come to light in the wake of the large corporate implosions, however, there is a tendency to gloss over the fact that scams continue to be run at a smaller level, with the Internet giving perpetrators the perfect medium for their activities. The much-maligned 'Nigerian Scam' wooed many gullible investors, while Web pages which offer misleading investment tips, have been proven to be written by under-qualified individuals, continue to divest the unwary of cash.

As such, the project has been allocated a budget of $1 million and will employ innovative language technology to scrutinise Web sites whose content is deemed suspicious.

It is a formidable task; take time to think about the huge array of factors that influence firstly the linguistic meaning of a word, and, most importantly, the final interpretation of the word. Remember the film Snatch, in which the two main characters are cheated in a deal with Brad Pitt's fast-talking Gypsy character? Their inability to comprehend nuances in a language not too far removed from theirs made them dupes.

For ASIC itself, there was no one incident that inspired the development of a new system.

ASIC had a variety of tried and tested methods in place that were used for flushing scam artists out of cyber space. Keith Inman the Director for Electronic Enforcement at ASIC discussed the original measures that were in place. Called "First Days", they involved a team of employees using traditional search engines to scour the Web, in search of any site which might contain potentially fraudulent material. A list of such pages was compiled; those featuring the severest, most deceptive content were ranked and then singled out for inspection by ASIC. The intention was to shut these sites down as swiftly as possible, thereby preventing any mistake on the consumers' part.

ASIC's in-house record system was used to for entity matching. Details of an inspected site were checked against ASIC's details, to ascertain whether any of the content originated from a 'previous offender.' Seeking a more efficient process, ASIC turned to computer programmes developed 'in house', which, according to Keith Inman, were combined into a cohesive series of scripts. Again, these were used to rank malignant Web sites so that ASIC could deal with them. However, this still did not meet the level of efficiency desired by ASIC, and this resulted in the ASIC seeking out partners and eventually initiating the research project for Scamseek.

Scamseek will have the ability to determine risks, by scanning entities against public and private databases, whilst marking and compiling information on and from sites whose content is deemed to be unacceptable. The system will feature some of the most up todate research in document classification, provided by SMARTS (Security MarketsAutomated Training and Surveillance), including a 'Web spider', which uses accurate alert algorithms to identify 'abnormal' behaviour. Professor Jon Patrick, team leader for the CMCRC and the University of Sydney, said the system would be able to indicate a suspicious Web site by indicating common linguistic features:

"Scams that are run through Web sites tend to use certain words, in certain ways, with certain characteristics -but they can be cleverly disguised as well," Professor Patrick said.

The key issue facing the project is employing new analytical methods to identify the meaning of words. Deciphering duplicitous, coded or obscure language on the Web requires new methods, hence the input of several disciplines and departments into the project. According to Professor Patrick, two academics from the Department of Linguistics at Macquarie University, Professor Christian Matthiessen and Doctoralscholar Maria Couchman would contribute the:"New theories on textual meanings to unravel the deep linguistic features that will enable us to detect scam proposals no matter what surface form of language they use."

Interest in the research extends well beyond the Australian community. The project has piqued the curiosity of the US, Holland, Canada and the UK. Professor Patrick cited the heightened interest in language technologies as the cause of the enthusiasm in the project and Scamseek itself.

"Successful language technologies are the next generation in the computer revolution andwe are proud to be part of such a significant step forward for research and Australian ingenuity," he said.

Related Articles:

Slammer could hint to future problems

Open Source Group Lists Top Ten Security Risks

Business Solution: