TIDES is an ambitious technology development effort, funded by DARPA. It stands for Translingual Information Detection, Extraction and Summarization. It is focused on the automated processing and understanding of a variety of human language data. The primary goal is to make it possible for English speakers to find and interpret needed information quickly and effectively regardless of language or medium.
To provide that overall capability, TIDES is intended to develop a suite of robust, powerful, broadly useful component capabilities; integrate those components effectively in technology demonstration systems; and experiment with the systems on real-world problems. These are all high-risk research activities.
The four component capabilities are
- Detection – Find or discover needed information.
- Extraction – Pull out key facts.
- Summarization – Reduce number of words that someone must read.
- Translation – Convert text from another language into English.
Detection, extraction, and summarization must work within a language (monolingually) and across languages (translingually) to aid people who speak only English.
In addition to creating effective technology, TIDES aims to develop methods for porting these capabilities rapidly and inexpensively to other languages, including languages having severely limited linguistic resources.
TIDES will integrate its component capabilities with one another and with other technologies to produce synergistic, effective, end-to-end, technology demonstration systems able to address multiple real-world applications.
Investigative Data Warehouse
The FBI's Investigative Data Warehouse contains an "Open Source News Library". This library contains news gathered by the TIDES program. The information is collected from dozens of public websites all over the world, such as Ha'aretz, Pravda, the Jordan Times, The People's Daily, The Washington Post, and others. It uses the MiTAP (Mitre Text and Audio Processing) system.
Notes and Bibliography
Full article ▸