{"63111":{"#nid":"63111","#data":{"type":"news","title":"Georgia Tech Assists in Identifying Files for United Kingdom Archive","body":[{"value":"\u003Cp\u003EResearchers at the Georgia Tech Research Institute (GTRI) are sharing results of advanced file-format recognition research with The National Archives of the United Kingdom.  The effort could enhance worldwide capability to manage the vast array of file formats created since the computer age began. \u003C\/p\u003E\n\u003Cp\u003EImproving archivists\u0027 ability to categorize and access hundreds of different computer file formats is critical in the digital age.  Increasingly, archives receive large quantities of government and other records in a wide variety of digital formats. \n\u003C\/p\u003E\n\u003Cp\u003E\u0022The ultimate problem we\u0027re addressing here is technical obsolescence,\u0022 said William Underwood, a principal research scientist leading the file-recognition effort for GTRI. \u0022As software programs have been superseded over the years, it\u2019s become critical to automate the enormous task of categorizing, verifying and viewing hundreds of past and present file formats.\u0022\n\u003C\/p\u003E\n\u003Cp\u003EOne major facilitator of that task is the PRONOM service, developed by The National Archives of the U.K.  This file-format registry, which can be utilized online by archivists and others worldwide, employs a database containing details of more than 750 different digital file formats. Those formats, in turn, are accessed by a file-format identification tool called DROID.\n\u003C\/p\u003E\n\u003Cp\u003EUnderwood explained that archivists face the task of distinguishing among data files in hundreds of different formats. At the most basic level, categorizing these data formats requires software tools that examine file extensions, which are the identifying characters such as \u0022doc\u0022 or \u0022pdf\u0022 found at the end of filenames.\n\u003C\/p\u003E\n\u003Cp\u003EYet a file extension -- an external identifier that is easily modified or deleted -- can be inaccurate.  More critical is the capability to identify correctly the distinctive internal signature that characterizes a file\u0027s format.\n\u003C\/p\u003E\n\u003Cp\u003EGTRI, in cooperation with the U.S. National Archives and Records Administration (NARA), is helping the United Kingdom expand the roster of internal signatures in the PRONOM database. GTRI has added more than 50 such signatures to PRONOM in the past months, increasing the number of signatures in the database by almost a quarter, with more additions expected next year. This work is being performed at the request of the National Archives Center for Advanced Systems and Technologies (NCAST), a NARA unit.\n\u003C\/p\u003E\n\u003Cp\u003ECurrently, about a third of PRONOM\u0027s 750 file formats have internal signatures. Increasing the number of internal signatures is important, Underwood said, because it helps the DROID tool identify files more accurately. In turn, increased accuracy enables digital archivists to better identify older, obsolete file formats and develop appropriate migration strategies and preservation tools.\n\u003C\/p\u003E\n\u003Cp\u003E\u0022We are grateful to NARA and the Georgia Tech Research Institute for the work they have recently undertaken on file-format research,\u0022 said David Thomas, director of technology at The National Archives of the UK.  \u0022The decision to share their work...has significantly improved the PRONOM database and will be of enormous benefit to the wider digital preservation community.\u0022 \n\u003C\/p\u003E\n\u003Cp\u003EThe technology contributed to The National Archives of the UK is derived from GTRI\u0027s research into Advanced Language Processing Technology Applied to Digital Records, a project sponsored by the U.S. Army Research Laboratory and by NCAST. This work applies computational linguistics technology to summarizing, accessing, reviewing and preserving electronic records of the Department of Defense, federal agencies and presidential administrations.\n\u003C\/p\u003E\n\u003Cp\u003E\u0022In PRONOM\/DROID, The National Archives of the U.K. has responded to an essential need for preserving and providing sustained access to valuable digital information,\u0022 said Kenneth Thibodeau, director of NCAST.  \u0022We are happy to be able to contribute to enhancing a tool that we use in NARA\u0027s Electronic Records Archives system. This helps us and also benefits anyone who needs to preserve digital assets.\u0022\n\u003C\/p\u003E\n\u003Cp\u003EThe first version of PRONOM was developed by The National Archives\u0027 Digital Preservation Department for internal use in March 2002 and was launched as a free online service to the public in February 2004. In 2007 The National Archives won the Digital Preservation Award for its development of the PRONOM and DROID tools.\n\u003C\/p\u003E\n\u003Cp\u003EIn 2011, PRONOM data will be released in a linked, open format. This move will make it easier for others to reuse the data, and will provide a means to extend and develop the dataset. More information is available at \u003Ca href=\u0022http:\/\/labs.nationalarchives.gov.uk\/wordpress\/\u0022 title=\u0022http:\/\/labs.nationalarchives.gov.uk\/wordpress\/\u0022\u003Ehttp:\/\/labs.nationalarchives.gov.uk\/wordpress\/\u003C\/a\u003E. \t\n\u003C\/p\u003E\n\u003Cp\u003E\u0022The GTRI computational-linguistics team will certainly continue to contribute to PRONOM,\u0022 Underwood said.  \u0022We\u0027re eager to use our experience in language-processing technology to support the evolution of this internationally important file format database.\u0022\n\u003C\/p\u003E\n\u003Cp\u003E\u003Cstrong\u003EResearch News \u0026amp; Publications Office\u003Cbr \/\u003E\nGeorgia Institute of Technology\u003Cbr \/\u003E\n75 Fifth Street, N.W., Suite 314\u003Cbr \/\u003E\nAtlanta, Georgia  30308  USA\n\u003C\/strong\u003E\u003C\/p\u003E\n\u003Cp\u003E\u003Cstrong\u003EMedia Relations Contacts\u003C\/strong\u003E: Kirk Englehardt (404-407-7280)(\u003Ca href=\u0022mailto:kirk.englehardt@gtri.gatech.edu\u0022\u003Ekirk.englehardt@gtri.gatech.edu\u003C\/a\u003E) or John Toon (404-894-6986)(\u003Ca href=\u0022mailto:jtoon@gatech.edu\u0022\u003Ejtoon@gatech.edu\u003C\/a\u003E).\n\u003C\/p\u003E\n\u003Cp\u003E\u003Cstrong\u003EWriter\u003C\/strong\u003E: Rick Robinson\n\u003C\/p\u003E\n\u003Cp\u003E\u0026nbsp;\u003C\/p\u003E","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":[{"value":"\u003Cp\u003EResearchers at the Georgia Tech Research Institute (GTRI) are sharing results of advanced file-format recognition research with The National Archives of the United Kingdom.  The effort could enhance worldwide capability to manage the vast array of file formats.\u003C\/p\u003E","format":"limited_html"}],"field_summary_sentence":[{"value":"GTRI researchers are helping archivists identify digital files."}],"uid":"27303","created_gmt":"2010-12-09 01:00:00","changed_gmt":"2016-10-08 03:07:54","author":"John Toon","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2010-12-09T00:00:00-05:00","iso_date":"2010-12-09T00:00:00-05:00","tz":"America\/New_York"},"extras":[],"hg_media":{"63112":{"id":"63112","type":"image","title":"Archivists must classify file types","body":null,"created":"1449176649","gmt_created":"2015-12-03 21:04:09","changed":"1475894552","gmt_changed":"2016-10-08 02:42:32","alt":"Archivists must classify file types","file":{"fid":"191736","name":"tzn11658.jpg","image_path":"\/sites\/default\/files\/images\/tzn11658_0.jpg","image_full_path":"http:\/\/www.tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/tzn11658_0.jpg","mime":"image\/jpeg","size":29055,"path_740":"http:\/\/www.tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/tzn11658_0.jpg?itok=rF9hRHUt"}},"63113":{"id":"63113","type":"image","title":"3-D map of the United Kingdom","body":null,"created":"1449176649","gmt_created":"2015-12-03 21:04:09","changed":"1475894552","gmt_changed":"2016-10-08 02:42:32","alt":"3-D map of the United Kingdom","file":{"fid":"191737","name":"tqo11658.jpg","image_path":"\/sites\/default\/files\/images\/tqo11658_0.jpg","image_full_path":"http:\/\/www.tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/tqo11658_0.jpg","mime":"image\/jpeg","size":56381,"path_740":"http:\/\/www.tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/tqo11658_0.jpg?itok=FWNqhkNk"}}},"media_ids":["63112","63113"],"related_links":[{"url":"http:\/\/www.gtri.gatech.edu\/","title":"Georgia Tech Research Institute"}],"groups":[{"id":"1188","name":"Research Horizons"}],"categories":[{"id":"143","name":"Digital Media and Entertainment"},{"id":"147","name":"Military Technology"},{"id":"135","name":"Research"}],"keywords":[{"id":"6624","name":"archives"},{"id":"1446","name":"digital"},{"id":"11430","name":"file-format"},{"id":"6748","name":"recognition"}],"core_research_areas":[],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003E\u003Cstrong\u003EJohn Toon\u003C\/strong\u003E\u003Cbr \/\u003EResearch News \u0026amp; Publications Office\u003Cbr \/\u003E\u003Ca href=\u0022http:\/\/www.gatech.edu\/contact\/index.html?id=jt7\u0022\u003EContact John Toon\u003C\/a\u003E\u003Cbr \/\u003E\u003Cstrong\u003E404-894-6986\u003C\/strong\u003E\u003C\/p\u003E","format":"limited_html"}],"email":["jtoon@gatech.edu"],"slides":[],"orientation":[],"userdata":""}}}