{"638534":{"#nid":"638534","#data":{"type":"news","title":"New Toolchain Automatically Finds Database Management System Bugs","body":[{"value":"\u003Cp\u003EGeorgia Tech researchers have applied fuzzing techniques to find bugs in database management systems (DBMS). Their new toolchain APOLLO automatically detects, reports, and diagnoses a common DBMS bug.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EAPOLLO automates the generation of regression-triggering queries, simplifies the bug reporting process for users, and enables developers to quickly pinpoint the root cause of performance regressions.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe researchers discovered 10 previously unknown and unique performance regressions, reduced query size by 4.2 times, and identified branches related to the root cause.\u003C\/p\u003E\r\n\r\n\u003Cdiv\u003E\u0026nbsp;\u0026quot;We believe that Apollo will\u0026nbsp;assist database system developers with the tedious process of testing these complex systems,\u0026quot; said School of Computer Science (SCS) Assistant Professor \u003Cstrong\u003EJoy Arulraj\u003C\/strong\u003E. \u0026quot;This\u0026nbsp;will allow them\u0026nbsp;to focus on more important problems in developing database systems.\u0026quot;\u003C\/div\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003EDBMS problems\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe complexity of DBMS increases their potential for error. An upgrade on a DBMS can unexpectedly slow\u0026nbsp;down certain queries, a problem known as a performance regression bug.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;A critical regression can reduce\u0026nbsp;performance by orders of magnitude, in many cases converting an interactive\u0026nbsp;query to an overnight execution,\u0026rdquo; said SCS Ph.D. student \u003Cstrong\u003EJinho Jung\u003C\/strong\u003E.\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETo improve this issue, the researchers used the toolchain approach, a pipeline of distinct software development tools that are linked together by specific stages.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe team\u0026rsquo;s new toolchain has three components:\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESQLFuzz \u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESQLFuzz generates structured query language (SQL), the language databases communicate with, to find performance regressions. It works by bombarding a system with many randomly generated inputs to trigger bugs, a technique known as fuzzing.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;During the fuzzing test, we noticed that validating performance regressions is challenging because the ground\u0026nbsp;truth of the regression is unclear and may be heavily affected by the\u0026nbsp;\u003C\/p\u003E\r\n\r\n\u003Cp\u003Eexecution environment and lead to a lot of false-positive bugs,\u0026rdquo; Jung said.\u003C\/p\u003E\r\n\r\n\u003Cp\u003ETo counter this, the researchers applied validation checks to reduce false positives.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESQLMin\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003ESQLMin minimizes the regression-triggering query, so performance isn\u0026rsquo;t compromised by trying to determine the essence of a regression-causing statement. The researchers achieve this by using both bottom-up and top-down approaches.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EBottom-up strategy extracts one sub-query from the database and monitors whether there is still a regression problem. If there is one, SQLMin keeps the sub-query for further analysis. The top-down strategy removes as many expressions as possible.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u0026ldquo;This takes out as many elements of the statement as possible while ensuring that the reduced query still triggers the problem,\u0026rdquo; Jung says.\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Cstrong\u003ESQLDebug\u003C\/strong\u003E\u003C\/p\u003E\r\n\r\n\u003Cp\u003EOnce a regression report is filed, developers must diagnose its root cause. To simplify the diagnosis process, the researchers use two techniques to automatically identify the root cause.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EFirst, they use the mathematical approximation method of bisecting to find the historical commit, or first code update, that the developer pushed to the code repository. Second, they leverage statistical debugging to determine if performance decreased because of suspicious source lines within the commit.\u003C\/p\u003E\r\n\r\n\u003Cp\u003EThe researchers introduced Apollo at the \u003Ca href=\u0022https:\/\/vldb2020.org\/\u0022\u003EVery Large Data Bases\u003C\/a\u003E conference from Aug. 31 to Sept. 4. Jung wrote the paper, \u003Ca href=\u0022\/\/www.vldb.org\/pvldb\/vol13\/p57-jung.pdf\u0022\u003E\u003Cem\u003EAPOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems\u003C\/em\u003E\u003C\/a\u003E\u003Cem\u003E,\u003C\/em\u003E with SCS Ph.D. postdoctoral student \u003Cstrong\u003EHong Hu\u003C\/strong\u003E, Arulraj, and Associate Professor \u003Cstrong\u003ETaesoo Kim\u003C\/strong\u003E, and eBay\u0026rsquo;s \u003Cstrong\u003EWoonhak Kang\u003C\/strong\u003E.\u003C\/p\u003E\r\n","summary":null,"format":"limited_html"}],"field_subtitle":"","field_summary":"","field_summary_sentence":[{"value":"Georgia Tech researchers have applied fuzzing techniques to find bugs in database management systems (DBMS). "}],"uid":"34541","created_gmt":"2020-08-28 17:07:12","changed_gmt":"2020-08-28 17:18:49","author":"Tess Malone","boilerplate_text":"","field_publication":"","field_article_url":"","dateline":{"date":"2020-08-28T00:00:00-04:00","iso_date":"2020-08-28T00:00:00-04:00","tz":"America\/New_York"},"extras":[],"hg_media":{"638535":{"id":"638535","type":"image","title":"Apollo","body":null,"created":"1598634830","gmt_created":"2020-08-28 17:13:50","changed":"1598634830","gmt_changed":"2020-08-28 17:13:50","alt":"Apollo","file":{"fid":"242813","name":"apollo.png","image_path":"\/sites\/default\/files\/images\/apollo.png","image_full_path":"http:\/\/www.tlwarc.hg.gatech.edu\/\/sites\/default\/files\/images\/apollo.png","mime":"image\/png","size":246585,"path_740":"http:\/\/www.tlwarc.hg.gatech.edu\/sites\/default\/files\/styles\/740xx_scale\/public\/images\/apollo.png?itok=029TwXnO"}}},"media_ids":["638535"],"groups":[{"id":"47223","name":"College of Computing"},{"id":"50875","name":"School of Computer Science"}],"categories":[],"keywords":[],"core_research_areas":[{"id":"39431","name":"Data Engineering and Science"}],"news_room_topics":[],"event_categories":[],"invited_audience":[],"affiliations":[],"classification":[],"areas_of_expertise":[],"news_and_recent_appearances":[],"phone":[],"contact":[{"value":"\u003Cp\u003ETess Malone, Communications Officer\u003C\/p\u003E\r\n\r\n\u003Cp\u003E\u003Ca href=\u0022mailto:tess.malone@cc.gatech.edu\u0022\u003Etess.malone@cc.gatech.edu\u003C\/a\u003E\u003C\/p\u003E\r\n","format":"limited_html"}],"email":[],"slides":[],"orientation":[],"userdata":""}}}