Data Science

Encompassed under Data Science are:

  • Big Data Systems
  • Data Mining
  • Databases
  • Machine Learning
  • Natural Language Processing
  • Search
     

Data Science

Data science is concerned with the study of data both analytically and predictively, addressing quantitative and qualitative characteristics, and employing cutting edge scientific methods. In this day and age of pervasive information flow, modern knowledge-based societies need tools and means of analyzing and sifting through data to extract and visualize relevant information in a timely manner. The data could be structured such as those found in databases and knowledge bases; semi-structured as in streams of financial data, video data, or climate data, and unstructured data as in streams of text from the web, a collection of articles from a newspaper, or large collections of radio shows. The Data Science group at GW brings several strands of research in machine learning, databases, data mining, natural language processing, and computer vision together to create a comprehensive program that addresses the challenge of handling "Big Data". 

This research addresses: (1) advanced information retrieval topics, including math search and audio/image/video search; (2) data analytics, including the sampling and mining of web databases, online social networks, and search engines; and (3) use of natural language processing techniques for advanced search and mining applications.
 

Mona Diab

Professor

Department: Computer Science
Phone: (202) 994-8109
Email: mtdiab@gwu.edu
Full Profile

Research Interest: Professor Diab conducts research in Statistical Natural Language Processing (NLP) is a rapidly growing, exciting field of research in artificial intelligence and computer science. Interdisciplinarity is inherent to NLP, drawing on the fields of computer algorithms, software engineering, statistics, machine learning, linguistics, pragmatics, information technology, etc. In NLP, we model language and its use. We build both analytical models and predictive ones. In Professor Mona Diab's NLP lab, we address problems in social media processing, building robust enabling technologies such as syntactic and semantic processing tools for written texts in different languages, information extraction tools for large data, multilingual processing, machine translation, and computational sociolinguistic processing. Professor Diab has a special interest in Arabic NLP, where the emphasis has been on investigating Arabic dialect processing where there are very few available automated resources.


Claire Monteleoni

Associate Professor

Department: Computer Science
Phone:  202-994-6569
Email: cmontel@gwu.edu
Full Profile 

Research Interest:  Professor Claire Monteleoni's Machine Learning Group is concerned with developing principled methods (known as algorithms) to automatically detect patterns in data. In this era of "Big Data," the various forms of complexity inherent in real data sources increasingly pose challenges for machine learning algorithm design. The GW Machine Learning Group works on the design, analysis, and application of machine learning algorithms, motivated by problems in real data sources, including learning from data streams, learning from raw (unlabeled) data, learning from private data, and climate informatics: accelerating discovery in climate science with machine learning.
 



Tim Wood

Associate Professor

Department: Computer Science
Phone: (202) 994-1918
Email: timwood@gwu.edu
Full Profile

Research Interest: Professor Timothy Wood's research studies how cloud computing platforms can be built from massive data centers containing thousands of servers and storage devices. He seeks to improve the performance, reliability, and energy efficiency of these large distributed systems by adding automation and intelligence at the operating system and virtualization layers.
 


Abdou Youssef

Professor

Department: Computer Science
Phone: (202) 994-0388
Email: ayoussef@gwu.edu
Full Profile

Research Interest: Professor Abdou Youssef's research interests are search and retrieval, audio-visual data processing, pattern recognition, data error recovery, theory and algorithms. He and his students developed a system for the Federal government to recover from fax errors without retransmission. Recently, he has created for NIST a new math-search engine for its Digital Library of Mathematical Functions (DLMF) intended for scientists, engineers, and all users of mathematics; this search engine is first of its kind and is deployed online at http://dlmf.nist.gov/. Currently, his students and he are working on sentiment-detection in documents that involve reviews of medical devices/procedures and other types of reviews, as well as developing sophistical mathematical search techniques that enable knowledge discovery.