School of Computing

Department of Computer Science

CS5344:   Big Data Analytics Technology  


[Announcements] [Instructor] [Course Objectives] [Lecture Schedule] [Reference Texts and Materials] [Assignment (Hadoop Labs & Gradiance] [Project] [Assessment]

Main Text

         Mining of Massive Datasets. Anand Rajaraman and Jeffrey David Ullman. Cambridge University Press. 2011.

o   A version of this textbook is available for free at the webpage maintained by the authors. We will follow some of the topics in this text, but not necessarily the specific algorithms.


Reference Text

Many data mining textbooks will also discuss algorithms that we will learn in our class.

         Data Mining: Practical Machine Learning Tools and Techniques. Ian H. Witten, Eibe Frank, Mark A. Hall. Morgan Kaufmann, 2011. 3rd Edition.

o   This text also has a webpage and comes with supporting tools. The open source software WEKA (that supports the algorithms presented in the book) is available for public use. An electronic version of the following book is available in NUS Library.


MapReduce/Hadoop Text

There are many online tutorials that you can find on the internet. Here are some books that you may want to look at:

         Hadoop: The Definitive Guide. 3rd Edition. Tom White. Yahoo! Press.

o   This text is about Hadoop and MapReduce. You should be able to find an online version on the internet.

         Data-Intensive Text Processing with MapReduce. Jimmy Lin and Chris Dyer. Morgan & Claypool Publishers, 2010.

o   This text focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning. You can find an online version here. You should be able to find from the NUS Library as well.