School of Computing
Department of Computer Science
CS5344: Big Data Analytics
Objectives] [Lecture Schedule]
and Materials] [Assignment (Hadoop Labs & Gradiance]
Mining of Massive Datasets. Anand Rajaraman and Jeffrey David Ullman.
Cambridge University Press. 2011.
version of this textbook is available for free at the webpage maintained by
the authors. We will follow some of the topics in this text, but not
necessarily the specific algorithms.
Many data mining
textbooks will also discuss algorithms that we will learn in our class.
Mining: Practical Machine Learning Tools and Techniques. Ian H. Witten, Eibe Frank, Mark A. Hall. Morgan Kaufmann, 2011. 3rd Edition.
text also has a webpage
and comes with supporting tools. The open source software WEKA (that supports
the algorithms presented in the book) is available for public use. An
electronic version of the following book is available in NUS Library.
There are many online
tutorials that you can find on the internet. Here are some books that you may
want to look at:
Definitive Guide. 3rd Edition. Tom White.
text is about Hadoop and MapReduce.
You should be able to find an online version on the internet.
Data-Intensive Text Processing with MapReduce. Jimmy Lin and Chris Dyer.
Morgan & Claypool Publishers, 2010.
text focuses on MapReduce algorithm design, with an
emphasis on text processing algorithms common in natural language processing,
information retrieval, and machine learning. You can find an online version here. You should be
able to find from the NUS Library as well.