44-517 Course Syllabus
Area
School of Computer Science and Information SystemsCourse Title
44-517 Big DataCourse Credit
3 hoursPlacement in Curriculum
This course is typically offered in the later years of an undergraduate degree or after the first semester of a graduate program.Prerequisites
Undergraduate prerequisites: MATH 17230 or MATH 17316 with a grade of C or better and CSIS 44242 with a grade of C or better. Graduate prerequisites: CSIS 44542 with a grade of B or better, or concurrent enrollment in CSIS 44542, or consent of instructor.Section Details
Spring 2021Sec 01 - MWF 1-1:50pm CH 3300
Sec 02 - MWF 2-2:50pm CH 3300
Course Description
An introduction to the design of data-intensive, reliable, scalable, and maintainable systems. This may include concepts such as parallel programming, distributed computing, distributed file systems, MapReduce, regular expressions, and the ingesting and processing of data at-rest and data in motion. Tools used may include Hadoop, HDFS, Pig, Hive, Spark, Storm, Kafka, Mahout, MLlib, etc.Course Rationale
This course involves an overview the design and implementation of big data solutions covering common approaches to processing big data at rest and data in motion.Student Learning Outcomes
Compentency | BS Data Science Program Outcome | Assessment |
---|---|---|
Managing Information | DSI students will access, generate, and reorganize information using contemporary technologies. | Selected assignment(s) |
Teamwork | DSI Students will work as a team to design, implement, and deliver solutions to problems using best practices with contemporary technologies. | Selected assignment(s) |
Additional student learning outcomes include:
Materials
Recommended references
- Mining of Massive Datasets by Leskovec, Rajaraman, and Ullman, Third Edition, 2019.
- Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann, 2017 (preview).
- Big Data by Nathan Marz with James Warren, 2015.
- Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer, 2010.
- MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat, 2004.
- Cornell Java reference (online book) by Professor David Gries.
- MIT Software Construction Reading 25: Map, Filter, Reduce.
Required
Students must have access to the following at every course meeting:
- A bound notebook with pencil/pen for taking notes and submitting written content (e.g., pop quizzes)
- Their campus-assigned laptop, in working order, with all required software
- Free Git distributed version control system
- Free TortoiseGit for integrating Git with Windows File Explorer
- Free PuTTY for creating SSH public/private key pairs
- Free BitBucket and GitHub educational accounts
- Free GitHub Education Pack (as needed)
- Free Chocolatey package manager for Windows
- Free Notepad++ text editor
- Free Visual Studio Code (VS Code) integrated development environment
- Free VS Code Extenstion: Java Extension Pack integrated development environment
- Free Python programming language
- Free Java OpenJDK programming language and platform
- Free Scala programming language
- Free, open-source Apache Zookeeper, Kafka, Spark, Beam, and other tools and libraries as directed by the instructor.
- Access to our Proxmox Virtual Environment
- Access to free cloud accounts including Amazon Web Services (AWS), Google Cloud Platform (GCP), IBM Cloud Free, Microsoft Azure, and Oracle Cloud as required.
- Typing is a foundational skill for developing software systems. If additional practice would be helpful, try https://www.typing.com/.