$Id: plan.txt,v 1.3 2001/10/16 08:57:23 mika Exp mika $ CS 2 2001-2002 ============= I. The official opinion CS 2. Introduction to Data Structures and Algorithms. 9 units (2-4-3); second term. Prerequisite: CS 1 or equivalent. CS 2 is a challenging course in programming languages and computer science, emphasizing modes of algorithmic expression. The course will include such topics as performance analysis of algorithms; proofs of program correctness; recursive and higher-order procedures; data structures, including lists, trees, graphs, and arrays; objects and abstract data types. The course includes weekly laboratory exercises and written homework covering the lecture material and program design. Instructor: Staff. II. My opinion CS 2 is a class about basic computing structures. It is directed to all freshmen and perhaps some others. Because of the diversity of the students, a normal computer-science "Data Structures and Algorithms" course would be inappropriate. But we cannot simply assume that CS 2 is the last computer science course that the students will take, either. It must be comprehensive, yet make sense. Having read through what I have written and having discussed it with several people, I am struck by what appears to be an implicit goal of CS 1/2/3: we seem to be heading towards a sequence of classes that comprehensively covers all the basics of computer science. This is why I added parsing to CS 2, for instance. I think that comprehensively covering all that I have written below in CS 2 will be out of the question. Some things may come out in homework assignments, others may only get covered cursorily, yet others may have to be dropped altogether. CS 2 will cover fundamental data structures and algorithms, but that cannot be its only focus. We must also talk about "software engineering": things like object-oriented programming, separation of interface and implementation. Algorithms and data structures make sense without "software engineering," but the converse is not true (as any physicist can tell you). Hence, algorithms must come before software engineering. But how to make the leap from algorithms to software engineering? The only way that makes sense to me is to introduce software engineering as a way of designing programs so that they are easier to understand; i.e., so that they are easier to find (prove) correct. Therefore, the sequence must be: 1. Introduction to imperative programming. 2. Basic data types; ARRAYs; RECORDs; memory allocation; references. 3. Algorithms and data structures -- the basics not covered in CS 1. 3a. Review of the basic model of computing. Time and space complexity of algorithms. (Some of this depending on what gets covered in CS 1.) 4. Some basic correctness proofs -- Hoare triples, wp (weakest preconditions), loop invariants. 5. Modular programming -- interface vs. implementation. The following I am not so sure about the order of: 6. Memory management (i.e., deallocation), garbage collection. 7. "Real-time" programming (e.g., user-interface programming): concurrency; threads, mutexes/semaphores, deadlock. 8. Object-oriented program construction: data and code encapsulation and abstraction, polymorphism, inheritance. Multiple interfaces to a single implementation; the "friends" concept. 9. "Constrained data types": generics. 10. Introduction to parsing; regular expressions, BNFs? (not too much detail) Many of these things go hand in hand. For instance, it makes sense to present certain common data structures together with the modularity principles: here we can talk about abstract data types. I am not sure if there should be a division of labor between lectures and homeworks, or if the concepts should be strictly sequenced. Existing texts that I believe the class will have access to (see below) will cover sections 1., 2., 3., 3a., 5., 7., 8. Some of the texts are a bit impenetrable and will have to be complemented with extra information. Some of the topics are close to things from CS 1 and CS 20. These (e.g., parsing) will be first on the list of what should be dropped to make the class fit in a term. Some of the topics are clearly more issues of implementation: these topics may have interesting theoretical ramifications, but those would get us too far afield, for instance all the things about NFAs, DFAs, etc., that are related to parsing. If a topic is more about implementation, then it might be possible to handle it through the homeworks rather than having lectures about it. I still do not know whether recitation instructors (i.e., a third hour of lecture a week) will be in the budget. If they are, the language issues can be separated from the main lecture track and be put in the recitations. Textbooks ========= There is no book that I know of that covers 1-10. There is no good book that covers more than about half the topics. Right now, this is my plan: First, a good language-neutral book on algorithms. This should be a book that a student could use as a reference in later years. I am considering the following books for this role: 1. D. E. Knuth, The Art of Computer Programming, vols. 1-3, or vol. 1 and 3, or just vol. 1. Originally published in 1967. Advantages: A real encyclopedia of information. Very useful for students that end up in C.S. Also a pleasant book to read. Disadvantages: Potentially expensive (if all three volumes), idiosyncratic beyond belief, old-fashioned (MIX). 2. Aho, Hopcroft, and Ullman. The Design and Analysis of Computer Algorithms. Originally published in 1974. Advantages: Also an encyclopedia, albeit smaller than 1 (and much better organized). Also very useful, comprehensive, and authoritative. Most highly rated algorithms book. Most sources consider it to be much better than the more down-to-earth "Data Structures and Algorithms" by the same authors. Disadvantages: Old-fashioned but not nearly as much as Knuth (uses "pidgin ALGOL" for the examples). Less enjoyable to read (denser, more mathematical, more impersonal). Has a few sections on "hot topics" from the 70s that really are not as hot these days (e.g., FFTs and Schoenhage-Strassen multiplication). 3. Horowitz and Sahni. Data Structures in Pascal. Originally published in 1984. Advantages: Much easier going and more practical than 1. and 2. Decent algorithms cookbook. Disadvantages: Not so authoritative; uses Pascal, which may limit the expression of some of the algorithms. The rest of the books I would consider are out of print (e.g., all of Sedgewick's books that use a language other than C or C++). Books 1. and 2. are Great Classics of Computer Science. Currently, I am thinking 2. will be the choice, but if it turns out that so little of it gets used that it is all in Knuth's vol. 1, we can go with Knuth instead. (I do not think this will be the case.) Secondly, some resources on writing modular and correct programs. I would not make too much of this: the underlying principles are really quite simple and should not be clouded in jargon. Most of this could come through assignments and outside reading, e.g., case studies of existing well-designed code and a few tech reports I know about that study the issue (DECSRC published a ton of these). I would emphasize tangible ways of improving program modularity, cross-referencing it with Hoare triples and other such ideas (rather than the intangibles that people often get stuck in); perhaps I would bring in automatic program verifiers or checkers (e.g., SRC's Extended Static Checker, freely available in binary form for Windows and Digital Unix for checking Java and Modula-3 programs). Lastly, a programming language and associated documentation. I think Modula-3 will have to be the language of CS 2. The only other language I can think of with the flexibility to do both traditional imperative programming and object-oriented programming is C++, and that is out of the question considering the following: safety, syntax, complexity. One more book will hence be required: Harbison. Modula-3. I shall print the Modula-3 Report (available from SRC) and distribute it to the class as well. Most of Greg Nelson's book "Systems Programming With Modula-3" (which by the way addresses many issues in object-oriented programming as well) is also available from SRC.