A Primer for Computational Biology aims to provide life scientists and students the skills necessary for research in a data-rich world. The text covers accessing and using remote servers via the command-line, writing programs and pipelines for data analysis, and provides useful vocabulary for interdisciplinary work. The book is broken into three parts:
- Introduction to Unix/Linux: The command-line is the “natural environment” of scientific computing, and this part covers a wide range of topics, including logging in, working with files and directories, installing programs and writing scripts, and the powerful “pipe” operator for file and data manipulation.
- Programming in Python: Python is both a premier language for learning and a common choice in scientific software development. This part covers the basic concepts in programming (data types, if-statements and loops, functions) via examples of DNA-sequence analysis. This part also covers more complex subjects in software development such as objects and classes, modules, and APIs.
- Programming in R: The R language specializes in statistical data analysis, and is also quite useful for visualizing large datasets. This third part covers the basics of R as a programming language (data types, if-statements, functions, loops and when to use them) as well as techniques for large-scale, multi-test analyses. Other topics include S3 classes and data visualization with ggplot2.
Shawn T. O’Neil earned a BS in computer science from Northern Michigan University, and later an MS and PhD in the same subject from the University of Notre Dame. His past and current research focuses on bioinformatics. O’Neil has developed and taught several courses in computational biology at both Notre Dame and Oregon State University.
Table of Contents
Preface
Acknowledgements
Dedication
Part I: Introduction to Unix/Linux
Context
Logging In
The Command Line and Filesystem
Working with Files and Directories
Permissions and Executables
Installing (Bioinformatics) Software
Command Line BLAST
The Standard Streams
Sorting, First and Last Lines
Rows and Columns
Patterns (Regular Expressions)
Miscellanea
Part II: Programming in Python
Hello, World
Elementary Data Types
Collections and Looping: Lists and for
File Input and Output
Conditional Control Flow
Python Functions
Command Line Interfacing
Dictionaries
Bioinformatics Knick-knacks and Regular Expressions
Variables and Scope
Objects and Classes
Application Programming Interfaces, Modules, Packages, Syntactic Sugar
Algorithms and Data Structures
Part III: Programming in R
An Introduction
Variables and Data
Vectors
R Functions
Lists and Attributes
Data Frames
Character and Categorical Data
Split, Apply, Combine
Reshaping and Joining Data Frames
Procedural Programming
Objects and Classes in R
Plotting Data and ggplot2
Files
Index
About the Author
Preface
Acknowledgements
Dedication
Part I: Introduction to Unix/Linux
Context
Logging In
The Command Line and Filesystem
Working with Files and Directories
Permissions and Executables
Installing (Bioinformatics) Software
Command Line BLAST
The Standard Streams
Sorting, First and Last Lines
Rows and Columns
Patterns (Regular Expressions)
Miscellanea
Part II: Programming in Python
Hello, World
Elementary Data Types
Collections and Looping: Lists and for
File Input and Output
Conditional Control Flow
Python Functions
Command Line Interfacing
Dictionaries
Bioinformatics Knick-knacks and Regular Expressions
Variables and Scope
Objects and Classes
Application Programming Interfaces, Modules, Packages, Syntactic Sugar
Algorithms and Data Structures
Part III: Programming in R
An Introduction
Variables and Data
Vectors
R Functions
Lists and Attributes
Data Frames
Character and Categorical Data
Split, Apply, Combine
Reshaping and Joining Data Frames
Procedural Programming
Objects and Classes in R
Plotting Data and ggplot2
Files
Index
About the Author