machines that store and manipulate information under the control of a changeable program.
Two key elements:
Machines for manipulating information
Control by a changeable program
What is a computer program?
A detailed, step-by-step set of instructions telling a computer what to do.
If we change the program, the computer performs a different set of actions or a different task.
The machine stays the same, but the program changes!
Programs are executed, or carried out.
Software (programs) rule the hardware (the physical machine).
The process of creating this software is called programming.
Algorithm
One way to show a particular problem can be solved is to actually design a solution.
This is done by developing an algorithm, a step-by-step process for achieving the desired result.
Hardware Basics
The central processing unit (CPU) is the “brain” of a computer.
The CPU carries out all the basic operations on the data.
Examples: simple arithmetic operations, testing to see if two numbers are equal.
Memory stores programs and data.
CPU can only directly access information stored in main memory (RAM or Random Access Memory).
Main memory is fast, but volatile, i.e. when the power is interrupted, the contents of memory are lost.
Secondary memory provides more permanent storage: magnetic (hard drive, floppy), optical (CD, DVD), flash (SSD)
I/O
Input devices: Information is passed to the computer through keyboards, mice, etc.
Output devices: Processed information is presented to the user through the monitor, printer, etc.
Fetch-Execute Cycle
First instruction retrieved from memory
Decode the instruction to see what it represents
Appropriate action carried out.
Next instruction fetched, decoded, and executed.
Programming Languages
Natural language has ambiguity and imprecision problems when used to describe complex algorithms.
Programs are expressed in an unambiguous, precise way using programming languages.
Every structure in programming language has a precise form, called its syntax.
Every structure in programming language has a precise meaning, called its semantics.
Programmers will often refer to their program as computer code.
Process of writing an algorithm in a programming language often called coding.
Low-level Languages
Computer hardware can only understand a very low level language known as machine language.
Example: add two numbers in ARM assembly
@ Load the first number in memory address 0x2021 into register R0
ldr r0, 0x2021
ldr r0, [r0]
@ Load the second number in memory address 0x2022 into register R1
ldr r1, 0x2022
ldr r1, [r1]
@ Add the numbers in R0 and R1 and store the result in R2
add r2, r0, r1
@ Store the result in memory address 0x2023
ldr r3, 0x2023
str r2, [r3]
In reality, these low-level instructions are represented in binary (1’s and 0’s)
High-level Languages
Designed to be used and understood by humans
c = a + b
This needs to be translated into the machine language that the computer can execute.
Compilers convert programs written in a high-level language (source code) into the target machine language.
Fortran, C/C++, Java, Go, Swift
Interpreters simulate a computer that understands a high-level language.
The source code is not translated into machine language all at once.
An interpreter analyzes and executes the source code instruction by instruction.
Interpreted languages are part of a more flexible programming environment since they can be developed and run interactively, at the expense of slower execution time than compiled languages.
Commandline shells, Lisp, R, Python, Julia
Dynamic Languages
A dynamic programming language is a class of high-level programming languages, which at runtime can change the behavior of the program by adding new code.
Also called a scripting language
Most dynamic languages are interpreted languages, providing an interactive read-eval-print loop (REPL).
Traditional interpreters parse the source code into an intermediate representation (IR) such as an abstract syntax tree (AST) and execute predetermined routines.
Just-in-time Compilation
Modern interpreters compiles the parsed IR to native machine code at runtime. If the same part of the source code (e.g., function) is executed again, then the compiled native code is executed. This technique is called just-in-time (JIT) compilation.
For subsequent uses (e.g., calling the function within a loop), the speedup is significant.
More and more interpreter languages are adopting JIT technology: R (version 3.4+), MATLAB (R2015b+), Python (PyPy), Julia, …
Distinction between complier and interpreter languages is getting blurred due to improved computation capabilities of the modern hardware and advanced compiler techniques.
Julia
What’s Julia?
Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.
Project started in 2009. First public release in 2012
Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah
First major release: v1.0 on Aug 8, 2018
Current stable release: v1.9.3 (August 24, 2023)
Aim to solve the notorious two language problem: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++.
Julia aims to:
Walks like Python. Runs like C.
Write high-level, abstract code that closely resembles mathematical formulas
yet produces fast, low-level machine code that has traditionally only been generated by static languages.
Multiple dispatch is a feature of some programming languages in which a function or method can be dynamically dispatched based on the run time (dynamic) type or, in the more general case, some other attribute of more than one of its arguments.
Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.
In Juila, methods belong to functions, called generic functions.
Let’s consider a simple “doubling” function:
g(x) = x + x
g (generic function with 1 method)
This definition is too broad, since some things, e.g., strings, cannot be added.
g(1.5)
3.0
g("hello world")
LoadError: MethodError: no method matching +(::String, ::String)
String concatenation is performed with [36m*[39m (See also: https://docs.julialang.org/en/v1/manual/strings/#man-concatenation).
[0mClosest candidates are:
[0m +(::Any, ::Any, [91m::Any[39m, [91m::Any...[39m)
[0m[90m @[39m [90mBase[39m [90m[4moperators.jl:578[24m[39m
g(x::Float64) = x + x
g (generic function with 2 methods)
This definition is correct but too restrictive, since any Number can be added.
g(x::Number) = x + x
g (generic function with 3 methods)
This definition will automatically work on the entire type tree above!
A lot nicer than
functiong(x)ifisa(x, Number)return x + xelsethrow(ArgumentError("x should be a number"))endend
methods(func) displays all methods defined for func.
methods(g)
# 3 methods for generic function g from [35mMain[39m:
g(x::Float64) in Main at In[101]:1
g(x::Number) in Main at In[102]:1
g(x) in Main at In[98]:1
When calling a function with multiple definitions, Julia will search from the narrowest signature to the broadest signature.
@which func(x) marco tells which method is being used for argument signature x.
Julia’s efficiency results from its capability to infer the types of all variables within a function and then call LLVM (compiler) to generate optimized machine code at run-time.
Consider the g (doubling) function defined earlier. This function will work on any type which has a method for +.
g(2), g(2.0)
(4, 4.0)
Step 1: Parse Julia code into AST.
@code_loweredg(2)
CodeInfo(
1 ─ %1 = x + x
└── return %1
)
Step 2: Type inference according to input type.
@code_warntypeg(2)
MethodInstance for g(::Int64)
from g(x::Number) @ Main In[102]:1
Arguments
#self#::Core.Const(g)
x::Int64
Body::Int64
1 ─ %1 = (x + x)::Int64
└── return %1
@code_warntypeg(2.0)
MethodInstance for g(::Float64)
from g(x::Float64) @ Main In[101]:1
Arguments
#self#::Core.Const(g)
x::Float64
Body::Float64
1 ─ %1 = (x + x)::Float64
└── return %1
; @ In[101]:1 within `g`
define double @julia_g_5094(double %0) #0 {
top:
; ┌ @ float.jl:408 within `+`
%1 = fadd double %0, %0
; └
ret double %1
}
We didn’t provide a type annotation. But different LLVM code gets generated depending on the argument type!
In R or Python, g(2) and g(2.0) would use the same code for both.
In Julia, g(2) and g(2.0) dispatch to optimized code for Int64 and Float64, respectively.
For integer input x, LLVM compiler is smart enough to know x + x is simple shifting x by 1 bit, which is faster than addition.
Step 4: Lowest level is the assembly code, which is machine dependent.
@code_nativeg(2)
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 13, 0
.globl _julia_g_5110 ## -- Begin function julia_g_5110
.p2align 4, 0x90
_julia_g_5110: ## @julia_g_5110
; ┌ @ In[102]:1 within `g`
.cfi_startproc
## %bb.0: ## %top
; │┌ @ int.jl:87 within `+`
leaq (%rdi,%rdi), %rax
; │└
retq
.cfi_endproc
; └
## -- End function
.subsections_via_symbols
1st instruction adds the content of the general purpose 64-bit register (a small memory inside the CPU) RDI to itself, and load the result into another register RAX. The addition here is the integer arithmetic.
@code_nativeg(2.0)
.section __TEXT,__text,regular,pure_instructions
.build_version macos, 13, 0
.globl _julia_g_5114 ## -- Begin function julia_g_5114
.p2align 4, 0x90
_julia_g_5114: ## @julia_g_5114
; ┌ @ In[101]:1 within `g`
.cfi_startproc
## %bb.0: ## %top
; │┌ @ float.jl:408 within `+`
vaddsd %xmm0, %xmm0, %xmm0
; │└
retq
.cfi_endproc
; └
## -- End function
.subsections_via_symbols