A Brief Introduction to Julia

Advanced Statistical Computing

Joong-Ho Won

Seoul National University

September 2023

A quick CS101

What is a computer?

Modern computer can be defined as

machines that store and manipulate information under the control of a changeable program.

Two key elements:

  • Machines for manipulating information
  • Control by a changeable program

What is a computer program?

  • A detailed, step-by-step set of instructions telling a computer what to do.

  • If we change the program, the computer performs a different set of actions or a different task.

  • The machine stays the same, but the program changes!

  • Programs are executed, or carried out.

  • Software (programs) rule the hardware (the physical machine).
  • The process of creating this software is called programming.

Algorithm

  • One way to show a particular problem can be solved is to actually design a solution.

  • This is done by developing an algorithm, a step-by-step process for achieving the desired result.

Hardware Basics

  • The central processing unit (CPU) is the “brain” of a computer.
    • The CPU carries out all the basic operations on the data.
    • Examples: simple arithmetic operations, testing to see if two numbers are equal.
  • Memory stores programs and data.
    • CPU can only directly access information stored in main memory (RAM or Random Access Memory).
    • Main memory is fast, but volatile, i.e. when the power is interrupted, the contents of memory are lost.
    • Secondary memory provides more permanent storage: magnetic (hard drive, floppy), optical (CD, DVD), flash (SSD)
  • I/O
    • Input devices: Information is passed to the computer through keyboards, mice, etc.
    • Output devices: Processed information is presented to the user through the monitor, printer, etc.
  • Fetch-Execute Cycle
    • First instruction retrieved from memory
    • Decode the instruction to see what it represents
    • Appropriate action carried out.
    • Next instruction fetched, decoded, and executed.

Programming Languages

Natural language has ambiguity and imprecision problems when used to describe complex algorithms.

  • Programs are expressed in an unambiguous, precise way using programming languages.
  • Every structure in programming language has a precise form, called its syntax.
  • Every structure in programming language has a precise meaning, called its semantics.
  • Programmers will often refer to their program as computer code.
  • Process of writing an algorithm in a programming language often called coding.

Low-level Languages

Computer hardware can only understand a very low level language known as machine language.

  • Example: add two numbers in ARM assembly
    @ Load the first number in memory address 0x2021 into register R0
    ldr r0, 0x2021
    ldr r0, [r0]

    @ Load the second number in memory address 0x2022 into register R1
    ldr r1, 0x2022
    ldr r1, [r1]

    @ Add the numbers in R0 and R1 and store the result in R2
    add r2, r0, r1

    @ Store the result in memory address 0x2023
    ldr r3, 0x2023
    str r2, [r3]

In reality, these low-level instructions are represented in binary (1’s and 0’s)

High-level Languages

Designed to be used and understood by humans

c = a + b
  • This needs to be translated into the machine language that the computer can execute.
  • Compilers convert programs written in a high-level language (source code) into the target machine language.
    • Fortran, C/C++, Java, Go, Swift
  • Interpreters simulate a computer that understands a high-level language.
  • The source code is not translated into machine language all at once.
  • An interpreter analyzes and executes the source code instruction by instruction.
  • Interpreted languages are part of a more flexible programming environment since they can be developed and run interactively, at the expense of slower execution time than compiled languages.
    • Commandline shells, Lisp, R, Python, Julia

Dynamic Languages

  • A dynamic programming language is a class of high-level programming languages, which at runtime can change the behavior of the program by adding new code.

    • Also called a scripting language
  • Most dynamic languages are interpreted languages, providing an interactive read-eval-print loop (REPL).

  • Traditional interpreters parse the source code into an intermediate representation (IR) such as an abstract syntax tree (AST) and execute predetermined routines.

Just-in-time Compilation

  • Modern interpreters compiles the parsed IR to native machine code at runtime. If the same part of the source code (e.g., function) is executed again, then the compiled native code is executed. This technique is called just-in-time (JIT) compilation.
    • For subsequent uses (e.g., calling the function within a loop), the speedup is significant.
  • More and more interpreter languages are adopting JIT technology: R (version 3.4+), MATLAB (R2015b+), Python (PyPy), Julia, …
  • Distinction between complier and interpreter languages is getting blurred due to improved computation capabilities of the modern hardware and advanced compiler techniques.

Julia

What’s Julia?

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments.

  • Project started in 2009. First public release in 2012
    • Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah
  • First major release: v1.0 on Aug 8, 2018
  • Current stable release: v1.9.3 (August 24, 2023)
  • Aim to solve the notorious two language problem: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++.

  • Julia aims to:

Walks like Python. Runs like C.

  • Write high-level, abstract code that closely resembles mathematical formulas
    • yet produces fast, low-level machine code that has traditionally only been generated by static languages.

https://julialang.org/benchmarks/

Some basic Julia code

versioninfo()
Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
  Threads: 2 on 8 virtual cores
using Pkg
Pkg.activate(pwd())
Pkg.instantiate()
Pkg.status()
Status `~/Dropbox/class/M1399.000200/2023/M1300_000200-2023fall/lectures/02-juliaintro/Project.toml` (empty project)
using LinearAlgebra, Random
# an integer, same as int in R
y = 1
typeof(y) 
Int64
# a Float64 number, same as double in R
y = 1.0
typeof(y) 
Float64
# Greek letters:  `\pi<tab>`
π
π = 3.1415926535897...
typeof(π)
Irrational{:π}
# Greek letters:  `\theta<tab>`
θ = y + π
4.141592653589793
# emoji! `\:kissing_cat:<tab>`
😽 = 5.0
5.0
# `\alpha<tab>\hat<tab>`
α̂ = π
π = 3.1415926535897...
# vector of Float64 0s
x = zeros(5)
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
# vector Int64 0s
x = zeros(Int, 5)
5-element Vector{Int64}:
 0
 0
 0
 0
 0
# matrix of Float64 0s
x = zeros(5, 3)
5×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
# matrix of Float64 1s
x = ones(5, 3)
5×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
# define array without initialization
x = Matrix{Float64}(undef, 5, 3)
5×3 Matrix{Float64}:
 1.0e-323  5.0e-324  0.0
 5.0e-324  5.0e-324  0.0
 5.0e-324  0.0       0.0
 5.0e-324  0.0       0.0
 5.0e-324  0.0       2.70751e-314
# fill a matrix by 0s
fill!(x, 0)
5×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
x
5×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
# initialize an array to be constant 2.5
fill(2.5, (5, 3))
5×3 Matrix{Float64}:
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
# rational number
a = 3//5
3//5
typeof(a)
Rational{Int64}
b = 3//7
3//7
a + b
36//35
# uniform [0, 1) random numbers
x = rand(5, 3)
5×3 Matrix{Float64}:
 0.546293   0.12138    0.425512
 0.826908   0.419831   0.227693
 0.0632249  0.0312031  0.309937
 0.816833   0.296741   0.656396
 0.358824   0.257185   0.863171
# uniform random numbers (in Float16)
x = rand(Float16, 5, 3)
5×3 Matrix{Float16}:
 0.539   0.5933    0.1494
 0.2935  0.332     0.3433
 0.4653  0.009766  0.4824
 0.968   0.6455    0.6074
 0.438   0.667     0.953
# random numbers from {1,...,5}
x = rand(1:5, 5, 3)
5×3 Matrix{Int64}:
 3  5  2
 5  5  3
 2  3  1
 4  5  3
 4  2  3
# standard normal random numbers
x = randn(5, 3)
5×3 Matrix{Float64}:
 -0.498406    0.353918   -0.86567
 -0.273643    0.0195263   0.0394816
  0.392344   -1.4122      0.00153266
 -0.131863    0.0317344  -0.581209
  0.0795573   0.548513   -0.637268
# range
1:10
1:10
typeof(1:10)
UnitRange{Int64}
1:2:10
1:2:9
typeof(1:2:10)
StepRange{Int64, Int64}
# integers 1-10
x = collect(1:10)
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
# or equivalently
[1:10...]
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
# Float64 numbers 1-10
x = collect(1.0:10)
10-element Vector{Float64}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0
# convert to a specific type
convert(Vector{Float64}, 1:10)
10-element Vector{Float64}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0

Matrices and vectors

Dimensions

x = randn(5, 3)
5×3 Matrix{Float64}:
  1.30281    1.26887     0.166915
 -1.06464    0.584751   -0.303207
  1.64798    0.0813088  -0.00574229
  0.685489   0.448753   -0.658642
  1.22546   -0.0591479  -1.21465
size(x)
(5, 3)
size(x, 1) # nrow() in R
5
size(x, 2) # ncol() in R
3
# total number of elements
length(x)
15

Indexing

# 5 × 5 matrix of random Normal(0, 1)
x = randn(5, 5)
5×5 Matrix{Float64}:
 -0.0128921  -0.404395   0.413536   1.85197   -0.516304
 -0.757414    0.36058   -1.0561     0.880256  -1.78939
 -1.76583     0.416184   0.577176  -0.068931  -0.903778
  2.19384    -1.13248    0.865923  -0.183744  -0.039366
  0.837032   -1.10723   -0.299019  -0.485402  -0.234505
# first column
x[:, 1]
5-element Vector{Float64}:
 -0.012892124457049585
 -0.7574135002536205
 -1.7658320314322546
  2.193843116801805
  0.8370324391285153
# first row
x[1, :]
5-element Vector{Float64}:
 -0.012892124457049585
 -0.4043948586638572
  0.4135355453571607
  1.8519652247599696
 -0.5163040102447345
# sub-array
x[1:2, 2:3]
2×2 Matrix{Float64}:
 -0.404395   0.413536
  0.36058   -1.0561
# getting a subset of a matrix creates a copy, but you can also create "views"
z = view(x, 1:2, 2:3)
2×2 view(::Matrix{Float64}, 1:2, 2:3) with eltype Float64:
 -0.404395   0.413536
  0.36058   -1.0561
# same as
@views z = x[1:2, 2:3]
2×2 view(::Matrix{Float64}, 1:2, 2:3) with eltype Float64:
 -0.404395   0.413536
  0.36058   -1.0561
# change in z (view) changes x as well
z[2, 2] = 0.0
x
5×5 Matrix{Float64}:
 -0.0128921  -0.404395   0.413536   1.85197   -0.516304
 -0.757414    0.36058    0.0        0.880256  -1.78939
 -1.76583     0.416184   0.577176  -0.068931  -0.903778
  2.19384    -1.13248    0.865923  -0.183744  -0.039366
  0.837032   -1.10723   -0.299019  -0.485402  -0.234505
# y points to same data as x
y = x
5×5 Matrix{Float64}:
 -0.0128921  -0.404395   0.413536   1.85197   -0.516304
 -0.757414    0.36058    0.0        0.880256  -1.78939
 -1.76583     0.416184   0.577176  -0.068931  -0.903778
  2.19384    -1.13248    0.865923  -0.183744  -0.039366
  0.837032   -1.10723   -0.299019  -0.485402  -0.234505
# x and y point to same data
pointer(x), pointer(y)
(Ptr{Float64} @0x000000010ffe2f40, Ptr{Float64} @0x000000010ffe2f40)
# changing y also changes x
y[:, 1] .= 0  # Dot broadcasting: "vectorization" in Julia. More below
x
5×5 Matrix{Float64}:
 0.0  -0.404395   0.413536   1.85197   -0.516304
 0.0   0.36058    0.0        0.880256  -1.78939
 0.0   0.416184   0.577176  -0.068931  -0.903778
 0.0  -1.13248    0.865923  -0.183744  -0.039366
 0.0  -1.10723   -0.299019  -0.485402  -0.234505
# create a new copy of data
z = copy(x)
5×5 Matrix{Float64}:
 0.0  -0.404395   0.413536   1.85197   -0.516304
 0.0   0.36058    0.0        0.880256  -1.78939
 0.0   0.416184   0.577176  -0.068931  -0.903778
 0.0  -1.13248    0.865923  -0.183744  -0.039366
 0.0  -1.10723   -0.299019  -0.485402  -0.234505
pointer(x), pointer(z)  # they should be different now
(Ptr{Float64} @0x000000010ffe2f40, Ptr{Float64} @0x000000010f777940)

Concatenate matrices

# 1-by-3 array
[1 2 3]
1×3 Matrix{Int64}:
 1  2  3
# 3-by-1 vector
[1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3
# multiple assignment by tuple
x, y, z = randn(5, 3), randn(5, 2), randn(3, 5)
([-1.4999732352895327 1.773939277444214 0.6715225956302983; -1.3849842267185073 0.1629035112557726 0.3562080533137759; … ; -1.6065912870082713 0.30084783005095794 0.7706189876690998; 0.5121773065548649 -0.964687566685388 0.3658354011165104], [-0.4902326192447578 0.6738040576465752; 0.1070972886226452 -0.37660032519874775; … ; 0.9443998042244898 -0.1485566612569545; -0.8183848793856123 0.2236238055915957], [-1.2829660735274226 1.1957527423128549 … -0.11357622839848679 0.830600133971669; 1.7451448614828269 0.05526390872932643 … -0.7093502689866387 -0.21436808183399167; 0.30009810615719884 -1.1237165220676726 … 0.6093291761311854 -1.778493587414965])
[x y] # 5-by-5 matrix
5×5 Matrix{Float64}:
 -1.49997    1.77394   0.671523  -0.490233   0.673804
 -1.38498    0.162904  0.356208   0.107097  -0.3766
  0.391741  -1.1743    0.722154   0.55271    0.0150385
 -1.60659    0.300848  0.770619   0.9444    -0.148557
  0.512177  -0.964688  0.365835  -0.818385   0.223624
[x y; z] # 8-by-5 matrix
8×5 Matrix{Float64}:
 -1.49997    1.77394     0.671523  -0.490233   0.673804
 -1.38498    0.162904    0.356208   0.107097  -0.3766
  0.391741  -1.1743      0.722154   0.55271    0.0150385
 -1.60659    0.300848    0.770619   0.9444    -0.148557
  0.512177  -0.964688    0.365835  -0.818385   0.223624
 -1.28297    1.19575    -1.96595   -0.113576   0.8306
  1.74514    0.0552639  -1.68426   -0.70935   -0.214368
  0.300098  -1.12372    -0.270963   0.609329  -1.77849

Dot operation

In Julia, any function f(x) can be applied elementwise to an array X with the “dot call” syntax f.(X).

x = randn(5, 3)
5×3 Matrix{Float64}:
  0.644836    0.411596  -1.22524
  0.0114805   0.768921  -0.601454
  0.723059   -0.914186  -2.12972
  0.226376   -2.49521    0.222758
 -0.435674    0.109742  -1.4867
y = ones(5, 3)
5×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
x .* y # same as x * y in R
5×3 Matrix{Float64}:
  0.644836    0.411596  -1.22524
  0.0114805   0.768921  -0.601454
  0.723059   -0.914186  -2.12972
  0.226376   -2.49521    0.222758
 -0.435674    0.109742  -1.4867
x .^ (-2) # same as x^(-2) in R
5×3 Matrix{Float64}:
    2.40493   5.90281    0.666123
 7587.2       1.69136    2.76437
    1.91273   1.19655    0.220473
   19.5136    0.160615  20.1526
    5.26838  83.0337     0.452433
sin.(x)  # same as sin(x) in R
5×3 Matrix{Float64}:
  0.601067    0.400072  -0.940889
  0.0114802   0.69536   -0.565842
  0.661681   -0.792066  -0.847828
  0.224448   -0.602302   0.220921
 -0.422021    0.109522  -0.996466

Basic linear algebra

x = randn(5)
5-element Vector{Float64}:
 -0.6337948981039522
 -1.235717874501824
  0.43117311951580656
  1.6681744799753075
  0.05806648707453217
norm(x) # vector L2 norm
2.2137711511764175
# same as
sqrt(sum(abs2, x))
2.2137711511764175
y = randn(5) # another vector
# dot product
dot(x, y) # x' * y
2.9145416266177078
# same as
x'y
2.9145416266177078
x, y = randn(5, 3), randn(3, 2)
# matrix multiplication, same as %*% in R
x * y
5×2 Matrix{Float64}:
  0.380833   0.216444
 -0.287348  -0.349765
 -2.20059    0.091639
 -0.901147   0.295597
  1.15496    0.157905
x = randn(3, 3)
3×3 Matrix{Float64}:
  0.963502   0.447377   0.221476
 -1.90163    0.304961   1.04087
 -0.158023  -1.56978   -2.06578
# conjugate transpose
x'
3×3 adjoint(::Matrix{Float64}) with eltype Float64:
 0.963502  -1.90163   -0.158023
 0.447377   0.304961  -1.56978
 0.221476   1.04087   -2.06578
b = rand(3)
x'b # same as x' * b
3-element Vector{Float64}:
 -0.6518796533083008
 -0.9397178684749301
 -1.082560937691874
# trace
tr(x)
-0.7973130652999001
det(x)
-0.19191043271882907
rank(x)
3

Sparse matrices

using SparseArrays

# 10-by-10 sparse matrix with sparsity 0.1
X = sprandn(10, 10, .1)
10×10 SparseMatrixCSC{Float64, Int64} with 8 stored entries:
  ⋅     ⋅         ⋅     ⋅         ⋅        …   ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅    0.166671   ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅   -0.513767   ⋅     ⋅         ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅   -0.480349   ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅        …   ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅   -1.22357   ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅     ⋅       2.21861    ⋅ 
  ⋅     ⋅         ⋅     ⋅        0.728669      ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅     ⋅       0.802902   ⋅ 
# convert to dense matrix; be cautious when dealing with big data
Xfull = convert(Matrix{Float64}, X)
10×10 Matrix{Float64}:
 0.0   0.0       0.0   0.0       0.0       …  0.0   0.0      0.0       0.0
 0.0   0.0       0.0   0.166671  0.0          0.0   0.0      0.0       0.0
 0.0  -0.513767  0.0   0.0       0.0          0.0   0.0      0.0       0.0
 0.0   0.0       0.0   0.0       0.0          0.0   0.0      0.0       0.0
 0.0   0.0       0.0  -0.480349  0.0          0.0   0.0      0.0       0.0
 0.0   0.0       0.0   0.0       0.0       …  0.0   0.0      0.0       0.0
 0.0   0.0       0.0   0.0       0.0          0.0  -1.22357  0.0       0.0
 0.0   0.0       0.0   0.0       0.0          0.0   0.0      2.21861   0.0
 0.0   0.0       0.0   0.0       0.728669     0.0   0.0      0.0       0.0
 0.0   0.0       0.0   0.0       0.0          0.0   0.0      0.802902  0.0
# convert a dense matrix to sparse matrix
sparse(Xfull)
10×10 SparseMatrixCSC{Float64, Int64} with 8 stored entries:
  ⋅     ⋅         ⋅     ⋅         ⋅        …   ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅    0.166671   ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅   -0.513767   ⋅     ⋅         ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅   -0.480349   ⋅            ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅        …   ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅   -1.22357   ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅     ⋅       2.21861    ⋅ 
  ⋅     ⋅         ⋅     ⋅        0.728669      ⋅     ⋅        ⋅         ⋅ 
  ⋅     ⋅         ⋅     ⋅         ⋅            ⋅     ⋅       0.802902   ⋅ 
# syntax for sparse linear algebra is the same as dense linear algebra
β = ones(10)
X * β
10-element Vector{Float64}:
  0.0
  0.16667062540536595
 -0.5137670762120491
  0.0
 -0.48034908102608775
  0.0
 -1.2235741749446545
  3.9995698640929875
  0.7286685000567706
  0.8029015444725927
# many functions apply to sparse matrices as well
sum(X)
3.480120201844925

Control flow and loops

  • if-elseif-else-end
if condition1
    # do something
elseif condition2
    # do something
else
    # do something
end
  • for loop
for i in 1:10
    println(i)
end
  • Nested for loop:
for i in 1:10
    for j in 1:5
        println(i * j)
    end
end

Same as

for i in 1:10, j in 1:5
    println(i * j)
end
  • Exit loop:
for i in 1:10
    # do something
    if condition1
        break # skip remaining loop
    end
end
  • Exit iteration:
for i in 1:10
    # do something
    if condition1
        continue # skip to next iteration
    end
    # do something
end

Functions

  • Function definition
function func(req1, req2; key1=dflt1, key2=dflt2)
    # do stuff
    return out1, out2, out3
end
  • Required arguments are separated with a comma and use the positional notation.
  • Optional arguments need a default value in the signature.
  • return statement is optional (value of the last expression is the return value, like R).
  • Multiple outputs can be returned as a tuple, e.g., return out1, out2, out3.
  • In Julia, all arguments to functions are passed by reference, in contrast to R and Matlab (which use pass by value).
    • Implication: function arguments can be modified inside the function.
  • By convention function names ending with ! indicates that function mutates at least one argument, typically the first.
sort!(x) # vs sort(x)
  • Anonymous functions, e.g., x -> x^2, is commonly used in collection function or list comprehensions.
map(x -> x^2, y) # square each element in x
  • Functions can be nested:
function outerfunction()
    # do some outer stuff
    function innerfunction()
        # do inner stuff
        # can access prior outer definitions
    end
    # do more outer stuff
end
  • Functions can be vectorized using the “dot call” syntax:
function myfunc(x)
    return sin(x^2)
end

x = randn(5, 3)
myfunc.(x)
5×3 Matrix{Float64}:
 0.793824    0.435542    0.79834
 0.00213892  0.155892    0.22172
 0.0463857   0.732109    0.981782
 0.973942    0.23271     0.0561392
 0.709574    0.00167153  0.98742

Collection function

Like a series of apply functions in R.

Apply a function to each element of a collection:

map(f, coll) # or
map(coll) do elem
    # do stuff with elem
    # must contain return
end

Alternative “vectorization”:

map(x -> sin(x^2), x)   # same as above
5×3 Matrix{Float64}:
 0.793824    0.435542    0.79834
 0.00213892  0.155892    0.22172
 0.0463857   0.732109    0.981782
 0.973942    0.23271     0.0561392
 0.709574    0.00167153  0.98742
map(x) do elem   # long version of above
    elem = elem^2
    return sin(elem)
end
5×3 Matrix{Float64}:
 0.793824    0.435542    0.79834
 0.00213892  0.155892    0.22172
 0.0463857   0.732109    0.981782
 0.973942    0.23271     0.0561392
 0.709574    0.00167153  0.98742
# Mapreduce
mapreduce(x -> sin(x^2), +, x)   # mapreduce(mapper, reducer, data)
7.129189711913572
# same as
sum(x -> sin(x^2), x)
7.129189711913572

List comprehension

[sin(2i + j) for i in 1:5, j in 1:3] # similar to Python
5×3 Matrix{Float64}:
  0.14112   -0.756802  -0.958924
 -0.958924  -0.279415   0.656987
  0.656987   0.989358   0.412118
  0.412118  -0.544021  -0.99999
 -0.99999   -0.536573   0.420167

Type system

  • Every variable in Julia has a type.

  • Everything is a subtype of the abstract type Any.

  • An abstract type defines a set of types.

  • Consider types in Julia that are a Number:

https://en.wikibooks.org/wiki/Introducing_Julia/Types

  • We can explore type hierarchy with typeof(), supertype(), and subtypes().
typeof(1.0), typeof(1)
(Float64, Int64)
supertype(Float64)
AbstractFloat
subtypes(AbstractFloat)
4-element Vector{Any}:
 BigFloat
 Float16
 Float32
 Float64
# Is Float64 a subtype of AbstractFloat?
Float64 <: AbstractFloat
true
# On 64bit machine, Int == Int64
Int == Int64
true
# convert to Float64
convert(Float64, 1)
1.0
# same as
Float64(1)
1.0
# Float32 vector
x = randn(Float32, 5)
5-element Vector{Float32}:
 -0.8045918
  0.7131479
 -0.36245242
 -0.5594625
 -0.19341426
# convert to Float64
convert(Array{Float64}, x)
5-element Vector{Float64}:
 -0.8045917749404907
  0.7131478786468506
 -0.3624524176120758
 -0.5594624876976013
 -0.19341425597667694
# same as
Float64.(x)
5-element Vector{Float64}:
 -0.8045917749404907
  0.7131478786468506
 -0.3624524176120758
 -0.5594624876976013
 -0.19341425597667694
# convert Float64 to Int64
convert(Int, 1.0)
1
convert(Int, 1.5) # should use round(1.5)
LoadError: InexactError: Int64(1.5)
round(Int, 1.5)
2

Multiple dispatch

  • Multiple dispatch is a feature of some programming languages in which a function or method can be dynamically dispatched based on the run time (dynamic) type or, in the more general case, some other attribute of more than one of its arguments.

  • Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.

  • In Juila, methods belong to functions, called generic functions.

  • Let’s consider a simple “doubling” function:
g(x) = x + x
g (generic function with 1 method)
  • This definition is too broad, since some things, e.g., strings, cannot be added.
g(1.5)
3.0
g("hello world")
LoadError: MethodError: no method matching +(::String, ::String)
String concatenation is performed with * (See also: https://docs.julialang.org/en/v1/manual/strings/#man-concatenation).

Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...)
   @ Base operators.jl:578
g(x::Float64) = x + x
g (generic function with 2 methods)
  • This definition is correct but too restrictive, since any Number can be added.
g(x::Number) = x + x
g (generic function with 3 methods)
  • This definition will automatically work on the entire type tree above!

  • A lot nicer than

function g(x)
    if isa(x, Number)
        return x + x
    else
        throw(ArgumentError("x should be a number"))
    end
end

methods(func) displays all methods defined for func.

methods(g)
# 3 methods for generic function g from Main:
  • g(x::Float64) in Main at In[101]:1
  • g(x::Number) in Main at In[102]:1
  • g(x) in Main at In[98]:1

When calling a function with multiple definitions, Julia will search from the narrowest signature to the broadest signature.

  • @which func(x) marco tells which method is being used for argument signature x.
# an Int64 input
@which g(1)
g(x::Number) in Main at In[102]:1
@which g(1.0)
g(x::Float64) in Main at In[101]:1
# a Vector{Float64} input
@which g(randn(5))
g(x) in Main at In[98]:1

JIT

  • Julia’s efficiency results from its capability to infer the types of all variables within a function and then call LLVM (compiler) to generate optimized machine code at run-time.

Consider the g (doubling) function defined earlier. This function will work on any type which has a method for +.

g(2), g(2.0)
(4, 4.0)

Step 1: Parse Julia code into AST.

@code_lowered g(2)
CodeInfo(
1 ─ %1 = x + x
└──      return %1
)

Step 2: Type inference according to input type.

@code_warntype g(2)
MethodInstance for g(::Int64)
  from g(x::Number) @ Main In[102]:1
Arguments
  #self#::Core.Const(g)
  x::Int64
Body::Int64
1 ─ %1 = (x + x)::Int64
└──      return %1
@code_warntype g(2.0)
MethodInstance for g(::Float64)
  from g(x::Float64) @ Main In[101]:1
Arguments
  #self#::Core.Const(g)
  x::Float64
Body::Float64
1 ─ %1 = (x + x)::Float64
└──      return %1

Step 3: Compile into LLVM bytecode

@code_llvm g(2)
;  @ In[102]:1 within `g`
define i64 @julia_g_5071(i64 signext %0) #0 {
top:
; ┌ @ int.jl:87 within `+`
   %1 = shl i64 %0, 1
; └
  ret i64 %1
}
@code_llvm g(2.0)
;  @ In[101]:1 within `g`
define double @julia_g_5094(double %0) #0 {
top:
; ┌ @ float.jl:408 within `+`
   %1 = fadd double %0, %0
; └
  ret double %1
}

We didn’t provide a type annotation. But different LLVM code gets generated depending on the argument type!

  • In R or Python, g(2) and g(2.0) would use the same code for both.

  • In Julia, g(2) and g(2.0) dispatch to optimized code for Int64 and Float64, respectively.

For integer input x, LLVM compiler is smart enough to know x + x is simple shifting x by 1 bit, which is faster than addition.

  • Step 4: Lowest level is the assembly code, which is machine dependent.
@code_native g(2)
    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 13, 0
    .globl  _julia_g_5110                   ## -- Begin function julia_g_5110
    .p2align    4, 0x90
_julia_g_5110:                          ## @julia_g_5110
; ┌ @ In[102]:1 within `g`
    .cfi_startproc
## %bb.0:                               ## %top
; │┌ @ int.jl:87 within `+`
    leaq    (%rdi,%rdi), %rax
; │└
    retq
    .cfi_endproc
; └
                                        ## -- End function
.subsections_via_symbols

1st instruction adds the content of the general purpose 64-bit register (a small memory inside the CPU) RDI to itself, and load the result into another register RAX. The addition here is the integer arithmetic.

@code_native g(2.0)
    .section    __TEXT,__text,regular,pure_instructions
    .build_version macos, 13, 0
    .globl  _julia_g_5114                   ## -- Begin function julia_g_5114
    .p2align    4, 0x90
_julia_g_5114:                          ## @julia_g_5114
; ┌ @ In[101]:1 within `g`
    .cfi_startproc
## %bb.0:                               ## %top
; │┌ @ float.jl:408 within `+`
    vaddsd  %xmm0, %xmm0, %xmm0
; │└
    retq
    .cfi_endproc
; └
                                        ## -- End function
.subsections_via_symbols