diff --git a/lectures/need_for_speed.md b/lectures/need_for_speed.md
index ef4f23ad..65ad6f2c 100644
--- a/lectures/need_for_speed.md
+++ b/lectures/need_for_speed.md
@@ -27,7 +27,7 @@ premature optimization is the root of all evil." -- Donald Knuth
 
 ## Overview
 
-It's probably safe to say that Python is the most popular language for scientific computing.
+Python is the most popular language for many aspects of scientific computing.
 
 This is due to 
 
@@ -74,29 +74,30 @@ Let's briefly review Python's scientific libraries.
 
 ### Why do we need them?
 
-One reason we use scientific libraries is because they implement routines we want to use.
+We need Python's scientific libraries for two reasons:
 
-* numerical integration, interpolation, linear algebra, root finding, etc.
+1. Python is small
+2. Python is slow
+
+**Python in small** 
 
-For example, it's usually better to use an existing routine for root finding than to write a new one from scratch.
+Core python is small by design -- this helps with optimization, accessibility, and maintenance
 
-(For standard algorithms, efficiency is maximized if the community can
-coordinate on a common set of implementations, written by experts and tuned by
-users to be as fast and robust as possible!)
+Scientific libraries provide the routines we don't want to -- and probably shouldn't -- write oursives
 
-But this is not the only reason that we use Python's scientific libraries.
+* numerical integration, interpolation, linear algebra, root finding, etc.
 
-Another is that pure Python is not fast.
+**Python is slow**
 
-So we need libraries that are designed to accelerate execution of Python code.
+Another reason we need the scientific libraries is that pure Python is relatively slow.
 
-They do this using two strategies:
+Scientific libraries accelerate execution using three main strategies:
 
-1. using compilers that convert Python-like statements into fast machine code for individual threads of logic and
-2. parallelizing tasks across multiple "workers" (e.g., CPUs, individual threads inside GPUs).
+1. Vectorization: providing compiled machine code and interfaces that make this code accessible
+1. JIT compilation: compilers that convert Python-like statements into fast machine code at runtime
+2. Parallelization: Shifting tasks across multiple threads/ CPUs / GPUs /TPUs
 
-We will discuss these ideas extensively in this and the remaining lectures from
-this series.
+We will discuss these ideas in depth below.
 
 
 ### Python's Scientific Ecosystem
@@ -123,7 +124,7 @@ Here's how they fit together:
 * Pandas provides types and functions for manipulating data.
 * Numba provides a just-in-time compiler that plays well with NumPy and helps accelerate Python code.
 
-We will discuss all of these libraries extensively in this lecture series.
+We will discuss all of these libraries at length in this lecture series.
 
 
 ## Pure Python is slow
@@ -189,15 +190,13 @@ a, b = ['foo'], ['bar']
 a + b
 ```
 
-(We say that the operator `+` is *overloaded* --- its action depends on the
-type of the objects on which it acts)
 
-As a result, when executing `a + b`, Python must first check the type of the objects and then call the correct operation.
+As a result, when executing `a + b`, Python must first check the type of the
+objects and then call the correct operation.
 
-This involves a nontrivial overhead.
+This involves overhead.
 
-If we repeatedly execute this expression in a tight loop, the nontrivial
-overhead becomes a large overhead.
+If we repeatedly execute this expression in a tight loop, the overhead becomes large.
 
 
 #### Static types
@@ -243,38 +242,29 @@ To illustrate, let's consider the problem of summing some data --- say, a collec
 
 #### Summing with Compiled Code
 
-In C or Fortran, these integers would typically be stored in an array, which
-is a simple data structure for storing homogeneous data.
+In C or Fortran, an array of integers is stored in a single contiguous block of memory
 
-Such an array is stored in a single contiguous block of memory
-
-* In modern computers, memory addresses are allocated to each byte (one byte = 8 bits).
 * For example, a 64 bit integer is stored in 8 bytes of memory.
 * An array of $n$ such integers occupies $8n$ *consecutive* memory slots.
 
-Moreover, the compiler is made aware of the data type by the programmer.
-
-* In this case 64 bit integers
+Moreover, the data type is known at compile time. 
 
 Hence, each successive data point can be accessed by shifting forward in memory
 space by a known and fixed amount.
 
-* In this case 8 bytes
 
 #### Summing in Pure Python
 
 Python tries to replicate these ideas to some degree.
 
-For example, in the standard Python implementation (CPython), list elements are placed in memory locations that are in a sense contiguous.
+For example, in the standard Python implementation (CPython), list elements are
+placed in memory locations that are in a sense contiguous.
 
 However, these list elements are more like pointers to data rather than actual data.
 
 Hence, there is still overhead involved in accessing the data values themselves.
 
-This is a considerable drag on speed.
-
-In fact, it's generally true that memory traffic is a major culprit when it comes to slow execution.
-
+Such overhead is a major culprit when it comes to slow execution.
 
 
 ### Summary
@@ -295,15 +285,11 @@ synonymous with parallelization.
 
 This task is best left to specialized compilers!
 
-Certain Python libraries have outstanding capabilities for parallelizing scientific code -- we'll discuss this more as we go along.
-
-
 
 
 ## Accelerating Python
 
-In this section we look at three related techniques for accelerating Python
-code.
+In this section we look at three related techniques for accelerating Python code.
 
 Here we'll focus on the fundamental ideas.
 
@@ -325,10 +311,11 @@ Many economists usually refer to array programming as "vectorization."
 In computer science, this term has [a slightly different meaning](https://en.wikipedia.org/wiki/Automatic_vectorization).
 ```
 
-The key idea is to send array processing operations in batch to pre-compiled
-and efficient native machine code.
+The key idea is to send array processing operations in batch to pre-compiled and
+efficient native machine code.
 
-The machine code itself is typically compiled from carefully optimized C or Fortran.
+The machine code itself is typically compiled from carefully optimized C or
+Fortran.
 
 For example, when working in a high level language, the operation of inverting a
 large matrix can be subcontracted to efficient machine code that is pre-compiled
@@ -346,6 +333,7 @@ The idea of vectorization dates back to MATLAB, which uses vectorization extensi
 ```{figure} /_static/lecture_specific/need_for_speed/matlab.png
 ```
 
+NumPy uses a similar model, inspired by MATLAB
 
 
 ### Vectorization vs for pure Python loops
@@ -423,19 +411,17 @@ can be run) has slowed dramatically in recent years.
 Chip designers and computer programmers have responded to the slowdown by
 seeking a different path to fast execution: parallelization.
 
-Hardware makers have increased the number of cores (physical CPUs) embedded in each machine.
+This involves
 
-For programmers, the challenge has been to exploit these multiple CPUs by
-running many processes in parallel (i.e., simultaneously).
+1. increasing the number of CPUs embedded in each machine
+1. connecting hardware accelerators such as GPUs and TPUs
 
-This is particularly important in scientific programming, which requires handling
-
-* large amounts of data and
-* CPU intensive simulations and other calculations.
+For programmers, the challenge has been to exploit this hardware
+running many processes in parallel.
 
 Below we discuss parallelization for scientific computing, with a focus on
 
-1. the best tools for parallelization in Python and
+1. tools for parallelization in Python and
 1. how these tools can be applied to quantitative economic problems.
 
 
@@ -447,22 +433,18 @@ scientific computing and discuss their pros and cons.
 
 #### Multiprocessing
 
-Multiprocessing means concurrent execution of multiple processes using more than one processor.
-
-In this context, a **process** is a chain of instructions (i.e., a program).
+Multiprocessing means concurrent execution of multiple threads of logic using more than one processor.
 
 Multiprocessing can be carried out on one machine with multiple CPUs or on a
-collection of machines connected by a network.
+cluster of machines connected by a network.
 
-In the latter case, the collection of machines is usually called a
-**cluster**.
+With multiprocessing, *each process has its own memory space*, although the physical memory chip might be shared.
 
-With multiprocessing, each process has its own memory space, although the
-physical memory chip might be shared.
 
 #### Multithreading
 
-Multithreading is similar to multiprocessing, except that, during execution, the threads all share the same memory space.
+Multithreading is similar to multiprocessing, except that, during execution, the
+threads all *share the same memory space*.
 
 Native Python struggles to implement multithreading due to some [legacy design
 features](https://wiki.python.org/moin/GlobalInterpreterLock).
@@ -472,6 +454,7 @@ But this is not a restriction for scientific libraries like NumPy and Numba.
 Functions imported from these libraries and JIT-compiled code run in low level
 execution environments where Python's legacy restrictions don't apply.
 
+
 #### Advantages and Disadvantages
 
 Multithreading is more lightweight because most system and memory resources
diff --git a/lectures/numba.md b/lectures/numba.md
index 6668e6c4..a39010da 100644
--- a/lectures/numba.md
+++ b/lectures/numba.md
@@ -48,11 +48,13 @@ import matplotlib.pyplot as plt
 In an {doc}`earlier lecture <need_for_speed>` we discussed vectorization, 
 which can improve execution speed by sending array processing operations in batch to efficient low-level code.
 
-However, as {ref}`discussed previously <numba-p_c_vectorization>`, traditional vectorization schemes, such as those found in MATLAB, Julia, and NumPy, have several weaknesses.
+However, as {ref}`discussed in that lecture <numba-p_c_vectorization>`,
+traditional vectorization schemes, such as those found in MATLAB, Julia, and NumPy, have weaknesses.
 
-For example, they can be highly memory-intensive and, for some algorithms, vectorization is ineffective or impossible.
+* Highly memory-intensive for compound array operations
+* Ineffective or impossible for some algorithms.
 
-One way around these problems is through [Numba](https://numba.pydata.org/), a
+One way to circumvent these problems is by using [Numba](https://numba.pydata.org/), a
 **just in time (JIT) compiler** for Python that is oriented towards numerical work.
 
 Numba compiles functions to native machine code instructions during runtime.
@@ -62,6 +64,14 @@ When it succeeds, Numba will be on par with machine code from low-level language
 In addition, Numba can do other useful tricks, such as {ref}`multithreading` or
 interfacing with GPUs (through `numba.cuda`).
 
+Numba's JIT compiler is in many ways similar to the JIT compiler in JULIA
+
+The main difference is that it is less ambitious, attempting to compile a smaller subset of the language.
+
+Although this might sound like a defficiency, it is in some ways an advantage.
+
+Numba is lean, easy to use, and very good at what it does.
+
 This lecture introduces the core ideas.
 
 (numba_link)=
@@ -74,10 +84,10 @@ This lecture introduces the core ideas.
 (quad_map_eg)=
 ### An Example
 
-Let's consider a problem that's difficult to vectorize: generating the
-trajectory of a difference equation given an initial condition.
+Let's consider a problem that's difficult to vectorize (i.e., hand off to array
+processing operations). 
 
-We will take the difference equation to be the quadratic map
+The problem involves generating the trajectory via the quadratic map
 
 $$
     x_{t+1} = \alpha x_t (1 - x_t)
@@ -168,22 +178,23 @@ The basic idea is this:
 * Python is very flexible and hence we could call the function qm with many
   types.
     * e.g., `x0` could be a NumPy array or a list, `n` could be an integer or a float, etc.
-* This makes it hard to *pre*-compile the function (i.e., compile before runtime).
-* However, when we do actually call the function, say by running `qm(0.5, 10)`,
-  the types of `x0` and `n` become clear.
+* This makes it very difficult to generate efficient machine code *ahead of time* (i.e., before runtime).
+* However, when we do actually *call* the function, say by running `qm(0.5, 10)`,
+      the types of `x0` and `n` become clear.
 * Moreover, the types of *other variables* in `qm` *can be inferred once the input types are known*.
-* So the strategy of Numba and other JIT compilers is to wait until this
-  moment, and then compile the function.
+* So the strategy of Numba and other JIT compilers is to *wait until the function is called*, and then compile.
 
-That's why it is called "just-in-time" compilation.
+That's is called "just-in-time" compilation.
 
-Note that, if you make the call `qm(0.5, 10)` and then follow it with `qm(0.9, 20)`, compilation only takes place on the first call.
+Note that, if you make the call `qm(0.5, 10)` and then follow it with `qm(0.9,
+20)`, compilation only takes place on the first call.
 
 This is because compiled code is cached and reused as required.
 
 This is why, in the code above, `time3` is smaller than `time2`.
 
 
+
 ## Decorator Notation
 
 In the code above we created a JIT compiled version of `qm` via the call
@@ -226,30 +237,22 @@ with qe.Timer(precision=4):
 
 Numba also provides several arguments for decorators to accelerate computation and cache functions -- see [here](https://numba.readthedocs.io/en/stable/user/performance-tips.html).
 
+
 ## Type Inference
 
 Successful type inference is a key part of JIT compilation.
 
-As you can imagine, inferring types is easier for simple Python objects (e.g., simple scalar data types such as floats and integers).
+As you can imagine, inferring types is easier for simple Python objects (e.g.,
+simple scalar data types such as floats and integers).
 
 Numba also plays well with NumPy arrays, which have well-defined types.
 
 In an ideal setting, Numba can infer all necessary type information.
 
-This allows it to generate native machine code, without having to call the Python runtime environment.
+This allows it to generate efficient native machine code, without having to call the Python runtime environment.
 
 When Numba cannot infer all type information, it will raise an error.
 
-```{note}
-In older versions of Numba, the `@jit` decorator would silently fall back
-to "object mode" when it could not infer all types, which provided little or
-no speed gain.  Current versions of Numba use `nopython` mode by default,
-meaning the compiler insists on full type inference and raises an error if
-it fails.  You will often see `@njit` used in other code, which is simply
-an alias for `@jit(nopython=True)`.  Since nopython mode is now the default,
-`@jit` and `@njit` are equivalent.
-```
-
 For example, in the (artificial) setting below, Numba is unable to determine the type of function `mean` when compiling the function `bootstrap`
 
 ```{code-cell} ipython3
@@ -297,9 +300,7 @@ Let's add some cautionary notes.
 As we've seen, Numba needs to infer type information on
 all variables to generate fast machine-level instructions.
 
-For simple routines, Numba infers types very well.
-
-For larger ones, or for routines using external libraries, it can easily fail.
+For large routines or those using external libraries, this process can easily fail.
 
 Hence, it's best to focus on speeding up small, time-critical snippets of code.
 
@@ -333,32 +334,14 @@ function.
 
 When Numba compiles machine code for functions, it treats global variables as constants to ensure type stability.
 
-### Caching Compiled Code
-
-By default, Numba recompiles functions each time a new Python session starts.
-
-To avoid this overhead, you can pass `cache=True` to the decorator:
-
-```{code-cell} ipython3
-@jit(cache=True)
-def qm(x0, n):
-    x = np.empty(n+1)
-    x[0] = x0
-    for t in range(n):
-        x[t+1] = α * x[t] * (1 - x[t])
-    return x
-```
-
-This stores the compiled code on disk so that subsequent sessions can skip
-the compilation step.
 
 (multithreading)=
 ## Multithreaded Loops in Numba
 
-In addition to JIT compilation, Numba provides support for parallel computing on CPUs.
+In addition to JIT compilation, Numba provides support for parallel computing on CPUs and GPUs.
 
-The key tool for parallelization in Numba is the `prange` function, which tells
-Numba to execute loop iterations in parallel across available CPU cores.
+The key tool for parallelization on CPUs in Numba is the `prange` function, which tells
+Numba to execute loop iterations in parallel across available cores.
 
 To illustrate, let's look first at a simple, single-threaded (i.e., non-parallelized) piece of code.
 
@@ -418,27 +401,10 @@ Now let's suppose that we have a large population of households and we want to
 know what median wealth will be.
 
 This is not easy to solve with pencil and paper, so we will use simulation
-instead.
-
-In particular, we will simulate a large number of households and then
-calculate median wealth for this group.
-
-Suppose we are interested in the long-run average of this median over time.
-
-For the specification that we've chosen above, we can
-calculate this by taking a one-period cross-sectional snapshot of median
-wealth of the group at the end of a long simulation.
-
-Moreover, provided the simulation period is long enough, initial conditions don't matter.
-
-(This is due to [ergodicity](https://python.quantecon.org/finite_markov.html#id15).)
-
-So, in summary, we are going to simulate 50,000 households by
-
-1. arbitrarily setting initial wealth to 1 and
-1. simulating forward in time for 1,000 periods.
+instead:
 
-Then we'll calculate median wealth at the end period.
+1. Simulate a large number of households forward in time
+2. Calculate median wealth 
 
 Here's the code:
 
@@ -492,6 +458,8 @@ with qe.Timer():
 
 The speed-up is significant.
 
+Notice that we parallelize across households rather than over time -- updates of
+an individual household across time periods are inherently sequential
 
 ## Exercises
 
@@ -550,8 +518,9 @@ So we get a speed gain of 2 orders of magnitude by adding four characters.
 :label: speed_ex2
 ```
 
-In the [Introduction to Quantitative Economics with Python](https://intro.quantecon.org/intro.html) lecture series you can
-learn all about finite-state Markov chains.
+In the [Introduction to Quantitative Economics with
+Python](https://intro.quantecon.org/intro.html) lecture series you can learn all
+about finite-state Markov chains.
 
 For now, let's just concentrate on simulating a very simple example of such a chain.