The Chain Rule

[math] \newcommand{\ex}[1]{\item } \newcommand{\sx}{\item} \newcommand{\x}{\sx} \newcommand{\sxlab}[1]{} \newcommand{\xlab}{\sxlab} \newcommand{\prov}[1] {\quad #1} \newcommand{\provx}[1] {\quad \mbox{#1}} \newcommand{\intext}[1]{\quad \mbox{#1} \quad} \newcommand{\R}{\mathrm{\bf R}} \newcommand{\Q}{\mathrm{\bf Q}} \newcommand{\Z}{\mathrm{\bf Z}} \newcommand{\C}{\mathrm{\bf C}} \newcommand{\dt}{\textbf} \newcommand{\goesto}{\rightarrow} \newcommand{\ddxof}[1]{\frac{d #1}{d x}} \newcommand{\ddx}{\frac{d}{dx}} \newcommand{\ddt}{\frac{d}{dt}} \newcommand{\dydx}{\ddxof y} \newcommand{\nxder}[3]{\frac{d^{#1}{#2}}{d{#3}^{#1}}} \newcommand{\deriv}[2]{\frac{d^{#1}{#2}}{dx^{#1}}} \newcommand{\dist}{\mathrm{distance}} \newcommand{\arccot}{\mathrm{arccot\:}} \newcommand{\arccsc}{\mathrm{arccsc\:}} \newcommand{\arcsec}{\mathrm{arcsec\:}} \newcommand{\arctanh}{\mathrm{arctanh\:}} \newcommand{\arcsinh}{\mathrm{arcsinh\:}} \newcommand{\arccosh}{\mathrm{arccosh\:}} \newcommand{\sech}{\mathrm{sech\:}} \newcommand{\csch}{\mathrm{csch\:}} \newcommand{\conj}[1]{\overline{#1}} \newcommand{\mathds}{\mathbb} [/math]

The theorems in Section \secref{1.7} were concerned with finding the derivatives of functions that were constructed from other functions using the algebraic operations of addition, multiplication by a constant, multiplication, and division. In this section we shall derive a similar formula, called the Chain Rule, for the derivative of the composition [math]f(g)[/math] of a differentiable function [math]g[/math] with a differentiable function [math]f[/math]. Before giving the theorem, we remark that an alternative way of writing the definition of the derivative of a function [math]f[/math] is

[[math]] \begin{equation} f'(a) = \lim_{x \goesto a}\frac{f(x) - f(a)}{x - a} . \label{eq1.8.1} \end{equation} [[/math]]

The substitution [math]x = a + t[/math] will transform into the expression that we have heretofore used for the derivative. An equation equivalent to is

[[math]] \lim_{x \goesto a} \Bigl[ \frac{f(x)- f(a)}{x-a} - f'(a) \Bigr] = 0. [[/math]]

We next define a function [math]r[/math] (dependent on both [math]f[/math] and [math]a[/math]) by

[[math]] \begin{equation} r(x) = \left \{ \begin{array}{ll} \frac{f(x) - f(a)}{x - a} - f'(a), & \mbox{if}\;\;\; x \neq a,\\ 0, & \mbox{if}\;\;\; x = a. \end{array} \right. \label{eq1.8.2} \end{equation} [[/math]]

Note that the two functions [math]f[/math] and [math]r[/math] have the same domain. Furthermore, as a result of, we have

[[math]] \lim_{x \goesto a} r(x) = 0 = r(a), [[/math]]

i.e., the function [math]r[/math] is continuous at [math]a[/math]. From the definition of [math]r[/math], we obtain the equation

[[math]] \begin{equation} f(x) - f(a) = [f'(a) + r(x)] (x - a), \label{eq1.8.3} \end{equation} [[/math]]

which is true for every [math]x[/math] in the domain of [math]f[/math]. We now prove:

Proposition (The Chain Rule)

If [math]f[/math] and [math]g[/math] are differentiable functions, then so is the composite function [math]f(g)[/math]. Moreover, [math][f(g)]' = f'(g)g'[/math].


Show Proof

Let [math]a[/math] be a number in the domain of [math]g[/math] such that [math]g(a)[/math] is in the domain of [math]f[/math]. By definition

[[math]] \begin{eqnarray*} [ f (g)]'(a) &=& \lim_{x \goesto a}\frac{(f(g))(x) - (f(g))(a)}{x - a}\\ &=& \lim_{x \goesto a}\frac{f(g(x)) - f(g(a))}{x - a}. \end{eqnarray*} [[/math]]
The intuitive idea behind the Chain Rule can be seen by writing

[[math]] \begin{eqnarray*} [f(g)]'(a) &=& \lim_{x \goesto a} \Bigl[ \frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \frac{g(x) - g(a)}{x - a} \Bigr] \\ &=& \Bigl[ \lim_{x \goesto a} \frac{f(g(x)) - f(g(a))}{g(x) - g(a)} \Bigr] \Bigl[ \lim_{x \goesto a}\frac{g(x) - g(a)}{x - a} \Bigr]. \end{eqnarray*} [[/math]]
Setting [math]y = g(x)[/math] and [math]b = g(a)[/math] and noting that [math]y[/math] approaches [math]b[/math] as [math]x[/math] approaches [math]a[/math], we have

[[math]] \begin{eqnarray*} [f(g)]'(a) &=& \lim_{y \goesto b}\frac{ f(y) - f(b)}{y - b} lim_{x \goesto a} \frac{ g(x) - g(a)}{x - a}\\ &=& f'(b)g'(a)\\ &=& (f'(g(a))g'(a)\\ &=& (f'(g)g')(a), \end{eqnarray*} [[/math]]
which is the desired result. This argument fails to be a rigorous proof because there is no reason to suppose that [math]g(x) - g(a) \neq 0[/math] for all [math]x[/math] sufficiently close to [math]a[/math]. To overcome this difficulty, we use equation. With a typical element in the domain of [math]f[/math] denoted by [math]y[/math] instead of [math]x[/math] and with the derivative evaluated at [math]b[/math], equation implies that

[[math]] f(y) - f(b) = [f'(b) + r(y)](y - b), [[/math]]
Moreover, [math]\lim_{y \goesto b} r(y) = 0[/math]. Substituting [math]y = g(x)[/math] and [math]b = g(a)[/math], we get

[[math]] f(g(x)) - f(g(x)) = [f'(g(a)) + r(g(x))][g(x) - g(a)]. [[/math]]
Hence

[[math]] \frac{f (g(x) ) - f (g(a) )}{x - a} = [f'(g(a)) + r(g(x))] \frac{g(x) - g(a)}{x - a} . [[/math]]
We know that [math]\lim_{x \goesto a} \frac{g(x) - g(a)}{x - a} = g'(a)[/math]. In addition, since [math]g[/math] is differentiable at [math]a[/math], it is continuous there [see Theorem], and so [math]\lim_{x \goesto a} g(x) = g(a) = b[/math]. Since [math]\lim_{y \goesto b} r(y) = 0[/math], it follows that [math]|r(y)|[/math] can be made arbitrarily small by taking [math]y[/math] sufficiently close to [math]b[/math]. Because [math]\lim_{x \goesto a} g(x) = b[/math], we may therefore conclude that [math]\lim_{x \goesto a} r(g(x)) = 0[/math]. The basic limit theorem asserts that the limit of a sum or product is the sum or product, respectively, of the limits. Hence

[[math]] \begin{eqnarray*} [f(g)]'(a) &=& \lim_{x \goesto a}\frac{f(g(x) ) - f(g(a) )}{x - a}\\ &=& \Bigl[\lim_{x \goesto a} f'(g(a) ) + \lim_{x \goesto a} r (g(x) ) \Bigr] \lim_{x \goesto a}\frac{g(x) - g(a)}{x-a}\\ &=& [f'(g(a)) + 0]g'(a) = f'(g(a))g'(a)\\ &=& (f'(g)g')(a), \end{eqnarray*} [[/math]]
and the proof of the Chain Rule is complete.

Example

If [math]F(x) = (x^2 + 2)^3[/math], compute [math]F'(x)[/math]. One way to do this problem is to expand [math](x^2 + 2)^3[/math] and use the differentiation formulas developed in Section \secref{1.7}.

[[math]] \begin{eqnarray*} F(x) &=& (x^2 + 2)^3 = x^6 + 6x^4 + 12x^2 + 8, \\ F'(x) &=& 6x^5 + 24x^3 + 24x. \end{eqnarray*} [[/math]]

Another method uses the Chain Rule. Let [math]g[/math] and [math]f[/math] be the functions defined, respectively, by [math]g(x) = x^2 + 2[/math] and [math]f(y) = y^3[/math]. Then

[[math]] f(g(x)) = (x^2 + 2)^3 = F(x), [[/math]]

and, according to the Chain Rule,

[[math]] F'(x) = [f (g(x))]' = f'(g(x))g'(x) . [[/math]]

Since [math]g'(x) = 2x[/math] and [math]f'(y) = 3y^2[/math], we get [math]f'(g(x)) = 3(x^2 + 2)^2[/math] and

[[math]] \begin{eqnarray*} F'(x) &=& 3(x^2 + 2)^2(2x)\\ &=& 6x(x^4 + 4x^2 + 4), \end{eqnarray*} [[/math]]

which agrees with the alternative solution above.

Example

Find the derivative of the function [math](3x^7 + 2x)^{128}[/math]. In principle, we could expand by the binomial theorem, but with the Chain Rule at our disposal that would be absurd. Let [math]g(x) = 3x^7 + 2x[/math] and [math]f(y) = y^{128}[/math]. Then [math]g'(x) = 21x^6 + 2[/math] and [math]f'(y) = 128y^{127}[/math]. Setting [math]y = 3x^7 + 2x[/math], we get

[[math]] \begin{eqnarray*} ((3x^7 + 2x)^{128})' &=& [f(g(x))]' = f'(g(x))g'(x)\\ &=& 128{(3x^7 + 2x)^{127}}(21x^6 + 2). \end{eqnarray*} [[/math]]


The above two examples are instances of the following corollary of the Chain Rule: If [math]f[/math] is a differentiable function, then

[[math]] (f^n)' = nf^{n-1}f', \;\;\; \mbox{for any integer} \; n. [[/math]]

To prove it, let [math]F(y) = y^n[/math]. Then [math]F(f) = f^n[/math], and we know that [math]F'(y) = ny^{n-1}[/math]. Consequently, [math](f^n)' = [F(f)]' = F'(f) f' = nf^{n-1}f'[/math]. A significant generalization of this result is

Proposition

If [math]f[/math] is a positive differentiable function and [math]r[/math] is any rational number, then [math](f^r)' = rf^{r-1}f'[/math]. The requirement that [math]f[/math] is positive assures that [math]f^r[/math] is defined. A nonpositive number cannot be raised to an arbitrary rational power. However, as we shall show later (see, the requirement that [math]r[/math] be a rational number is unnecessary. Theorem is actually true for any real number [math]r[/math].


Show Proof

Let [math]r = \frac{m}{n}[/math], where [math]m[/math] and [math]n[/math] are integers, and set [math]h = f^r = f^{m/n}[/math]. Then [math]h^n = (f^{m/n})^n = f^m[/math], which implies that [math](h^n)' = (f^m)'[/math]. Using the above formula for the derivative of an integral power of a function, we get

[[math]] nh^{n-1}h' = mf^{m-1}f'. [[/math]]
Solving for [math]h'[/math], we obtain

[[math]] \begin{eqnarray*} h' &=& \frac{m}{n} {h^{1 - n}{f^{m-1}} f'}\\ &=& \frac{m}{n}{(f^r)^{1- n}{f^{m -1}}f'}\\ &=& r{f^{r - rn + m - 1}} f'\\ &=& rf^{r -1} f'. \end{eqnarray*} [[/math]]


This completes the proof---almost. Note that we have in the argument tacitly assumed that [math]h[/math], the function whose derivative we are seeking, is differentiable. Is it? If it is, how do we know it? The answer to the first question is yes, but the answer to the second is not so easy. The problem can be reduced to a simpler one: If [math]n[/math] is a positive integer and [math]g[/math] is the function defined by [math]g(x) = x^{1/n''[/math], for [math]x \gt 0[/math], then [math]g[/math] is differentiable.} If we know this fact, we are out of the difficulty because the Chain Rule tells us that the composition of two differentiable functions is differentiable. Hence [math]g(f)[/math] is differentiable, and [math]g(f) = f^{1/n}[/math]. From this it follows that [math](f^{1/n})^m[/math] is differentiable, and [math](f^{1/n})^m = f^{m/n}[/math]. (When we express [math]r[/math] as a ratio [math]\frac{m}{n}[/math], we can certainly take [math]n[/math] to be positive.) A proof that [math]x^{1/n}[/math] is differentiable, if [math]x \gt 0[/math], is most easily given as an application of the Inverse Function Theorem. However, the intuitive reason is simple: If [math]y = x^{1/n}[/math] and [math]x \gt 0[/math], then [math]y^n = x[/math], and by interchanging [math]x[/math] and y we obtain the equation [math]x^n = y[/math]. The latter equation defines a smooth curve whose slope at every point is given by the derivative [math]\frac{dy}{dx} = nx^{n-1}[/math]. Interchanging [math]x[/math] and [math]y[/math] amounts geometrically to a reflection about the line [math]y = x[/math]. We conclude that the original curve [math]y = x^{1/n}, x \gt 0[/math], has the same intrinsic shape and smoothness as that defined by [math]y = x^n, y \gt 0[/math]. It therefore must have a tangent line at every point, which means that [math]x^{1/n}[/math] is differentiable.

Example

If [math]y = x^{1/n}[/math], then

[[math]] \frac{dy}{dx} = \frac{1}{n} x^{(1/n)-1} = \frac{1}{nx^{1-1/n}}, \;\;\; x \gt 0. [[/math]]

Example


Find the derivative of the function [math]F(x) = (3x^2 + 5x + 1)^{5/3}[/math]. If we let [math]f(x) = 3x^2 + 5x + 1[/math], then Theorem (8.2) implies that

[[math]] \begin{eqnarray*} F'(x) &=& \frac{5}{3} f(x)^{2/3} f'(x) \\ &=& \frac{5}{3} (3x^2 + 5x + 1)^{2/3}(6x + 5). \end{eqnarray*} [[/math]]


With the [math]\frac{d}{dx}[/math] notation for the derivative, the Chain Rule can be written in a form that is impossible to forget. Let [math]f[/math] and [math]g[/math] be two differentiable functions. The formation of the composite function [math]f(g)[/math] is suggested by writing [math]u = g(x)[/math] and [math]y = f(u)[/math]. Thus [math]x[/math] is transformed by [math]g[/math] into [math]u[/math], and the resulting [math]u[/math] is then transformed by [math]f[/math] into [math]y = f(u) = f(g(x))[/math]. We have

[[math]] \begin{eqnarray*} \frac{du}{dx} &=& g'(x), \\ \frac{dy}{du} &=& f'(u), \\ \frac{dy}{dx} &=& [f(g(x))]'. \end{eqnarray*} [[/math]]

By the Chain Rule, [math][f(g(x))]' = f'(g(x))g'(x) = f'(u)g'(x)[/math], and so

[[math]] \begin{equation} \frac{dy}{dx} = \frac{dy}{du}\frac{du}{dx}. \label{eq1.8.4} \end{equation} [[/math]]

The idea that one can simply cancel out [math]du[/math] in is very appealing and accounts for the popularity of the notation. It is important to realize that the cancellation is valid because the Chain Rule is true, and not vice versa. Thus far, [math]du[/math] is simply a part of the notation for the derivative and means nothing by itself. Note also \ref{eq1.8.4} is incomplete in the sense that it does not say explicitly at what points to evaluate the derivatives. We can add this information by writing

[[math]] \frac{dy}{dx} (a) = \frac{dy}{du} (u(a)) \frac{du}{dx}(a). [[/math]]

Example


If [math]w = z^2 + 2z + 3[/math] and [math]z = \frac{1}{x}[/math], find [math]\frac{dw}{dx}(2)[/math]. By the Chain Rule,

[[math]] \begin{eqnarray*} \frac{dw}{dx} &=& \frac{dw}{dz} \frac{dz}{dx} \\ &=& (2z + 2) \Bigl( - \frac{1}{x^2} \Bigr). \end{eqnarray*} [[/math]]

When [math]x = 2[/math], we have [math]z = \frac {1}{2}[/math]. Hence

[[math]] \frac{dw}{dx} (2) = (2 \cdot \frac{1}{2} + 2)( -\frac{1}{4}) = - \frac{3}{4}. [[/math]]

Example


Two functions, which we shall define in Chapter Differential Equations, are the hyperbolic sine and the hyperbolic cosine, denoted by [math]\sinh x[/math] and [math]\cosh x[/math] respectively. These functions are differentiable and have the interesting property that

[[math]] \begin{eqnarray*} \frac{d}{dx} \sinh x &=& \cosh x, \\ \frac{d}{dx} \cosh x &=& \sinh x. \end{eqnarray*} [[/math]]

Furthermore, [math]\sinh (0) = 0[/math] and [math]\cosh (0) = 1[/math]. Compute the derivatives at [math]x= 0[/math] of (a) [math](\cosh x)^2[/math], (b) the composite function [math]\sinh (\sinh x)[/math]. By, we obtain for (a)

[[math]] \frac{d}{dx}(\cosh x)^2 = 2 \cosh x \frac{d}{dx} \cosh x = 2 \cosh x \sinh x, [[/math]]

and so

[[math]] \frac{d}{dx} {(\cosh x)^2}(0) = 2 \cosh 0 \sinh 0 = 0. [[/math]]

Part (b) requires the full force of the Chain Rule: Setting [math]u = \sinh x[/math], we obtain

[[math]] \begin{eqnarray*} \frac{d}{dx} \sinh u &=& \frac{d}{du} \sinh u \frac{du}{dx} \\ &=& \cosh u \cosh x, \end{eqnarray*} [[/math]]

or

[[math]] \frac{d}{dx} \sinh (\sinh x) = \cosh (\sinh x) \cosh x. [[/math]]

Hence

[[math]] \begin{eqnarray*} \frac{d}{dx} \sinh (\sinh x)(0) &=& \cosh (\sinh 0) \cosh 0 \\ &=& \cosh 0 \cosh 0 = 1. \end{eqnarray*} [[/math]]

General references

Doyle, Peter G. (2008). "Crowell and Slesnick's Calculus with Analytic Geometry" (PDF). Retrieved Oct 29, 2024.