2.1. Data Manipulation — Dive into Deep Learning 0.1.0 documentation (2024)

In order to get anything done, we need some way to store and manipulatedata. Generally, there are two important things we need to do with data:(i) acquire them; and (ii) process them once they are inside thecomputer. There is no point in acquiring data without some way to storeit, so let us get our hands dirty first by playing with synthetic data.To start, we introduce the \(n\)-dimensional array (ndarray),MXNet’s primary tool for storing and transforming data. In MXNet,ndarray is a class and we call any instance “an ndarray”.

If you have worked with NumPy, the most widely-used scientific computingpackage in Python, then you will find this section familiar. That’s bydesign. We designed MXNet’s ndarray to be an extension to NumPy’sndarray with a few killer features. First, MXNet’s ndarraysupports asynchronous computation on CPU, GPU, and distributed cloudarchitectures, whereas NumPy only supports CPU computation. Second,MXNet’s ndarray supports automatic differentiation. These propertiesmake MXNet’s ndarray suitable for deep learning. Throughout thebook, when we say ndarray, we are referring to MXNet’s ndarrayunless otherwise stated.

2.1.1. Getting Started¶

In this section, we aim to get you up and running, equipping you withthe basic math and numerical computing tools that you will build on asyou progress through the book. Do not worry if you struggle to grok someof the mathematical concepts or library functions. The followingsections will revisit this material in the context of practical examplesand it will sink. On the other hand, if you already have some backgroundand want to go deeper into the mathematical content, just skip thissection.

To start, we import the api and mxnet-engine modules from DeepJava Library on maven. Here, the api module includes all high levelJava APIs that will be used for data processing, training and inference.The mxnet-engine includes the implementation of those high levelAPIs using Apache MXnet framework. Using the DJL automatic engine mode,the MXNet native libraries with basic operations and functionsimplemented in C++ will be downloaded automatically when DJL is firstused.

%load ../utils/djl-imports

An ndarray represents a (possibly multi-dimensional) array ofnumerical values. With one axis, an ndarray corresponds (in math) toa vector. With two axes, an ndarray corresponds to a matrix.Arrays with more than two axes do not have special mathematicalnames—we simply call them tensors.

To start, we can use arange to create a row vector x containingthe first \(12\) integers starting with \(0\), though they arecreated as floats by default. Each of the values in an ndarray iscalled an element of the ndarray. For instance, there are\(12\) elements in the ndarray x. Unless otherwisespecified, a new ndarray will be stored in main memory anddesignated for CPU-based computation.

NDManager manager = NDManager.newBaseManager();var x = manager.arange(12);x

ND: (12) gpu(0) int32[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Here we are using a`NDManager <https://javadoc.io/doc/ai.djl/api/latest/ai/djl/ndarray/NDManager.html>`__to create the ndarray x. NDManager implements theAutoClosableinterface and manages the life cycles of the ndarrays it created.This is needed to help manage native memory consumption that JavaGarbage Collector does not have control of. We usually wrap NDManagerwith try blocks so all ndarrays will be closed in time. To knowmore about memory management, read DJL’sdocumentation.

try(NDManager manager = NDManager.newBaseManager()){ NDArray x = manager.arange(12);}

We can access an ndarray’s shape (the length along each axis) byinspecting its shape property.

x.getShape()

(12)

If we just want to know the total number of elements in an ndarray,i.e., the product of all of the shape elements, we can inspect itssize property. Because we are dealing with a vector here, the singleelement of its shape is identical to its size.

x.size()

To change the shape of an ndarray without altering either the numberof elements or their values, we can invoke the reshape function. Forexample, we can transform our ndarray, x, from a row vector withshape (\(12\),) to a matrix with shape (\(3\), \(4\)). Thisnew ndarray contains the exact same values, but views them as amatrix organized as \(3\) rows and \(4\) columns. To reiterate,although the shape has changed, the elements in x have not. Notethat the size is unaltered by reshaping.

x = x.reshape(3, 4);x

ND: (3, 4) gpu(0) int32[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11],]

Reshaping by manually specifying every dimension is unnecessary. If ourtarget shape is a matrix with shape (height, width), then after we knowthe width, the height is given implicitly. Why should we have to performthe division ourselves? In the example above, to get a matrix with\(3\) rows, we specified both that it should have \(3\) rows and\(4\) columns. Fortunately, ndarray can automatically work outone dimension given the rest. We invoke this capability by placing-1 for the dimension that we would like ndarray to automaticallyinfer. In our case, instead of calling x.reshape(3, 4), we couldhave equivalently called x.reshape(-1, 4) or x.reshape(3, -1).

Passing create method with only Shape will grab a chunk ofmemory and hands us back a matrix without bothering to change the valueof any of its entries. This is remarkably efficient but we must becareful because the entries might take arbitrary values, including verybig ones!

manager.create(new Shape(3, 4))

ND: (3, 4) gpu(0) float32[[ 1.12103877e-44, 1.26116862e-44, 1.40129846e-44, 1.54142831e-44], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00],]

Typically, we will want our matrices initialized either with zeros,ones, some other constants, or numbers randomly sampled from a specificdistribution. We can create an ndarray representing a tensor withall elements set to \(0\) and a shape of (\(2\), \(3\),\(4\)) as follows:

manager.zeros(new Shape(2, 3, 4))

ND: (2, 3, 4) gpu(0) float32[[[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], ], [[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], ],]

2.1.2. Operations¶

This book is not about software engineering. Our interests are notlimited to simply reading and writing data from/to arrays. We want toperform mathematical operations on those arrays. Some of the simplestand most useful operations are the elementwise operations. These applya standard scalar operation to each element of an array. For functionsthat take two arrays as inputs, elementwise operations apply somestandard binary operator on each pair of corresponding elements from thetwo arrays. We can create an elementwise function from any function thatmaps from a scalar to a scalar.

In mathematical notation, we would denote such a unary scalar operator(taking one input) by the signature\(f: \mathbb{R} \rightarrow \mathbb{R}\). This just means that thefunction is mapping from any real number (\(\mathbb{R}\)) ontoanother. Likewise, we denote a binary scalar operator (taking two realinputs, and yielding one output) by the signature\(f: \mathbb{R}, \mathbb{R} \rightarrow \mathbb{R}\). Given any twovectors \(\mathbf{u}\) and \(\mathbf{v}\) of the same shape,and a binary operator \(f\), we can produce a vector\(\mathbf{c} = F(\mathbf{u},\mathbf{v})\) by setting\(c_i \gets f(u_i, v_i)\) for all \(i\), where \(c_i, u_i\),and \(v_i\) are the \(i^\mathrm{th}\) elements of vectors\(\mathbf{c}, \mathbf{u}\), and \(\mathbf{v}\). Here, weproduced the vector-valued\(F: \mathbb{R}^d, \mathbb{R}^d \rightarrow \mathbb{R}^d\) bylifting the scalar function to an elementwise vector operation.

In DJL, the common standard arithmetic operators (+, -, *,/, and **) have all been lifted to elementwise operations forany identically-shaped tensors of arbitrary shape. We can callelementwise operations on any two tensors of the same shape. In thefollowing example, we use commas to formulate a \(5\)-element tuple,where each element is the result of an elementwise operation. Note: youneed to use add, sub, mul, div, and pow as Java doesnot support overloading of these operators.

var x = manager.create(new float[]{1f, 2f, 4f, 8f});var y = manager.create(new float[]{2f, 2f, 2f, 2f});x.add(y);

ND: (4) gpu(0) float32[ 3., 4., 6., 10.]

x.sub(y);

ND: (4) gpu(0) float32[-1., 0., 2., 6.]

x.mul(y);

ND: (4) gpu(0) float32[ 2., 4., 8., 16.]

x.div(y);

ND: (4) gpu(0) float32[0.5, 1. , 2. , 4. ]

x.pow(y);

ND: (4) gpu(0) float32[ 1., 4., 16., 64.]

Many more operations can be applied elementwise, including unaryoperators like exponentiation.

x.exp()

ND: (4) gpu(0) float32[ 2.71828175e+00, 7.38905621e+00, 5.45981483e+01, 2.98095801e+03]

In addition to elementwise computations, we can also perform linearalgebra operations, including vector dot products and matrixmultiplication. We will explain the crucial bits of linear algebra (withno assumed prior knowledge) in Section 2.3.

We can also concatenate multiple ndarrays together, stackingthem end-to-end to form a larger ndarray. We just need to provide alist of ndarrays and tell the system along which axis toconcatenate. The example below shows what happens when we concatenatetwo matrices along rows (axis \(0\), the first element of the shape)vs. columns (axis \(1\), the second element of the shape). We cansee that the first output ndarray’s axis-\(0\) length(\(6\)) is the sum of the two input ndarrays’ axis-\(0\)lengths (\(3 + 3\)); while the second output ndarray’saxis-\(1\) length (\(8\)) is the sum of the two inputndarrays’ axis-\(1\) lengths (\(4 + 4\)).

x = manager.arange(12f).reshape(3, 4);y = manager.create(new float[]{2, 1, 4, 3, 1, 2, 3, 4, 4, 3, 2, 1}, new Shape(3, 4));x.concat(y) // default axis = 0

ND: (6, 4) gpu(0) float32[[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.], [ 2., 1., 4., 3.], [ 1., 2., 3., 4.], [ 4., 3., 2., 1.],]

x.concat(y, 1)

ND: (3, 8) gpu(0) float32[[ 0., 1., 2., 3., 2., 1., 4., 3.], [ 4., 5., 6., 7., 1., 2., 3., 4.], [ 8., 9., 10., 11., 4., 3., 2., 1.],]

Sometimes, we want to construct a binary ndarray via logicalstatements. Take x.eq(y) as an example. For each position, if xand y are equal at that position, the corresponding entry in the newndarray takes a value of \(1\), meaning that the logicalstatement x.eq(y) is true at that position; otherwise that positiontakes \(0\).

x.eq(y)

ND: (3, 4) gpu(0) boolean[[false, true, false, true], [false, false, false, false], [false, false, false, false],]

Summing all the elements in the ndarray yields an ndarray withonly one element.

x.sum()

ND: () gpu(0) float3266.

For stylistic convenience, we can write x.sum() as np.sum(x).

2.1.3. Broadcasting Mechanism¶

In the above section, we saw how to perform elementwise operations ontwo ndarrays of the same shape. Under certain conditions, evenwhen shapes differ, we can still perform elementwise operations byinvoking the broadcasting mechanism. This mechanism works in thefollowing way: First, expand one or both arrays by copying elementsappropriately so that after this transformation, the two ndarrayshave the same shape. Second, carry out the elementwise operations on theresulting arrays.

In most cases, we broadcast along an axis where an array initially onlyhas length \(1\), such as in the following example:

var a = manager.arange(3f).reshape(3, 1);var b = manager.arange(2f).reshape(1, 2);a

ND: (3, 1) gpu(0) float32[[0.], [1.], [2.],]

ND: (1, 2) gpu(0) float32[[0., 1.],]

Since a and b are \(3\times1\) and \(1\times2\) matricesrespectively, their shapes do not match up if we want to add them. Webroadcast the entries of both matrices into a larger \(3\times2\)matrix as follows: for matrix a it replicates the columns and formatrix b it replicates the rows before adding up both elementwise.

a.add(b)

ND: (3, 2) gpu(0) float32[[0., 1.], [1., 2.], [2., 3.],]

2.1.4. Indexing and Slicing¶

DJL use the same syntax as Numpy in Python for indexing and slicing.Just as in any other Python array, elements in an ndarray can beaccessed by index. As in any Python array, the first element has index\(0\) and ranges are specified to include the first but before thelast element. As in standard Python lists, we can access elementsaccording to their relative position to the end of the list by usingnegative indices.

Thus, [-1] selects the last element and [1:3] selects the secondand the third elements as follows:

x.get(":-1");

ND: (2, 4) gpu(0) float32[[0., 1., 2., 3.], [4., 5., 6., 7.],]

x.get("1:3")

ND: (2, 4) gpu(0) float32[[ 4., 5., 6., 7.], [ 8., 9., 10., 11.],]

Beyond reading, we can also write elements of a matrix by specifyingindices.

x.set(new NDIndex("1, 2"), 9);x

ND: (3, 4) gpu(0) float32[[ 0., 1., 2., 3.], [ 4., 5., 9., 7.], [ 8., 9., 10., 11.],]

If we want to assign multiple elements the same value, we simply indexall of them and then assign them the value. For instance, [0:2, :]accesses the first and second rows, where : takes all the elementsalong axis \(1\) (column). While we discussed indexing for matrices,this obviously also works for vectors and for tensors of more than\(2\) dimensions.

x.set(new NDIndex("0:2, :"), 12);x

ND: (3, 4) gpu(0) float32[[12., 12., 12., 12.], [12., 12., 12., 12.], [ 8., 9., 10., 11.],]

2.1.5. Saving Memory¶

Running operations can cause new memory to be allocated to host results.For example, if we write y = x.add(y), we will dereference thendarray that y used to point to and instead point y at thenewly allocated memory.

This might be undesirable for two reasons. First, we do not want to runaround allocating memory unnecessarily all the time. In machinelearning, we might have hundreds of megabytes of parameters and updateall of them multiple times per second. Typically, we will want toperform these updates in place. Second, we might point at the sameparameters from multiple variables. If we do not update in place, otherreferences will still point to the old memory location, making itpossible for parts of our code to inadvertently reference staleparameters.

Fortunately, performing in-place operations in DJL is easy. We canassign the result of an operation to a previously allocated array usinginplace operators like addi, subi, muli, and divi.

var original = manager.zeros(y.getShape());var actual = original.addi(x);original == actual

true

2.1.6. Summary¶

DJL’s ndarray is an extension to NumPy’s ndarray with a fewkiller advantages that make it suitable for deep learning.
DJL’s ndarray provides a variety of functionalities includingbasic mathematics operations, broadcasting, indexing, slicing, memorysaving, and conversion to other Python objects.

2.1.7. Exercises¶

Run the code in this section. Change the conditional statementx.eq(y) in this section to x.lt(y) (less than) or x.gt(y)(greater than), and then see what kind of ndarray you can get.
Replace the two ndarrays that operate by element in thebroadcasting mechanism with other shapes, e.g., three dimensionaltensors. Is the result the same as expected?