I worked in image processing and remote sensing from the late 1970s through the early 1990s, both in image interpretation for geological applications and as an image processing algorithm developer. I’m sure that much of the technologies I was familiar with are obsolete now, but the physics and mathematics are probably pretty much the same. These remarks will probably be of purely historical interest to those of you who work routinely with imagery today, but those who are exposed to imagery as a result of your other work or interests may find some of this commentary useful. Modern digital image manipulation relies heavily on tiling techniques which take advantage of hardware configurations, and on advanced compression algorithms and Fourier and power spectrum enhancements in frequency space. That’s all well out of my pay grade, but I am pretty familiar with the basics, and I hope you can find some of this useful. Most of the image geometry i talk about here here are now handled automatically by your software.
A digital image is a rectangular array of numbers, produced by the scanning of a real world scene or an analog image or photograph with a digital scanner. The image is converted to a long list of numbers, each being called a picture element, or “pixel”. A digital image is physically just a long list of numbers, one per pixel, each signifying a brightness value (or gray level): the brighter the pixel, the higher the number. Logically, the pixels are organized as horizontal lines and vertical samples, with the numbering starting at the upper right corner. Typically, they are organized as one line per computer record, with a group of records comprising an image file, but not necessarily. Your software needs to know how they are arranged so they can be unscrambled and reassembled into a recognizable picture which can be printed or displayed on a screen.
In the olden days, memory was expensive, and the capabilities of scanners and display devices limited, so the space reserved for each pixel varied with the amount of information on the pixel. A bit image pixel was either on or off, 0 or 1, black or white. It could be stored in a one bit word, or one space of computer memory. Not much visual information is transmitted this way, but you can display recognizable images. A byte image is stored in an eight-bit word, which allowed 256 gray levels–from 00000000 to 11111111–or 0 to 255. So for example, a brightness or “gray level” of 100 was stored in one byte as 01100100. Count from left to right, zero 128s + one 64 + one 32 + zero 16s + zero 8s+ one 4 + zero 2s and zero 1s. So only 256 brightness levels can be encoded in byte data, but the human eye is only capable of distinguishing about sixty or so, so perfectly realistic images (like human faces or aerial photographs) can be stored as byte data, although some of the subtle visual textures can be lost. Most display devices, even today, can only display 256 gray levels. So if your image is “deeper”, and requires larger words to display each pixel, the display software will still map it down to 0-255.
Besides bit and byte images, we occasionally had to work with imagery with greater dynamic ranges, such as small integer (a two byte word capable of over 32000 gray levels), a large integer
(4 bytes, millions of gray levels) and real images, which stored fractional numbers in a four bit word, some for the integer and some for the decimals, plus one bit for the positive or negative sign.
Deeper data might require double precision size words, 8 bytes. Of course, the eye can’t distinguish this many levels of gray, but some sensors can record it. For example, we can see in bright sunlight or by starlight at night, a dynamic range of millions of gray levels, but we can only discriminate a few dozen of them. Sometimes data encoded as imagery cannot be fully grasped by the human visual system in its full detail. We called this “resolution”, which is not to be confused with the “spatial” or “angular” resolution” that photographers and astronomers talk about when discussing the apparent sizes of objects. By processing image data, you can preferentially enhance certain gray levels at the expense of others, to simulate some of the darkroom techniques, like dodging, that film photographers use. You can process a negative to give you detail in the overexposed nucleus of a galaxy, or process it to show fine structure in the faint outer spiral arms. The data may all be there on the negative, but the human eye can’t see enough gray levels to see it all at once. You must over- or under- expose the print to see one at the expense of the other.
All digital images are black and white. So how do we represent color? Fortunately, the human eye can see any color as a combination of red, green and blue pixels. So a color sensor will image a scene through R, G, and B filters, generating three black and white images. Each is then assigned to the R, G or B color guns of a display device, and a reasonable facsimile of the original scene can be reproduced. These three colors are called “bands”, from bands of the spectrum, and although display devices or printers can only display three bands, sometimes imagery can have more. For example, the old Thematic Mapper LandSat had seven bands, R,G,B and 4 in the infrared region of the spectrum, colors invisible to the human eye, but useful for remote sensors because terrestrial geology and vegetation is much more colorful in the infrared region than it is in the visual range. The analyst can select which band he will assign to each color gun that best enhances and displays whatever it is he is looking for.
This explains why vegetation in satellite imagery is often many subtle shades of red (instead of monotonous greens) and why astronomical imagery of the same nebula often looks so different. The astronomers often have many different sensor bands, even some in the IR, UV, radio etc which they can assign to any color gun, depending on their research interests and aesthetic sensibilities.
So an image data set is usually described by its word size (bit, byte, integer, long integer, real, or double precision), dimension (Y lines by X samples) and number of bands. You also have to specify how those bands are arranged in the image file: band by band, line by line, or pixel by pixel, so that the software that reads it can unscramble it properly.
Next time we’ll talk about enhancement techniques.