In computers, sound is pressure variations that are represented by a continuous waveform with peaks and valleys that vary from the average pressure level.

The degree of deviation is the amplitude (measured in units of pressure or decibels) and is perceived as the volume of the sound.

The number of peaks in a period of time is the audio frequency (measured in cycles per second aka hertz (Hz)) and is perceived as the pitch of the sound. The human ear can detect frequencies from 50 to 20,000 Hz. The human voice usually falls between 300 to 3,000 Hz. A sound is typically a complicated mix of super imposed amplitudes and frequencies

The number of waveform samples taken in a period of time is the sampling rate (measured in Hz) and is perceived as the quality of the recording. EGs: The sampling rate for the telephone company is 8 KHz, but CDs (CDDAs) and DVDs sample at 44.1 or 44.8 KHz, while professional recordings are 88.2 KHz, 96 KHz, and 192 KHz. DSD (Direct-Stream Digital) can supposedly sample at 2.8224 MHz.

The number of bits used to represent the waveform is the sample size (measured in bits (b)) and is also perceived as the quality of the recording, since a greater sample size can capture a greater dynamic range (perceptible differences and extremes in sound). Voice cards are usually 4 or 8 bits, whereas sound cards are usually 8 bits or 16 bits or more.

The sampling rate times the sample size is the transmission rate (measured in b/s or Kb/s). The transmission rate times the duration of the recording is the size of the sound file (measured in b). EG: (60 seconds of recording) x (8 bit sample size) x (8,000 Hz) = 3,840,000 b = 480,000 B = roughly 0.5 MB.

There are several audio codecs or amplitude conversions schemes to compress sound files including linear PCM (Pulse Code Modulation), non-linear PCM, and ADPCM (Adaptive Differential PCM). EG: WAV files typically use linear PCM. Non-linear PCM uses more bits to represent the easier to hear lower amplitudes. Micron-law PCM is a non-linear PCM common in the US and Japan, whereas A-law PCM is common elsewhere. A common ADPCM scheme is OKI ADPCM.

Sound Files

Sound is encoded into digital format via a codec program. Encoding also frequently also compressed as either lossy or lossless. A codec produces the essence (digital sound itself) plus some metadata (synching, titles, length of file, etc.). The codec file is encapsulated/stored/wrapped in an audio file format. Some file formats store only one kind of codec while others take more. Some "super" file formats (ASF, Ogg, and Quicktime) store a wide variety of video or audio codecs.

Some audio codecs:

  • AAC. Advanced Audio Coding. Lossy. Supposed to be an improvement on MP3.
  • Apple Lossless. Aka: ALE (Apple Lossless Encoder); ALAC (Apple Lossless Audio Codec). Stored in an MP4 container with an extension of .m4a. Compresses by 50%.
  • Ogg codecs by All open source.
    • Speex. Lossy. 8-32 Kb/s. Voice quality.
    • Vorbis. Lossy. 16-256 Kb/s. Compresses by 30-70%. Started in 1998 when it was feared that there was going to be a charge for the MP3 file format.
    • FLAC. Free Lossless Audio Codec. Compresses by 30-70% and yet it's lossless!
  • PCM. Pulse Code Modulation. First used by the telephone companies.
    • Linear PCM. Straight uncompressed analog to digital.
    • Non-Linear PCM.
      • DPCM (Differential (or Delta) Pulse Code Modulation) encodes the PCM values as differences between the current and the previous value. DPCM is about 25% smaller than regular PCM.
        • ADPCM (Adaptive DPCM). STandard G.726. ADPCM is a variant of DPCM that varies the size of the quantization step, to allow further reduction of the required bandwidth for a given signal-to-noise ratio. Compresses roughly 2:1.
    • Logarithmic PCM. Standard G.711. µ-law PCM is a non-linear PCM common in the US and Japan, whereas a-law PCM is common elsewhere. Compresses roughly 12:8.
  • TTA. True Audio. Lossless. Free.

Some audio file formats:

  • Recorded Sound
    • Lossless. Compresses roughly by 50% without any data loss.
      • AIFF. Audio Interchange File Format. A sound file format created by Apple but is cross platform.
      • AU. .au. AUdio File Format. A sound file format for Macintosh.
      • WAV. .wav. Waveform Audio Format. A sound file format created by Microsoft and IBM, and is frequently used on Windows, esp. for OS events such "beep". WAV can hold different codec but frequently stores linear PCM, thus WAV tends to be high-quality but large. WAV files have metadata. MIME Type: audio/wav.
    • Lossy. Compresses by greater than 50% but with "unimportant" data loss.
      • MP3. .mp3. Moving Pictures Experts Group audio layer 3. A subset of MPEG that can compress by 90%. MP3 can make a 20 MB into 1.4 MB, i.e. the equivalent of taking two minutes of CD music and putting it onto a floppy disk. There are three coding schemes in MPEG to compress audio (layer 1, layer 2, and layer 3). Layer 3 supposedly compresses by removing redundant and irrelevant parts of the signal. MP3 compresses CD quality (1411.2 kb/s) to FM radio rates (112-128 kb/s, roughly a factor of 12). This is small enough for the Internet. MP3s can be played back using a player or "ripper" like Microsoft Media Player or RealNetwork's RealPlayer.
      • Ogg Vorbis.
      • VOX. .vox. VOice data. A file that contains raw sound data and no metadata.
    • Either.
      • ASF. .asf. Advanced Systems Format or Advanced Streaming Format. Stores a wide variety of audio or video codecs.
        • WMA. .wma. Windows Media Audio. Lossy or lossless. A WMA file is almost always additionally enclosed in an ASF, and thus it can have a .asf extension, but only WMA files lacking metadata can have a .wma extension.
      • Matroska.
      • Ogg. Ogg file format by Frequently holds Ogg codecs, hence "Ogg Vorbis".
      • QuickTime.
  • Generated Sound
    • MIDI. .mid or .midi. Musical Instrument Digital Interface. Tends to have a "tin-ish electronic" sound quality. MIME Type: audio/midi.

Web Sounds

Here are tips on how to put some sound in web pages. [I've hidden this info here instead of in my section on webs because so far most people make hideous sites when they add sound.]

Microsoft Internet Explorer accepts two non-W3C tags for automatically playing sounds in the background. Both must be enclosed by the HTML <head> tag:

<bgsound src="sound/SomeSound.mid"
         loop="2" />
<!-- src can only be wav, au, or mid. If LOOP is -1 or INFINITE, then it loops forever. / -->
<embed src="sound/SomeSound.mid"
<!-- Here are some notes about embed:
Only the attributes HEIGHT, SRC and WIDTH are always available.
HIDDEN="true"|"false" The default is "false".
LOOP="true"|"false"|"n" Supported only by NS.
PLAYCOUNT="n" Supported only by IE. 
AUTOSTART="true"|"false" The IE default is "false" while the NS default is "true".
The default on the Mac is "false" for both browsers.

Netscape accepts just the <embed> method. If you want to be both IE and NS compatible, use both <bgsound> and <embed> but enclose the <embed> tag in an <object> tag so IE ignores it and doesn't play the sound file twice.

<bgsound src="sound/SomeSound.mid" />
<embed src="sound/SomeSound.mid"
       loop="false" />

It is of course possible to have sounds play when certain events occur using script. EG:

<embed src="sound/SomeSound.mid"
<input type="button"
       value="Play sound file"

Sound Cards

A sound card is a physical card that is connected to your computer's motherboard. Make sure your sound card is compatible with the right standards.

  • Sound Blaster. This has been The Standard for years.
  • 3DAudio. Aka 3DA. 3DA is actually part of Microsoft's DirectX API suite for graphics and controls as well as sound. 3DA is extremely popular.
  • A3D by Aureal.

Voice cards are usually 4 or 8 bits, whereas sound cards are usually 8 or 16 bits or more.

At the least a sound card should have these ports.

  • An output port for your speakers/headphones.
  • An input port for a microphone.
  • An input port for some auxiliary audio device.

See also a guide for gaming hardware.

Audio Jacks

The most common audio jack for computers is a 3.5 mm = 1/8" TRS connector [W]. If you look at the audio jack (aka headphone jack, stereo plug), you can see the three conducting parts: The tip, ring, and sleeve, hence "TRS".

Here the most common audio jacks, listed by size:

  • 6.3 mm = 1/4". Mostly for big headphones and speakers for big stereos.
  • 3.5 mm = 1/8". Mostly for mid-sized headphones and speakers for computers.
  • 2.5 mm = 3/32". Mostly for little headphones for phones.

TS and TRRS connectors are less common. EG: iPhones use TRRS, but iPods use TRS.

Note that jacks of the same design are used for different things such as line in, line out, microphones, power, etc.


Links that lead to off-site pages about audio. See also Music#Links.

GeorgeHernandez.comSome rights reserved