Building a sound synthesizer from scratch in Go: Part 1

This is the first blog post in the series where I am describing the progress of building a sound synthesizer from scratch in Go.

I was always fascinated by how computers can generate and process sound, going from something purely digital to something physical. The motivation behind this project is to get more hands-on experience in Golang and at the same time, learn something about DSP (digital signal processing).

Source code corresponding to this blog post is on my GitHub repository.

Goals

My goal for this is to create a Golang application with the following features:

  • Oscillators for basic waveforms: sine, triangle, square
  • ADSR envelope (attack, decay, sustain, release) processing
  • Mixer for multiple signals and effects
  • Being able to play full chords
  • Audio effects e.g.: vibrato, tremolo, echo, distortion
  • TUI - text-based user interface
    • Display piano keys and map them to keyboard keys
    • Visualize controls for all parts of the synthesizer
    • Audio visualizations
  • Saving (and loading) presets to a file
  • Benchmark, identify, and optimize “hot spots”
  • Experiment with the macOS Core Audio driver

Configuration

Go

In the root go.mod file I am defining the minimal required version of Go and a single dependency of ebitengine/oto - more about Oto later.

1
2
3
module synthwave
go 1.25
require github.com/ebitengine/oto/v3 v3.4.0

Makefile

A simple Make file that I borrowed from one of my other projects; it defines a couple of useful targets to: manage Go modules, run tests and benchmarks, and run the app itself. By default, it displays the help message with the docs.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
.DEFAULT_GOAL := help

mod: ## Run go mod tidy
	go mod tidy

update: ## Update go mod dependencies
	go get -u
	make mod

run: ## Run the application
	go run .

test: ## Run unit tests
	# -v (verbose)
	# ./... (look for tests in all directories)
	go test -v ./...

bench: ## Run benchmarks (only)
	# -v (verbose)
	# -bench . (run all found benchmarks)
	# -benchmem (show memory allocation stats)
	# -run ^$$ (run no unit tests - only benchmarks)
	# ./... (look for benchmarks in all directories)
	go test -v -bench . -benchmem -run ^$$ ./...

coverage: ## Generate and open test coverage report
	go test -v ./... \
		-coverpkg=./... \
		-covermode=atomic \
		-coverprofile=coverage.out \
		|| true
	go tool cover \
		-html=coverage.out \
		-o coverage.html
	rm coverage.out
	open coverage.html

help: ## Show this help message
	@HELP_WIDTH=10; \
	LINES=$$(grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST)); \
	echo "$$LINES" | awk -v width=$$HELP_WIDTH 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-*s\033[0m %s\n", width, $$1, $$2}'

ebitengine/oto - abstraction layer

Oto is a low-level library to play sound in Go, developed by the same people behind the Ebitengine game engine. It can be used as a part of Ebitengine or on its own - as I do here.

To play a sound, you need to interact with the hardware, and this is done through an operating system’s audio driver. As you can imagine, there are multiple operating systems and platforms (macOS, Windows, Linux, Android, WebAssembly, etc.) - each of them has its own audio driver.

macOS itself has three different audio drivers: Core Audio, Audio Unit, and Audio Toolbox.

Interacting with the audio driver is a complex task and usually requires writing some low-level code using platform-specific language like: C++, or Swift (for iOS or macOS). Moreover, writing a multiplatform application requires implementing the code for each platform you want to support separately.

This is when oto comes into play - it abstracts away the platform-specific details so that you can focus on application logic instead. There is a single way of interacting with Oto, while the library takes care of audio driver specifics.

---
title: "Oto abstraction layer: operating systems and audio drivers"
---
graph TD
    client["Client Application<br>(Go)"]
    oto["ebitengine/oto<br>(Go lib)"]
    subgraph os ["OS and Audio Drivers"]
        direction TB
        macOS["macOS"]
        linux["Linux"]
        windows["Windows"]
        android["Android"]
        other["..."]
    end
    client -- uses --> oto
    oto -- abstracts --> os

Generating sine wave

Mathematical representation

Now, that is where the real implementation (and complexity) starts.
First, let’s take a look at the mathematical representation of a sine wave:

$$ y(t) = A \sin(\omega t + \phi) $$ $$ \omega = 2 \pi f $$ $$ \begin{aligned} & A - \text{amplitude} \\ & \omega - \text{angular frequency} \\ & f - \text{frequency in Hz} \\ & \phi - \text{phase} \end{aligned} $$

In practice this means that:

  • $A$ - can be used to adjust the volume, so the value can be between 0.0 (mute) to 1.0 (full volume)
  • $\omega$ - is based on the frequency, and so can be used to change the frequency of the wave
  • $\phi$ - phase describes how far the wave is shifted from the origin

In trigonometry, calculations are performed on radians, which are the standard unit of a plane angle in the SI system (International System of Units). Sine function is periodic, and it repeats itself after $360\degree$ so $2 \pi$ radians. $$sin(0) = sin(2 \pi) = sin(4 \pi) = 0$$

Angular frequency $\omega$ is strictly related to frequency but is expressed in radians per second. Frequency of 1 Hz means that there is 1 period per second. $$\omega = 2 \pi f$$ $$f = 1 \text{Hz}$$ $$\omega = 2 \pi \cdot 1 = 2 \pi$$ Angular frequency of $2 \pi$ radians per second means that the wave repeats itself every second (1 Hz).

Phase $\phi$ will come very handy later to calculate the value of a single wave at a specific point in time (period) - between $0$ and $2 \pi$ radians.

The code

First, I declare a struct representing a sine wave oscillator for a single frequency:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
// SinOscillator generates sinusoidal waveforms for audio synthesis.
// It maintains phase information to produce continuous sine waves at a specified frequency.
type SinOscillator struct {
	// amplitude is the amplitude of the oscillator's waveform, between 0 and 1
	amplitude float64
	// frequency is the oscillator's frequency in Hz
	frequency uint16
	// angularFrequency is the oscillator's angular frequency in radians per second
	angularFrequency float64
	// phase is the current, internal phase angle in radians
	phase float64
	// phaseStep is the phase increment per one sample, calculated as angular frequency / sample rate
	phaseStep float64
}

To create a new SinOscillator, I need a constructor function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
func newSinOscillator(amplitude float64, frequency uint16) *SinOscillator {
	if amplitude < 0 || 1 <= amplitude {
		panic("amplitude must be between 0 and 1")
	}

	angFreq := angularFrequency(frequency)
	return &SinOscillator{
		amplitude:        amplitude,
		frequency:        frequency,
		angularFrequency: angFreq,
		phase:            0,
		phaseStep:        angFreq / float64(sampleRate),
	}
}

func angularFrequency(f uint16) float64 {
	return 2 * math.Pi * float64(f)
}

The constructor does the following:

  • lines 2–3: it validates if the amplitude is between 0 and 1
  • line 6: calculates angular frequency as $2 \pi f$
  • line 12: calculates phaseStep based on the audio driver sampleRate (44100 Hz)
  • lines 7–13: returns a pointer to a new SinOscillator instance

Sampling rate — can be thought of as the “resolution” of the audio signal.
In a computer world, audio cannot be represented as a continuous signal but instead as a discrete sequence of samples. Using a value of 44.1 kHz means that the audio driver samples the signal 44100 times per second. 44.1 kHz is roughly twice the max frequency that a human can hear (~20 kHz) and is pretty much the standard for audio processing.

Phase step — having this allows removing the time variable $t$ from the equation (literally):

  • angFreq := angularFrequency(frequency) — that many radians are there in one second to produce a signal with the given frequency
  • const sampleRate = 44100 — oscillator calculates the value of sine function 44100 times per second
  • phaseStep: angFreq / float64(sampleRate) - that many radians are there in one sample

Based on that, we can shift the phase $\phi$ by angFreq / sampleRate radians each time a single sample is calculated.

Getting the value from oscillator is pretty straightforward now:

1
2
3
4
5
6
7
func (s *SinOscillator) next() float64 {
	s.phase += s.phaseStep
	if s.phase >= 2*math.Pi {
		s.phase -= 2 * math.Pi
	}
	return s.amplitude * math.Sin(s.phase)
}
  1. Move the phase by phaseStep radians.
  2. If the phase is greater than $2 \pi$ radians, “move it back” $2 \pi$ radians.
    The sine function is periodical so $sin(2 \pi + 1) = sin(2 \pi + 1 - 2 \pi)$.
  3. Calculate the value of the sine function at the current phase, adjusted by the amplitude.

Probably, we should first calculate the value and only then move the phase - I will fix that another time :)

Playing sound through oto

At this point, I am able to generate a sine wave of any frequency, but I am still in the “digital domain.” The next step is to somehow push the samples into the audio driver to hear the sound.

Configuring Oto to play the sound using streaming (based on the official docs):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
const sampleRate = 44100 // 44.1 kHz
const bufferSizeSamples = 4096 // audio driver buffer size
const hardwareBufferSize = 50 * time.Millisecond // length of the operating system buffer
const channels = 1 // 1 - mono, 2 - stereo

func main() {
	oscillator := newSinOscillator(0.2, 440)

	ctxOptions := &oto.NewContextOptions{}
	ctxOptions.SampleRate = sampleRate
	ctxOptions.ChannelCount = channels
	ctxOptions.Format = oto.FormatSignedInt16LE
	ctxOptions.BufferSize = hardwareBufferSize

	otoCtx, readyChan, err := oto.NewContext(ctxOptions)
	if err != nil {
		panic("Creating oto context failed: " + err.Error())
	}
	// Wait for the hardware to be ready
	<-readyChan

	player := otoCtx.NewPlayer(oscillator)
	player.SetBufferSize(bufferSizeSamples)
	player.Play()

	for player.IsPlaying() {
		if err := otoCtx.Err(); err != nil {
			panic("oto error: " + err.Error())
		}
		time.Sleep(10 * time.Millisecond)
	}
}
  • lines 2–3: bufferSizeSamples and hardwareBufferSize were adjusted using a trial-and-error method to avoid audio artifacts.
  • lines 9–13: configuring basic audio options e.g., 44.1 kHz sampling rate.
  • lines 15–20: creating a new oto.Context instance, waiting on readyChan until it is closed — that means that the context is ready.
  • lines 22–24: creating a new oto.Player, initializing it with the oscillator instance, and starting playback.
  • lines 26–32: preventing the program from exiting, printing any errors that occur during playback.

Audio format - bit depth

You might have noticed that I used ctxOptions.Format = oto.FormatSignedInt16LE in the configuration. Oto is able to use three different audio formats:

  • FormatUnsignedInt8 - 8-bit unsigned integers (0 to 255), the lowest precision, “retro” sounding
  • FormatSignedInt16LE - 16-bit signed integers little-endian (-32768 to 32767), standard, high-quality format
  • FormatFloat32LE - 32-bit floating-point numbers little-endian (-1 to 1 floating-point), “studio” quality format

Little-endian means that the least significant byte is stored (sent) first.

Since the output of the SinOscillator is a float64, I guess it would be easier to just convert it to float32 and use FormatFloat32LE format - I will fix that in the future too.

Bit depth together with the sampling rate acts like an audio resolution:

  • sampling rate - how many samples per second, how dense is the audio signal
  • bit depth - how many bits per single sample, how precise is the sample

Using the following function converts the value of the sample from float64 (0 to 1) into int16(-32768 to 32767):

1
2
3
func (s *SinOscillator) nextSignedInt16() int16 {
	return int16(math.Round(s.next() * math.MaxInt16))
}

Oto Player

To play the actual sound, the audio buffer must be populated with some samples. I am doing that by telling otoCtx.NewPlayer(oscillator) to use my instance of SinOscillator as their source.

This creates an oscillator for 440 Hz tone and volume of 0.2. 440 Hz is the standard pitch for “A4” (the note A above middle C), which gives the frequency some musical context.

1
2
3
4
5
func main() {
	oscillator := newSinOscillator(0.2, 440)
	...
	player := otoCtx.NewPlayer(oscillator)
}

That is only possible because NewPlayer function expects an io.Reader (a standard Go interface) as a parameter.
See the docs: oto package - github.com/ebitengine/oto/v3 - Context.NewPlayer:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// The format of r is as follows:
//
//	[data]      = [sample 1] [sample 2] [sample 3] ...
//	[sample *]  = [channel 1] [channel 2] ...
//	[channel *] = [byte 1] [byte 2] ...
//
// Byte ordering is little endian.
func (c *Context) NewPlayer(r io.Reader) *Player {
	return &Player{
		player: c.context.mux.NewPlayer(r),
	}
}

Let’s also inspect io.Reader in the official Golang docs: https://pkg.go.dev/io#Reader — there is quite extensive documentation on how to implement it properly, here is a fragment:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
// ...
// Read reads up to len(p) bytes into p. It returns the number of bytes
// read (0 <= n <= len(p)) and any error encountered. Even if Read
// returns n < len(p), it may use all of p as scratch space during the call.
// ...
// If len(p) == 0, Read should always return n == 0. It may return a
// non-nil error if some error condition is known, such as EOF.
// ...
// Implementations must not retain p.
type Reader interface {
	Read(p []byte) (n int, err error)
}

Requirements
In practice, this means that the func (s *SinOscillator) Read(p []byte) (n int, err error) must:

  • use signed 16-bit integer format for representing samples
  • send the samples in little-endian
  • take p []byte buffer and populate it with samples
  • return n int with the number of bytes written to p

Implementation of io.Reader for s *SinOscillator:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
func (s *SinOscillator) Read(p []byte) (n int, err error) {
	pLength := len(p)
	pIdx := 0

	// while index of next element is smaller than the length e.g. (pIdx=4, pIdx+1=5, pLength=5)
	for pIdx+1 < pLength {
		sample := s.nextSignedInt16()
		p[pIdx] = byte(sample)
		p[pIdx+1] = byte(sample >> 8)
		pIdx += 2
	}
	return pIdx, err
}
  • lines 2–6: checking the length of the buffer and “traversing” it
  • line 7: getting the next sample from the oscillator
  • lines 8–9: “splitting” the 16-bit integer into two bytes, first storing the least significant byte
  • lines 10–12: moving the counter; it is the same as the number of written bytes, so it can be returned

If we were to print the size of the buffer len(p) it should (but not must) be equal to bufferSizeSamples we declared earlier, so 4096.

That is it! That code is enough to play 440 Hz tone through the speakers of your computer.

Run the code yourself

You can grab the source code corresponding to this post from my GitHub repository. Running it on macOS is as simple as cloning the repo, installing Go and Make, and then using make:

1
2
3
4
git clone https://github.com/kamil-duda/synthwave.git && cd synthwave
brew install go
brew install make
make run

Summary

This is the end of my first blog post in this new series about building a sound synthesizer in Go.

I described you my goals for this project. Together we learned about the mathematical basis of sound and sine waves: radians, angular frequency, phase.

I showed how to generate sine waves programmatically and then how to use the Ebitengine Oto library to abstract the audio driver and the OS to play the sound. That involved some math, knowledge of Go’s io.Reader interface, and how data is stored in binary format.

I am looking forward to your comments and feedback!