My audio automatic gain control sw driver (AGC) notes

Contents

1 Intro
2 Signal flow
- 2.1 Two PCM-TDM systems
- 2.2 Scaling
3 Rationale for AGC
4 Specification
5 Implementation
6 Signed formats
- 6.1 s number format
7 The tin box
8 The code
9 AGC on spectrums for the scope
10 References

Blog note new 10Feb2024. Updated 18May2024. This note is in group Technology, sub-group My Beep-BRRR pages.

Intro

In the log-like and rather extensive blog note My Beep-BRRR notes I have tried to explain what Beep-BRRR (or Beep-BRRR-proto) is about. In short, at My Beep-BRRR notes (some log & movies) there are movies showing that the unit now is able to «hear» a recorded sound (a «beep») and trigger an output. This is connected via an audio cable to a clock and alarm unit, consisting of that clock with sound and light plus a separate bed shaker (which «BRRR»s). Nice for people with hearing deficit who might get better sleep from not being afraid of losing out on the door bell or the local fire alarm. Not or, rather and: like 8 sounds simultaneously. It’s also a nice pastime for retired me, with lots of difficult stuff to dig into. Then, when I moved to a different board without an integrated microphone I discussed microphones etc. at My MEMS microphones notes etc.. The new unit I have called Beep-BRRR-2, or even Beep-BRR2. But the project is not only a port onto that board. There also is a need to «hear» better through a closed door. Therefore I got stuck for some time with this automatic gain control (AGC) stuff. Now is perhaps the time to spell this out in a note. I have been coding so much now, I need to assemble the results to clear my head. At work we called it documentation, which I guess it is. ~~So far I haven’t published the code, but I might.~~

Signal flow

Fig.1 – The Beep-BRRR-2 signal flow (PDF)

Two PCM-TDM systems

I needed PCM/TDM for (1) the mic and (2) the headset.

The Beep-BRRR-proto version of this is at 219:[Signal flow]. The main difference is that I now am picking out the TDM bits from the microphone and outputting PCM frames to a headset amplifier with my own code. TDM and PCM are kind of the same thing, as [1] somewhat rewritten says: «One 32-bit PCM code from each channel (64-bits total) is called a TDM frame«. In my case the mic delivers 24 bits and the headset DAC (with PLL) accepts 32, 24, 20 or 16 bits. See 243:[Forums] for the process I went through in several forums.

Since the internal user-controlled PLLs in the XMOS X1-X3 processors are not really suitable for being sources for port timing, the XMOS board that Beep-BRRR-proto runs on uses an external to the processor, on board, PLL chip. (XMOS / XCore blogs are My XMOS pages). Now I wanted to avoid this (even if I bought an Si5351A based PLL breakout board from Adafruit to keep just in case (here)).

The programmer in me wanted to find out if clean xC code, running alone on one (or rather two) of the 16 logical cores could be time-wise precise enough to handle 16 kHz frames. (The resulting AGC-only xC source file could with little change become a C file by just renaming the file). I needed to generate the double frequency accurate enough for two edges per 1/1024 kHz. At max one edge per 488.28.. ns. The internal divisor gives me 10 ns resolution of xC timers and I didn’t want to be almost at 16 kHz, but exactly. I let the pulsing be table driven which repeated for every 32 bits at exactly 31.250 µs. The final 0 there shows that I’m on 10 ns sharp. This worked very well. I used xC timerafter only, and not the advanced features of XCore ports, which together with en external PLL would however have taken my much further up in frequency.

I watched signals on the scope, but more trust was put in the XMOS xta timing analysis tool, part of the xTIMEcomposer 14.4.1 toolset (which I still use). I set up a #pragma for the desired 488 ns, and it gave me much margin.

Scaling

I now had two xC tasks on two logical cores, and then a third task collects and scales the samples. This also does buffering and the AGC. I will come back to this task, and especially the AGC algorithm in itself. The rest of Fig.1 is not described here.

Rationale for AGC

In Beep-BRRR-proto I struggled with getting the unit to hear and detect sounds through a closed door. Or rather, starting by the door bell, through a 6 m long corridor with sound traps into three connecting «open» rooms, the closed door and across the room unto the Beep-BRRR-proto unit. This was in a bedroom with a large bed with a duvet, so also that room has some attenuation of the sound level. However, Beep-BRRR-proto did hear through the door itself when I forced a recorded sound through a small bluetooth speaker just outside the door. In 219:[22] («THE AUDIBILITY OF SMOKE ALARMS IN RESIDENTIAL HOMES») I read that:

«If a door between two rooms is closed, the sound level is attenuated (reduced) by 10 dBA. If a home does not have a forced air heating/cooling system, the sound level is attenuated by an additional 6 dBA (P27). Closing a lightweight door attenuates a smoke alarm signal from one room to another room between 10 to 20 dBA.» (P40)

This should basically not be a problem provided the sound level onto the door is adequate. Also, in addition I assume that from the door bell to the unit we would have more attenuation. The same reference also shows a lot of spectral diagrams. I have wondered whether resonance of the veneer in the door would garble the spectrum so much that at low signal levels, comparing with the recorded sounds would be more problematic.

In other words, maybe turning the volume up when it’s silent is a good idea? I have two parameters here:

When AGC: the wanted max level of the audio samples. Samples will be scaled up or attenuated down to this max level
Switching AGC on and off

Plus I needed to find out when to switch this on and off (more or less) automatically.

Specification

It shall be possible to switch AGC on and off at run-time
Data set in is analogue samples from mic are max int32_t (also called q1_31)
If stereo, same AGC on both channels
It shall be possible to set the max level of abs(sample) at run-time. Only «up to but not equal to» a positive limit and only «down to but not equal to» the negative of the same limit
This limit is absolute, no sample is allowed to pass these limits
Changing the gain is per bit, ie. about +6dB (doubling) or -6dB (halving) per step, ie. only by signed shift left or up as << or signed shift right or down as >>. Or unchanged, of course. No gain by multiplying by arbitrary value and dividing by arbitrary value (ie. no running fraction)
There is no requirement as to attack and decay timing
AGC should be changed as seldom as a certain algorithm manages (hysteresis, window)
The signal processing (like FFT) is the receiver of the AGC’ed samples. The samples for a certain FFT (window) shall not have gain changed during that sequence. AGC parameters is only allowed to change between DSP windows
The AGC is not allowed to take the gain up so much that the DSP overflows (*)

(*) At the moment I don’t know what the max level might be, only that Beep-BRRR-proto is based on an XMOS application note where the full range signed 32 bits q1_31 seems to be allowed, with no downscaling. But I do notice that the spectrums smear out for a single sine when the internal gain is increased above some level. Smearing out means overtones caused by distortion. It looks like that level is around q6_26 (signed 27 bits). I call this +18 dB, with 0 dB at signed 24 bits q9_23, +6 dB at q8_24, +12 dB at q7_25 and then 18 dB at q6_26, ie. 6 dB per doubling, which is about right. I will show below how the sign is represented with all the higher bits above the data bits.

Implementation

Since I do time windows of 32 ms at 16 kHz sharp I would have 512 samples per spectrum. See Fig.1. This gives me a spectral resolution og 15.625 Hz/bin, up to 4000 Hz. I do two real single component 512-samples per FFT, which then does a complex two-component FFT spectrum and then two 32 ms single-component real spectrums coming out of this. (Covered in note 219, search for dsp_fft_split_spectrum.)

This means that I need to find min and max values over 512 samples before I send that array for processing. In other words, the AGC causes a delay of 32 ms caused by the buffering needed.

It is possible to find the mean and max values of those 512 analogue data by doing a standard min and max calculation. However, since the spec says scaling by the bit (p6), then doing NumLeadingZeros (abs (512_values)) will do. See 245:[XCore Exchange, point 5] about this.

Signed formats

s number format

I here introduce something I will call the s number format. I’d be very surprised if this were not described a hundred years ago. I take its form from the Q (number format), which in its different forms seems to have different versions related to how the sign is described. Even whether the sign is described in the Q format at all.

The s number format does not say anything about whether the number contains a fixed point. Let’s forget it. It simply goes like this, for f.ex. 32 bits signed 2’s complement data formats. Negative values have 1 to (N-1) sign bits, all with value ‘1’. Positive numbers have the same number of sign bits, but they are all zeros.

Arguing against this is that it doesn’t introduce anything, or it’s even wrong, because no matter how you see it then BIT31 is all we need to tell the sign. That said, BIT31 alone tells zilch about the max value of the number format.

Therefore I (think I) need this so that I can understand all the one’s that appear in a negative number that’s been divided. It’s also nice to know this, because it’s related to the number of leading zeroes of the absolute value of a signed number. More later.

s1_31 is the same as int32_t, one bit is sign and 31 bits are data
Divide the above by 16 or arithmetic shift right by four and we end up with
s5_27 which would have 5 sign bits (sign field) and 27 data bits
s31_1 would contain [-1,0,1] – not very useful I assume

Fig.2 – Sign bit(s) of negative values

I wrote this paragraph before I came up with the s number format, so this is how I came to it. Fig.2 shows something I have always known. However, when drawing the figure I realised that this must have been hibernating knowledge. I even had to look up arithmetic shift on Wikipedia to be sure. Yes, shift operators in C and related languages do arithmetic shifts. Shift down or right of a signed value with >> does not destroy the value, positive or negative (*). It always does the same as div by 2. This is a no-brainer for shift up or left with << (or mult by two) (what else could it do?), but my brain needed a brush up for the >> (where the leftmost bit (0 for positive and 1 for signed) is always filled in with the same. No matter if the implementation is with a barrel shifter for one cycle or looping for N cycles. I think the MCS-51 processor of the eighties looped.. Remember this when coding: value = value << numbitstoshift and value = value >> numbitstoshift, ie. the numbitstoshift is always on the right side of the expression. The compiler probably wouldn’t complain on erroneous coding if none are const.

However, beware of the below, as also stated in the figure. I have also added this in a talk page on Wikipedia, see Two’s complement «double meaning» of MSBit shifted in.

The Q (number format) like q4.24 (1 sign bit, 3 integer bits and 24 fraction bits) contains integer info even after «shifting the sign in and down». The magic of two’s complement is that those very «sign bits» except for the leftmost will indeed also have a say for the integer value, for negative values meaning zero * (-2exp(n)-1). This goes for all Q formats, from q1_31 (int or signed) and «down» in formats. If «sign» bits =1 they represent that something is not present, that’s why they can have this «double meaning». Example:

0x80000000 (q4.28 -128.000000 signed -2147483648) (INT_MIN), divided by 256 or shifted right 8 bits is
0xFF800000 (q4.28 -0.500000 signed -8388608) with 9 «sign bits» as well as 1 sign bit + 8 value bits.
0xBFFFFFFF (q4.28 -64.000000 signed -1073741825) has first integer bit zero, meaning present (-2exp(30))-1=-1073741824-1
0xFFFFFFFF (signed -1) has 31 «sign bits» and 1 integer bit

(*) No matter how many bit positions the sign field takes (one for signed or int32_t or s1_31, nine for s9_23).

The tin box

Fig.3 – The microphone (top left) and boards, and chocolate box

I took the Beep-BRRR-proto from a very technical transparent plastic box to a tin box; the chocolate gone after 24 hours. Now it can anonymously reside on the bedside drawer and almost not be noticed. The microphone you will notice as the small transparent box on the top left of the right photo. The sound input hole in the mic is aligned with a larger hole in the tin. The ferroelectric RAM (FRAM) board may be spotted, the 16 logical core XMOS X2 processor and then the headset DAC.

The good thing is that the new version, with AGC, now detects sounds from a much lower volume than the prototype did. I have not retrofitted the prototype with the new AGC functionality, simply because I haven’t. It’s nice to have the proto as a reference – as it is. The tin box Beep-BRRR-2 now has started its functional life, and the proto is back on the lab bench, together with a second Beep-BRRR-2, which lives folded out, not in any tin box.

The code

The purpose of this is to avoid new gain values being set if not absolutely necessary, according to some criterium. It’s these criteria I try to discuss here.

When I had come to some point in developing the algorithm I could hear in the headset that on almost every 32 ms frame there was a new gain, whenever I tested with music. I thought that this was not strictly necessary. So I thought hysteresis was the solution and dived into coding it. Then I discovered that this wasn’t what I wanted. I will never allow overflow, sticking to some earlier value of the scaling parameter until the new value had proven itself to some level of a stability was what I was after.

Fig.4 – Overview of scaling to s9_23

(Fig.4 as PDF, press figure.) If the previous scaling was within the same window as the present value, then there was no reason to change. I hope both these code figures explain themselves, to avoid the thousand words.

Fig.5 – The window code (in xC)

(The code as PDF, press fig.5.) The above code is called by the xC task that receives the samples from the mono microphone every 16 kHz at 62.5 µs. The new sample is put into the input buffer, while at the same time the next sample to use or send off is delayed with 512 samples, ie. indexed by the same index as the input buffer, from the output buffer. All values thus sent off from the output buffer are shifted up or down with the stn (shifts to normal) bit position. This is calculated in the above code whenever one buffer is full, for the same buffer when it has the role of an output buffer. Thus the samples sent off, per buffer, will never be scaled above the max limit.

This seems to work rather well. Even if still, in the headset, it sounds rather weird. But the DSP will later compare 32 spectra per 32 ms (1024 ms), and thus the only important thing is that the scaling does not change other places than when going from one to the next buffer.

AGC on spectrums for the scope

I have now also implemented AGC for the spectrum output to the oscilloscope. Every 100 µs I output a spectral component value to one of the four channel on a DAC4 CLICK board from MikroElektronika, containing an MCP4728. 100 µs times 256 values takes 25.6 ms, in time for the next spectrum to arrive for the next 32 ms frame. I also have a digital output to trigger the scope with. This goes to 4000 Hz with 4000 / 256 = 15.625 Hz / bin.

With AGC on that output, a new non-overflow gain for every frame, is nicer than I thought. Usually I like scope gain to stay put, but not this time. It’s nice to see the noise from the mic, the noise from the fan of the scope and up to full volume. I use pos[0] for a 100% DAC output and pos[1] for a scaling indicator. 50% down means the scaling is a divide by two, 25% down a scaling by div 4 etc.

The AGC code in this case has no window like the one described above. I have some preprocessor defines to switch between AGC and some (but which?) defined gain.

References

Wiki-refs: Arithmetic shift Automatic gain control, Barrel shifter, Companding, Phase-locked loop PLL, ..

Communication Systems, PCM-TDM System by Mandeep Kaur, (2016), see http://ecedunia.blogspot.com/2016/03/pcm-tdm-system.html

Øyvind Teig