My neural network models notes


Started 29Dec2025, updated 01Mar2026 (DNPU). Finished? This note is in groups Technology (grouped: TECH NOTES) and My XMOS pages, plus NOTES FROM THE VAULT.

Intro

What actually is under the hood of embedded AI (Artificial intelligence or KI «kunstig intelligens» in Norwegian) and ML (Machine learning) processor? What was (suggested) then, and what is now? And how might this «engine» be used? What is meant by «neural processor/engine» and «neural network»?

This note is absent about the AI or ML methodologies and algorithms, or perhaps rather, procedures. At my level I guess I would learn more about this if I ever take to using TensorFlow. I am still editing this note to discover some previous statement that might contradict my present understanding! My use of terminlogy, or lack of correct or consistent terminology, would also reflect this. Please mail me or comment below if you find anything unbearingly naïve.

I now have a board on my desk which uses about 500 mW, containing a processor that has at least one sub-unit meant to speed up embedded ML and AI. The idea with this note is to try to learn something about where this processor is in the broader picture. My starting point may be modelled as «square one». (It is the XMOS.AI evaluation kit. More later.)

More and more transistors

Fig.2 OC71, look-inside OC71, OC72 and BCY11 transistors

When I was new as a teenager (1963) I did build a small amplifier with transistors, using the plastic encapsulated and metallic encapsulated OC71 and OC72. (No Wikipedia page!) (Update: searching through my boxes, it was the BCY11 that looked so nice. I have also kept an OC71 that I removed the black lacquer from, testing its light sensitiveness. See photo.) They were both PNP audio transistors with grounded plus, made for sound up to 25 kHz. I had been very careful with the flat iron, to direct the heat into it instead of into the volatile transistor. Still my sound amplifier stubbornly did not play, even if the repair man Johansen at the repair shop on the first floor of Arne Hansen for free had said it did work (Hamar: here). He became one of my adult model persons, always taking the time for this strange boy. I liked him. I gave up on these volatile transistors. Probably a year later I built an oscilloscope with tubes only (three EF184, see Radiobyggboken). Retrospectively, with that scope I could then have tested the fallen amplifier. From the shelves containing category «621» books at the local library I had learnt that I could connect the individual triodes of an ECC83 back to back as bistable flip-flop, so that they triggered each other, to store «a bit». I did solder up one, and I used a neon lamp to see the bit. I don’t think I made a transistor version.

Later, at work we started using the Intel 8048 processor with thousands of transistors. Then Inmos Transputers and later Texas Instruments TMS320 DSP that the collected and calculated on combustion pressure values of huge engines on board ships, for the Autronica NK-100 and NK-200 products. Even more transistors. And after I retired I have almost solely worked with XMOS XCore processors. (More about this at NOTES FROM THE VAULT.)

The unavoidable exception from XMOS after I retired is a small Arm Arduino board Feather M0 from Adafruit, which has a nice development system and plug-and-play libraries, but no task model, neither in software nor hardware. Exception: interrupts and on-chip units like counter/timers. (But then, using an Arm for ML is possible, see the book [7], which shows how to do this on an Arm Arduino.)

The other day I decided that it was time to jump to the next XCORE generation. I purchased an 151:[XCORE.AI evaluation kit]. I could possibly use it for some mark 3 of Beep-BRRR, even without ML (Machine learning). Why should any such need to learn about some sound with all those different background noises, when I can just take the spectrums and analyse them with my proprietary solution? More importantly though, what could I use the DSP and ML hardware for? I know what I can use the hardware tasks (HART) for. But the VPU block?

They say «neural processor», but how «neural» is it?

This note started as a paragraph in 226:[1986 LSI course at NTH], which by coincidence led me to Synaptics. I also read an article in Elektronikk 2025 12 12 Googles KI beveger seg ut i kanten, where Synaptics is mentioned, also about their cooperation with Google Research. But then I discovered that I could not really tell the difference(?) between a neural processor for AI and the multiplier based architecture, also for AI (KI), but perhaps more for machine leaning (ML).

My text Duck.ai (gpt-4o mini)..
.. after I had told it that this was to be an informal text in a blog note, and that I did not want a scientific rewrite. It certainly changed its mind:
Popularly, if we start with the brain, it consists of neurons. At the end of their arms, the dendrites, which each neuron has some number of, there is a collecting point called the synapse. For a rather slow chemical synapse, when the voltage collected by the synapse reaches 55 mV it kicks the connected dendrite’s synapse. It receives this pulse and sends it on to a neighbouring synapse. The much faster electrical synapse is more direct and does not wait for any raised level. When the pulses are received they may be more excited about  ringing a bell or feel inhibited about such a thought. This is all biology, chemistry and electricity. Popularly, if we start with the brain, it consists of neurons. At the end of these neurons’ arms, we find dendrites, which each neuron has in various numbers. Here’s where it gets interesting: at the synapse—a kind of collecting point—signals are exchanged. In a slower chemical synapse, when the voltage collected reaches about -55 mV, it essentially gives the adjacent dendrite’s synapse a nudge. This dendrite then receives the pulse and sends it onward to the next synapse.

On the flip side, electrical synapses are the speedsters of the bunch. They don’t wait around for that voltage to climb; they transfer signals directly, making communication much quicker. When these signals arrive, they can spark excitement—think of it like the thrill of ringing a bell—or they can hold back feelings, leading to an inhibited response. It’s a fascinating blend of biology, chemistry, and electricity!

My comment: I will not filter any other paragraph through AI for rewrite, neither here nor in any other note, without explicitly stating its use

Scientists have started to make neurons grow onto open ends of integrated circuits, so that it may control f.ex. muscles, or read when a nerve tells a missing foot to engage a prothesis. This is not what this note is about, because even when there are nerve cells and transistors in the same chip, it’s either biology or transistors. Not anything in between.

Modelling

So, not even the neural processors would have biological components. They will only try to mirror, or model the brain. Or maybe even model something else that suits the cause.

From Wikipedia, Model I quote:

«Models can be divided into physical models (e.g. a ship model or a fashion model) and abstract models (e.g. a set of mathematical equations describing the workings of the atmosphere for the purpose of weather forecasting). Abstract or conceptual models are central to philosophy of science.»

But then, is it true that all models of the neuron have been abstract or mathematical? What about a transistor that is built as close to the working of a synapse as possible? Or millions of software synapses that resemble or model how real neurons are connected in the brain? Does this mess things up, or do we simply have neural processors that might reflect both physical and abstract models? Remember that a model never is the real thing. But the real thing (me) probably models every (physical?) object, e.g. there is no locomotive, flower or planet in my head. But what about love?

Fig.1 But it is a stupid 120° left tilted ‘P’

Now, this sprang to my mind: René Magritte in 1929 painted a pipe with the text «Ceci n’est pas une pipe» («This is not a pipe«) (Wikipedia The Treachery of Images). Even then, Michel Foucault later said that this still was a statement about pipes – I think, adding up on Magritte’s intentions (same Wikipedia page).

What about (1) «neither these transistors, hardware arithmetic ‘neural’ unit, software or algorithm are neurons»? Or what about some 86 billion equally connected «nerve cells» (=neuron) (quotation marks indicating ‘modelled’), might they model me? I hope not. There are so much more, even than DNA sequences, transposons [1] (and Wiki-refs) and maybe even quantum physics [2] in me, to «set up» the neuronal architecture of the gray cells.

What about (2), I can’t but think that this note has been so difficult to write for me and I have a feeling that I have ended up with a mote (model) which only weakly models what I originally had intended it to model. Recursively worse, I don’t know what those original thoughts were. Only, that’s what I was going to try to find out with this note.

Artificial neuron

Starting in the 226:[] note, an interesting matter that I discovered only in 2025 was that Synaptics was founded by Mead and Faggin in 1986 (Wiki-refs). I think the most original patent is the 1986 called «Brain learning and recognition emulation circuitry and method of recognizing events» (by Faggin and Lynch) where they combined a standard MOS and an EEPROM type transistor to store a (learnt) analogue (trigger) value. Placing these in an n * m array made it possible to adapt or learn and then later trigger on the learnt values only [3]. No «processor» as we think of it today. The neural model is closer to an analogue computer, like the «Jenny» I worked with when I was a student (Notes from the vault – 0x04 – Δ tillegg til norsk ultralydhistorie).

By looking at quoted (in [3]) or similar patens, I found a patent from 1992, also interesting [4]. The whole idea is to build something more processor-like, built with a number (n or n*m) of MOS transistors which were adapted to some pattern to be electrically trainable, so that the «winner-take-all» when any n equal to the adapted only inputs are presented. (Don’t ask me why this is not called «winner-takes-all».) With a «plurality of electrically-adaptable elements«. The neural model is formed an «electrically adaptable neural network with post-processing circuitry«, which was the name of their patent, forming «a synaptic array fabricated on a semiconductor substrate comprising» this adaptable learning.

I guess it’s relevant here to mention that the history of the «artificial neuron» is described at Wikipedia (Wiki-refs). It’s a good read.

Some history

Something has happened between even earlier than 1986 and 2026.

I found these two YouTube presentations running through some of it. See them and ring a bell every time you hear the word «model«. For every time I watch them I recognise something I learned in the previous watching.

Neural Networks Explained: From 1943 Origins to Deep Learning Revolution 🚀 | AI History & Evolution by The AI Guy, see [5]

The moment we stopped understanding AI [AlexNet] by Welch Labs. se [6].

[6] particularly goes through the higher modelling level of modern AI, the AlexNet, with «layer after layer of compute blocks» (17.35). This paved the way for GPT.

[6] also lists the hardware that the different AI generations ran on. Zero analogue transistor «neurons»:

  • LeNet-5 (1998) – Pentium II CPU
  • AlexNet (2012) – Dual Nvidia GTX 580
  • GPT-3 (2020) – 10,000 Nvidia V100 GPUs
  • GPT-4 (2023) – 25,000 Nvidia A100s GPUs

Neural processing unit (NPU)

Synaptics are still around. Presently they are introducing a series of processors for AI, partly built over the same architectural concept of «neural network processors (NPU)» (their SR-series) or «multi-TOPS neural processing units (NPUs)»  (their Astra™ SL-series) – in addition to the ubiquitous Arm, of course (quad-core Arm Cortex-A73 CPU here). I can buy a dev kit (Astra Machina) for an ok price (here). But the power for it is, as I read it, some max 20 Watt since the board is needing 15 V @ 1.8 A USB-PD power adapter. But I haven’t found any photos with a heat sink or fan. I think it comes with a power port for a fan. It’s rather well documented in AstraTM Machina SL-Series Power Consumption for Typical Use Case Configurations which shows a worst case of some 3400 mW for the SL1680. I assume that this would require a hit sink and thermal paste and an additional fan if inside a closed cabinet.

I assume that in the NPU there are lots of arithmetic small processor units, and they are connected much like nerves in the brain. Or the «connections» are software only. Or a combination. Transistors cost little, so there would be as many of units on the chip, to parallelise as much as possible. But eventually I guess they would need to be «reused» in another phase of the calculations. Reading at Neural processing unit at Wikipedia I learn that «On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being performed.» Similar for Neural network (machine learning) I read that «In machine learning, a neural network or neural net (NN), also called artificial neural network (ANN), is a computational model inspired by the structure and functions of biological neural networks«. Settling it all, quoting Artificial neuron: «An artificial neuron is a mathematical function conceived as a model of a biological neuron in a neural network. The artificial neuron is the elementary unit of an artificial neural network.» Referring to the McCulloch / Pitts 1943 paper [8] it would say that «proposed a mathematical model of the nervous system as a network of simple logical elements, later known as artificial neurons, or McCulloch-Pitts neurons. These neurons receive inputs, perform a weighted sum, and fire an output signal based on a threshold function. By connecting these units in various configurations, McCulloch and Pitts demonstrated that their model could perform all logical functions.» See my comment on the talk page here.

So it looks like an NPU does not (any more?) rely on networks of nodes of MOS transistors functioning like neurons, as described in the patents above. After all, a voltage level is harder to store in analog silicon than in digital. Charge vs. number. In this respect, both the Synaptics and the below XMOS processor architectures may be compared on an equal basis.

But then, the NPU is indeed drawn as a separate unit in the Synaptics SL1680 data sheet (here).

Update 01Mar2026. I discovered the Ara family of dicrete NPUs (DNPU), like the NXP Ara20 (here). The data sheet tells that it supports frameworks like TensorFlow, PyTorch and ONNX – but that’s all that they tell about it. Except (1) that it supports Convolutional Neural Networks (CNN), Transformer Models, Large Language Models (LLM), Multimodal Language Models (MMLM), Vision Language Models (VLM), Vision Language Actions (VLA). I guess, trying to make the lists as complete as possible? But more than a single box in the diagram I couldn’t find. Nothing about the inside. Except (2), it is a 17 mm x 17 mm flip-chip ball grid array (?) (FCBGA) and draws 6.5W.

Kind of summary

In [9] Wold et al write (in Norwegian, left), right almost as translated by Google:

Moderne KI bygges ikke som tradisjonell programvare, linje for linje, på måter mennesker kan inspisere og forstå. I stedet «gror» man store nevrale nettverk ved å trene dem på enorme mengder data fra internett. Sammenhengen mellom treningsdata og faktisk oppførsel er ofte uforutsigbar. Resultatet er systemer som kan være både imponerende nyttige og uventet farlige. Modern AI is not built like traditional software, line by line, in ways that humans can inspect and understand. Instead, large neural networks are “grown” by training them on vast amounts of data from the internet. The relationship between training data and actual behaviour is often unpredictable. The result is systems that can be both impressively useful and unexpectedly dangerous.

The crux here is that this «growing» is not physical connections appearing and disappearing inside the chips. It could be hardware routing of messages between on-chip hardware units – but then we’d need a zillion of them. (The XCORE architecture has routing between hardware tasks (HART), but diminutive numbers in this respect). So this «growing» has to be software. And then I assume that this is about parameters used to pass data from one neuron the the other. A «connection» is present if the value to multiply with is 1.0, and it’s «disconnected» if 0.0. And if this were binary, those two would be the only values used. If any values in a range, it’s analogue. Or some non-linear transfer function, like level triggering or hysteresis. I think these neural models might be any of these. This software is running on more or less specialised hardware. Hence, for my desktop: «porting TensorFlow» of some sort, like «Lite», on my favourite processor architecture.

Vector processing unit (VPU)

I guess that this is where the XMOS (and others) enter the picture. In common is that these processor architectures both are aiming IoT-applications.

Neural (physically connected) network is somewhat different from traditional machine learning and the embedded processor I want to learn more about at the moment, the XMOS xcore.ai (here).

Maybe the major differences are how a unit to do the calculations is designed and how many multiple of them are connected, and in how many layers there are? Plus embedded units like crypto security extension and I/O units. In the final end this would differentiate the possible usage, like video streams with lip synch or video streams with licence plates to detect. Or sound only etc. Or plain AI like LLMs (Large language models).

For the xcore.ai, here is what can be offered from the board I already have on my desk. The board is the 151:[XCORE.AI evaluation kit]. Mostly cut and paste from XCORE-AI: Key features:

  • 16 hardware threads (HART) split between 2 multi-threaded processor ‘tiles’. (The term «logical cores» has now been replaced with HART, from the RISC-V world. See XMOS xcore RISC-V)
  • DSP applications (scalar unit)
    • One 32-bit floating-point scalar pipeline
    • 256-bit VPU adds block floating point capabilities up to 12.8 GMACS of performance at 800MHz
    • Integrated complex arithmetic and FFT/iFFT support
  • AI / ML Performance (vector unit)
    • 256-bit VPU supports 32-bit, 16-bit, 8-bit and single-bit for binary vector operations / neural operations
    • Peak AI performance of 51.2 GMACC/s at 8-bit with a sustained performance of 40.96 GMACC/s at 800MHz
    • Homogeneous computing environment for tight integration of AI operators
    • For neural-network development, XMOS has ported TensorFlow Lite for MCUs
  • They call it a «crossover processor» (XCORE AI: technical), which I guess helps me placing it in the broader picture
  • Power: Active: 203 mW at 600 MHz @ 24MHz clock (typical). Standby: 5 mA (typical). Max. 800 MHz @ 32 MHz clock, then higher max. But package absolute max is 1W.  (XU316-1024-TQ128)
  • For the xcore.ai evaluation kit board there is a core current sense-amp, so that it may be measured with a voltmeter. A 1V reading on J2 is equivalent to 500 mA core current (450 mW at 0.9V) (here)

more text to come..

Not to mention

Not for hobbyist work, I guess. But the Graphcore (split-out from XMOS in 2016), located in India, has a novel design called an Intelligence processing unit (IPU). Their C600 has a power consumption of 185 W running typical workloads (here). But if you want to have fun, study its extremely nice architecture. In many ways it looks to me like it’s a branch and scaled-up XCORE. It has 1472 independent MIMD IPU-tiles! (here). And each may run 6 threads, scheduled by a HW scheduler on a round-robin basis (here). And in the middle the IPU-Exchange, a communication network that will run any communication, non-blocking.

Not to mention: a big flora of other AI-enabled IoT processors. Most of them I haven’t even heard about!

References

Wiki-refs :: AlexNet. A Logical Calculus of the Ideas Immanent in Nervous ActivityArtificial neuron: History. Multiple instruction, multiple data (MIMD). Neural network (machine learning)Neural processing unitSynaptics. TensorFlow. The Emperor’s New Mind, Transposable element (transposon)

[1] Involvement of transposable elements in neurogenesis by RN Mustafin and EK Khusnutdinova. See https://pmc.ncbi.nlm.nih.gov/articles/PMC7893149/. The Abstract was readable, at least!

[2] The Emperor’s New Mind by Roger Penrose (1989). See Wikipedia The Emperor’s New Mind. Speculating about consciousness, but I like it!

[3] Brain learning and recognition emulation circuitry and method of recognizing events by Federico Faggin and Gary S. Lynch (1986). See https://patents.google.com/patent/US4802103A/

[4] Electrically adaptable neural network with post-processing circuitry by Allen, Anderson, Mead, Faggin, Platt and Wall (1992). See https://patents.google.com/patent/US5331215A/en

[5] Neural Networks Explained: From 1943 Origins to Deep Learning Revolution 🚀 | AI History & Evolution by The AI Guy. In 4.20 minutes YouTube: https://youtu.be/AA2ettRM6_Q?si=Yy0MtkpfT5vEVFWE

[6] The moment we stopped understanding AI [AlexNet] by Welch Labs. In 17:37 minutes on YouTube: https://youtu.be/UZDiGooFs54?si=NpMzyKojuhWBYgnx. At 13.38 he talks about the old neural nets and the perceptron and why the old model became usable in 2012, here

[7] TinyML (Machine Learning with TensorFlow Lite on Arduino and Ultra-Low-Power Microcontrollers) by Pete Warden & Daniel Situnayake. O’Reilly (2019, 2020 2nd release). See https://www.oreilly.com/library/view/tinyml/9781492052036/, Errata for this version. As of 29Dec2025 no newer version

[8] A Logical Calculus of the Ideas Immanent in Nervous Activity by McCulloch / Pitts (1943), see Wikipedia A Logical Calculus of the Ideas Immanent in Nervous Activity

[9] Superintelligent KI er en trussel («Artificial superintelligence (ASI), is a threat)» av Kristian Wold, Trym Lindell, Laurence Habib og Edmund Henden. I Klassekampen 12Jan2026 p18-19. Les her: https://klassekampen.no/artikkel/2026-01-12/superintelligent-ki-er-en-trussel/XDuf (med tillatelse av Klassekampen)

Leave a Reply

Dette nettstedet bruker Akismet for å redusere spam. Finn ut mer om hvordan kommentardataene dine behandles.