Contents
Started 07Jan2025, updated 10Mar2026 (microgpt. Vibe coding? to HW?. LLMs for Code Evaluation)
This note is in group Technology, sub-groups My Beep-BRRR pages and My XMOS pages.
Intro
I will let the screen clip layout (below) from my first encounter with the AI-powered Cursor text editor and my XC code more or less speak for itself. (Aside: the guys who know about this seem to be impressed by the fact that Supermaven joins Cursor. Maybe some time I’ll see the light, too). I zoomed in on some rather coincidental code and after some introductory rounds asked it to make lib_xcore code out of it, and it flunked. To read the whole chat with full resolution just press the figure and you will get full resolution. I could have shown the graphics as text only, but it would have removed the originality of it.
It was a comment in the Norwegian magazine Kode24 that made me aware of Cursor [5].
I started with some general chat remarks, then asked about the other end of a channel (it passed!) but then, at the arrow in fig.1 is where the problem showed up. For that case I have commented thoroughly below fig.1.
The take-away for me is the following. The AI language model Claude 3.5 Sonnet by Anthropic [1] (ANTHROP\C) does know surprisingly much about XC. (Good XMOS docs. Probably some XC code out there. But it might have read my blog notes as well.. for free..) But when it starts to get complicated, it flunks like a student of real-time systems that hasn’t listened to the teacher when she tried to describe what XC is and what the newer lib_xcore is up to. Busy polling!
A polite apology
But I may be stretching it too far and not being fair since Claude apologised. Maybe even rightly:
Because Cursor lives up to what they say in [1] (06Jan2025) in Languages it works with, which are Python (Excellent), JavaScript Excellent, TypeScript Excellent, Java Good, C++ (Good), Rust (Good), PHP (Moderate). It says nothing about C, and I would have to dream sweat dreams on a bright day to have expected to see XC (See the sub-groups above for stuff on XC). I am not blind.
In [2] I read that the top programming languages in 2024 were Python (1.0), Java (0.48), JavaScript (0.44), C++ (0.37) …. Rust (0.15), C (0.2). Nothing in either chapter about XC. But read my worries in the screen clip. XMOS is phasing it out.
But is it enough to apologise when the answer showed up as rather authoritative, even if it stated that «Current XC code… Would become something like this in C» – rather well structured Popper‘s falsification statements. And I wonder, when would it have detected this itself, and given a correct answer instead?
Besides, in xCore Exchange forum point 1 (below) xhuw comments that «My experience with Claude/chatgpt/copilot is that it will struggle with writing xcore specific code as there isn’t enough xc/lib_xcore on the web for it to have been trained on. I saw you mention Claude apologising for being wrong in your blog. In my experience the AI will apologise for being wrong even when it was right, if you tell it that it was wrong. The customer is always right I suppose 😉»
I guess this apology then has to do with the trait (which I read somewhere) that AI presumably tries to respond according to what it believes I as a reader like the best.
The issue
XC: my code
My original XC receiving code looks like this. Observe that there is no busy polling here, not even polling. The code simply waits for the ch_logger channel. chanend is hardware that the xcore handles directly. Observe that XC allows for busy polling at application level with a default select case. I have never used this.
void dsp_task_y (
chanend ch_logger,
// Other params
)
{
task_y_ctx_t ctx; // Context
// Init etc.
while (1) {
[[ordered]] // Excludes [[combinable]]
select {
// Other channel or timer or or cases
case ch_logger :> ctx.next_curve_logger_state : {
// Handle the select case channel event
} break;
// No common code here
}
// Unreachable
} // dsp_task_y
Also see 222:[XC compiler lives].
C: ChanSched
I have written several CSP-based runtime systems myself, the latest being ChanSched. It’s in C and runs on Atmel Microchip AVR ATMega. The code (from New ALT ..) would go something like this:
void dsp_task_y (void) {
Task_y_CP_a CP = (Task_y_CP_a)g_CP;
PROCTOR_PREFIX()
while (TRUE) {
ALT();
// Other channels or timers
ALT_CHAN_IN (CHAN_CH_LOGGER, &CP->Data_ch_logger, sizeof (CP->Data_ch_logger));
gALT_END();
switch (g_ThisChannelId) {
case CHAN_CH_LOGGER: {
// Handle the ALT event
} break;
}
}
}
There is no busy-poll here either. The task returns one level down to the scheduler if the other end channel of the channel is not waiting to send in the ALT_CHAN_IN macro. Or in the ALT macro when the channel event has been processed. These returns are an invisible yields. The only polling-like action that’s done is for the scheduler to get the next task from the ready queue. A channel event has been placed on the ready queue by the ALT_CHAN_IN or CHAN_OUT macros when both ends of the synchronous channel went ready (one to send and one to receive, in arbitrary order).
Also see my attempt at Towards a taxonomy(?) of CSP-based systems.
C: lib_xcore
Now back to XMOS and the lib_xcore library. From the XMOS documentation [5] and a YouTube lecture 221:[6]. However, there is no example in it of a select on a channel, but this should be ok according to xhuw on xCore Exchange forum point 1 (below):
DECLARE_JOB(dsp_task_y, (chanend_t));
void dsp_task_y (chanend_t ch_logger) {
SELECT_RES_ORDERED(
CASE_THEN(ch_logger, on_logger_input))
{
on_logger_input: {
// Handle logger input event
continue;
}
}
}
}
This code uses the xcore’s chanend HW, exactly like the XC code.
(I note that letting the CASE_THEN get a tag and then handling it based on that label later on is exactly what I do in ChanSched, years earlier)
C: Claude AI erroneous code
// First, channel declaration would change from:
// chanend ch_logger; // XC style
// to:
channel_t ch_logger; // lib_xcore style
// The select case would become:
uint8_t ready = 0;
while (!ready) {
// Non-blocking check if data is available
if (xcore_channel_can_input(ch_logger)) {
uint32_t received_state;
xcore_channel_in_word(ch_logger, &received_state);
ctx.next_curve_logger_state = received_state;
ready = 1;
// Your original handler code here
}
// Optionally yield to other tasks
xcore_yield();
}
It looks like being «non-blocking» is a virtue. On this point I am afraid that Claude might be mainstream, see Not so blocking after all.
Then it shows off by letting me believe that xcore_channel_can_input exists. It would do busy polling with the outer while loop. There is no busy polling with xcore, just as there is no busy polling in CSP, XC or occam. That is, if one does not, for some strange reason want to do it at application level, not «scheduler level».
Then Claude sets up an «optional» xcore_yield (which also does not exist). Yielding here I’m afraid won’t be much optional, since the sender is allowed to do its sending any time into the future.
But I am afraid that Claude has got this idea from somewhere, meaning: most of what Claude has read looks like this!? I shiver by the thought.
With CSP-like channels noting happens before both parts are ready. These days several languages have implemented channels. Go, but also Rust. But that’s another story.
More
- Maybe I should try this with
occamas well? - I was asked by a guy in a zoom meeting where I mentioned this, whether I had tried ChatGPT (or just gpt 4o) instead of Claude. Another guy replied that I should expect both of them to be equally bad when faced with complex concurrent programming. I’ll probably not do this. It might easily eat my life
- I probably need to tell it about [4], or even supply the correct answer (which might also easily eat my life). It’s obvious that it hasn’t seen enough
XCandlib_xcorecode. These guys in (2) are of the opinion that this might help other XC programmers from not getting that lost in the same tool. But then I have also thought that this potentially could have an AI engine becoming wrongly biased in «my direction»?
Aside: wc -l *.xc
MacOS command
mydir wc -l *.xc
248 DAC4_mcp4728.xc
858 FRAM_memory.xc
165 _Beep_BRRR_02.xc
3311 _Beep_BRRR_a_appl_micarray.xc
296 _Beep_BRRR_b_micdev_explorer.xc
3768 _Beep_BRRR_c_appl_explorer.xc
2928 _Beep_dsp_handling.xc
423 _Beep_dsp_handling_f1.xc
50 _print_macros.xc
464 _test_aux.xc
537 audio_agc.xc
348 audio_sine_table.xc
683 buffer_tasks.xc
610 button_press.xc
1586 correlation.xc
699 decoupling_tasks_implementation_f.xc
989 display_ssd1306.xc
21 error_handling.xc
232 group.xc
292 headset_pcm5100.xc
572 i2c_client_task.xc
524 logger_task.xc
101 main.xc
205 maths.xc
805 maths_fix.xc
1647 maths_fix_test.xc
150 maths_test.xc
302 mic_ics52000.xc
mics_in_headset_out files
885 mics_in_headset_out_a_appl_micarray.xc
222 mics_in_headset_out_b_micdev_explorer.xc
624 mics_in_headset_out_c_appl_explorer.xc
63 port_tile0_task.xc
389 power.xc
389 power_test.xc
252 pwm_softblinker.xc
126 timers_long.xc
25764 total
Huge line count?
You may have been shocked by my huge line count? The file in the graphics isn’t the only one. See line counts in the above «details» (even nested) fold. Yes, folding is the term. From almost 20 years of programming at work I used proper folding editors like Origami and WinF (or F or WinF32 or winf32). With proper I mean hand semantics-based, not auto syntax-based – even if the latter also has some value. These editors hide (or: hid) everything in such a way that what you will mostly see a single screen page only. Visual complexity is low, meaning that I «fractally» grasp what’s totally going on, kind of in every screen. I would know where I am, not only by line number, but where I am semantically. So high line counts hang with me also when I do Visual Studio Code (VSCode, VSC). That’s why I have used the ascii character graphics generator https://patorjk.com/software/taag by patorjk to compensate, to navigate easier. Of course I break into separate files when «this is something else, why didn’t I see that before» starts to clutter the code. VSCode pretends to support folding, but it isn’t a folding editor like I was used to. Read more about it at Wishes for a folding editor. That being said, VSCode does have the interesting block heading collapse thing. Study the top of my code screen clip at lines 2413, 2415, 2416 and 2426 which take up four lines. To me the most important things are as high internal semantic cohesion as possible and as low run-time coupling as possible between the tasks (files). Read at High Cohesion and Low Coupling: the Office Mapping Factor. Also read my disagreement arguments about cyclomatic complexity at Cooperative scheduling in ANSI-C and process body software quality metrics.
So isn’t there any limit? Even if files tend to get long, they should be as short «as possible». Every real-time task has a context struct. I don’t want to export this in a header file. So all functions called from a task that take that struct as a parameter will be local in that file, and add up to its length. This is practical but sometimes lazy. So if only parts of that struct is needed as a parameter then the sub-struct’ed function goes out of the .xc file with that sub-struct only as a param. One could say that one never need the whole struct as a param. This is subject to experience, practicalities and taste. Some times I am too eager with the next job and fall short to pick out the sub-struct when I would have wanted it. This is not healthy for the line count. Since I don’t want «thousands» of files but rather fewer and longer files, you may see that this particular project still has some 30-40 .xc files. I have one .c file and some 50-60 .h files. But cohesion and coupling are important to me. I don’t want to have different functional areas mixed with task files. Like power handling started its life in a task file, but I eventually moved them out to separate power handling .xc and .h files. This I too often seem to do too late. But better late than never. (Power is re*re + im*im of a complex number frequency bin after the FFT. Magnitude is the square root of the power.)
{ One problem with very large files is that, for some compilers, (like) a missing semi-colon or curly bracket might start an exceedingly long compile-and-write-error-messages period. For the compiler I presently use this could take minutes. Very seldom does the compiler seem to max out on error messages. That being said, most errors are wisely handled, even missing semi-colons or curly brackets; }
Code generation and the future
Updates
After I wrote the initial text here, new angles are popping up. At last for me. Newest on the top:
microgpt
The below mentioned Andrej Karpathy has on 12Feb2026 published 200 lines of Python code that is the complete code of his microgpt [13]. Very interesting, indeed.
Vibe coding? to HW?
In [12] XMOS is telling about «HW vibe coding»(?) as something which «takes that same intent-driven principle and applies it to hardware design«, and follows up with:
«With GenSoC, you can describe your system’s behaviour in natural language or higher-level models and the XMOS toolchain generates a deterministic, real-time, reconfigurable SoC, ready to run immediately on an XCORE® GenSoC silicon platform. .. It’s not just generating code — it’s generating the platform the code runs on.«
There are no references in the blog post, so I am eager for more. Until I am wisened my feeling is that the heading might be a clickbait, especially since it’s «Beyond vibe coding». Searching for GenSoC: https://www.xmos.com/search/?term=GenSoC certainly takes me more into the field, though. But «vibe»? Well..
All that being said, it sounds like good news!
LLMs for code evaluation
LLM = Large Language Model. From Wikipedia (Wiki-refs) (02Oct2025):
«A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation.
The largest and most capable LLMs are generative pre-trained transformers (GPTs) and provide the core capabilities of chatbots such as ChatGPT, Gemini and Claude. LLMs can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologies inherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained on.»
On 01Oct2025, from [11] I have this very interesting quote, as spoken by Mark Sherman, starting at 35.31 (with the help of YouTube texting, but no guarantee of full correctness):
Users defer to perceived AI authority. «But in all these cases, the systems make mistakes and so again the conventional wisdom in AI in general, not just with this code model is AI, you use it to ask a question and then you need to have a human look at the answer and see whether it’s reasonable or not. «Are these lead pipes in the house?» «Is this the right veil amount to be set?» And so on. There’s all sorts of AI systems, all sorts of studies that show that you need the human in order to try to correct what’s going on here. Well, in the case of programming it says that people don’t seem to have the judgement in order to do that. This is what like Price Waterhouse Coopers where they had people going out in their engagements and they just could not catch it. Again from Facebook, one of the engineers, that they make mistakes and we’ve seen them already here in this talk, and it takes a really experienced developer to spot the issues and I’ll give you one subtle example in where you might not expect a programmer to get it right. The challenge is that programmers believe the AI systems over their own trust or their own expertise. So that first quote came actually from that same Stanford study I mentioned earlier where they had students and sometimes they had some professional developers build some of these things and use AI systems. They would get feedback from the evaluation saying this is insecure code and when given a choice the programmers would prefer what the AI system generated over what they believed to be secure code even though it was the wrong answer. Now in this particular study they had one additional result which again many people found interesting. I have not seen it replicated but it’s interesting enough to share. Turns out that the more experienced the developer the more they were willing to defer to the AI system.»
I found Sherman really interesting. It was worth the hour. He also mentions threads and deadlock detection.
LiveCodeBench Pro
In [10] Zihan Zheng et.al. have compared large language models (LLM) with experts’ coding. They «introduce LiveCodeBench Pro, a benchmark composed of problems from Codeforces (Wiki-refs), ICPC, and IOI that are continuously updated to reduce the likelihood of data contamination» (from the abstract). Codeforces contains lots of high-quality problems. However, I don’t see any problems related to real-time, concurrent or such architectures and paradigms, languages or libraries. I base this on these not being mentioned in [10], plus an internet search.
Their AI-generated summary goes like this:
«LLMs perform well on implementation-heavy competitive programming problems but struggle with nuanced algorithmic reasoning, as highlighted by LiveCodeBench Pro.»
They have used the Elo rating system for pair-wise comparisons (Wiki-refs), also used by Anthropic, which is in scope here. With it, my score depends on the opponent’s score. I get higher «Elo rating» from beating a good as compared to a mediocre opponent.
In other words: I’d hope my Beep-BRRR code (13k+ lines of my code) and the algorithms I invented there might not appear out of an LLM. But when I convert the code to lib_xcore, maybe AI could help some? Stay tuned.
XGH+AI
I enjoyed reading Pavel Samsonov satirical? / polemical? blog note about his «eXtreme Go Horse» (XGH) method at [9]. A quote:
«If you learn by shipping, XGH+AI will let you ship the wrong thing much faster than any old way of working, and the shipped thing will be more wrong than anything you could have dreamed of. Therefore we will learn faster than ever. It just makes sense.«
A fresh take on trial and error problem solving.. But not exactly for the fainthearted. Or safety critical.
AI some times is of much help
I have moved my code to private GitHub repositories. See My Git/GitHub/GitLab notes. I did get help from a relative (thanks, Edvard!) to start this. But then, on my own, I did as he suggested: «try Duck.ai from DuckDuckGo«. It seems to use GPT-4o mini. I have had lots of help from it. To be honest, I am impressed!
Hallucination?
I guess what I experienced here is close to hallucination as described on Wikipedia, since it looks like the AI just guessed from related stuff. (I already had hallucinations in a quote, though). Or «confabulations» or «creative gap-filling» [8].
Vibe coding?
Or maybe AI tools (or their owners, laughing all the way to the bank) are happy with me using them as «replacements for my brain» and not as «assistants» (Kush Brahmbhatt) in a wild west technique dubbed by Andrej Karpathy as vibe coding? Meaning I shouldn’t have jumped off my exercise, but allowed Claude and I to vibrate to some functionality, disregarding how it’s solved under the bonnet? Would or could I ever trust the result? The car starts and moves for sure, but looking under the hood it turns out I’ve got a rubber band car because the AI must have been trained with too many interesting enough articles like [7].
AlphaGeometry
In [6] I read that
«AlphaGeometry is a combination of components that include a specialized language model and a ‘neuro-symbolic’ system — one that does not train by learning from data like a neural network but has abstract reasoning coded in by humans. The team trained the language model to speak a formal mathematical language, which makes it possible to automatically check its output for logical rigour — and to weed out the ‘hallucinations’, the incoherent or false statements that AI chatbots are prone to making.
For AlphaGeometry2, the team made several improvements, including the integration of Google’s state-of-the-art large language model, Gemini. The team also introduced the ability to reason by moving geometric objects around the plane — such as moving a point along a line to change the height of a triangle — and solving linear equations.«
Maybe with this kind of AI that code generation might become (even) more viable? (Thanks Jeremy, for pointing this out!)
Forums
xCore Exchange forum
- lib_xcore select on case channel input example – started by me 08Jan2025. Answered by xhuw 09Jan2025
References
Wiki-refs: EN: Artificial_intelligence (Artificial intelligence = «AI»), Codeforces. Elo rating system. Large Language Model («LLM»). NO: Kunstig_intelligens (Kunstig intelligens = «KI»), Stor språkmodell («LLM»)
[1] Cursor AI: The AI-powered code editor changing the game, 26Aug2025, see https://daily.dev/blog/cursor-ai-everything-you-should-know-about-the-new-ai-code-editor-in-one-place
[2] The Top Programming Languages 2024 Typescript and Rust are among the rising stars, in IEEE Spectrum by BY STEPHEN CASS 22 AUG 2024, see https://spectrum.ieee.org/top-programming-languages-2024
[3] Claude 3.5 Sonnet by Anthropic, Jun. 2024, see https://www.anthropic.com/news/claude-3-5-sonnet
[4] XMOS XM-014363-PC document, see 221:[References]
[5] – Bli en 10x AI-utvikler med Cursor! leserinnlegg i Kode24 9.des.24 av Kristofer Giltvedt Selbekk, fagsjef i Bekk, les på https://www.kode24.no/artikkel/bli-en-10x-ai-utvikler-med-cursor/82358651. (Litt på siden: Jeg skrev an artikkel i Kode24 en gang, «Slik styrer han akvariet sitt med XC«. Se om den i kode24 eller artikkelen her)
[6] DeepMind AI crushes tough maths problems on par with top human solvers, in Nature 07Feb2025, by Davide Castelvecchi. See https://www.nature.com/articles/d41586-025-00406-7
[7] The Rubber Band Car Challenge by The Smallpeice Trust: ENGINEERING @HOME, see https://www.smallpeicetrust.org.uk/downloads/EaH-01-The-Rubber-Band-Car-Challenge.pdf
[8] Company apologizes after AI support agent invents policy that causes user uproar. Frustrated software developer believed AI-generated message came from human support rep. BENJ EDWARDS – 18. APR. 2025. See https://arstechnica.com/ai/2025/04/cursor-ai-support-bot-invents-fake-policy-and-triggers-user-uproar/
[9] The future of AI-driven development isn’t Agile. It’s XGH by Pavel Samsonov (11Jun2025). Read at spavel.medium.com/the-future-of-ai-driven-development-isnt-agile-it-s-xgh
[10] LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming? By Zihan Zheng and 18 more authors, all from American and Canadian universities and Sentient Foundation (Open AI) (13Jun2025). Read at https://huggingface.co/papers/2506.11928
[11] Using LLMs to Evaluate Code. Shane McGraw interviews Mark Sherman, Technical Director, CERT, Software Engineering Institute at Carnegie Mellon University, https://www.youtube.com/watch?v=E14F5csMEP4 (01Oct2025) (1 hour)
[12] Beyond vibe coding: how Generative System-on-Chip (GenSoC) takes generative design to the hardware level, by Hollie Drohan. Read at https://www.xmos.com/gensoc-beyond-vibe-coding (27Oct2025)
[13] microgpt by Andrej Karpathy, see https://karpathy.github.io/2026/02/12/microgpt/ (12Feb2026)

