Okay, this was far better than I remembered.
This movie might be one of my favorites of the last decade.
The scoring was beautiful, the direction was great, acting was good, and the story was worth it.
I completely forgot about it until @natalie reminded me of it because it had a switchboard operator.
This one's cheating, I want a pure AVX version.
The eternal float assembly conundrum:
Do I reorder my inputs and outputs to EXACTLY match haddps (because otherwise haddps is slower) or do I try to save instructions elsewhere.
Update
Status: 23 instructions
Target: 21 or less
8-point FFT (thread)
Status: 24 instructions
Target: 21 or less
This reminds me of something else.
I was talking to jmspeex (one of the main Opus devs) about his LPCNet project (https://jmvalin.ca/demo/lpcnet_codec/). I asked him how much would be needed adapting this to work if someone were to tap into all the nerves necessary for speech.
He said almost none.
It doesn't sound great because it was optimized for low power machines, but if you throw enough GPGPU at it to generate a high fidelity model it would sound great.
A reminder: if you want to improve compression of your voice just pinch your nose.
The vocal tract is perfectly modeled with an LPC (it's just a tube) which is what a lot of speech codecs use. And lossless ones too!
That model completely fails when the tube has a random hole in it, but it's still somewhat compressible.
I made a mistake when always partitioning /boot as 200M on all my machines.
"200M is huge", I thought, "it could hold many kernels", "leaves plenty of space".
It doesn't hold many kernels, only about 3. And it especially doesn't hold many kernels if I don't upgrade fairly often and/or I get the timing wrong.
Next time I'll use 768M.
Fuck glfw too.
Both those libraries are beyond bad on Wayland. I'd take GTK any day. I'd even take well-done Qt over this.
The best editor I've seen so far is lite.
https://github.com/rxi/lite
But it uses SDL2 for management. Fuck SDL to hell. It doesn't pass bare usability on Wayland, and it's been YEARS.
I'm almost wanting to fork it and transplant my Wayland code on top.
Modern FFT optimizations are mostly all practically unusable unless you're writing for an ASIC or an FPGA.
They're all "we save a few operations, but to implement this on a computer you have to double your binary bloat for custom special-cased functions, ruining your instruction cache and doubling how many twiddles you need".
That, or the conveniently "forget" to take into account the pre/post bit reversal permutation needed. All the bits are there, they're just in the wrong order. To correct this you have to do a permute twice as expensive as the actual FFT because CPUs can't shuffle quick to save their lives. No, vgatherps is usually slower than manual scalar loading.
Out of curiosity I took a look at a QFFT paper.
Compared to the regular computer split-radix ITS SOOOOOOO CLEAN. Look at the 8-point! It's perfectly straightforward!
I want a quantum computer! With at least tens of thousands of real qbits, well connected and error corrected!
Pushed. Also gave a 3% speedup on x86 and made the ac3enc_fixed usable.
"under budget and ahead of schedule"
But the good thing was it didn't try to be just a GUI around vim. Nor an ncurses-based thing.
The search goes on. But gedit does pretty much all I want a text editor to do.
I decided to try out onivim2.
12 GIGABYTES and 1 hour later (after altering a few esy packages because Werror braindeath) I had it running.
No repeat key because libsdl sucks. No assembly syntax highlighting. Shit documentation.
This is by far the most meme editor I've seen. It makes vscode look like a mortgage consultant.
Codec researcher, physicist, x86/aarch64/wasm assembly and Vulkan expert.
A concept physically manifesting herself in a titatium-strengthened partially-organic shell.