Search logs:

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present


http://bespin.org/~qz/search/?view=1&c=osdev2&y=22&m=1&d=18

Tuesday, 18 January 2022

02:02:00 <geist> gosh after all this work to set up data transfers on AHCI looks like nvme is going to be EZ TIME
02:03:00 <geist> it's like ahci gives you most of the tools to build something really efficient, and then just pulls back at the last minute and makes you grub around to figure out how to handle things well. especially NCQ
06:23:00 <Jari--> Guys what do you think on business long-term goal, is Open Source or Propertiary Operating System better ? To get money.
07:11:00 <kazinsal> neither, this is a shit thing to make a business out of
07:11:00 <kazinsal> unless you're doing something extremely specific and niche like high performance routing or storage and your sole focus is on that and you REALLY know what you're doing
07:11:00 <kazinsal> and I don't mean on paper know what you're doing and "run it through the optimizer" high performance
07:12:00 <kazinsal> I mean this particular microsubject is your actual honest to god getting paid for it field already
07:12:00 <kazinsal> you would need to be a subject matter expert in that particular field and then some
07:12:00 <kazinsal> AND be a good enough programmer with enough time to make something useful
07:12:00 <kazinsal> AND have the connections to build a working hardware product
07:13:00 <kazinsal> AND have the connections to market it successfully
07:13:00 <kazinsal> basically, don't even think about it. it's not going to happen.
07:53:00 <klange> I moved my old album of historic screenshots from Imgur to a GitHub gist and expanded it; I found a number of old screenshots in a laptop backup and have added new ones since the album was last updated long before 2.0
07:53:00 <klange> https://gist.github.com/klange/f427a551af5f2f8b3c9ef80687883fcf
07:53:00 <bslsk05> ​gist.github.com: 11 Years of ToaruOS · GitHub
07:57:00 <mlombard> klange, this is amazing
08:40:00 <geist> everything kazinsal said
08:40:00 <geist> it's exceptionally *exceptionally* rare to make money writing an OS nowadays
09:11:00 <kazinsal> geist: update on the forum thread I mentioned the other day; they wanted to hear more baffling x86/PC architecture stuff so I just wrote a giant post about interrupt routing with a big terrifying middle section about ACPI bytecode haha
09:12:00 <kazinsal> of course that's then followed with "thank god for PCI-SIG going 'no that's insane stop that' and giving us message signalled interrupts"
09:20:00 <sham1> Who thought that the ACPI bytecode was a good idea
09:24:00 <geist> i think there was a period of time when multi arch was a bigger deal (think intel trying to get people to switch to itanium and MSFT running on 3 or 4 arches) and no one had really thought that running arbitrary bytecode in kernels was going to be a security problem
09:24:00 <geist> it's certainly not the first time bytecode interpreted driver and low level logic was used
09:25:00 <sham1> One of the biggest annoyances is that one has to pretend to be Microsoft to do anything
09:26:00 <geist> yah functionally speaking ACPI is the MSFT driver model. it's designed such that it basically drives the whole setup of what driver goes where
09:27:00 <geist> hence why it's mandatory on ARM devices that windows runs on. they dont have a mechanism to convert device-tree/etc to their model
09:27:00 <geist> or at least it would be some hard coded acpi thing
09:27:00 <kazinsal> it first showed up around the time that high end workstations were shipping with Pentium Pros, PowerPC 604s, Alpha 21164s, or MIPS R4Ks so having some platform independent bytecode for figuring out how to handle interrupts and such for any arbitrary PCI card you slammed in there was handy
09:28:00 <kazinsal> I wasn't there at the time but I suspect that in practice other platforms had much less dodgy interrupt controller setups than Intel did
09:28:00 <sham1> Still, having to essentially lie to the bytecode "Hey, I'm actually Windows 2000" or whatever is weird
09:31:00 <geist> i wonder if cpus feel pain when they have to run one kernel or the other
09:31:00 <geist> like 'oh no not again'
09:32:00 <kazinsal> "ugh, they're loading up TSS/360 *again*?"
09:33:00 <klange> "this asshole keeps trying to run his garbage toy OS, please just kill me"
09:33:00 <klange> i'm sure that's what my thinkpad thinks all the time
09:33:00 <kazinsal> AS/400 machines waking up and lamenting that they've been pulled from spending time with their loved ones in the computer afterlife to go run some queries on a financial database from 1989
09:37:00 <kazinsal> ha, after looking it up, I hadn't realized that the pre-POWER AS/400s ran on a 48-bit ISA
09:41:00 <geist> hmm like 48 bit regsters or that was the virtual memory size?
09:44:00 <CompanionCube> kazinsal: i wonder what the feeling of migration to modern POWER would be in that context
09:46:00 <kazinsal> 48-bit virtual and physical address space, 32-bit machine word
09:46:00 <CompanionCube> do we actually know about the registers of the CISC ISA?
09:47:00 <kazinsal> for AS/400? looks like 16 general purpose registers, a condition register, and a program counter
09:47:00 <kazinsal> 48 bit registers, 32-bit adder
09:48:00 <CompanionCube> for pre-power 400 yeah
09:48:00 <kazinsal> s/adder/ALU/
09:48:00 <kazinsal> IEEE 754 double-float, doesn't say if there are registers for them or what
09:50:00 <CompanionCube> (System/38 has rather more comprehensive documentation iirc)
09:51:00 <kazinsal> yeah, I'm looking at an excerpt from a 1989 edition of the IBM Systems Journal that details some of it
09:52:00 <CompanionCube> did you ever read the page about how the memory tagging works on POWER?
09:52:00 <geist> IBM was always big on sayign it has 96 bits of virtual
09:52:00 <geist> or 80 i mean
09:52:00 <geist> when AFAICT it's simply 64bit + 16 bits of address space id
09:53:00 <CompanionCube> geist: iirc the user ISA is defined as having 128-bit pointers.
09:53:00 <geist> (thought actually makes me wonder how many virtual bits are implemented in POWER/PPC)?
09:56:00 <geist> how woudl 128 bit pointers work without a register to hold them?
09:57:00 <kazinsal> looks like Power ISA 3.1 supports between 65 and 78 bits of virtual address space
09:57:00 <geist> yah but how does that work? is that simply ASID + virtual pointer?
09:58:00 <kazinsal> I think so
09:58:00 <kazinsal> it says "effective address space size is 2**64 bytes" on the same page
09:58:00 <geist> and even that i think some amount of the 64 is not implemented, as it generaly is not
09:59:00 <CompanionCube> 'For 64-bit PowerPC processors, the virtual address resides in the rightmost 64 bits of a pointer while it was 48 bits in the S/38 and CISC AS/400.'
09:59:00 <geist> i think the power hashtable only has 56 bits or so for the virtual address match
12:15:00 <Killaship34> Hi
12:15:00 <Killaship34> Talking from school here, lmao
12:16:00 <GeDaMo> Hi Killaship34 :)
12:16:00 <Killaship34> Hi
12:16:00 <Killaship34> wait who are you
12:16:00 <GeDaMo> I'm GeDaMo :|
12:17:00 <Killaship34> Ok, hi GeDaMo
12:17:00 <Killaship34> :p
12:18:00 <Killaship34> I found out how to get a terminal from my home PC at school, so here I am, running an IRC client from the terminal
12:18:00 <Killaship34> I'm gonna go
12:18:00 <Killaship34> bye
12:19:00 <gog> bye
12:20:00 <Killaship34> Actually I'm gonna have this hanging around in the background
12:20:00 <gog> hi
12:20:00 <Killaship34> hi
12:20:00 <gog> u writing a kernel?
12:21:00 <Killaship34> Yeah
12:21:00 <Killaship34> I'm Killaship on github
12:21:00 <Killaship34> I have a real mode kernel, Proton
12:21:00 <Killaship34> and a multiboot1 kernel, Spectrum
12:21:00 <Killaship34> and some other projects with similar names
12:21:00 <gog> i have a thing, it's not exactly a kernel yet
12:21:00 <gog> it's called "sophia"
12:22:00 <gog> it's an absolute mess
12:22:00 <Killaship34> lol
12:22:00 <Killaship34> Cool
12:22:00 <klange> you can go a long way and still have an absolute mess
12:22:00 <Killaship34> Yup
12:22:00 <gog> i anticipte it's always going to be an absolute mess
12:22:00 <Killaship34> Good idea]
12:22:00 <gog> that's ok, i am too and i'm pretty cool
12:22:00 <Killaship34> nice
12:24:00 <Killaship34> ;-; tfw you find a nice guy on irc but he's gone
12:24:00 <zid> gog: your mess brings all the boys to the yard, but they're like, wtf is this mess, you're like, I could show you, but I'd have to cringe
12:25:00 <gog> true facts
13:20:00 <Killashi134> And, I'm back
13:20:00 <Killashi134> hello
13:21:00 <gog> wb
13:34:00 <gog> bye
16:24:00 <Bitweasil> "<klange> you can go a long way and still have an absolute mess" <-- See Microsoft! ;)
16:24:00 <Bitweasil> Or the Linux kernel, or...
16:35:00 <heat> is 25-26% cache misses horrible? for a terminal emulator
16:36:00 <heat> yesterday i had "fun" profiling mine in linux with perf
16:37:00 <heat> it turns out the hotspots for the terminal are just loads from memory
16:38:00 <heat> which tells me the caching is all screwed up probably but I don't see how I'm supposed to make it go fast
16:39:00 <heat> when I keep a render thread the speed is similar but I don't do fancy delaying of the actual thread waking up
16:40:00 <heat> and I'm not sure how to go about it because I'm scared I'll introduce some sort of latency which is not what I want, obviously
16:41:00 <Bitweasil> That doesn't seem terribly odd to me, most things suck at cache-behavior.
16:42:00 <j`ey> heat: why not perf test another emulator?
16:42:00 <heat> just tried with konsole and it has slightly lower cache misses but it's obviously much faster
16:42:00 <heat> also, GPU acceleration
16:43:00 <heat> my problem is that my virtual terminal is *really slow*
16:43:00 <heat> it takes 40 seconds to readelf -a /boot/vmonyx (~4300 lines)
16:44:00 <heat> actually 5300 lines but still, in konsole it's instantaneous
16:45:00 <Bitweasil> I doubt your cache misses are the problem at that point.
16:46:00 <clever> heat: that reminds me, i had problems on nixos when i first started, where `ls -lh` would be abnormally slow
16:46:00 <heat> Bitweasil, if my instruction hotspots are an "and" with a memory operand and a "test" with a memory operand, it's not impossible
16:47:00 <clever> heat: i eventually figured out why, nixos was using an out-of-tree copy of gallium i think it was, which had been deprecated after merging it into another repo
16:47:00 <clever> so there was 2d accel bugs in there, that had been fixed ages ago
16:47:00 <heat> i don't have acceleration at all :)
16:48:00 <Bitweasil> Are you working within cache lines mostly, or romping all over memory and not even in L3?
16:48:00 <heat> idk
16:49:00 <heat> the only concrete measurements I did were yesterday and even those aren't really representative
16:51:00 <heat> my design is dead simple: the console is represented by a big array of struct console_cell, which are 16 bytes in size and have the codepoint, fg, bg, and flags (one of which is the dirty flag)
16:52:00 <heat> all commands, writing, scrolling are done directly on the console_cell array
16:52:00 <heat> after the vterm_write() finishes I go around and look through the entire array for dirty cells, which is probably what's screwing me
16:53:00 <heat> when flushing out cells it's just simple bitmap character drawing directly to the framebuffer
16:54:00 <clever> heat: oh, and another problem, even older, when i installed a new distro on a "new" laptop, i found gnome-terminal to be a major cpu hog, but xterm was amazingly fast, ever since then, ive been stuck on xterm :P
16:57:00 <Bitweasil> I've seen long running 'top' sessions in gnome-terminal start chewing up a ton of CPU as well.
16:57:00 <Bitweasil> I'll have to try xterm on those systems.
16:57:00 <Bitweasil> (start top, come back a month or two later, gnome-terminal is chewing... alarming amounts of the limited CPU)
16:57:00 <clever> Bitweasil: that laptop also had cpufreq bugs
16:57:00 <clever> `ls -lh` would chew enough cpu, that it would step to a higher freq
16:57:00 <clever> but then its idle, so it steps back down
16:58:00 <clever> and due to hw bugs, the system locks up solid for 0.5 seconds on every step
16:58:00 <clever> and that step back down, lines up perfectly with when i react to the ls output, and type another cmd
16:58:00 <clever> and the ps2 fifo overflows
16:58:00 <clever> so key-up events are lost, and auto-repeat just takes off!
16:59:00 <heat> holy shittttttttttttt linux devs are so smart
16:59:00 <heat> how did I not see this
17:00:00 <heat> their console cell array is an array of pointers
17:00:00 <heat> when they shuffle lines (e.g scrolling) they can just swap pointers
17:00:00 <clever> ?
17:01:00 <clever> ive run into scrolling performance issues on LK as well, and would be interested in options
17:01:00 <GeDaMo> heat: did you see those videos from Casey Muratori where he wrote a terminal emulator on Windows?
17:01:00 <heat> instead of doing a huge memmove of the whole cell array, as you would do if you just had a static one, they just move pointers
17:02:00 <heat> since you have like what, 25-40 lines you need to move 25-40 lines in the worst case when scrolling the whole system
17:02:00 <heat> which in our best case are just pointers
17:02:00 <clever> heat: is each line of text its own image and ptr?
17:02:00 <heat> GeDaMo: yes, I didn't find it too useful, lots of snark and user-space only stuff
17:03:00 <GeDaMo> He basically got annoyed at MS for claiming that making their terminal faster would be extremely difficult
17:03:00 <heat> I don't have a GPU here, nor do I have a chance to do fancy asynchronous processing of writes; if I take too long to do something I'm just adding latency to a program
17:04:00 <heat> clever: each line of text is its own array of char32_t's
17:04:00 <clever> heat: ahh
17:04:00 <heat> in my case I have 4 uint32_t's
17:04:00 <clever> so your manipulating the text content, and then re-rendering?
17:04:00 <heat> codepoint, fg color, bg color, flags
17:04:00 <heat> si
17:04:00 <clever> LK is very different, it has no way to store the text content
17:05:00 <clever> the print routines just directly draw to the image buffer, and then promptly forget the text
17:05:00 <heat> if you want to implement a vt100 emulator you need this
17:05:00 <j`ey> heat: i tried to give it a watch, but yeah, so much snark
17:05:00 <clever> scrolling is just a memmove() to shift the pre-drawn glyphs
17:05:00 <heat> memmove() on a framebuffer is bad
17:05:00 <clever> yeah
17:06:00 <clever> thats why i was asking, to find another solution
17:06:00 <clever> but it looks like what you found is for the text buffer, and doesnt solve re-rendering that to the gfx buffer
17:07:00 <heat> helps
17:07:00 <heat> if you don't read from video memory you're way better off
17:07:00 <clever> ah yeah, that also came up a while back on the rpi forums
17:07:00 <clever> somebody asked why reading /dev/fb0 is so much slower then writing
17:07:00 <clever> and the answer is write-combined and no-cache
17:08:00 <clever> but in the case of LK on the VPU, i can configure it so the video memory is cachable, and the GPU is coherent with the cache
17:09:00 <clever> but i would probably use DMA on uncached memory, if i wanted to optimize it more
17:09:00 <heat> i think with PCI you're also limited to a single read but you can queue a lot of writes
17:09:00 <heat> remember reading that somewhere
17:11:00 <clever> https://github.com/littlekernel/lk/blob/master/lib/gfxconsole/gfxconsole.c#L74-L92
17:11:00 <bslsk05> ​github.com: lk/gfxconsole.c at master · littlekernel/lk · GitHub
17:11:00 <clever> heat: the region of code i was refering to
17:13:00 <clever> and gfx_copyrect() is a potential way to fix things
17:13:00 <clever> https://github.com/littlekernel/lk/blob/master/lib/gfx/gfx.c#L545-L573
17:13:00 <bslsk05> ​github.com: lk/gfx.c at master · littlekernel/lk · GitHub
17:14:00 <clever> heat: when a bitmap is created, the copyrect function is filled in, with a variant suitable for the bpp, but i could swap that out with a gpu accelerated copy
18:18:00 <heat_> crap I can't get AHCI IRQs in real hardware
18:21:00 <heat_> yeah OK I think I'm not getting IRQs at all
18:21:00 <heat_> unless both my AHCI and RTC drivers are broken
18:32:00 <heat_> hmmm the local apic timer works though
18:32:00 <clever> heat_: check the irq controller tree?
18:32:00 <heat_> no clue what you're talking about
18:33:00 <heat_> this is my OS
18:33:00 <clever> doesnt x86 have a thing where some irq controllers are routed into an irq pin on another irc controller?
18:33:00 <clever> and if that chained pin is disabled, everything on that branch of the tree is dead
18:39:00 <heat_> oh wow I found the issue
18:39:00 <heat_> it turns out the IO APIC's redirection table can have garbage in it
18:39:00 <clever> what does the redirection table do?
18:41:00 <heat_> IRQs map globally to the IO APIC's redirection table, then the IO APIC redirects the IRQ to a given local APIC with the given delivery mode
18:41:00 <heat_> and asserts a given irq vector
18:42:00 <heat_> I noticed it because I tried to print an entry and it had a bogus destination CPU ID
18:42:00 <clever> ah
18:42:00 <heat_> my init code just read the entries, adjusted a few things and kept going
18:42:00 <heat_> If I set it to 0 explicitly everything is fine
18:43:00 <heat_> Default value: xxx1 xxxx xxxx xxxxh <-- actually yeah they're pretty explicit about that
18:59:00 <joe9> I am trying to decode this instruction into further detail ff2425910e2000 . This is in x86-64
18:59:00 <joe9> It is causing a #GP(0)
18:59:00 <joe9> and I am trying to figure out why. http://okturing.com/src/12985/body If I replace the JMP with a CALL or JMP to a label, it works fine.
19:00:00 <joe9> So, something to do with segment selector or task gate setup, I think
19:02:00 <gog> task gate?
19:02:00 <gog> can't use those in long mode
19:04:00 <Oli> As a brief addendum: I love you all people delving on and helping others paradigms on close-to-metal computer science topics.
19:05:00 <Oli> tangent* sleepy me
19:06:00 <joe9> gog, but, I do not know if it is due to that. It should not be. but, just want to be sure.
19:06:00 <gog> trying to jump through a task gate in long mode will #GP(0), so it's a good starting point
19:06:00 <joe9> if I want to look into the instruction components mod-reg-r/m bits, other than doing the bits manually.
19:06:00 <joe9> is there a better way of going about it.
19:07:00 <gog> are you trying to use x86 hardware task switching?
19:07:00 <joe9> no, not knowingly.
19:07:00 <gog> that's the only purpose for task gates
19:07:00 <joe9> unless, I forgot to enable/disable some specific flag.
19:07:00 <gog> and it's deprecated in long mode
19:07:00 <gog> or rather, not supported
19:08:00 <gog> it's deprecated in protected mode and not recommended
19:11:00 <gog> ok hm examining this it doesn't seem like that's what you're trying to do though
19:12:00 <gog> what does your GDT look like?
19:13:00 <joe9> http://okturing.com/src/12988/body
19:13:00 <joe9> this is plan9 c, so the syntax will be off.
19:13:00 <joe9> from gcc.
19:14:00 <joe9> CS 0008 DS 0001 ES 0000 FS 0000 GS 0000
19:14:00 <joe9> these are the segment selectors when it GP'ed
19:14:00 <gog> DS 0001
19:14:00 <gog> that looks wrong right off the bad
19:14:00 <gog> bat*
19:14:00 <joe9> thanks, I was thinking that too.
19:15:00 <joe9> will dig through it.
19:16:00 <gog> should be 0010 if it's DPL0 and the second entry
19:18:00 <gog> disassembling the second snippet shows that it's doing an indirect jump referencing ds, hence the GP
19:19:00 <gog> what's strange is it doesn't give the selector index
19:20:00 <joe9> how did you disassemble it?
19:20:00 <gog> just put it through a web thing :p
19:20:00 <joe9> which web thing?
19:21:00 <GeDaMo> https://defuse.ca/online-x86-assembler.htm
19:21:00 <bslsk05> ​defuse.ca: Online x86 and x64 Intel Instruction Assembler
19:21:00 <gog> https://defuse.ca/online-x86-assembler.htm
19:21:00 <joe9> thanks.
19:21:00 <gog> 0: ff 24 25 91 0e 20 00 jmp QWORD PTR ds:0x200e91
19:21:00 <gog> it's trying to jump to the address of a function at ds:200e91
19:22:00 <joe9> thanks, that is exactly what I need.
19:22:00 <gog> :D
19:23:00 <joe9> thanks so much.
20:35:00 <joe9> gog, I am doing a dump of all registers when the GP(0) occurs and notice that DS=0001 when I have a JMP instruction there.
20:35:00 <joe9> but, DS=0, when I replace the JMP with a CALL.
20:35:00 <joe9> does that make sense to you?
20:36:00 <gog> weird
20:37:00 <gog> can i see the assembly code?
20:38:00 <gog> not the disassembled code, but the source
20:39:00 <gog> also do you set DS at any point?
20:41:00 <joe9> gog, this is plan9 assembler code. http://okturing.com/src/12991/body I am setting DS to 0 at boot.
20:42:00 <gog> DS needs to be 0x10
20:42:00 <gog> the index of your kernel data descriptor
20:42:00 <joe9> http://okturing.com/src/12990/body
20:42:00 <gog> well index << 3
20:43:00 <joe9> ok, will set it explicitly to 0x10 and try. Thanks.
20:43:00 <gog> has to be after you do lgdt
20:57:00 <joe9> ok, thanks.
20:59:00 <geist> clever: oh yeah that gfxconsole stuff is all about max compatibility, not speed
21:00:00 <geist> though it does generally acknowledge the fact that you should generally write to the framebuffer and not back
21:00:00 <clever> geist: and i think i could work within that framework, by adding a custom copyrect function, for any images rendered on the rpi gpu
21:00:00 <geist> though if the FB is in main ram that's not really true, you can generally munge it
21:00:00 <clever> yeah
21:00:00 <clever> but a dma accelerated copy, would leave the cpu more idle
21:01:00 <clever> and could probably saturate the bus better
21:04:00 <geist> sure
21:16:00 <clever> but i can also see some issues, copyrect and dma, the stride, it may not play well together...
21:16:00 <clever> worst case, i could see creating one control structure per row of pixels, and chaining them up
21:16:00 <clever> but if the rect spans the entire image as gfxconsole does, it can be far simpler
21:46:00 <heat> yo geist
21:46:00 <heat> do you know anything about irq balancing between CPUs?
21:47:00 <heat> now that I've looked at this code I might as well improve stuff and I haven't looked too much into balancing
21:48:00 <heat> although I remember reading vague comments about having every IRQ on the same CPU being faster because of caching or something
22:12:00 <clever> heat: for nvme, you do want per-core interrupts, because nvme supports per-core command rings
22:12:00 <clever> so basically, each core can talk to the nvme drive independantly, without having to do any mutex'ing to share the drive
22:12:00 <clever> and it will irq back to the core that gave that specific command
22:26:00 <sham1> That's actually neat
22:27:00 <sham1> I suppose that's just one more reason why nvme is so good
22:27:00 <sham1> And performant
22:27:00 <heat> well a single nvme drive is a single controller afaik
22:29:00 <heat> there's also not much to lock in AHCI
22:29:00 <heat> each drive can operate semi independently
22:30:00 <clever> i think the reason nvme does that, is 2 fold
22:31:00 <clever> 1: the cpu cache, if a given process just requested data, sending the reply to the same core improves the chances of that process still being in the cache
22:31:00 <clever> 2: less need for the kernel to co-operate with its other cores, over who talks to the drive when
22:40:00 <clever> heat: xhci also gains other fun features from its use of many command rings, like exporting a usb device on an xhci controller to a virtual machine, with minimal translation involved
22:40:00 <clever> you can basically just map the command ring for a given usb device right into the guest, and cover it with a very thin xhci virtual device
22:41:00 <clever> though, i'm not sure what actually takes advantage of that....
23:02:00 <zid> clever: I like the idea of routing io irqs based on the cpu that is actually waiting on that io
23:02:00 <zid> idk if anybody does that
23:02:00 <zid> then just waking up into that processes read() straight out of the irq'd be cool
23:03:00 <heat> it's pretty standard on high performance stuff
23:03:00 <heat> NICs do it a lot for example
23:04:00 <heat> the whole linux networking stack has a lot of optimisations for getting a packet to the CPU it belongs in directly
23:07:00 <heat> now I just need to find out why my PS2 driver isn't picking up the controller and I'll have a working real thing
23:40:00 <kazinsal> yeah MSI-X makes routing different queues to specific CPUs super easy
23:53:00 <geist> indeed. also good high performance nics have multiple queues as well, and usually let you assign a MSI-X vector per
23:54:00 <geist> the assumption being that you spread them across cpus
23:54:00 <geist> clever: i think that particular command ring thing in xhci is larely used for things like mass storage
23:54:00 <geist> you can basically build a separate transfer queue for some disk so it doesn't have to go through the usual high level machinery of the usb stack
23:54:00 <geist> the details are hazy
23:55:00 <geist> or that may be that xhci controller has mass storage offload and maybe that shows up as a separate queue
23:55:00 <geist> oh heat left. drat