Search logs: #osdev2 - 14 July 2024

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev2&y=24&m=7&d=14

Sunday, 14 July 2024

00:07:00 <vai> yo :-)
01:21:00 <heat> is there a particularly TLB shootdown-intensive *realistic* workload?
01:25:00 <heat> basically i'm wondering how to strike the right tlb shootdown balances between IPIs and fine-grained TLB invalidation...
01:26:00 <heat> something really synthetic probably won't work if you can't easily see the TLB shootdown difference
01:46:00 <vin> heat: Maybe PTE access bit scanning to detect page activity. https://sjp38.github.io/post/damon/
01:46:00 <bslsk05> sjp38.github.io: DAMON: Data Access Monitor | hacklog
01:47:00 <heat> damon's a whole thing ;)
01:54:00 <vin> heat: applications used here for evaluation http://www.cs.yale.edu/homes/abhishek/kumar-asplos18.pdf might be of interest for you
01:55:00 <heat> thank you, i'll check it out :)
05:41:00 <kazinsal> https://faultlore.com/cargo-mommy/ for the rust users
05:41:00 <bslsk05> faultlore.com: cargo-mommy
07:30:00 <nikolar> rust users, smh
08:04:00 <zid> Imagine being a user
08:06:00 <klys_> something might happen
09:23:00 <adder> need some help with my linker?
09:24:00 <adder> https://bpa.st/2OMA
09:24:00 <bslsk05> bpa.st: View paste 2OMA
09:24:00 <adder> I have stack_top defined in my linker script, and I'm trying to use it from boot.S
09:24:00 <adder> I declared it as extern, so I'm not sure how it's not seeing it
09:25:00 <adder> and I've no idea what the error means
09:25:00 <adder> and yes, this is my third attempt, ditched limine
09:26:00 <adder> what relocation? why is it undefined? how am I making a PIE object?
09:31:00 <klys_> needs info about your compile and link commands
09:31:00 <klys_> also I cannot do this for lack of time.
09:36:00 <adder> this is the makefile https://bpa.st/NUFQ
09:36:00 <bslsk05> bpa.st: View paste NUFQ
10:04:00 <adder> nvm, fixed
13:11:00 <guideX> I'm having trouble separating my os code from the programs that run inside of it
13:12:00 <guideX> I guess my OS is missing an code interpreter, was just looking for others to sort of help me mentally grasp that, and see if I am thinking correctly about it
13:15:00 <heat> you're not
13:16:00 <heat> Traditionally (in 99% of cases), you just use whatever the CPU architecture gives you to separate the user code and kernel code
13:16:00 <heat> traditionally some sort of kernel mode and user mode, or in x86 ring 0 and ring 3
13:16:00 <guideX> ah ok
13:28:00 <guideX> so which mode it will run in is one thing, but what about the act of separating the code from the os? right now, there's no concept of code in the os vs program
13:28:00 <guideX> that's somegthing I have to build out I guess right
13:28:00 <mjg> so what material on operating systems have you read so far
13:28:00 <heat> system calls
13:29:00 <zid> Files, is a good way..
13:30:00 <zid> see: hard drives, initrd
13:50:00 <guideX> this is a cosmos c# operating system https://www.gocosmos.org/ https://imgur.com/a/O2ect3s I have been reading things on os's, but honestly it all began saturday morning, I have used cosmos before though
13:50:00 <bslsk05> www.gocosmos.org: COSMOS - COSMOS
13:50:00 <bslsk05> imgur.com: Imgur: The magic of the Internet
13:51:00 <heat> no idea how that works, sorry
13:52:00 <guideX> no problem, it's quite different, I am familiar with it though, I wrote a cli os with cosmos in the past
13:54:00 <guideX> my understanding is limited though, on how to separate concerns, https://pastebin.com/raw/i9xKrisi this is a program for instance, but it is built with the os itself
13:55:00 <guideX> I am so far not sure how to like, separate the code of the os from code of programs, and was just kind of wondering how other os's do it, or tips
13:57:00 <guideX> already I have a file system, networking, gui, and a program can have a window and stuff, but how do I put that code outside the os
13:57:00 <guideX> I can describe a program outside of the os, but the part where the code executes is harder
13:59:00 <guideX> is it like, I need to build a scripting language, and interpret the commands myself, or is my goal to try and use something
13:59:00 <guideX> or maybe neither of those things eh
14:34:00 <adder> I have a really, really, really stubborn pml4. I'm trying to get it page aligned via various means, from attributes to link time, but it remains at 0x111507?
14:47:00 <kof673> guideX, you keep saying "os" where i think most people would say "kernel"
14:48:00 <kof673> linux is "just a kernel" bsd includes both gnu/linux is an os (kernel + userland).
14:49:00 <kof673> you can use whatever terms you want, but understanding other people requires learning their definitions :D
14:55:00 <kof673> "Inconceivable!"
15:07:00 <guideX> kof673, I only started working on this thing on saturday, I'm a little ahead of my own knowledge and terminology
15:07:00 <guideX> it is already incredibly far along for how little time I have put into it
15:08:00 <guideX> I am using cosmos c# sdk, which abstracts things for me to some degree also
15:09:00 <guideX> there is a kernel project in my os vs the logic of the os
15:10:00 <guideX> the kernel handles things like, the file system, memory allocator, some drivers and things, and a whole lot more, the os project is about the os, built in features, and the things you see in the os
15:11:00 <guideX> there also libc, .net corlib, and the cosmos bits
15:12:00 <guideX> I say os instead because, it's hard to describe it all in a few short words xD
15:14:00 <guideX> but I guess the thing I am having trouble mostly with, is those "built in programs" vs the ones that exist outside my os, it's a troubling concept so far for me, I'm not sure how to go about that in a logical way
15:16:00 <GeDaMo> Does your system have the concept of processes?
15:20:00 <guideX> GeDaMo, yeah, I built like an app container
15:21:00 <guideX> it is for built in apps, but I have been trying to figure out how to have external apps, that bit is kind of missing
15:21:00 <guideX> I have it capable of describing a window of an external app, but I'm not sure how to execute code from an external app yet
15:21:00 <guideX> I say external like it means something, basically just code that is not compiled with the os
15:23:00 <guideX> I've been trying to figure out; how do I put this program for example outside the built code of the os https://pastebin.com/raw/btNZTLfM
15:23:00 <guideX> that : Window is what defines what is a gui program in my os
15:25:00 <GeDaMo> You'll need some kind of executable format which you can compile to and load into a process
15:28:00 <guideX> GeDaMo, I kind of just am looking for what is a logical way to go about it I guess, I can write the code to do it myself.. I would need something to build something like an interpreter for the front end controls (like xml or whatever), and something to interpret the code (a scripting language), and stuff it all inside some kind of zip file, and when launched, it does all the things with the front end and code to execute
15:28:00 <guideX> is that sort of, how to do that in a paragraph?
15:29:00 <guideX> I basically need to interpret scripts I guess eh
15:29:00 <GeDaMo> If you build an interpreter into your system, you can just load text files
15:31:00 <dostoyevsky2> https://thasso.xyz/2024/07/13/setting-up-an-x86-cpu.html
15:31:00 <bslsk05> thasso.xyz: Setting up an x86 CPU in 64-bit mode
15:31:00 <heat> guideX, tip: don't use cosmos
15:32:00 <heat> cosmos isn't *really* a proper operating system
15:32:00 <heat> in using it you may just be confusing yourself further
15:33:00 <heat> adder, what did you try? are you sure you aren't looking at the wrong thing?
15:34:00 <kof673> https://0x0.st/s/JR6YKh_XCclMaYjOa3S0aA/XLwE.jpeg diagram of common levels of separation
15:52:00 <guideX> heat, just curious what do you find improper about it
15:53:00 <guideX> it does work on bare metal if that's a concern
15:53:00 <guideX> also you can download the cosmos source code and make changes to the base
15:53:00 <heat> it is (IMO) a glorified demo thing that's only popular because it's in C#, a language that really isn't suited for kernel development in any way shape or form
15:56:00 <guideX> actually, I don't think I'm using much cosmos, I am using it for debugging support, and the bootloader and console and certain things, I am mostly using .net native aot
15:57:00 <guideX> my other os is like 100% cosmos though
16:01:00 <guideX> ok, per your advice I removed all the cosmos bits, it still works fine xD
16:01:00 <guideX> I guess I was just using it for the cli portion, which I don't need it for that even
16:01:00 <guideX> this is entirely just, .net7/corlib, libc, and my os code
16:01:00 <heat> my advice is to stop using C# altogether :)
16:02:00 <heat> it is seriously not the language you want for low level development
16:02:00 <heat> C, C++, Rust - all fine choices
16:02:00 <heat> probably a few others i can't think of right now
16:03:00 <GeDaMo> asm! :P
16:03:00 <heat> /votekick GeDaMo
16:06:00 <dzwdz> any opinions on queue(3)?
16:08:00 <guideX> idk that is harder, the entire thing is c# xD
16:10:00 <guideX> I can write c++ too, but it is too late
16:15:00 <mjg> dzwdz: these are semi-shite macros, but ultimately they do work
16:16:00 <jimbzy> heat, C# == C++++?
16:16:00 <mjg> the real q though is if you should be linked listin' to begin with
16:16:00 <mjg> jimbzy: c# == ++c++;
16:16:00 <jimbzy> Ahhh
16:17:00 <jimbzy> That makes sense.
16:18:00 <dzwdz> i mean, linked lists are simple and i don't have any fancy needs
16:18:00 <dzwdz> and i was thinking about using some common abstraction for linked lists instead of reimplementing them everywhere
16:20:00 <dzwdz> i'm debating if i should do that
16:22:00 <dzwdz> queue(3) is kinda ugly but at least it's well known, and i think it's used in the bsd kernels too?
16:23:00 <dostoyevsky2> can one use floating point numbers in the linux kernel? I remember on openbsd the compiler has flags that forbid fp to make context switches cheaper
16:24:00 <heat> dostoyevsky2, generally no, but there are exceptions if you really need SIMD for instance
16:24:00 <heat> (surrounded by kernel_fpu_begin/end())
16:25:00 <dostoyevsky2> heat: ah, interesting
16:25:00 <mjg> dude
16:25:00 <mjg> wtf
16:25:00 <mjg> fp being forbidden by default is kernels 101
16:26:00 <dostoyevsky2> unless it's a cuda kernel
16:28:00 <heat> i would bet 200 weimar republic papiermarks as to how fuckin windows probably does something different
16:29:00 <heat> damn i was wrong, they also have KeSaveExtendedProcessorState/Restore
16:29:00 <heat> guess i lost like 2 cents
16:36:00 <mjg> you also lost some social kredit with the gestapo
16:37:00 <heat> too early
16:37:00 <heat> no gestapo yet
16:37:00 <mjg> shit, also a month
16:38:00 <mjg> my apologies
16:42:00 <mjg> s/also/almost/
16:42:00 <mjg> wtf
16:43:00 <mjg> anyhow i'm on the market for temporary access to a real pentium 3
16:43:00 <heat> lol what
16:43:00 <mjg> i'm not surprised to not find any options :[
16:44:00 <mjg> there is magic code i am totally not going to share which i'm confident sucks terribly
16:44:00 <mjg> despite the author claiming it's fast (while ofc providing nothing to back it up)
16:44:00 <mjg> according fog's instruction tables it is indeed bad
16:44:00 <mjg> the question is how much we talkin'
16:48:00 <heat> sometimes code size really does matter tho
16:48:00 <mjg> .. :D
16:48:00 <mjg> mofs
16:48:00 <mjg> not doing that shit would be less code
16:50:00 <mjg> look mon the code is totally geezered
16:50:00 <mjg> the question is hwat kind of stats we talkin' specifically
16:56:00 <kof673> model name: Pentium III (Coppermine) i got a system or 2
16:58:00 <mjg> can you give me ssh access? i only need to prod some userspace a little bit
16:59:00 <mjg> as an unpriv user
16:59:00 <mjg> is that linukkz by any chance?
17:00:00 <kof673> i don't know if i have access to router to portfwd :/
17:00:00 <kof673> yes, i can run knoppix 8.1 binaries easily lol live cd, if you can compile ther
17:01:00 <kof673> didn't mean to tease ....ask geist :D
17:01:00 <mjg> how much ram you got there
17:01:00 <mjg> fuckery could be done with reverse ssh, but i don't remember how that's done
17:02:00 <kof673> this system is like 512M not gonna happen. the other system is a laptop with non-working AC...so you would get about 2 hours. 1G or 2G there
17:02:00 <kof673> 2 hours before the battery dies lol
17:02:00 <kof673> maybe more...4?
17:02:00 <mjg> and you can't rechanrge the sucker? :D
17:02:00 <mjg> is it dead for good after?
17:02:00 <kof673> yes, but not while it is inside, i was hoping to build some contraption
17:03:00 <mjg> well i can prep some test scriptzz
17:03:00 <kof673> i can do that :D
17:03:00 <kof673> just boot knoppix 8.1 in qemu and get it working there :D
17:03:00 <mjg> 8(
17:03:00 <mjg> aight
17:03:00 <kof673> unless you have another live cd/dvd/usb stick :D
17:03:00 <mjg> can you test if perf works though?
17:03:00 <kof673> if you tell me what to do
17:03:00 <kof673> or does it need custom kernel?
17:04:00 <mjg> hrm
17:04:00 <mjg> that's 2017 vintage
17:04:00 <mjg> so that's after the metldown et al fiasco
17:04:00 <kof673> too new? lol
17:04:00 <heat> pretty sure those cpus don't have the bugs?
17:04:00 <mjg> ye it would be best to bench without "knowing" about the problems
17:04:00 <mjg> heat: dude
17:05:00 <mjg> the 32-bit kernels got a facelift
17:05:00 <mjg> full 4G space
17:05:00 <mjg> cause meltdown
17:05:00 <heat> what?
17:05:00 <heat> when did that happen?
17:05:00 <mjg> fresh after meltdown?
17:05:00 <mjg> kof673: yo mate can you boot the sucker and "lscpu"
17:05:00 <kof673> yeah, gimme 5 minutes or so
17:05:00 <heat> idk boss i don't follow the 32-bit kernel stuff
17:08:00 <kof673> actually that has knoppix 7.6.1 unless i burn a cd or dvd maybe. 1G ram , kernel 4.2.6 .... that one is Mobile Intel® Pentium® III Processor "pentium M" 1500 mhz
17:08:00 <kof673> its loading... :D
17:09:00 <kof673> this system...is pentium III coppermine, 930 "desktop" https://0x0.st/s/n0954Iym5WxGvf8efguQiQ/XL3b.txt lscpu knoppix 8.1 lol
17:09:00 <kof673> *930 MHz
17:10:00 <kof673> 8.1 has kernel 4.12.7 yes 2017
17:12:00 <heat> mjg, whatever you're talking about doesn't check out
17:13:00 <heat> just booted a new i386 alpine linux kernel and page tables are as usual, userspace mapped, kernel addresses at typical i386 places
17:14:00 <kof673> the pentium m.... is same lscpu but model 9, and a few more flags: fpu vme de pse tsc msr mce cx8 sep mtrr pge mca cmov clflush dts acpi mmx fxsr sse sse2 tm pbe bts est tm2 no sse2 on the desktop
17:15:00 <kof673> uname -a says 4.2.6 was built in dec 2015
17:16:00 <mjg> heat: it totes happened on freebsd, i did not verify on linux
17:16:00 <mjg> heat: i'm gonna check linux soon(tm)
17:16:00 <mjg> kof673: can you "perf top" in there
17:17:00 <mjg> i'm gonna need period-accurate gcc and whatnot for some tinkering, but that i'm gonna sort out on my end
17:17:00 <mjg> and by period accurate i mean about 2000
17:19:00 <kof673> linux-perf-4.12 is not installed, /usr/bni/perf fails. not installed... i can compile stuff...
17:20:00 <mjg> perf is compilable from the kernel source, but it has quite a few deps
17:20:00 <mjg> knoppix is probably not suited for that
17:20:00 <mjg> i can try to get a binary working to copy over there
17:21:00 <mjg> that said i'm gonna prod you some time next week
17:21:00 <mjg> thanks mate
17:21:00 <kof673> ok
17:21:00 <Matt|home> o\
17:24:00 <kof673> the 7.6.1 "perf top" doesn't even exist at all...so go with 8.1 :D
17:25:00 <kof673> unless you really need older kernel
17:25:00 <kof673> older knoppix than those should also all work AFAIK
17:43:00 <guideX> I think what I'll do is convert the c# to cil, then find a cil interpreter
17:44:00 <guideX> and then I'm off to using binaries
17:45:00 <guideX> and that is how to separate a program from a piece of code inside my os
18:07:00 <chiselfuse> how is the `catch syscall` implemented in gdb?
18:08:00 <chiselfuse> how does the process get stopped at a point where it executes a specified system call?
18:08:00 <heat> ptrace
18:09:00 <heat> see ptrace(2)'s PTRACE_SYSCALL
21:01:00 <heat> geist, what was your solution wrt riscv page mappings requiring sfence.vma? my problem being on the kernel side
21:01:00 <geist> shoot em down
21:02:00 <heat> aww eww
21:02:00 <geist> the key is whether or not you can handle a stray page fault
21:02:00 <geist> so for kernel, on zircon, we cant, so i shoot down mappings. for user space, just sfence on the local cpu
21:03:00 <geist> but also sfence locally on page fault entry (or exit, but entry is easier) just to make sure if you dont find anything wrong it'll retry
21:03:00 <heat> btw am i misreading this or do i need a global tlb invalidation for a paging structure change?
21:03:00 <geist> i still dont 100% understand why it's needed, almost like the cpu can store a 'negative' entry, but i dont understand it
21:04:00 <geist> basically you do to be 100% correct
21:04:00 <heat> wack :(
21:04:00 <geist> the verbiage of the spec says you must sfence any time you change the page tables
21:04:00 <geist> but i think the key is when adding an entry the worst case is it misses the addition
21:04:00 <geist> so you get a stray page fault, which if you sfence inside the PF handler and retry will continue
21:05:00 <geist> so i avoid doing a shootdown for all cores for user mappings
21:05:00 <heat> oh that's even worse, christ
21:05:00 <heat> i didn't notice the "adding"
21:05:00 <heat> i've been tighening up my x86 semantics there wrt page table removal and i'm figuring i should do the same for riscv
21:05:00 <geist> https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/kernel/arch/riscv64/mmu.cc#851
21:05:00 <bslsk05> fuchsia.googlesource.com: zircon/kernel/arch/riscv64/mmu.cc - fuchsia - Git at Google
21:06:00 <heat> but, seriously, global invalidation on page table adding :sob:
21:06:00 <geist> well, 'adding' is changing the space
21:06:00 <heat> yeah but other MMUs aren't this silly
21:06:00 <geist> yep, and this is precisely why i dont really want to try to unify the page table logic
21:06:00 <geist> because whe it gets to the nitty gritty here the arches start to differ
21:07:00 <geist> esp when you throw ASIDs into the mix
21:07:00 <heat> in fact most x86's are de-facto "one invalidate flushes all of the walker cache"
21:07:00 <heat> actually this is pretty ok to handle, my tlb code is separate
21:08:00 <heat> e.g most of my unmap code is arch generic, with a bunch of arch-specific accessors and helpers. but the tlb invalidation code is probably going to be entirely separate
21:08:00 <geist> yah at the minimum you need to abstract it. probably something like 'handle_tlb_change(is_kernel, asid, addr)' and then some sort of spattring of 'flush_tlb(situation)' that each arch deals with
21:09:00 <geist> and depending on the arch it decides to take action or not on all those situations
21:09:00 <heat> arch-independent logic just does e.g tlb_remove_page, tlb_remove_pmd, tlb_remove_pud
21:09:00 <heat> for instance for arm64 i'm planning on just straight up shooting a tlbi and the tlb invalidation finish just being a dsb + isb
21:10:00 <geist> suggestion: stop using x86 style names for layers
21:10:00 <geist> just use level 0, 1, 2, 3, and decide which order to name it
21:10:00 <geist> far simpler
21:10:00 <heat> haha
21:10:00 <heat> i LARP'd linux sorry
21:10:00 <geist> at the minimum it makes it easier to write recursive stuff and just use something like int layer
21:10:00 <heat> wdym recursive?
21:10:00 <geist> if you really want you can use soething like enum level { PMD, PML4, etc }
21:10:00 <Ermine> 'while learning from its mistakes'
21:11:00 <geist> oh if you wanted to do somthing like 'traeverse(pt, vaddr, level)' that recursively calls itself
21:11:00 <geist> with level - 1 (or + 1)
21:11:00 <heat> yeah i can't do that, i can't assume levels are equal in size or length or format
21:12:00 <geist> oh you wanna port to 68k?
21:12:00 <geist> i think linux does too, they just recquire cpus with non uniform levels to suck it up
21:12:00 <heat> or x86 PAE :)
21:12:00 <geist> sure, but that's still easy to at least quantify
21:13:00 <geist> `constexpr size_of_level(int level) { switch(level) .... }
21:13:00 <geist> if you pull it off right it's pretty darn efficient
21:14:00 <geist> on arm64 that's all dynamic anyway if you choose anything but the default aspace size
21:14:00 <geist> so if you write it that way you just replace those with non constexpr functions
21:15:00 <heat> https://gist.github.com/heatd/7e73261e8b44767110225e51a777e75a here's my unmap
21:15:00 <bslsk05> gist.github.com: unmap.c · GitHub
21:15:00 <heat> it's darn generic!
21:15:00 <heat> but very linuxy :)
21:16:00 <heat> tlbi_remove_* does the TLB magic for whatever arch, pte_* (et al) are all arch-dependent, set_*() is also arch-dependent
21:16:00 <geist> yah see you can just collapse a few of thoe trailing funcs into one that takes a level
21:17:00 <heat> yeah
21:17:00 <heat> i did notice my page table function codegen got larger
21:17:00 <geist> now the problem there that's annoying is defining the order of the levels, and annoyingly they're not the same brtween arches
21:17:00 <geist> iirc arm numbers them backwards, like the leaf nodes are always 0, independent of how deep the structure is
21:17:00 <geist> which i guess makes sense in a certain way
21:18:00 <geist> though to me it always makes sense in my mind to number level 0 as the root, and the leaf level is just wherever it happens to be
21:18:00 <geist> 2, 3, 4, 5? whatever you know
21:18:00 <heat> yeah, like x86
21:18:00 <heat> _64
21:18:00 <geist> only reason it sort of matters on ARM is page faults in the ESR_EL1 arm actually tells you at what level it failed
21:19:00 <geist> and the way they encode it is according to their naming conventoin
21:19:00 <geist> so if they say there was a tlb permission failure at level 0 it was always at the leaf node
21:19:00 <geist> or atleast the deepest part. kinda makes sense
21:20:00 <geist> but anyway i still like the idea of counting up as you go down the tree, so that's my thing
21:20:00 <heat> yeah i prefer counting up too
21:21:00 <geist> x86 counts backwards to right? PML4, PML5?
21:21:00 <heat> yep
21:21:00 <heat> PML5 is the root
21:21:00 <geist> i guess the logic behind counting down is you can basically 'seed' your walk fro th root with how deep it is for your configuration
21:22:00 <geist> `int get_tree_depth() = 5` then start your walk until you're at 0
21:22:00 <geist> which sort of makes sense from a logic point of view, keeps you fromt having to test at every level if you've reached the terminal depth
21:22:00 <geist> or at least the test is compare with 0
21:22:00 <heat> the easier solution to this problem is to say "stop counting nerds lol" and adopt whatever a linux guy half-drunk in 2004 said should be the page levels
21:22:00 <geist> tis probably why the hardware internally works that way
21:23:00 <geist> 2004 haha, goes a lot farther back than that bruh
21:23:00 <heat> i am aware
21:23:00 <heat> dunno what was the 4-level arch first arch
21:23:00 <geist> like 1994 or so when they ported to alpha and had to deal with 'shit how are we gonna shoehorn this non x86 page tables into x86'
21:23:00 <geist> 'oh i know, lets just add a bunch of macros and deal with it'
21:23:00 <geist> make it look like x86
21:24:00 <heat> i have to say they're not macros and the type hack they found is lovely and i need to use it more
21:24:00 <geist> yah though originally it was probably macros since that was older C
21:24:00 <heat> typedef struct { pteval_t pte; } pte_t; /* no implicit type conversion now bozos */
21:24:00 <geist> i do remember they were abusing the crap out of inline functions that were not standard C. my early newos code tried to use a lot of their bits
21:24:00 <geist> yah or even opaque structs in C
21:25:00 <geist> `struct myshit; void frob(myshit *, int frob_func);`
21:26:00 <heat> personally not a fan of opaque structs
21:26:00 <heat> you can't declare them on the stack and that's mega lame
21:26:00 <geist> agreed, but i'm a big fan of hiding the contents of stuff from callers that dont need to know
21:26:00 <geist> one of the generally worst parts of C++
21:26:00 <geist> the general cheat there that sometimes works is to define some sort of sizeof(struct) for the caller
21:27:00 <geist> and then they can pass you a buffer of bytes for them to construct a thing on, but that's pretty lame
21:27:00 <heat> if you don't have the deetz where are the inlines coming from :(
21:27:00 <geist> `struct foo; #define FOO_SIZE 64; #define FOO_ALIGN 8` and then inside your .c file do a `static_assert(sizeof(struct foo) == FOO_SIZE)`
21:28:00 <geist> oh word. indeed
21:28:00 <geist> i use it for things where it's really an opaque thing that they shouldn't know about ad dont need inlines
21:28:00 <geist> like a pointer to a driver or whatnot
21:28:00 <heat> a compromise i like: <frob_types.h> struct frob { /* ... */ }; <frob.h> static inline void frob_init(struct frob *) ...
21:28:00 <heat> i find it neatly reduces the header hell
21:28:00 <geist> yah
21:31:00 <heat> btw geist i don't see how the lazy non-leaf PTE adding thing is supposed to work?
21:31:00 <geist> lazy non leaf pte adding....
21:31:00 <geist> not sure i get what you're getting at
21:31:00 <heat> if they specifically ask for a global sfence.vma, i suspect it's not the same as doing sfence.vma <addr within page table>
21:32:00 <heat> at least per the very-formal-very-great riscv priv spec
21:32:00 <geist> well, the question s what is the worst case scenario
21:32:00 <geist> if you dont flush it on a clean add, what could possibly happen
21:32:00 <geist> it appears that worst case the cpu will get a tlb miss even if it's present
21:33:00 <geist> as if there's a pt walker cache that has cached the non present entry
21:33:00 <heat> could you get stuck in a page fault loop cuz no one understands what's happening?
21:33:00 <geist> right, that's why i also said here and in the comment that you should always flush the page upon entry to a PF
21:33:00 <geist> that way worst case if it appears like nothing to be done you just restart and it should work the second time
21:33:00 <heat> yeah but global sfence.vma != sfence.vma <addr> right?
21:33:00 <geist> that's right, but a PF is only for a local cpu
21:34:00 <geist> so you're only concerned about the vision of that one cpu at the time
21:34:00 <geist> so if you have 8 cores and you map a page on the first cpu, locally sfence and continue
21:34:00 <geist> now you have the chance that 7 other cores will as they touch that page also fault with an extraneous fault (but probably not)
21:34:00 <geist> so if they do, they locally sfence and continue and will work the second time
21:35:00 <geist> so you're avoiding a global flush with the idea that 99% of the time the secondary cores wont trip over it
21:35:00 <heat> yeah that's not my point
21:35:00 <geist> oh for adding inner nodes? yah
21:35:00 <heat> point is: "If software modifies a non-leaf PTE, it should execute SFENCE.VMA with rs1=x0" this would imply that it's permissible for the implementation to *not* flush the walker cache on a sfence.vma rs1=addr no?
21:37:00 <geist> yah i dont when adding a new one (and only in that case) because it seems fine on real hardware: https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/kernel/arch/riscv64/mmu.cc#797
21:37:00 <bslsk05> fuchsia.googlesource.com: zircon/kernel/arch/riscv64/mmu.cc - fuchsia - Git at Google
21:37:00 <geist> but that may also not be correct
21:37:00 <geist> that MapPageTable routine is basically the main recurse and map routine
21:37:00 <heat> yeah linux seems to YOLO it too
21:38:00 <heat> either this should be amended in the spec or we're all fucked
21:39:00 <geist> so on removal we do do TLB flush but only at the end of the operation, before the page is returned to the PMM https://fuchsia.googlesource.com/fuchsia/+/refs/heads/main/zircon/kernel/arch/riscv64/mmu.cc#723
21:39:00 <bslsk05> fuchsia.googlesource.com: zircon/kernel/arch/riscv64/mmu.cc - fuchsia - Git at Google
21:39:00 <geist> same as arm64
21:40:00 <geist> note you should grab the newest version of the spec, not officially released, it's a lot clearer
21:40:00 <geist> someone has been cleaning it up
21:40:00 <geist> it might mention more bits about it
21:40:00 <geist> at the minimum it moved away from the default Latex look
21:41:00 <geist> there's also the newer, fancier flush mechanism though i dont think it changes the contract really
21:41:00 <heat> https://gist.github.com/heatd/0f41377789a81cd16dc44602f2c93890 my logic for x86 removal is actually Real Simple
21:41:00 <bslsk05> gist.github.com: x86_tlbi.c · GitHub
21:41:00 <heat> god bless x86
21:41:00 <geist> yah and of course ARM has something fairly similar as you're aware
21:42:00 <heat> wdym?
21:42:00 <geist> well, the whole page table walker cache maintenance thing
21:42:00 <geist> where you need to tell it to dump the inner nodes manually
21:42:00 <heat> yep
21:42:00 <geist> or always use the stronger version, in which case it acts like x86 al the time
21:43:00 <heat> tbf i don't know how intel cores are looking in this regard, they explicitly recommend that logic i used. amd also does, but they explicitly mention the old behavior of a single invlpg flushing the whole thing
21:43:00 <heat> and amd does have the EFER.TCE bit you can set, which actually opts-in
21:46:00 <geist> yah i think it just basically mentions that there is a page walker cache and you dont have to worry about it as long as you invlpg
21:47:00 <heat> https://reviews.freebsd.org/D45191 oooooooooooh
21:48:00 <geist> oh wow that's a very good point
21:48:00 <geist> never occurred to me, i had kinda written it off too because of the same PCID issue
21:49:00 <geist> most likely linux wont use it because they've probably invested in a bunch of 'avoid IPI storm' logic that probably scales better
21:50:00 <geist> i think there's even talk in the riscv manuals or one of the things i read that said they sort of explicitly didn't do the broadcast stuff because in the long run having software do it is more efficient, somewhat paradoxically
21:50:00 <geist> at least when you really scale up to 128, 256, etc cores. software knows best (ie, linux)
21:51:00 <geist> i've heard some folks grumble that on some of the ARM server cores the broadcast IPI stuff is *slow* because the hardware has some concurrency issues
21:51:00 <geist> and you're almost better off switching to a software solution
21:51:00 <heat> i kinda want to play around with it now, though i don't have a zen 3
21:51:00 <mjg> except these ipis are mostly self-induced on freebsd
21:51:00 <mjg> s/ipis/invalidations/
21:52:00 <geist> yah vs doing a simple broadcast ipi it's propbably a win
21:52:00 <geist> vs doing a very sophisticated 'avoid flushing until you have to' software solution it's probably not
21:52:00 <mjg> the kernel makes extensive use of temporary mappings which it keeps whacking
21:52:00 <geist> right, exactly
21:52:00 <geist> or some sort 'delay this on that cpu because it's idle, or running user space, or something'
21:52:00 <mjg> vast majority of that can be straight up eliminated
21:53:00 <mjg> most commonly seen ipis come from freeing pipe-backing buffers
21:53:00 <geist> what about plain thread stacks?
21:53:00 <mjg> they are cached
21:53:00 <geist> how so?
21:53:00 <geist> like recycled from previous mappings?
21:53:00 <geist> some sort of LRU?
21:53:00 <mjg> no lru or anythign, just per-cpu caching of some numbers of stack caches
21:54:00 <mjg> linux is also doing it, except as a total hack (caching up to 2 per cpu)
21:54:00 <geist> and when creating a thread it tries to grab one from the local cpu
21:54:00 <mjg> yes
21:54:00 <geist> well, it's kinda a lru, just distributed across the cpus
21:54:00 <geist> so i gues the worst case is you suddenly create a destroy a bunch of threads, so it builds up a list
21:54:00 <mjg> it's a static 2-sized array
21:54:00 <geist> but then i guess it can try to collect and free a bunch at a time, which would be a win from IPI point of iew
21:55:00 <mjg> it notoriously overflows
21:55:00 <geist> right
21:55:00 <mjg> i added some probes and ran a kernel build
21:55:00 <geist> still, better than nothing, but not by much
21:55:00 <geist> would just soak up some little stray thread creation
21:55:00 <mjg> well it is better than nothing but seriously lame af
21:55:00 <geist> but point is when it frees those i assume it does some IPI to everyone
21:56:00 <mjg> they have something to delay ipis in vmalloc/vfree (which is how stacks come to be), but eventually yes, you get hit
21:56:00 <heat> linux vmalloc does not broadcast ipis
21:56:00 <heat> *if it can*
21:56:00 <geist> i was always wondring if there was some sort of generation style counter thing where you dont free the pages to the PMM, but you unmap them but dont TLB broadcast
21:56:00 <geist> but roll the gen couter
21:56:00 <heat> you can totes free the pages
21:56:00 <mjg> here is a simple solution which takes care of everything: intermediate per-node cache
21:56:00 <geist> as cpus cycle through the kernel they bump their counter to match, tlb flush, then when they all have you return the pages
21:56:00 <heat> it's the kernel, you can do whatever you like
21:56:00 <mjg> there, after some warmup you will probably never vfree any of the pages
21:57:00 <heat> UAF deref? you're fucked anyway, might as well leak some data
21:57:00 <mjg> and the local array overflow will add to the per-node cache instead of vfreeing
21:57:00 <mjg> i'm rather negatively surprised they did not already do it
21:57:00 <heat> NEGATIVELY SURPRISED
21:57:00 <mjg> what
21:57:00 <heat> another one of your funny expressionz
21:57:00 <geist> any of those local caches always have a side effect though where you have to deal with an OOM situation
21:57:00 <heat> depessimize style
21:57:00 <geist> so tey're not free necessarily
21:57:00 <mjg> ofc
21:58:00 <mjg> this is why i said per-node cache
21:58:00 <mjg> not bigger per-cpu caches
21:58:00 <geist> and more to the point folks may take extra convincing as a result
21:58:00 <mjg> that i doubt. note the total number of cached stacks may even be the same
21:58:00 <geist> our security folks in fuchsia would take longer to convince because of potential use-after-free situatios of recycling stacks
21:58:00 <mjg> the difference is you don't go to vmalloc/vfree pair
21:58:00 <geist> easiest way is to just not tell them
21:58:00 <heat> geist, re the riscv thing: on rs1!=x0 "The fence also invalidates all address-translation cache entries that contain leaf page table entries corresponding to the virtual address in rs1, for all address spaces."
21:59:00 <mjg> linux memsets cached stacks fwiw
21:59:00 <geist> yah i guess that's true
21:59:00 <heat> so it is weird that they explicitly recommend a global fucking flush?
21:59:00 <geist> yep, that was exactly my surprise too
22:03:00 <geist> it' pretty bad. i assume they're starting off very conservatively and over time more extensions will appear that relax things
22:03:00 <geist> so for example the sinval instruction is now there which is a bit looser
22:04:00 <geist> it makes sense, it's always easier to loosen up things over time with new features or even just a extension flag that says 'this isn't as strict could be' kinda things
22:04:00 <heat> Znotmyfirstcpuunicourse
22:05:00 <heat> wording around this should probably be tightened up
22:06:00 <geist> note i only saw the weird stray PF stuff on a newer sifive core that's not generally available yet
22:06:00 <geist> i have meant to ask them precisely what's going on, even their manual doesn't really go into any more details
22:06:00 <geist> p470/p670. they're at least announced
22:06:00 <geist> proper superscalar + vector riscv core
22:07:00 <heat> didnt you also see it on the starfive you have?
22:07:00 <geist> you know i dunno, i forget
22:07:00 <geist> i may have been talking about runnig it on the newer cores, i as just being coy about it because i didn't want to mention the new cores
22:07:00 <geist> but now they're announced, etc
22:08:00 <heat> yeah
22:13:00 <heat> https://kib.kiev.ua/x86docs/Intel/WhitePapers/317080-002.pdf i found this earlier, great overview
22:17:00 <geist> oh that does look good
23:49:00 <heat> i kinda wanted a CoW filesystem tree right now
23:50:00 <heat> but i use REAL FILESYSTEMS so i'll have to just git clone the source code into a separate tree
23:53:00 <clever> heat: seen git worktrees?
23:54:00 <heat> nop
23:54:00 <heat> e
23:55:00 <clever> basically, `git worktree add ../path branch`
23:55:00 <clever> and it will checkout that branch, in a new dir
23:55:00 <clever> but both places, share the .git and object store
23:55:00 <clever> so you can act on 2 branches of a repo at once, without paying the cost of 2 .git's
23:56:00 <heat> oh interesting
23:56:00 <clever> [clever@amd-nixos:~/apps/nixpkgs-master4]$ cat .git
23:56:00 <clever> gitdir: /home/clever/apps/nixpkgs/.git/worktrees/nixpkgs-master4
23:56:00 <clever> internally, .git becomes a file, pointing to the master .git
23:57:00 <clever> `git worktree ls` will also show the state of each
23:57:00 <clever> s/ls/list/
23:57:00 <clever> i have 25 different worktrees on nixpkgs alone
23:58:00 <heat> what happens if you check out a branch, then make some changes?
23:59:00 <heat> can you easily merge those back?
23:59:00 <clever> a given branch can only be checked out in one place at a time
23:59:00 <clever> and the normal things like `git merge` can still work as usual
23:59:00 <heat> hmm guess i could fork a new branch
23:59:00 <heat> lets try that
23:59:00 <clever> there is also `git worktree add ../path -b newbranch oldbranch