Search logs:

channel logs for 2004 - 2010 are archived at ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

Thursday, 2 March 2023

00:11:00 <gorgonical> I'm getting a weird feeling I've somehow broken some cache functionality. A hashtable lookup that's never broken before is exhibiting now the same kind of cache coherence behavior from earlier, requiring a manual dcache flush to show up on the other core
00:12:00 <gorgonical> I don't like when cache stuff like this breaks because it's very confusing
00:48:00 <geist> yeah
00:48:00 <geist> you were not doing SMP before and now you are?
00:48:00 <geist> what changed?
00:51:00 <kazinsal> ha, yeah, that's something that's pretty universal. "hey this works great" *turns on another core* "what the fuck"
00:52:00 <klange> I continue to be rather afraid of how little I did in this regard.
00:52:00 <klange> What mysterious issues lurk in unseen corners?
00:53:00 <Mutabah> Meanwhile, I reaped the benefits from using rust recently. Turned on SMP, and after getting the bringup logic working - everything Just Worked
00:54:00 <moon-child> rust won't help you not screw up the cache
01:00:00 <gog> rust rust rust
01:03:00 <gorgonical> geist: I don't know if we ever got kitten with multiple cores working on this board, but we've got it working on a different one
01:04:00 <gorgonical> with multiple cores. that was how we learned about the morass that's inner/outer shareable and how both the tcr registers and the page tables are marked for it separately, etc
01:04:00 <gorgonical> I don't *think* I did anything to mess with that, but now I'm seriously paranoid lol
01:06:00 <Mutabah> moon-child: Well, not things like TLB shootdown sure... but other race conditions yep
01:06:00 <moon-child> kindaaa
01:07:00 <moon-child> it can at least attempt to draw attention to cases when something is shared
01:07:00 <moon-child> which you can argue is advantageous
01:07:00 <moon-child> but you can always manifest your own race conditions at a higher level
01:07:00 <gorgonical> I haven't changed any of the code to do with memory configuration and stuff, just checked. I did check and the TCR has shareability set to inner, which is the less inclusive choice
01:07:00 <gorgonical> I should check the page tables themselves now
01:08:00 <moon-child> not saying useless. But not panacea either
01:12:00 <geist> yah this may be where you have been missing some memory barriers and now you need them
01:12:00 <geist> for example updating a page table entry and not DSBing when done
01:15:00 <gorgonical> Is it possible that this is "IMPLEMENTATION DEFINED" behavior varying? I'm seeing with an alarmingly large amount of the arm cache/translation stuff that the exact behavior is impl. def.
01:15:00 <geist> not anything in v8
01:16:00 <gorgonical> I am currently chalking it up to the pine64 board being different from the rk3399. After all, the rk3399 actually has two clusters, meaning in theory there is a difference between inner/outer shareable. But I was under the impression that if PE 0 and 1 are in the same "shareability domain" and you mark pts as inner shareable you shouldn't have to manually manage cahce
01:16:00 <geist> v8 standardized things pretty solidly. before that there were some exceptions
01:16:00 <geist> and no. the two clusters should absolutely be inner sharable
01:16:00 <gorgonical> then what is outer for?? the manual at least suggests clusters could be in different domains
01:16:00 <geist> otherwise it wouldn't work. all cores that expect to be in the same SMP domain *must* be inner sharable
01:16:00 <geist> outer is basically deprecated. idea is you could put cores that are off running different things
01:17:00 <geist> or say, a GPU or whatnot
01:17:00 <geist> but i think in practice nothing uses it
01:17:00 <gorgonical> oooh a gpu is not a bad idea for is
01:17:00 <gorgonical> but I see what you mean
01:18:00 <gorgonical> but basically it means that for all systems if you mark it as *either* shareability then it should work as you expect, without explicit cache management?
01:18:00 <geist> well, not entirely sure what you're asking there
01:19:00 <gorgonical> What I'm really getting at is that since inner is fully covered by outer and outer is deprecated, it really means you have no-share and inner/outer
01:19:00 <geist> well, and 'sys' which is the outer to outer
01:19:00 <geist> at least in things like DSB and whatnot
01:19:00 <geist> think of it this way: inner sharable is for synchronizing between cpus in an SMP system
01:20:00 <geist> the SYS domain is for synchronzing with things like devices
01:20:00 <geist> so if you just want to say insert a barrier so that another cpu sees it: dsb ish
01:20:00 <geist> but if you want it to do the whole thing it's 'dsb sy'
01:20:00 <gorgonical> Holy crap those acronyms just clicked
01:21:00 <geist> in linu for example a bunch of these are hidden behing 'mb()' macros
01:21:00 <gorgonical> Before they just looked like perl sigils to me but it's "data sync barrier inner-share"
01:21:00 <geist> like smp_mb vs mb. and in the case of smp_ it'll probably be ish
01:21:00 <geist> yep!
01:22:00 <gorgonical> Hmm. Okay maybe I'm forgetting some basic CS theory here but what's the point of setting them shareable if, to be sure it happens, I have to dsb ish after each malloc for a smp-shared resource?
01:23:00 <geist> because ARM is a weakly ordered architecture (vs x86)
01:23:00 <gorgonical> Like, is the setting in tcr_el1 and the pt entries just a hint to the caching engine about what to flush and stuff when I *do* run dsb ish?
01:23:00 <geist> without actual barriers at particular points, other cores are not guaranteed to see writes in any particular order
01:23:00 <geist> it's still cache coherent, but the order at which things happens is undefined
01:24:00 <gorgonical> Gross. I'm also shocked this wasn't somehow a problem before
01:24:00 <geist> so you use barriers like dmb/dsb/isb at various points or use isntructions with built in barriers (lots of atomics do) so that it's not a problem
01:24:00 <gorgonical> I mean it's not shocking considering this is originally an x86 OS but we *did* port it
01:24:00 <geist> in general if yuo aren't doing a lot of volatile sharing, and you're probably using locks and atomics, it just works
01:24:00 <geist> since locks and atomics almost always have a barrier built in them already
01:25:00 <gorgonical> oh that's a good point
01:25:00 <gorgonical> what's the acronym for dmb then?
01:25:00 <moon-child> data memory barrier?
01:25:00 <geist> ie you grab the mutex (barrier), now you can fiddle with the object(s) all you want, and writes to the cpu can happen in any order, then you go to release the mutex (barrier), now all other cpus see everyting occurred
01:26:00 <geist> (this is where acquire and release semantics come into play, where the barryer can operate in either direction or both
01:26:00 <geist> )
01:26:00 <gorgonical> ah
01:26:00 <geist> and yeah DMB is a weaker and more subtle form of DSB
01:26:00 <geist> DMB just informs the cpu that when it gets around to writing out stuff it has to do it in this particular order (before + after)
01:26:00 <gorgonical> Then in theory if this stuff was working before I need to go snoop around my atomics/locks and make sure I didn't break something there. We use locks and such properly
01:27:00 <geist> but doesn't otherwise stop the world
01:27:00 <geist> DSB is more of a stop the world, make it get out into the cache subsystem
01:27:00 <geist> i think you'll also notice the cache flush routines have a DSB at the end right?
01:27:00 <gorgonical> yes
01:27:00 <geist> same with TLB sync routines on arm64
01:27:00 <geist> that DSB also acts as a 'make sure those pending cache flushes/tlb syncs happened'
01:27:00 <gorgonical> yeah it's transactional, right
01:28:00 <geist> since they are *also* weakly ordered
01:28:00 <gorgonical> It is more flexible but also more complex
01:28:00 <geist> think of both of those kinda operations are a special kinda bus write, essentially. queued up with all the other writes the cpu may be doing
01:29:00 <geist> yah makes you appreciate (or be grossed out) by all the efforts x86 cpus myust go through to make after appear in order
01:29:00 <geist> even if they aren't really
01:29:00 <gorgonical> seymour cray was right
01:29:00 <gorgonical> multicore is a mistake
01:29:00 <gorgonical> lol
01:29:00 <geist> x86 cpus are tracking a fairly substantial amount of state to keep everything looking in order from the outside
01:29:00 <moon-child> from what I understand from what david chisnall said, the effect of tso is basically just that you have to have a bigger write buffer
01:30:00 <moon-child> weak ordering def better though
01:30:00 <geist> right you have a large write buffer and strict dependencies between entries in it
01:30:00 <moon-child> mehh
01:30:00 <moon-child> need lots of fancy dependency tracking anyway
01:31:00 <moon-child> I actually kinda wonder what would happen if you moved more of that out of the uarch into the arch
01:31:00 <geist> true, though it's much simpler to just slam stuff out as the cache lines become available.
01:31:00 <geist> but it's a good question: what is the real cost of all of that
01:31:00 <moon-child> eg don't do memory disambiguation in hardware; instead, have alias tags on loads/stores
01:31:00 <geist> really the apple M1 cpus are probably the best way to test that theory, since they support both modes dynamically
01:32:00 <moon-child> maybe. Maybe they underestimate the effect, since you're still paying in area
01:32:00 <geist> they n eed it because of x86 emulation, on top of an otherwise weakly ordered machine
01:32:00 <geist> but in that case yeah they have to build the silicon to do it, but can turn it off for regular ARM code
01:32:00 <geist> so presumably there must be a win, since they have both situations at the same time
01:33:00 <geist> otherwise they could just choose to leave it on all the time since they've already got the silicon for it
01:34:00 <moon-child> yeah, but point is there might be an even bigger win if you could take that silicon and use it for something else
01:34:00 <gorgonical> so do the ldaxr/stxr instructions do something like dmb/dsb? They don't, right? This is just for synchronizing on the lock address itself
01:35:00 <geist> moon-child: oh 100%. they dont have that choice but most arm cores of course do, and thus dont have the overhead of silicon for it
01:35:00 <moon-child> there are some other big arm cores with a tso mode iirc
01:36:00 <geist> gorgonical: they do, the 'a' is the acquire part
01:36:00 <geist> otherwise it's just ldxr
01:37:00 <geist> and similarly stlxr vs stxr (the 'l' is for release)
01:37:00 <gorgonical> That's disturbing then
01:37:00 <geist> so using one or the other n either end gets you an acquire/release or both (seqcst)
01:38:00 <geist> is a pretty good cheat sheet onc eyou grok it
01:38:00 <bslsk05> ​ Rust Atomics and Locks — Chapter 7. Understanding the Processor
01:39:00 <gorgonical> If you always acquire then you never need to release, right?
01:39:00 <geist> no not at all
01:39:00 <geist> acquire vs release are all about which 'direction' the barrier applies to
01:39:00 <geist> ie, before vs after memory transactions
01:39:00 <moon-child> prefer
01:39:00 <bslsk05> ​ C/C++11 mappings to processors
01:39:00 <geist> if you put both then you have created a full before & after barrier
01:40:00 <geist> otherwise an acquire or relase barrier only works in one direction, which is fine for the nomenclature: you're acquiring access to the data vs releasing it
01:42:00 <gorgonical> Then I must be misunderstanding something
01:43:00 <gorgonical> Or something is super borked
01:43:00 <gorgonical> I allocated this htable inside a spinlock, which has ldaxr for locking and stlr for unlocking the lock
01:43:00 <gorgonical> But if I don't explicitly flush_dcache_area on the *pointer* for the htable, the pointer shows up in core 1 as null.
01:44:00 <geist> did you enable the cache on core 1?
01:44:00 <gorgonical> But the malloc is inside the spinlocks
01:44:00 <geist> what kinda cores are these?
01:44:00 <gorgonical> These are the standard cores on the rk3399, so a72
01:44:00 <gorgonical> Cache should be enabled on core 1
01:45:00 <geist> double check that
01:45:00 <geist> (almost certainly is because otherwise your spinlocks would asplode)
01:45:00 <geist> are you positive your spinlock works properly?
01:45:00 <gorgonical> Positive in the sense that they have been working for years at this point and to port it about 18 months ago all we did was copy Linux's spinlock primitives for the arch port
01:46:00 <gorgonical> We did not re-engineer the asm for the spinlocks
01:46:00 <geist> when you initialize the second core did you invalidate it's cache? (should be fine, but a good idea)
01:46:00 <geist> in case it has stale entries with garbage in it
01:46:00 <gorgonical> Yeah part of head.S has a big dcache flush I believe
01:46:00 <geist> flush or invalidate?
01:46:00 <geist> ie, clean vs invalidate vs clean + invalidate?
01:46:00 <geist> (to use arm's nomenclature which is very precise)
01:47:00 <gorgonical> bl __flush_dcache_all; ic iallu; dsb sy
01:47:00 <geist> on the secondary core?
01:47:00 <gorgonical> Yes
01:47:00 <geist> before the cache is enabled?
01:48:00 <geist> what precisely dose that flush dcache routine do?
01:48:00 <gorgonical> Should be. We go from cpu_setup to enable_mmu
01:48:00 <gorgonical> Let me point you to it
01:48:00 <geist> okay
01:49:00 <gorgonical>
01:49:00 <bslsk05> ​ kitten/cache.S at master · HobbesOSR/kitten · GitHub
01:49:00 <geist> yeah that's probably fine. double so if you boot it from PSCI which i think guarantees that the cache is clean
01:49:00 <gorgonical> I wasn't responsible for this code so I don't know it well
01:50:00 <geist> hmm, yeah that code is not really a great idea, but it's a super hard problem to generically do
01:50:00 <geist> so for example that code uses a `dc cisw` at the bottom of it
01:50:00 <geist> so it's really clean+invalidate
01:50:00 <geist> which strictly speaking isnt a good idea here because that it does is it causes the secondary cpu to write out any stale cache lines it may have
01:50:00 <geist> since the cache isn't enabled, and/or it was initialized with garbage it could hypothetically overwrite something
01:51:00 <geist> but it probably doesnt
01:51:00 <geist> usually the right idea is if you really want all your ducks in a row either dont do anything (because PSCI guarantees cpus come up with an empty cache) or do a pure *invalidate* over your cache
01:51:00 <geist> but *only* to the point of unification
01:51:00 <geist> so you end up needing like 8 variants of that
01:52:00 <geist> i kinda doubt ay of this is a problem
01:53:00 <gorgonical> I have a suspicion that running in secure mode is contributing to the problem but I am not sure
01:58:00 <gorgonical> So we flush+invalidate, then load sctlr_el1 val which enables caching. Since it's the same kernel code the pts and control registers are the same, so they should all have the same sharing settings
02:01:00 <geist> yeah
02:01:00 <geist> must be something you dont know could be a problem, and thus aren't reporting
02:02:00 <geist> ie, everything you have told me about seems correct at least on the surface
02:28:00 <heat> hell
02:29:00 <heat> i saw i missed a RUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUST moment, so there it is
04:48:00 <gorgonical> The world is chaos and I still don't understand where these sync errors are from
05:25:00 <geist> the silly thing is it'll end up being something tiny
05:25:00 <geist> or worse, something you aren't doing
05:25:00 <zid> I have the solution to all your code woes
05:25:00 <zid>
05:43:00 <geist> got the visionfive 2 rv64 running
05:44:00 <geist> better performance than i expected. sits nicely alongside quad a53 boards i've used
05:44:00 <geist> which is exactly in line with the kinda cpu tech thats in it (quad sifive u74 cores @ 1.5Ghz)
06:42:00 <gorgonical> Here's a thought. If I run core 0, which should be releasing the spinlock, in a loop waiting for core 1 to change some sentinel, if I run dsb sy each loop iteration will that sort of act as a total ordering flush for the system?
06:42:00 <gorgonical> Just for debugging purposes?
06:42:00 <gorgonical> Core 0 reads that the spinlock is released but core 1 doesn't and is stuck forever
07:10:00 <qxz2> what are your opinions of go? i know it's mostly meant for distributed/cloud apps.
07:10:00 <zid> oly thing I really know is that the binaries are absolutely enourmous and the ecosystem is so fragile that people just maintain all their own deps
07:11:00 <qxz2> i think they're statically linked
07:12:00 <qxz2> which probably explains the bloated binaries
07:12:00 <zid> some language feature like reflection or debug api means they ahve to include the source for every line and stuff in everything
07:12:00 <zid> there's still afaik an open bug for "yea, maybe hello world shouldn't be 10MB?"
07:12:00 <qxz2> hah
07:14:00 <zid> that former thing is also annoying to corps like nvida who use it afaik
07:14:00 <zid> I can't remember what their solution was, source obfuscation or something
07:14:00 <zid> or hacking the binaries up
07:14:00 <qxz2> nvidia is known for using go? i didn't know that
07:15:00 <zid> no, they're known for making graphics cards, you'd think you'd have heard of them before :P
07:16:00 <geist> i suspect go is quite good at doing precisely what it sets out to do and isn't so great in other contexts
07:16:00 <geist> ie, is more domain specific than it probably wants to be
07:17:00 <qxz2> i'm very familiar with nvidia
07:17:00 <qxz2> what do they use go for?
07:17:00 <geist> but i can't say i have a tremendous amount of go experience
07:17:00 <geist> i was briefly swept up in the hype 10 years ago or so
07:18:00 <qxz2> the generics are idiosyncratic
07:19:00 <qxz2> it's an interesting reaction to typical OO langs
07:19:00 <qxz2> composition instead of inheritance
07:19:00 <qxz2> interface types
07:19:00 <geist> gorgonical: how are you configuring the page tables?
07:19:00 <geist> there's a shared/no shared bit
07:20:00 <zid> qxz2: idk, wouldn't surprise me if some random stuff like the debugger or experience etc was written in it, I was just talking to someone working on some go at nvidia listening to his rambling
07:22:00 <qxz2> i googled around and saw nvidia job ads looking for go experience, mostly for backend type dev, which makes sense
07:23:00 <geist> and/or web backend where i think it shines
07:23:00 <qxz2> distributed systems are go's niche
07:27:00 <qxz2> "Building automation for routine maintenance tasks for GPU farm
07:27:00 <qxz2> fun
07:28:00 <geist> yah makes sense.
07:29:00 <qxz2> concurrency is built in. i think the runtime has its own scheduler.
07:32:00 <zid> neat, there's a tool that can unpack the nvidia driver packages, do the installation by hand with better configurability
07:32:00 <zid> and apply a driver patch to enable MSI
07:36:00 <gorgonical> geist: the shared/unshared bit refers to the two bits in the pte right?
07:36:00 <gorgonical> where 11 is ish and 10 is osh?
07:36:00 <gorgonical> or so
07:36:00 <geist> iirc there's a NS bit
07:36:00 <geist> no. it's another one
07:36:00 <gorgonical> That's the non-secure bit
07:36:00 <geist> yeah okay, maybe there's another one
07:37:00 <geist> basically it says 'dont bpther doing cache coherency with this' i think. basically unused. lemme see
07:37:00 <qxz2> is this x86_64?
07:37:00 <immibis> ARM documentation tells me that cache coherency is system-specific, but that is from the point of view of the single-core CPU
07:37:00 <geist> ah yeah i guess it is bit 8
07:37:00 <qxz2> oh nm
07:37:00 <gorgonical> Because I'm losing my mind over here. I can't even get the spinlock to communicate even when I loop dsb sy
07:37:00 <immibis> DSB implies ARM I think
07:37:00 <geist>
07:37:00 <bslsk05> ​ lk/mmu.h at master · littlekernel/lk · GitHub
07:37:00 <gorgonical> qxz2: yeah arm64
07:38:00 <geist> anyway you def want to set inner sharable there on all the PTEs
07:39:00 <gorgonical> Yeah those are set
07:39:00 <geist> yah using non sharable pages there is a feature that i dont think anything uses, and is possibly unimplemented in most cores
07:39:00 <geist> but still worth setting
07:40:00 <gorgonical> it is definitely implemented in the pine64
07:40:00 <gorgonical> lol
07:40:00 <geist> there's some verbiage on a lot of this inner/outer stuff that an implementation may actually implement something 'larger'
07:40:00 <gorgonical> That was how we discovered it was a feature in the first place
07:40:00 <geist> ah
07:40:00 <gorgonical> When we first ported kitten to arm64 fully
07:40:00 <geist> so just to be clear this is the very first time you've tried to enable SMP on ARm64?
07:40:00 <geist> whereas you have working SMP on x86?
07:41:00 <gorgonical> No, this is the first time I've tried smp on arm64 in secure mode on the rockpro64
07:41:00 <gorgonical> We have smp working in regular mode on qemu and the pine64
07:41:00 <geist> hmm
07:41:00 <gorgonical> yes exactly
07:41:00 <geist> can't say i think there's much of a difference there, except the whole secure bit in the page tables
07:41:00 <gorgonical> And these are basically the exact same problems that we solved the last time by fixing coherency issues
07:41:00 <geist> maybe that interacts with the cache subsystem in a weird way
07:42:00 <gorgonical> That's my only thought but I did look at the manual with some scrutiny and ctrl+f and didn't find any... clear indication that it might mess with it
07:42:00 <geist> but in general secure vs insecure mode should be hidden. the cpu even boots in secure mode, so it's possible that a random linux is actually running in secure mode, just because there's nothing that took it away
07:42:00 <geist> probably does a lot of the time anyway
07:42:00 <gorgonical> yes that's right. your bootloader actually needs to be sure and untick the box, but if you leave it on most stuff will 'just work'
07:42:00 <geist> and the secure/nonsecure bit only really kicks in when you have both modes actie at the same time, otherwise they dont really *do* anything
07:42:00 <gorgonical> yes
07:43:00 <gorgonical> right
07:43:00 <gorgonical> so in theory I wouldn't expect the secure bit to have such dramatic impacts on the coherence subsystem
07:43:00 <gorgonical> Unless something horriyfing like one of the cores running in secure mode and the other one not is happening
07:43:00 <gorgonical> WAit
07:43:00 <geist> yah there's some complicated verbiage about generating cache entries in one mode and then trying to use it in another, etc
07:43:00 <geist> oooh.
07:43:00 <gorgonical> Boy I hope that's not true lol
07:44:00 <geist> that might do it, because i think secure cache lines can't really be accssed from non secure cpus, etc
07:44:00 <gorgonical> trusted firmware and the psci interface sucks anyway man. It's always something
07:44:00 <geist> and then i honestly dont know what happens, up to and including UNDEFINED
07:44:00 <gorgonical> trusted firmware doesn't even allow secure world callers to issue psci calls
07:45:00 <geist> having fun building linux here on this riscv machine. it's slow! but it's cortex-a53 slow, so really that means it's performing as expected
07:45:00 <gorgonical> I'm jealous. I haven't had time to play with my riscv machines for a while and won't for more time
07:45:00 <gorgonical> dissertation research and stuff looms
07:45:00 <gorgonical> But I badly want to port kitten to the d2 riscv core I have
07:45:00 <geist> yah sadly nothing on the market yet supports virtualization mode, except qemu
07:46:00 <geist> looks like some of the new sifive performance cores, the P670 in particular, does
07:46:00 <geist> but dont know when that'll show up in hardware
07:46:00 <gorgonical> I am very pleased that sifive has so much traction. I was working somewhere we could afford the boards but I'm happy someone's actually doing it
07:46:00 <gorgonical> we had the unmatched
07:46:00 <geist> yah i have an unleashed around here somewhere
07:46:00 <geist> and an unmatched
07:47:00 <gorgonical> The d2 I got is this goofy clockwork pi devterm thing
07:47:00 <gorgonical> It comes with a thermal ink receipt printer
07:47:00 <gorgonical> ...
07:47:00 <geist> probably will get ahodl of the one of the horse creeks when they come out, but they're going to be P550 based, which is just before the cut to get virt, etc
07:48:00 <gorgonical> This nonsecure core theory can be tested, too. I haven't actually put the code in kitten to turn on the trustzone memory controller for the kernel image
07:48:00 <gorgonical> That would quickly tell me if secondary core is secure or not
07:51:00 <gorgonical> geist have you played with zig at all? It came to my attention recently and its constant presence on HN makes me wonder about it
08:00:00 <geist> hmm, zig the language no. someone mentioned it at work as something to look at
08:00:00 <geist> and they seemed to be pretty impressed
08:13:00 <gorgonical> I haven't done much work in it but it seemed very simple and even pleasing. In the way that c++ is impressive but because of its complexity, zig felt impressive because of what it seems to be able to do while still looking like c to me
08:13:00 <gorgonical> I wrote a little stub kernel in D's betterC subset and I remember wishing "why isn't this just a whole language"
08:15:00 <moon-child> isn't it a whole language?
08:15:00 <moon-child> also: betterc is mostly a marketing thing
08:16:00 <zid> am I allowed to pronounce that like "beturk"
08:16:00 <gorgonical> nobody can stop you, zid
08:16:00 <zid> To make something turkish
08:16:00 <gorgonical> moon-child: how do you mean about it being marketing?
08:16:00 <gorgonical> that it has neat interop or something like that?
08:16:00 <moon-child> there's not much point in using it over 'regular d'
08:17:00 <gorgonical> I mean unless you're trying to do something like a simple kernel
08:17:00 <zid> using regular D also lets you make crappy memes about sex
08:19:00 <moon-child> gorgonical: no
08:21:00 <moon-child> omg
08:21:00 <moon-child> 'use lldb', they said
08:21:00 <moon-child> 'it's great', they said
08:21:00 <moon-child> I tried it, and it span up to 100% cpu doing who knows what
08:21:00 <zid> First they came for the lldb users but I did not speak out, for I was sane
08:22:00 <moon-child> never doing that again
08:22:00 * moon-child remains annoyed gdb doesn't work on mac
08:22:00 <gorgonical> I've been using gnu global and ggtags recently and I quite like it
09:09:00 <gorgonical> Good job forgetting for nearly an hour that ioremap_nocache exists and fumbling around like a dingus with head.S
09:09:00 <gorgonical> Probably time to sleep for the night
09:46:00 <geist> hmm i really should give zig a shot though
09:47:00 <geist> that's at least two people that have recommended it for all the reasons i'm interested in
12:09:00 <epony> is it supported by GCC ?
12:15:00 <mjg> geist: zig?
12:15:00 <mjg> geist: quite frankly i would just spend time on rust man :)
12:15:00 <mjg> geist: why the zig
12:17:00 <davros1> geist have you tried any other C/C++ alternatives cough rust cough
12:17:00 <davros1> I must admit although 100% capable rust is a PITA for low level (IMO).
12:17:00 <mjg> excuse me sir, can i talk to you about a fast and memory-safe alternative to whatever you are using right onw?
12:18:00 <davros1> Look what I did there with a disclaimer before I get onto its virtues
12:20:00 <davros1> Seems there are some other options appearing that offer better C++ interoperability
12:20:00 <davros1> I can't afford another language switch though
12:21:00 <pitust> isnt zig the language where the compiler is a minefield?
12:21:00 <mjg> is it?
12:21:00 <mjg> i think that's the onme which compiles to c
12:21:00 <davros1> They have their own backends
12:21:00 <mjg> can't be arsed to check
12:21:00 <davros1> Its not compile to C, they do LLVM and their own codegen
12:22:00 <davros1> The aspect in which the compiler may be a minefield perhaps is it lets you run code at compile time
12:22:00 <davros1> To be fair thats probably less of a minefield than an accidentally discovered Turing complete language in the type system
12:22:00 <pitust> afaik the zig codegen itself can be buggy
12:23:00 <davros1> I must admit I thought they were biting off more than they could chew even trying that
12:23:00 <davros1> Rust has stuck to LLVM
12:23:00 <pitust> even with llvm
12:23:00 <davros1> I even wish rust could compile to C
12:23:00 <davros1> Heh ok
12:23:00 <pitust> you can compile llvm to c with llvm-cbe
12:23:00 <davros1> Well maybe if they weren't trying to write their own codeine they could fix those bugs
12:23:00 <davros1> Oh I thought that had all bitrotted (last time I tried)
12:24:00 <davros1> Rust -> C would have made getting onto some platforms a little easier
12:24:00 <pitust> this is a thing:
12:24:00 <bslsk05> ​JuliaHubOSS/llvm-cbe - resurrected LLVM "C Backend", with improvements (127 forks/678 stargazers/NOASSERTION)
12:24:00 <davros1> Ok nice thanks
12:24:00 <pitust> it works
14:04:00 <heat> mjg, hello sir can i talk to u for a second about a fast and unsafe alternative to whatever ur using rn
14:04:00 <heat> its called ansi c (anzi see)
14:04:00 <zid> heat fix my shader
14:05:00 <heat> whats the problem
14:05:00 <zid> my uniform is being optimized out
14:05:00 <zid> uniform sampler2D tex; outcol = texture(tex, frag_uv);
14:05:00 <zid> getAttribLocation(p, "tex") is returning -1
14:06:00 <mrvn> p is undefined
14:07:00 <heat> out of my Area Of Expertise(tm)
14:07:00 <mrvn> is "uniform" an attribute like static?
14:07:00 <zid> why did you ask then :P
14:08:00 <heat> now anyone else can help you
14:08:00 <heat> win for everyone
14:09:00 <mrvn> why oh why does /proc/pid/fd/ have a link count of 2? Should be 2+<num fds> :(
14:12:00 <sham1> OpenGL should be able to tell you why the operation fails
14:17:00 <sham1> I just don't remember the API off the top of my head
14:19:00 <tepperson> this is a dump from qemu with gdb, what might cause this behavior? I'm seeing instructions executed that don't exist.
14:19:00 <bslsk05> ​ weird long mode behavior? -
14:22:00 <heat> tepperson, info registers pls
14:24:00 <tepperson> info registers here ->
14:24:00 <bslsk05> ​ registers for long mode -
14:24:00 <zid> they're not aligned
14:24:00 <zid> 0x000000000010007e in enter_long ()
14:24:00 <zid> add [rax], al is 0x00 afaik
14:24:00 <zid> 0x0000000000100078 <+14>: mov %ax,%gs
14:24:00 <zid> 0x000000000010007b <+17>: mov $0x0,%eax
14:25:00 <zid> you've executed the 0x0 in the mov eax
14:26:00 <tepperson> 0x10007b seems to execute fine
14:26:00 <heat> you're not in long mode
14:26:00 <zid> yea mismatched cpu mode is 99% the likely reason
14:26:00 <heat> unless gdb is batshit crazy atm
14:26:00 <heat> "efer 0x0"
14:27:00 <zid> idk which mode it disassembled it in though without the machine code
14:27:00 <heat> i wanted a qemu info registers btw
14:28:00 <tepperson> for efer, mov ecx, 0xc0000080
14:28:00 <tepperson> rdmsr
14:28:00 <tepperson> or ecx, 0x100
14:28:00 <tepperson> wrmsr ?
14:28:00 <heat> that's wrong yeah
14:29:00 <heat> "select register to operate on, rdmsr, or (register to operate on), 0x100, wrmsr"
14:29:00 <heat> you're lucky it doesn't crash
14:29:00 <heat> you're pretty much just copying EFER to 0xc0000180
14:29:00 <zid> mine is.. mov ecx, 0xc00000080; rdmsr; or eax, 1 | 1<<11 | 1<<8; wrmsr
14:30:00 <zid> cus you know, it's eax that has the value, ecx that has the *address*
14:30:00 <heat> mov $IA32_EFER, %ecx
14:30:00 <heat> or $(IA32_EFER_LME | IA32_EFER_NXE), %eax
14:30:00 <heat> wrmsr
14:30:00 <heat> rdmsr
14:30:00 <heat> xorl %edx, %edx
14:30:00 <heat> defines 4 life
14:30:00 <zid> I have commnets in, irl
14:31:00 <zid> don't need a define for a oneshot, comment is fine
14:31:00 <zid> 1 is sysenter, 11 is NX, 8 is long
14:31:00 <heat> yeah mine isn't a oneshot, i later enable syscall
14:31:00 <gog> why aren't you clearing eax
14:32:00 <heat> idk gog
14:32:00 <zid> as in mov instead of or?
14:32:00 <gog> wait
14:32:00 <gog> nvm
14:32:00 <gog> i
14:32:00 <gog> i am v stup
14:32:00 <heat> i don't know why i'm even clearing the top part in the first place
14:32:00 <zid> sick
14:32:00 <zid> heat loves bimbos btw
14:32:00 <heat> isn't EFER guaranteed to be 32-bits
14:32:00 <heat> what
14:32:00 <zid> wait no, not bimbos, footballers
14:32:00 <gog> :<
14:32:00 <gog> i am not a footballer
14:32:00 <zid> I always get those confused, they both spend a lot of time doing their hair and drive jeeps
14:32:00 <heat>
14:32:00 <bslsk05> ​ Bimbo | Bimbo
14:33:00 <heat> luv me some bimbo
14:33:00 <gog> i do not drive a jeep
14:33:00 <zid> you're pre-bimbo
14:33:00 <zid> early access bimbo
14:33:00 <zid> a real bimbo is married in exchange for the black credit card
14:33:00 <gog> dang
14:33:00 <zid> so she can afford the jeep and barrels of spray tan
14:34:00 <zid> gog do you want to fix my shader
14:34:00 <gog> i don't know how to do that
14:34:00 <zid> same
14:34:00 <gog> i'm way out of my depth in third dimension
14:34:00 <gog> get it
14:34:00 <zid> but I know how to write the code that doesn't work at least
14:34:00 <heat> hehe
14:34:00 <gog> 'cause i'm so shallow
14:34:00 <heat> hilarious
14:34:00 <zid> I'm not even using a third dimension :(
14:48:00 <mrvn> gdb has problem stepping through theb switch from 32bit to 64bit
14:54:00 <mrvn> Do you know that feeling after you cut some hot peppers that your eye itches?
14:54:00 <mrvn> .oO(Don't scratch, don't scratch!)
14:54:00 <sham1> Well gdb isn't exactly designed to cope with going from 32-bit mode to 64-bit mode
14:54:00 <sham1> mrvn: wash your damn hands and eyes
14:57:00 <zid> I feexed it
14:57:00 <mjg> eeey
14:57:00 <mrvn> sham1: you have to wash pretty thouroughly to get all the hot of your hand
14:57:00 <mjg> how you doin peeps todey
14:58:00 <sham1> I don't see the problem. You want to get the hot out of your hands after all
14:58:00 <mrvn> mjg: annoyed at DHL. Package tracking shows my packge in Bremen sind Friday.
14:58:00 <mrvn> s/sind/since/
14:58:00 <mjg> :]
14:59:00 * mrvn wonders if "Bremen GVZ" is actually the processing center in China and the package is now on some ship.
15:01:00 <mrvn> As in: "It's going to Bremen GVZ" instead of "It arrived at Bremen GVZ"
15:01:00 <tepperson_> for intel assembly, how do i define n instances of a 32-bit data?
15:02:00 <mrvn> 4 db?
15:02:00 <mrvn> dw
15:02:00 <zid> that's an assembler directive
15:02:00 <zid> so ask your assembler's manual
15:02:00 <gog> dd
15:02:00 <mrvn> tepperson_: do you want data or zeroes?
15:02:00 <tepperson_> zeros in this case
15:03:00 <mrvn> then just pad n*4 byte
15:03:00 <mrvn> Does intel have .zero?
15:05:00 <gog> you can use it in GAS
15:05:00 <gog> iirc
15:05:00 <gog> idk about nasm
15:06:00 <mrvn> also try: .bss my_data,4*1354
15:07:00 <mrvn> .oO(or try assembling with gcc so you have GNU syntax everywhere)
15:08:00 <gog> don't assemble
15:09:00 <zid> for nasm it'd be times 32 dd 0 I think
15:10:00 <zid> gl.NEAREST_MIPMAP_LINEAR (default value)
15:10:00 <zid> Who did this and why do they hate fun
15:24:00 <kof123> "seymour cray was right" "multicore is a mistake" no comment, could use a comic .oO( Chicken Attack! (Legend of Zelda LTTP SNES) )
15:28:00 <kof123> cow tools also can work here
15:40:00 <tepperson_> does this look like a valid pml4 table? .align 64
15:40:00 <tepperson_> PAGE_TABLE_BOOT2:
15:40:00 <tepperson_> .fill 510, 8, 0
15:40:00 <tepperson_> .quad 0x200083
15:40:00 <tepperson_> .quad 0x83
15:42:00 <gog> no
15:42:00 <zid> no
15:42:00 <gog> if your goal is to make 2MiB mappings you need two more levels
15:42:00 <gog> you need a PDPT and a PD
15:42:00 <zid> what is the 0x80 flag anyway
15:42:00 <gog> that's page size
15:42:00 <zid> ah!
15:42:00 <zid> 512GB pages :o
15:43:00 <gog> o:
15:43:00 <gog> do you have 512 gogglebytes of memory
15:43:00 <zid> if you .quad some_other_table | 3 you can do 1GB pages in some_other_table with | 83
15:43:00 <zid> then if you do .quad some_other_other_table in some_other_table you can do 2M pages with | 83
15:44:00 <gog> yes
15:44:00 <gog> you can do that
15:44:00 <zid> This is why people generally just.. do it at runtime
15:44:00 <zid> easier to mov [rsi+(8*510)], pdpt
15:44:00 <zid> than fuck around with trying to eyeball a bunch of arrays and stuff
15:45:00 <gog> yeah
15:45:00 <gog> and if you have an allocator or preallocate some pages for tables you can do arbitrary mappings
15:45:00 <gog> which makes your life very easy
15:46:00 <tepperson_> obviously my kernel needs 3 exabytes of ram to run correctly
15:46:00 <zid> I have my tables fully built in my ELF loader and pass them along to a small stub that does "set EFER, load value into pml4"
15:46:00 <gog> you don't need 3 exabytes
15:46:00 <zid> mine needs 1024GB
15:47:00 <gog> you only need three tables
15:47:00 <gog> 4KiB each
15:47:00 <gog> PML4[0] = PDPT, PDPT[0] = PD, PD[0] = page entry
15:47:00 <gog> for a 2MiB page
15:47:00 <tepperson_> i was kidding about the ram usage
15:48:00 <gog> o
15:48:00 <zid> gog is a very srs person
15:48:00 <zid> pls no tease
15:48:00 <gog> yes
15:48:00 <gog> i am srs bsns all day every day
15:48:00 <gog> be nice to me i have anxiety
15:49:00 <tepperson_> i dont write hello world apps anymore. i only write i am groot apps
15:49:00 <gog> same
15:53:00 <zid> I have anxiety about webgl's default
15:53:00 <zid> texturing mode
15:58:00 <nikolar> /me pets gog
15:59:00 * gog prr
15:59:00 <nikolar> What
15:59:00 <bnchs> what! >:(
15:59:00 * gog pet bnchs
15:59:00 <heat> mjg, rust
15:59:00 * bnchs purrs
15:59:00 <mjg> heat: RUST
15:59:00 <mjg> show some respect and upcase
16:00:00 <heat> show some respect and scream
16:00:00 <heat> hehehehe
16:00:00 <gog> hehehehehe
16:01:00 <heat> you are comedy
16:01:00 <gog> i'm hilarious
16:01:00 <mjg> you know what's the funniest
16:01:00 <heat> ricky gervais? more like
16:01:00 <heat> ricky gogervais
16:01:00 * bnchs derustifies mjg
16:01:00 <mjg> project/company names being puns on rust
16:01:00 <gog> wow i must be really funny
16:01:00 <mjg> like OXIDE
16:01:00 <gorgonical> turns out the secondary core isn't starting in secure more, because life is suffering and nothing is simple
16:01:00 <mjg> gog: you are not as funny as solaris diaspora
16:02:00 <gog> who
16:02:00 <heat> mjg, you know what's worse?
16:02:00 <mjg> OXIDE
16:02:00 <mjg> an actual company using rust
16:02:00 <gog> o
16:02:00 <heat> BSD people that have bsd in their nickname/email
16:02:00 <mjg> i mean RUST
16:02:00 <mjg> dude there is a not-bsd guy who literally has a nickanme ending in bsd
16:02:00 <bnchs> rust has a really bad syntax
16:03:00 <zid> OXIDE is a bad language and also a bad guess in wordle
16:03:00 <zid> double whammy
16:03:00 <mjg> bnchs: that is a widespread opinion
16:03:00 <mjg> which i do share :)
16:03:00 <mrvn> What CPUs support 512GB pages? Even 1GB is optional.
16:03:00 <bnchs> like i have to type more symbols in rust than C
16:03:00 <gog> mrvn: none of them :P
16:03:00 <gog> maybe some hypothetical risc-v machine
16:04:00 <heat> mjg, there's this guy in linux kernel dev that named himself after you
16:04:00 <mrvn> What you can do is a fractal mapping. Then you can map a lot of memory with a single 4k page.
16:04:00 <heat> and this "linus" guy that named himself after linux
16:04:00 <bnchs> heat: is he the guy who shills microsoft pluton?
16:05:00 <mjg> heat: did you know original name for the kernel was FREAKS or something like that
16:05:00 <heat> mjg, yep
16:05:00 <heat> bnchs, mjg does not shill for pluton
16:08:00 <FireFly> linux torvalds
16:09:00 <gog> he made linux os
16:11:00 <mjg> it's unix, i know this
16:11:00 <gog> unix is really good
16:11:00 <gog> i know it
16:12:00 <heat> gog, how much do you like fork() and how little do you not like CreateProcess
16:12:00 <mjg> nah dawg fuckitix is the feature
16:12:00 <gog> creating processes is cringe no matter how you do it
16:12:00 <gog> i will never
16:12:00 <mjg> sounds like templeos is for you then
16:12:00 <heat> do you only create threads?
16:12:00 <mjg> no processes
16:12:00 <mjg> i think
16:12:00 <gog> no i got rid of that code
16:12:00 <gog> single process forever
16:12:00 <mjg> you basically print 'lol' in a tight loop?
16:13:00 <gog> yes
16:13:00 <mjg> aight i would sign off on that kernel
16:13:00 <heat> depends
16:13:00 <mjg> i would accept LoL
16:13:00 <gog> it uses PrintLolFactory to do it
16:13:00 <heat> mjg would only sign off on it if it was OPTIMAL
16:13:00 <mjg> right
16:13:00 <heat> PESSIMAL CODE = PESSIMAL
16:13:00 <mjg> no printf("%s\n", "lol");
16:13:00 <mjg> o rsimilar\
16:13:00 <heat> i hope your locking primitives are on point
16:13:00 <mjg> in fact i would proably only be satisfied if you hacked the cpu
16:14:00 <mjg> and patched the microcode to do it the fastest way it can
16:14:00 <mjg> which is probably inaccessible thorugh regular code
16:14:00 <mjg> you could even possibly block SMIs
16:14:00 <mjg> for more performance
16:14:00 <mjg> so ye, ultimately NACK, try harder
16:15:00 <heat> not-ok mjg@
16:15:00 <mjg> what he said
16:16:00 <kof123> i watched that temple vid linked the other day...the funny part was ba ha ha ha ha ha ha
16:16:00 <kof123> i never used it because 64-bit only
16:16:00 <mjg> you are somehow stuch with a 32-bit cpu?
16:16:00 <mjg> *today*?
16:17:00 <kof123> no, just saying why i never used it
16:17:00 <heat> he only enjoys Portable(tm) Software(tm)
16:19:00 <bnchs> mjg: what about a CPU that does operations on 64-bit integers, but has a 32-bit address bus
16:20:00 <kof123> i was hoping his typing "notes" he would show some move like: "The Oberon System has an unconventional visual text user interface (TUI)" and plan 9 IIRC
16:20:00 <mjg> bnchs: on your laptop?
16:20:00 <mjg> i got a product on a 32 bit arm *today*, the pain is real
16:22:00 <gorgonical> bnchs: in my head your handle is "bean cheese"
16:22:00 <heat> gorgonical: in my head your handle is "gorgonical"
16:23:00 <kof123> it is true i would never go for his U8 U16 U32 U64
16:34:00 <kof123> i have worse ideas than he does i think, so i show a bit of respect
16:38:00 <mrvn> Poll: What are "and *plant shaped C4 charges* on the four pipelines". Explosives shaped like plants so they aren't so obvious, right? Thank you google translate.
16:39:00 <gorgonical> Sounds like you mean you are placing shaped charges to achieve a more targeted effect
16:40:00 <gorgonical> the phrase is garden path-y though
16:40:00 <mrvn> gorgonical: obviously. "plant shaped charges" not "plant-shaped charges". But google doesn't get that.
16:41:00 <gorgonical> Ooh I see what you're saying
16:41:00 <mrvn> They reported plant-shaped charges on the NEWs on the german TV channel.
16:42:00 <mrvn> quality TV, I say. The highest quality.
16:49:00 <kof123> (*plant_p) (charges_ptr *)
16:50:00 <zid> it's not gorgonical?
16:53:00 <tepperson_> what page/pages in do i find the pml4 table information?
16:54:00 <heat> have you tried looking for it
16:56:00 <zid> That's pretty mean, making them look it up themselves heat
16:56:00 <zid> you should type it out for them
16:58:00 <zid> oh hey latest sdm has pml5, I was using an older version before
16:58:00 <zid> HLAT paging too, whatever that is
16:58:00 <heat> is pml5 even implemented in silicon yet?
16:59:00 <zid> not sure, saw the whitepaper for it yeeears ago though
16:59:00 <heat> oh yes, ice lake apparently
17:00:00 <zid> anyway, figure 4-11
17:00:00 <zid> smh size bit is Rsvd in pml4 still, no 512GB pages yet
17:01:00 <heat> ohno.jpeg
17:01:00 <heat> that would actually only work if large pages were remotely useful in Intel
17:02:00 <heat> versus the TLB shittery that was going on
17:03:00 <zid> Imagine having a good TLB
17:04:00 <zid> it manages to have good L1, with way more addresses and shit in it
17:04:00 <zid> instead we get "idk, 40 bytes is the best I can do, and the lookups are bad"
17:06:00 <heat> i wonder how much does 5-level paging actually cost
17:06:00 <heat> (in perf, not monies hehe)
17:08:00 <zid> I am assuming the same perf delta as 2M pages gains you over 4k, ignoring TLB misses
17:08:00 <zid> 1 extra lookup step
17:08:00 <zid> but it doesn't really cost any space
17:09:00 <zid> Unless you're making gallions of pml4es for some reason
17:11:00 <mrvn> heat: if you have the same mappings as with pml4 then you only use 2 entries and they are easily cached.
17:11:00 <mrvn> the extra lookup step would always be in cache for kernel and have one miss per ASID change for user space.
17:12:00 <mrvn> sorry, no, one miss per CR3 reload each.
17:15:00 <mrvn> Is there a debug register counting page walk cache misses? How often does the plm4 miss?
17:16:00 <mrvn> My opinion is: You shouldn't use 5-level paging unless you need it. It surely can't be faster.
17:20:00 <gog> that's a lot of vm space
17:29:00 <heat> sharp observation
17:36:00 <gog> hehehehe
17:44:00 <tepperson_> does this look more correct than my last attempt at paging data?
17:44:00 <bslsk05> ​ paging structures? -
17:49:00 <gog> do you want your pages mapped there to be writeable?
17:49:00 <gog> otherwise i think it's right
17:50:00 <zid> might be useful before you started, but now you're finished
17:50:00 <bslsk05> ​ <no title>
17:51:00 <tepperson_> not sure what that link actually does, any explanation for it?
17:52:00 <zid> put an address in
17:53:00 <zid> 0xB00B0000 -> pml4[0]->pdpt[2]->pd[384]->pt[176] = blah | PT_PRESENT;
18:00:00 <tepperson_> hmm, as soon as i load cr0, i cannot access any memory at all
18:01:00 <tepperson_> i mean cr3
18:01:00 <tepperson_> wait nevermind, it is cr0 (enable paging)
18:05:00 <zid> you immediately triple fault?
18:10:00 <tepperson_> after a few instructions i triple fault, no idt setup
18:10:00 <zid> paging works then
18:10:00 <tepperson_> gdb complains of unable to access memory
18:10:00 <zid> what's the faulting instruction?
18:11:00 <zid> If you're in qemu you can also -d int to get info about the page fault in its monitor
18:11:00 <tepperson_> ah interesting
18:12:00 <zid> you can also info tlb to see what translations are currently set up, you'll need to -no-reboot -no-shutdown to be able to do that one
18:16:00 <tepperson_> info tlb in qemu shows nothing after the fault
18:16:00 <tepperson_>
18:16:00 <bslsk05> ​ broken paging? -
18:17:00 <zid> x /1i 0x104044
18:18:00 <tepperson_> address 0x104044 is out of bounds
18:18:00 <zid> xp then
18:18:00 <zid> now that paging is enabled
18:18:00 <heat> 0x101000 <PAGE_TABLE_PML4_BOOT>: 0x00102000 <-- wrong
18:19:00 <heat> 0x102000 <PAGE_TABLE_PDP_BOOT>: 0x00103001 <-- also wrong(technically right but in practice probably not what you want at this stage)
18:19:00 <heat> 0x103000 <PAGE_TABLE_DIRECTORY_BOOT>: 0x00000081 <-- see above
18:20:00 <tepperson_> what would be right? my goal is to map the first 2 megabytes of ram
18:20:00 <heat> exercise left to the reader
18:21:00 <gog> oh yeah
18:21:00 <gog> yeah it's wrong
18:21:00 <gog> sorry
18:21:00 <gog> i'm not well today
18:21:00 <heat> i could tell you exactly what's wrong but that's unhelpful
18:21:00 <gog> i'll tell you exactly what's wrong for $500
18:21:00 <zid> If 0x1000 is your pml4, pml4[0] = 0x2003, then 0x2000 is your pdpt
18:21:00 <heat> i'm cheaper than gog, i'll do 50
18:22:00 <zid> pdpt[0] = 0x3003, making 0x3000 your PD
18:22:00 <gog> zid is doing it for free already
18:22:00 <gog> we're ruined
18:22:00 <heat> sucker
18:22:00 <zid> PD[0] = 131;
18:22:00 <zid> That'll be $499.99
18:22:00 <zid> if it's correct
18:25:00 <tepperson_> (gdb) x 0x101000
18:25:00 <tepperson_> 0x101000 <PAGE_TABLE_PML4_BOOT>: 0x00102003
18:25:00 <tepperson_> (gdb) x 0x102000
18:25:00 <tepperson_> 0x102000 <PAGE_TABLE_PDP_BOOT>: 0x00103003
18:25:00 <tepperson_> (gdb) x 0x103000
18:25:00 <tepperson_> 0x103000 <PAGE_TABLE_DIRECTORY_BOOT>: 0x00000083
18:25:00 <tepperson_> no change
18:26:00 <zid> I take it the instruction before xx44 is the mov cr0, eax?
18:26:00 <tepperson_> it does make info tlb in qemu do stuff now
18:26:00 <zid> oh, screenshot/paste?
18:31:00 <tepperson_> gdb appears to be interpreting my 32-bit code as 64-bits, trying to figure out how to make it stop
18:32:00 <zid> just dump the bytes then
18:32:00 <zid> as long as it ends up on a paste not a screenshot
18:32:00 <zid> I ain't typing it back in
18:33:00 <zid> (or use objdump)
18:33:00 <heat> use qemu
18:33:00 <heat> gdb will always interpret it as whatever the elf bitness is
18:33:00 <heat> qemu will interpret it based on the mode you're on
18:35:00 <tepperson_> it appears to be only mapping one 4kb section instead of a 2mb section
18:35:00 <Ermine> last time I tried it with "OS from 0 to 1" gdb disassembled 16 bit code as 32 bit
18:35:00 <Ermine> s/ it/ gdb/
18:36:00 <heat> i bet 10 usd dollar on how you're missing a certain cr4 flag
18:37:00 <tepperson_> bit PSE on cr4?
18:38:00 <heat> most certainly
18:38:00 <zid> So I get to keep my $499.99?
18:39:00 <tepperson_> bit pse didnt change it
18:39:00 <heat> didn't change what
18:40:00 <zid> did you remember to or the correct reg this time? :P
18:40:00 <mjg> burp
18:40:00 <zid> heat: I fixed my opengl btw
18:41:00 <bslsk05> ​ <no title>
18:41:00 <mjg> and another geezer does no think atomic ops are expensive
18:41:00 <mjg> fml
18:41:00 <heat> zid, sweeeeeeeeeeeeeeeeeeeeeet
18:41:00 <heat> mjg, link pls
18:41:00 <mjg> NAH
18:42:00 <heat> yes
18:42:00 <mjg> no slut shaming
18:42:00 <mjg> i'll link if he doubles down
18:43:00 <heat> linus would never say that shit
18:43:00 <heat> praise be the linux, creator of linus
18:44:00 <zid> now the test rig is complete, I need to actually figure out how to make perlin noise I guess
18:44:00 <heat> is that a wizard or what
18:44:00 <mjg> heat: he indeed would not
18:46:00 <mjg> heat: hey heat, wanna patch intel pcm tools?
18:46:00 <mjg> it prints process pids in hex instead of dec
18:46:00 <mjg> std::cerr << "Program " << sysCmd << " launched with PID: " << child_pid << "\n";
18:46:00 <mjg> someone with c++ 101 needs to patch it
18:46:00 <sham1> Wait, atomic ops aren't expensive? But they affect every other pipeline, no? Because it has to be, well, atomic
18:46:00 <zid> impossible mjg
18:47:00 <zid> That's 40MB of machine code, it's iostream
18:47:00 <mjg> sham1: on amd64 they flush the store buffer on all uarchs
18:47:00 <mjg> sham1: it is *turbo* slow
18:47:00 <sham1> Oh, I read your comment in the opposite manner
18:48:00 <sham1> I was all about to get argumentative and tell you that you're wrong
18:49:00 <heat> mjg, what's intel pcm tools
18:49:00 <mjg>
18:49:00 <bslsk05> ​intel/pcm - Intel® Performance Counter Monitor (Intel® PCM) (384 forks/2022 stargazers/BSD-3-Clause)
18:50:00 <heat> std::cerr << "Program " << sysCmd << " launched with PID: " << std::dec << child_pid << "\n";
18:50:00 <heat> if that works tell me and I'll open a PR
18:50:00 <sham1> I've never liked the use of bit shifts for C++'s iostreams
18:51:00 <mjg> aight, give me 5
18:51:00 <heat> sham1, it's a verbose% speedrun
18:51:00 <zid> heat what is the portobalkanssub
18:52:00 <heat>
18:52:00 <bslsk05> ​ PORTUGALCYKABLYAT
18:53:00 <Ermine> you're welcome
18:55:00 <mjg> heat: that works
18:56:00 <heat> nice
18:56:00 <mjg> heat: but there is more problems
18:56:00 <mjg> std::cerr << "Process " << child_pid << " was terminated with status " << WTERMSIG(res) << "\n";
18:56:00 <mjg> add this
18:56:00 <mjg> there is missing newline for that sucker
18:56:00 <mjg> consider it a freebie
18:56:00 <heat> does anyone use that fucking tool?
18:56:00 * mjg
18:57:00 <heat> you are the one user of that tool, ever
18:57:00 <mjg> hm
18:57:00 <mjg> are you sure the std::dec thing is correct?
18:57:00 <heat> yes
18:57:00 <heat> why?
18:57:00 <mjg> Program sleep launched with PID: b429
18:57:00 <mjg> Process b429 was terminated with status 2
18:58:00 <mjg> if i use that std::dec patch *both* change to dec
18:58:00 <mjg> even though only one statement is patched
18:58:00 <mjg> sounds like it flips the default?
18:58:00 <heat> yes
18:58:00 <heat> iostream does this globally
18:58:00 <heat> it's clinically insane
18:58:00 <mjg> wtf
18:58:00 <mjg> that's retarded
18:58:00 <heat> the format options are all global
18:58:00 <mjg> well
18:58:00 <zid> and you can't query it either
18:58:00 <zid> so you just have to pray nobody shits all over you
18:59:00 <mjg> then point out please that there are std::cerr uses all of which want to print dec
18:59:00 <mjg> in that func
18:59:00 <mjg> and add that newline
18:59:00 <zid> I'd just submit a patch that changes them all to printfs
18:59:00 <zid> it'll compile and run faster
19:00:00 <Ermine> heat: graphs suggest that living in Portugal is fun
19:01:00 <heat> Ermine, you'll feel right at home
19:01:00 <heat> <mjg> then point out please that there are std::cerr uses all of which want to print dec <-- wdym?
19:01:00 <mjg> std::cerr << "Program " << sysCmd << " launched with PID: " << child_pid << "\n";
19:02:00 <mjg> std::cerr << "Program exited with status " << WEXITSTATUS(res) << "\n";
19:02:00 <mjg> std::cerr << "Process " << child_pid << " was terminated with status " << WTERMSIG(res) << "\n";
19:02:00 <mjg> these fucking guys all want to print decimal
19:02:00 <heat> yes
19:02:00 <Ermine> heat: looking forward!
19:02:00 <heat> also, can you check if I didn't accidentally break any other thingy that wanted to print hex?
19:03:00 <heat> Just Iostream Things
19:03:00 <mjg> look ok here
19:03:00 <mjg> aha!
19:04:00 <mjg> you need to use 'dec'
19:04:00 <mjg> src/pcm-core.cpp: cout << "Time elapsed: " << dec << fixed << AfterTime-BeforeTime << " ms\n";
19:04:00 <heat> what
19:04:00 <heat> dec is exactly the same as std::dec
19:04:00 <mjg> aight
19:04:00 <heat> it's just that they "using namespace std;" that file
19:05:00 <mjg> i assumed it would not be fucking with global, but yeare right
19:05:00 <mjg> even in the same file they use that or std::dec
19:05:00 <mjg> fuckin'
19:05:00 * mjg pets printf
19:05:00 <mjg> allgith, just remember that \n and we are set here
19:05:00 <mjg> kthx
19:06:00 <zid> std::cout << std::hex << std::setfill('0') << std::setw(8) << x << std::dec << std::endl;
19:06:00 <zid> is the prefered method.
19:06:00 <heat> haha
19:06:00 <heat> aka %08x
19:07:00 <zid>
19:07:00 <bslsk05> ​ C++ FQA Lite: Input/output via <iostream> and <cstdio>
19:08:00 <tepperson_> ok i now have the first 2 megabytes of ram mapped with my page tables (with a 2mb page directory entry). jmp 0x104080 actually jumps to 0x407e, what might cause that?
19:08:00 <sham1> Just use std::printf like a sane persohnj
19:08:00 <zid> user error?
19:09:00 <zid> either the assembler, or one of the debugger tools got told/asked the wrong thing somewhere
19:10:00 <mjg> lol > or some reason, it doesn't throw an exception (it's not really bad, because what's really bad is C++ exceptions).
19:14:00 <gog> %p
19:14:00 <gog> %pp
19:14:00 <Ermine> gog: may I pet you
19:14:00 <gog> you may
19:14:00 * Ermine pets gog
19:14:00 * gog prr
19:16:00 <mjg> heat: where the pull request at mofo
19:16:00 <heat>
19:16:00 <bslsk05> ​ Print PIDs and exit status in decimal by heatd · Pull Request #522 · intel/pcm · GitHub
19:17:00 <tepperson_> is my processor in long mode here?
19:17:00 <bslsk05> ​ (gdb) i rrax 0x10405b 1065051rbx 0x10000 -
19:17:00 <heat> no
19:17:00 <mjg> heat: maybe ship them with sample bad output:
19:17:00 <bslsk05> ​ <no title>
19:17:00 <zid> you want figure 4-1
19:18:00 <heat> actually yes you may be in long mode
19:19:00 <zid> he is
19:19:00 <zid> PG and LM are set
19:19:00 <heat> turns out PSE is a don't-care bit in 64-bit
19:19:00 <heat> PG and LM are set, but I know jack shit about his GDT
19:20:00 <tepperson_> do i need to set LMA in the EFER register?
19:20:00 <heat> have you tried reading docs
19:20:00 <heat> it could help
19:20:00 <tepperson_> i cant find efer anywhere in the do i am looking at
19:20:00 <heat> how?
19:21:00 <tepperson_> ctrl f, "efer", whole words, phrase not found
19:22:00 <heat> "whole words"
19:22:00 <heat> there's your issue
19:22:00 <heat> try IA32_EFER whole words
19:23:00 <tepperson_> ah i see now. it looks like i told it to do long mode, but the IA32_EFER.LMA isn't turning on to indicate long mode active
19:34:00 <zid> oh right, needs to do the gdt swap to get out of *compat* mode
19:34:00 <zid> which is what he's in
19:36:00 <heat> qemu's monitor makes it painfully obvious as all the register names and widths change
19:54:00 <tepperson_> ok i figured out how to get the qemu monitor and gdb at the same time, cooking bacon now
20:10:00 <davros1> Anyone here code for retro consoles
20:11:00 <davros1> (I dont have a question, I'm just curious if there's overlap of interest in this community)
20:11:00 <davros1> Machines where you could hold the hardware in your head and you code by hitting the metal
20:12:00 <davros1> I do miss that (perhaps doing something on web assembly is making me want to 'purge' this way)
20:13:00 <clever> davros1: i recently wrote a basic framework for c on gba, to help a friend out
20:13:00 <davros1> Ok nice, never used one of them
20:14:00 <clever>
20:14:00 <clever> same, ive never owned a gba, but the .elf works in an emulator
20:14:00 <clever> it can draw to the framebuffer, wait for vsync irq, and poll all inputs on every frame
20:14:00 <clever> all thats missing is audio support, and an actual game, lol
20:18:00 <davros1> Wondering about using generative AI and downscaling to retro machines
20:18:00 <davros1> Wont look as crisp as pixelart but you could make a low-effort game look more interesting that way
20:24:00 <GeDaMo>
20:24:00 <bslsk05> ​ Wizard mouse in laboratory, pixel art : dalle2
20:25:00 <davros1> Hah yeah I hadn't looked for actual pixelart finetunes and so on
20:25:00 <GeDaMo> Have you seen the DALL-E prompt book?
20:26:00 <davros1> No. I've been experimenting with stablediffusion . I like the ability to run locally.
20:26:00 <GeDaMo>
20:38:00 <GeDaMo> This site allows you to search images generated by Stable Diffusion
20:38:00 <bslsk05> ​ Lexica - pixel art
20:49:00 <davros1> What I'm seeing for pixel art there is quite different to the tailored palette work of classic pixel art.. but the Dalle things above were closer.
20:49:00 <geist> will be interesting to see when the first game that somehow integrates this sort of image generation in the game itself
20:49:00 <davros1> Still it doesn't matter. I think its still a good "quality : effort" tradeoff.. doesn't have to be perfect
20:49:00 <geist> cant think of how that'd owkr, but you could imagine some sort of clever puzzle thing where part of the puzzle is generated art
20:50:00 <geist> or machine generated art treated as a level, or whatnot
20:50:00 <davros1> Heh yeah Geist - stable diffusion on the highest end GPU can spit out new images in a couple of seconds . Imagine a scroll speed tuned such that you're always moving in to newly generated things
20:51:00 <geist> yah. and of course that'll get moare optimized over time, etc
20:51:00 <davros1> But even without goign that far.. being able to make random mazes or whatever , and just visually enhance them with "Img2img" would be great
20:51:00 <geist> yah, probaby some sort of indie game tries it first, with it being the main trick of the game, and then slowly becmes more standard
20:53:00 <davros1> I think for AAA hand tuned art will win in 3d. Generative will produce a rough dreamlike style that will suit indies (quality:budget)
20:53:00 <davros1> Indies and modders
20:53:00 <geist> yah
20:54:00 <geist> i was planning on piddling with stable diffusion maybe this weekend. my older 1080ti should still be able to crunch numbers fairly well
20:54:00 <davros1> Things like Minecraft/roblox ..
20:55:00 <GeDaMo> It might be possible to make smaller models trained for specific games / styles
20:55:00 <davros1> Yeah that should do 10-15 seconds per image.. better than hugging face certainly
20:55:00 <geist> yah, toms hardware has some benchmarks
20:55:00 <geist> iirc the 1080ti is somewhere like a 3060 iirc
20:55:00 <GeDaMo> Facebook / Meta just released a paper on their new LLM which showed a smaller model trained with more data can match GPT-3
20:55:00 <geist> assuming it doens't need specific hardware that the new rtxes have
20:56:00 <davros1> Its worth grabbing a 3000 series card or even 4000 if you can
20:56:00 <geist> yah i've been putting it off unti i can nicey find a strong replacement for it
20:56:00 <geist> but this may be something that pushes me to
20:57:00 <davros1> I'm fine with splashing out on a big GPU but electricity is the limiting factor really
20:57:00 <davros1> UK electric bills are high
20:57:00 <davros1> If it wasn't for that I'd be running 2
20:57:00 <geist> yah i get ya there
20:58:00 <davros1> SD img2img has also encouraged me to get back to amateur art
20:58:00 <davros1> Doing my own doodles and AI enhancing
20:58:00 <davros1> AI can't do 3d game ready yet, hence the interest in retro styles
20:58:00 <davros1> And hence the interest in retro hardware again hah
20:59:00 <bnchs> AI isn't accurate yet lol
20:59:00 <gog> it's not very intelligent
20:59:00 <gog> like me
20:59:00 <davros1> Neither are people in many ways,although I know what you're saying
20:59:00 <bnchs> yeah, most of the times you catch it saying dumb shit
20:59:00 <davros1> chatGPT bullshits alot, like a human ..
21:00:00 <gog> yes
21:00:00 <bnchs> especially when you tell it to explain something
21:00:00 <gog> chatGatekeep, chatGaslight, chatGPT
21:00:00 <davros1> But the art generators - if you use 'img2img' you can sketch, say what it's supposed to be, and it'll detail for you. This is pretty good quality/control/productivity balance. man+machine
21:01:00 <tepperson_> i asked chatgpt to make a dummy kernel driver and it used structs that don't exist in any kernel
21:01:00 <davros1> Yeah I wont use it for code
21:01:00 <gog> it doesn't really have a capacity for abstract reasoning
21:02:00 <gog> such a thing can't really code
21:02:00 <davros1> Script kiddies are getting excited by it though..
21:03:00 <bnchs> yeah
21:04:00 <bnchs> can't even make a working kernel driver
21:07:00 <mats1> its not good but its not useless
21:07:00 <mats1> just needs some care and sanitisation
21:08:00 <mats1> like cvs where there's six self checkout stations and one guy to tend to them / greet guests
21:08:00 <sbalmos> those six checkout stations require 6 truckloads of receipt printer paper
21:08:00 <mats1> there's great business value in eating the low hanging fruit jerbs where retards are googling and including left pad libraries
21:10:00 <mats1>
21:10:00 <bslsk05> ​ke0z/VulChatGPT - Use IDA PRO HexRays decompiler with OpenAI(ChatGPT) to find possible vulnerabilities in binaries (17 forks/203 stargazers)
21:50:00 <heat> geist, do you have any perf numbers on 5-level paging vs 4-level paging?
21:50:00 <geist> i do not. on x86?
21:50:00 <heat> it sounds like something you'd have
21:50:00 <heat> yeah
21:50:00 <heat> also cc mjg
21:50:00 <geist> no i haven't actually fiddled with any machines with 5 level enabled
21:51:00 <geist> hypothetically with page table caching and aggressive data caching there's probably not *too* much of a hit
21:51:00 <geist> possibly the bigger hit is simply that it allows the system to be even more scattered
21:51:00 <mjg> i don't, but i would expect intel to have a hard-to-find paper with it
21:51:00 <mjg> alternatively lkml with submiion for la5
21:52:00 <geist> FWIW riscv has thesame thing now too, but surprisingly linux mainline seems to only currently support sv39
21:54:00 <gorgonical> geist: isn't that because no hardware really supports anything higher than sv39?
21:54:00 <heat> i think linux should enable PML5 by default?
21:54:00 <heat> huh you sure it only supports 39?
21:54:00 <heat> i think they usually lock that stuff behind a CONFIG_
21:54:00 <geist> yah, except qemu and probably stuff in dev
21:54:00 <gorgonical> I can't help but wonder if the kernel support is just lagging along then
21:54:00 <geist> or it's sitting in a branch yet and hasn't been merged
21:54:00 <geist> heat: yeah it seems to only have the SV39 config option
21:55:00 <geist> obviously someone will add it, just don't see it yet
21:55:00 <gog> hi
21:55:00 <heat>
21:55:00 <heat> at the very least they have layouts for 48 and 57
21:55:00 <geist> yah makes sense
21:55:00 <geist> may as well design it even if it's not a build option yet
21:56:00 <netbsduser> on chatgpt: it has declined significantly, in november it was able to give me competent advice and spot real issues, now it can barely match a typical stackoverflow answer and it only spots superficial issues. i was really disappointed to see how much it declined
21:58:00 <sham1> As an OpenAI language model, I cannot e expected to actually be useful at this time.
21:58:00 <heat> geist, yeah you're right, seems to assume 39 only
21:58:00 <heat> very weird IMO
21:58:00 <tepperson_> i thought i would try bochs, but on ubuntu it can't even load grub2
21:58:00 <heat> 48 sounds like the no-brainer (because of x86, etc)
21:59:00 <heat> i guess they wanted to support smaller devices?
21:59:00 <heat> *shrug*
22:00:00 <heat> does android still do 39-bits on arm64 too?
22:00:00 <geist> well, yeah i think all the existing hardware except for qemu only supports up to sv39. the next batch will probably pick up sv48, but it does seem that 39 is the general sweet spot for current non datacenter hardware
22:01:00 <heat> i have not messed with support multiple va sizes yet
22:01:00 <heat> supporting*
22:03:00 <geist> so an annoying thing i just discovered about my vf2 board
22:03:00 <geist> it seems to not support that many ASID bits
22:03:00 <geist> possibly 0, which is valid. linux dmesg shows that it basically decides it can't do ASID because not enough bits
22:04:00 <geist> and the logic in linux to determine that is (num_asids >= 2*NR_CPUS)
22:04:00 <geist> since NR_CPUS is 8 in this particular kernel, that means it must have declared less than 4 bits of ASID support
22:04:00 <geist> (probably 0)
22:39:00 <mjg> yo
22:40:00 <mjg> what, if any, cpus today do *software* tlb?
22:40:00 <mjg> not mips
22:40:00 <mjg> something which is not slow
22:40:00 <mjg> :]
22:41:00 <\Test_User> > not slow
22:41:00 <\Test_User> > run code for every (uncached) memory access to translate pages
22:41:00 <\Test_User> seems to conflict a bit to me
22:41:00 <heat> mjg, IA64 you FUCKEN BITCH
22:55:00 <heat> geist, i'm starting to think that riscv really is digging its own grave with i n f i n i t e e x t e n s i o n s
22:55:00 <heat> in this case, infinite optional crap
23:03:00 <geist> there appears to be an effort to standardize via the riscv profiles stuff
23:03:00 <geist> ie RVA20, RVA22, etc
23:04:00 <geist> would be somewhat akin to armv8.x in that you define a series of increasing profiles that include previous ones and add to, with a list of mandatory and optional bits
23:04:00 <chibill> Honestly I start writing an OS, get to where I can print to the screen (So basically nothing) and then just sort of can't decide how to proceed because my whole drive to write an OS is to just do it for fun so I have no real goal. :(
23:05:00 <geist> well, you can make interrupts work. that's fun
23:05:00 <heat> for fun != real goal
23:05:00 <heat> everyone here does this for fun
23:05:00 <heat> no one's getting rich off this
23:05:00 <heat> erm
23:05:00 <heat> for fun = real goal
23:06:00 <chibill> Like even for fun I have no end goal of where I want to end up xD
23:06:00 <geist> right i find it fun to just climb up the tech tree
23:06:00 <geist> ie, start adding more and more features akin to a Real OS
23:06:00 <geist> that's fine. treat it as a journey
23:06:00 <geist> a series of features you add
23:07:00 <heat> chibill, what features do you like from your system (linux, etc)?
23:07:00 <heat> *that do not involve graphics*
23:07:00 <heat> think small, and use that as an objective
23:10:00 <heat> like "oh I really like this bash thing to screw around", so you work towards that
23:10:00 <heat> or an http server, etc
23:11:00 <chibill> Hm. Just realized I can set a goal of at least having as much stuff as XV6 (So a basic terminal system and file system access of some sort.) Biggest challenge for me is I am working in Rust. I feel like I am going to run into issues when I need access to thing in multiple places. Might start over in C.
23:11:00 <mjg> oh noez
23:11:00 <mjg> someone had a kernel written in rust
23:11:00 <mjg> you could probably use it as a reference when stuck
23:12:00 <heat> note that xv6 sucks
23:12:00 <heat> (IMO)
23:12:00 <chibill> mjg: You mean the RustOS blog thing?
23:12:00 <heat> xv6 is a stupid simple example of a unix-ish kernel but man the code is not really high quality
23:13:00 <heat> much like the original UNIX I guess
23:13:00 <mjg> chibill: no, made by a local
23:13:00 <geist> but that's a goal, do a better job than xv6
23:13:00 <mjg> Mutabah: where ya kernel at
23:13:00 * Mutabah is away (Sleep)
23:13:00 <heat> kernel is sleep
23:13:00 <mjg> dafaq that mirc shieeet
23:13:00 <chibill> heat: I agree, had to do some deep inside work on it for a College class. (Adding new syscalls, file systems blocks, a way to show the memory layout and things like that.)
23:14:00 <bnchs> ding dong ding dong wake up
23:14:00 <mjg> chibill:
23:14:00 <bslsk05> ​thepowersgang/rust_os - An OS kernel written in rust. Non POSIX (43 forks/627 stargazers/NOASSERTION)
23:16:00 <heat> one day (maybe today hrmm) I should try to write a better xv6
23:17:00 <heat> except in x86 because screw whatever old crap xv6 was targetting
23:17:00 <heat> i don't want a workflow based around simh lol
23:17:00 <heat> s/xv6/v6/ in that second statement
23:18:00 <mjg> while(xchg(&lk->locked, 1) != 0)
23:18:00 <mjg> // The xchg is atomic.
23:18:00 <mjg> ;
23:18:00 <mjg> fuck me
23:18:00 <heat> hahahaha
23:18:00 <heat> do you always open locking functions when you see a new OS
23:20:00 <mjg> // Release the lock, equivalent to lk->locked = 0.
23:20:00 <mjg> // This code can't use a C assignment, since it might
23:20:00 <mjg> // not be atomic. A real OS would use C atomics here.
23:20:00 <mjg> asm volatile("movl $0, %0" : "+m" (lk->locked) : );
23:20:00 <mjg> fucking
23:20:00 <heat> what's really irking me is that they use __sync_synchronize but then never bother to use any of the other ones
23:20:00 <mjg> heat: it is my iltmus test for quality
23:20:00 <heat> __sync_synchronize is way overkill
23:20:00 <heat> mjg, what do my locking primitives say
23:20:00 <mjg> what does it expand to? a full barrier?
23:20:00 <heat> yes, mfence
23:20:00 <mjg> heat: i already told you why they suck
23:20:00 <mjg> wow, that's stupid
23:20:00 <mjg> xchg already provides a full fence
23:21:00 <heat> mjg, yes, just wondering what they say about my OS
23:21:00 <heat> "passable but mildly incompetent" is probably a solid conclusion
23:21:00 <mjg> that it ouperforms openbsd at best
23:21:00 <heat> and net
23:21:00 <mjg> this is the real crime: asm volatile("movl $0, %0" : "+m" (lk->locked) : );
23:21:00 <mjg> with the comment above it
23:22:00 <heat> why?
23:22:00 <mjg> stackoverflow-level disinformation
23:22:00 <heat> wait, why does it do a memory fence before the store
23:22:00 <heat> in fact, you don't need a memory fence here at all
23:23:00 <mjg> you don't
23:23:00 <mjg> on that cpu
23:23:00 <mjg> it is common to not know that bit though
23:23:00 <mjg> you *do* need it on everything else
23:24:00 <heat> do you need a full one tho?
23:24:00 <heat> or just a store memory barrier?
23:25:00 <mjg> "release" it is called
23:25:00 <heat> yes so I think it's just a store memory barrier
23:26:00 <heat> "Creates an inter-thread happens-before constraint to acquire (or stronger) semantic loads that read from this release store. Can prevent sinking of code to after the operation."
23:26:00 <mjg> it guaranatees or *loads* and *stores* earlier in program order are finished
23:26:00 <mjg> at the same time does not prevent ops *past* the fence from leaking up
23:27:00 <heat> fun fact: which in x86 linux store fences literally just expands to something like (addl $0, -8(%esp))
23:27:00 <mjg> s/or//
23:27:00 <mjg> i think you mean a full fence
23:27:00 <mjg> i don't remember if the 32 sucker has the same trick
23:28:00 <mjg> it may be it needs it for release
23:28:00 <heat> actually I mean it for all
23:28:00 <bslsk05> ​ barrier.h - arch/x86/include/asm/barrier.h - Linux source code (v6.2.1) - Bootlin
23:28:00 <heat> although TIL there's an alternative there for the fence instructions
23:29:00 <mjg> that's not smp_*
23:30:00 <mjg> time to head off, happy to flame tomorrow
23:31:00 <heat> it seems that smp_rmb and smp_wmb are just compiler barriers
23:31:00 <heat> huh
23:32:00 <heat> hmm do you really need that lock there?
23:32:00 <mjg> they are for amd64
23:32:00 <mjg> i don't know about i386
23:32:00 <mjg> for a smp_mb you do
23:33:00 <heat> why don't you for a mov $0, (lock) then?
23:33:00 <mjg> see the previous remark about ops leaking up
23:33:00 <heat> for RELEASE semantics at least
23:33:00 <mjg> there is no lock mov
23:33:00 <mjg> or at least i have not seen one :]
23:34:00 <mjg> burp really need to go dawg
23:34:00 <heat> ah so mov works differently here, got it
23:34:00 <mjg> mov is literally just store this shit over there
23:34:00 <mjg> so does not matter what other cpus are doing in the area
23:34:00 <mjg> in contrast smoething like 'addl' could lose existing state
23:35:00 <mjg> which is why for proper smp use it gets the lock prefix
23:35:00 <mjg> and lock addl 0 is a no-op in terms of content of the target area
23:35:00 <mjg> while still resulting in all synchro normally associated with a locked op
23:35:00 * mjg off
23:36:00 * heat nods
23:42:00 <chibill> <heat> "except in x86 because screw..." <- lol XV6 was targeting x86, they stopped that and rewrote it to target risc-v now.
23:45:00 <heat> yes, yes they were
23:45:00 <heat> my plans were more to write a simple early unix clone
23:45:00 <heat> kinda started mixing up unix and xv6 when writing those sentences
23:50:00 <kof123> " cvs where there's six self checkout stations and one guy to tend to them / greet guests" <squints eyes, not familiar enough with cvs to tell if double-joke about cvs checkouts>
23:51:00 <epony> sixty-six
23:52:00 <epony> mirrors
23:52:00 <epony> of 1 main CVS server
23:52:00 <epony> and 1 backup CVS server
23:52:00 <epony> figure it out, beats github 100% of the time since 1991
23:55:00 <geist> i was gonna say but but cvs has some nice properties... but then i can't think of them
23:57:00 <chibill> CVS = Customer Values Separated
23:57:00 <epony> text format of the metadata / repo files
23:58:00 <epony> and known "operation" / generic support everywhere
23:59:00 <epony> simple enough to be implemented as a yak-shaving task
23:59:00 <epony> and suitable for coherent development teams
23:59:00 <epony> existed in the 90ies