Search logs: #osdev2

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev2&y=22&m=5&d=3

Tuesday, 3 May 2022

02:57:00 <klys> geist, what's the latest on your threadripper instability
02:59:00 * klys was out hiking this evening
08:02:00 <Ali_A> Just wondering how do I test if the processor successfully got into protected mode? I did load a gdt, enabled PE in cr0, and did a far jump, is there any way to test if this was successful or something bad happened (like an entry in gdt was wrong or something)
08:02:00 <kazinsal> if your code is executing after the far jump, it worked
08:03:00 <kazinsal> if the GDT entry is invalid then it'll most likely triple fault
08:03:00 <Ali_A> the problem is I can not be sure how to test if I can execute the code after that, (since after getting into 32 bit mode, I can not use bios
08:03:00 <Ali_A> so I can not print to the screen
08:04:00 <Mutabah> You can use the serial port
08:04:00 <Mutabah> or, if VGA mode/emulation is present, you can just draw directly to 0xB8000
08:06:00 <Ali_A> I will try and see if I can draw directly to the screen
08:06:00 <Ali_A> thanks!
08:07:00 <kazinsal> yeah blatting a few characters to the top left of the screen is usually my first test for something like that
08:18:00 <kingoffrance> bochs and qemu have 0xE9 can just "out" a byte to there
08:20:00 <Mutabah> Oh, does qemu have E9 now?
08:21:00 <Mutabah> It didn't last time I looked for it (... granted that was AAAGES ago)
08:22:00 <kingoffrance> qemu "-debugcon dev" option no idea if that is ancient (option name/syntax changed) or what "defaults" are etc. :)
08:22:00 <kingoffrance> i wonder why noone wires it so you can "read" from somewhere :D
08:22:00 <kingoffrance> i mean, seems a simple patch to code if you wanted to
08:23:00 <kazinsal> yeah, it's been there a while but it's an optional isa device
08:23:00 <Mutabah> ah.
08:23:00 <Mutabah> Eh, serial port is only a small amount of extra effort
08:23:00 <Mutabah> with the advantage of working on real hardware
08:23:00 <kazinsal> yeah
08:24:00 <kazinsal> a quick serial driver isn't much code and it's something that'll work on any platform
08:24:00 <kazinsal> and any hypervisor
08:32:00 <Ali_A> okay so it does work, after a far jump
08:32:00 <Ali_A> which I assume means I can execute 32-bit code
08:34:00 <Ali_A> however, I noticed when I try to load the SS segment with mov ax, 0x16 ; data segment offset in the gdt is 0x16 followed by `mov ss, ax` gdb somehow crashes or something
08:34:00 <kazinsal> selectors are aligned to 8 byte offsets -- 0x16 is not divisible by 8
08:36:00 <Ali_A> kazinsal, u r a genius, thanks!
08:36:00 <Ali_A> that was just meant to be 0x10 (16....)
08:37:00 <kazinsal> 👍
09:08:00 <mrvn> there are 10 kinds of people
09:08:00 <clever> those that understand binary, and those that dont
09:08:00 <Ali_A> mrvn 0b10 kind of people
09:32:00 <Griwes> and those who didn't know this joke was in ternary
13:27:00 <sbalmos> been spending an interesting past few days reading some of the redox 0.7 code. just wish there was better architectural design documentation (I know, same can be said of all hobby OSs)
16:33:00 <mrvn> you need to invest in some documentation driven design :)
17:59:00 <Griwes> I'm in a love/hate relationship with the osdev cycle between "things you wrote work so well you make much faster progress than you expected" and "the progress you've made reveals extremely fundamental bugs in the core of the OS"
17:59:00 <Griwes> For the past few sessions I've been at the former, now I'm at the latter
18:00:00 * Griwes shakes fists at how sysret leaves the segment registers in a hugely messy state requiring irq handling to adapt
18:01:00 <Griwes> (it also turns out that I still have some bugs in my avl tree, oopsie)
19:32:00 <geist> klys: re: ryzen instability. I did the first step: move the machine to a plcae where i can work on it, unplug the 10gbe (but leave it in). ran memtest for a few hours
19:32:00 <geist> then booted it and let it run. ran `watch -n1 'dmesg | tail -40'` to see if something showed up on the log just before it crashed
19:32:00 <geist> nope. lasted about a day
19:32:00 <geist> so next thing i'll do is start pulling out cards
19:33:00 <geist> i am suspecting the off brand mega cheap vid card
19:33:00 <geist> that i had to install because of the 10gbe vid card being pci-e x4 which used up the x16 slot that the old vid card was in
19:36:00 <clever> geist: ive started to notice the effects of your c++ support, vc4-elf-c++filt takes up a large chunk of my build times now!
19:36:00 <clever> while generating lk.elf.debug.lst
19:36:00 <klange> https://github.com/kuroko-lang/kuroko/releases/tag/v1.2.5
19:37:00 <bslsk05> github.com: Release Kuroko v1.2.5 · kuroko-lang/kuroko · GitHub
19:37:00 <clever> it seems to be in the all: target, and the only way to not, is to specify just the .elf as a target?
19:37:00 <klange> i finally got around to implementing steps in slices in kuroko, alongside switching over to slice objects and matching python3 on dropping __getslice__, et al. in favor of passing slice objects to __getitem__.
19:40:00 <geist> clever: really? like how long, seconds?
19:41:00 <clever> geist: 7 seconds
19:43:00 <geist> ah
19:43:00 <geist> well, suppose you can add a switch to turn it off (or on)
19:43:00 <clever> `make PROJECT=bootcode-fast-ntsc build-bootcode-fast-ntsc/lk.bin` works around it, but now i have to specify the project twice
19:43:00 * geist nods
19:49:00 <geist> well like i said it'd be easy to remove it from the all, or make it a separate target that is then optionally included (or not) based on a switch
19:49:00 <geist> iirc you're using a pretty old bulldozer cpu right?
19:51:00 <clever> yeah, fx-8350
19:51:00 <clever> but i'm also not using any c++ code currently, so the c++filt is pointless on my builds
19:52:00 <clever> oh, what if you just scanned the list of sources, and auto-disabled it?
19:52:00 <geist> do you use the .lst files or whatnot?
19:52:00 <clever> plus a flag to force it off anyways
19:52:00 <clever> i do use the .lst files any time i need to debug a fault
19:52:00 <geist> i see
19:52:00 <clever> and .debug.lst sometimes
19:52:00 <geist> well anyway6 you're a smart person. go disable it
19:53:00 <geist> i'm surprised it's substantially slow, usually that part is a blip compared to the cost of the disassembly in the first place
19:53:00 <geist> but... dunno
19:53:00 <geist> i do have very fast machines here so i tend to not see it
19:53:00 <clever> yeah, i can always just edit engine.mk or build.mk directly
19:53:00 <geist> does it only suhow up on VC sources?
19:53:00 <geist> possible your toolchain was built -O0 or something?
19:54:00 <geist> also surprised it runs slower with C++ symbols present, vs just the need for it to scan the file in the first place
19:54:00 <geist> seems that the piping and the scanning would be the slow part, and thus proportional to the size of the input
19:55:00 <clever> PROJECT=qemu-virt-arm64-test rebuilds in 2 seconds, from touching arch.c
19:55:00 <clever> so its probably a flaw in the vc4 gcc
19:56:00 <geist> yah might want to double check it's not compiled with -O0. i've had that problem once before
19:56:00 <geist> ran for liek 6 months on a project at work before discovering that whoever built the toolchain did it -O0 -g
19:57:00 <zid> modern -g doesn't actually slow down binaries does it?
19:57:00 <j`ey> do you even need a specific vc4 c++filt?
19:57:00 <j`ey> oh, to read the object files you do
19:57:00 <j`ey> (maybe?)
19:57:00 <geist> probably not. actually really -C i think on binutils is all you actually need for most of this
19:58:00 <geist> i think the notion of always piping output of objdump through c++filt as a separate step is just old habit of mine
19:58:00 <zid> we have -Og now though
19:58:00 <zid> which does what you 'want' when you think of -g slowing down binaries
19:58:00 <geist> primarily because i dont think -C always existed and i had some theory that piping allows for a little bit of parallelism
19:59:00 <clever> i could probably also cheat, and use the host c++filt instead of the vc4 cross compiler c++filt
19:59:00 <clever> they cant differ that much?
19:59:00 <geist> j`ey: probably not, i just included that for a complete description
19:59:00 <geist> probably not at all
19:59:00 <geist> or just dont include it. add a sewitch to the build system to turn it off or something and push up a patch
19:59:00 <clever> yeah
19:59:00 <zid> what is c++filt, anyway
19:59:00 <geist> again though, if it's substantiallyslower than an arm build either a) your toolchain is not compiled properly, you should look at that or b) you're building something substantially different
20:00:00 <geist> like a 2GB image file or something
20:00:00 <j`ey> zid: demangler
20:00:00 <geist> zid: basically whatever you pipe through it it looks for c++ mangled names and replaces in line
20:00:00 <zid> ah
20:00:00 <geist> so it's nice to take the output of a dissasembly, or symbol file, etc to demangle stuff
20:00:00 <zid> The upgraded version is called IDA, it looks at the machine code and spits out C
20:00:00 <geist> the LK build system basically generates a full suite of secondary files for this after linking, and runs all of them through c++filt
20:01:00 <geist> i was sad when we turned it off for zircon, but the much larger build there was really starting to take a substantial amount of time to disassemble/demangle
20:01:00 <geist> and basically zero people cared about the files but me, which IMO is frankly an issue
20:01:00 <geist> but i can't force folks to look at disassembly
20:01:00 <clever> lol
20:02:00 <zid> I'd troll C++ more but I'm having trouble concentrating with how hot these noodles are
20:02:00 <geist> i can only teach them the virtues of following along
20:02:00 <geist> zid: i'm aware of your trolling. i could sense you as a shark, circling around the conversation, trying to find the right spot to strike
20:02:00 <mrvn> clever: c++ name manging isnÄ't standardized (or at least prior to modules it isn't).
20:02:00 <zid> I'm doing lemaze breathing
20:03:00 <geist> mrvn: yeh but for a given version of gcc using a given c++abi verfsion it probably is at least the same across arches
20:03:00 <geist> ie, no arch specific parts to it
20:04:00 <mrvn> I wouldn't bet on it. And for vc4 you don't have gcc output.
20:04:00 <geist> and it is at lerast standardized *enough* that it's not been a problem recently. there were a few ABI breaking changes in the past but i think that's mostly gone
20:04:00 <geist> well, it can't change much or you'd have actual name resolution linking issues
20:04:00 <mrvn> worst case you are probably left with few not demangled parts.
20:04:00 <clever> mrvn: https://github.com/itszor/gcc-vc4
20:04:00 <bslsk05> itszor/gcc-vc4 - Port of GCC to the VideoCore4 processor. (6 forks/4 stargazers/NOASSERTION)
20:05:00 <mrvn> clever: is that what the firmware uses?
20:05:00 <geist> anyway you can generally tell if something is -O0 by just looking at the dissasembly of it
20:05:00 <clever> mrvn: nope
20:05:00 <mrvn> clever: see
20:05:00 <geist> if it seems to be extraneously moving things to the stack and back that's a sure sign
20:05:00 <clever> the official firmware uses a metaware compiler from synopsys, that is behind NDA
20:05:00 <clever> but i only plan to use c++filt on code produced by the open gcc fork
20:05:00 <clever> the official firmware has symbols stripped anyways, so it wouldnt help
20:09:00 <geist> clever: side note when you compile are you using -j switches?
20:09:00 <geist> ie, make -j4 or whatnot? it should parallelize these things
20:09:00 <clever> i forget about that often
20:09:00 <clever> and 99% of the time, only 1 file has changed, so there is little benefit
20:09:00 <mrvn> geist: My issue is that ABIs for different archs can mangle differently. Within an ABI it has to be standard or as you say linking fails.
20:09:00 <clever> i only remember they exist when i change a MODULES +=, and it takes a while
20:10:00 <mrvn> geist: so using the host c++filt might give different result than the cross c++filt
20:10:00 <geist> mrvn: well, you might be right, but i hae seen no proof of this
20:10:00 <geist> if it were different per arch it seems like each arch's ABI guide would have a section for it
20:10:00 <geist> maybe it does though, i'll ask folks at work today
20:11:00 <geist> i suppose it could for things like AVX512 registers or whatnot: ie, specialized types based on arch
20:11:00 <mrvn> geist: can't say that I have an example
20:11:00 <mrvn> some windows and unix formats could differ
20:11:00 <geist> *thats* absolutely true
20:11:00 <geist> i should have been more clear: within the same ABI family (sysv, etc) it shuoldn't change
20:12:00 <geist> but anwyay, moot point. i wouldn't recommend doing it anyway
20:13:00 <mrvn> c++filt probably has the rules for all the known formats in it. It doesn't see if the binary was elf or pe or whatever and still has to work,
20:13:00 <geist> yah. and anyway like i said objdump and whatnot has a -C flag for it now, which runs the filtering inline
20:13:00 <geist> would be interesting to have clever time that
20:13:00 <geist> an external c++filt vs adding -C
20:13:00 <clever> *looks*
20:13:00 <zid> does -C just pipe it through popen to c++filt though?
20:14:00 <clever> [nix-shell:~/apps/rpi/lk-overlay]$ time vc4-elf-objdump -dC build-bootcode-fast-ntsc/lk.elf > /dev/null
20:14:00 <clever> real 0m0.093s
20:14:00 <geist> dunno, presumably internally since it's built out of the same thing
20:14:00 <clever> its virtualy the same runtime as without -C
20:14:00 <clever> real 0m0.092s
20:14:00 <zid> If it changes the timing what do you even conclude there
20:14:00 <clever> down into the noise
20:14:00 <mrvn> can't see how writing and reading back the name could save time really.
20:14:00 <geist> interesting, double check that it's actually demangling things
20:14:00 <clever> let me pop a .cpp file into the build...
20:15:00 <mrvn> demaningling should be linear time
20:15:00 <zid> not unless your name lookups are O(1)
20:15:00 <geist> mrvn: yeah that's my though, mostly linear based ont he input size too
20:15:00 <geist> which is why it feels highly odd that it'd take like 7 seconds
20:15:00 <mrvn> zid: what name lookups?
20:15:00 <geist> clever: time the pipe too
20:15:00 <geist> double verify it's actually taking 7 seconds
20:16:00 <zid> it's doing a string lookup on input tokens to output tokens isn't it
20:16:00 <geist> or you're not misreading one of them. also try the debug version of it. that usually takes substantially longer
20:16:00 <zid> I'd expect log n anyway though which will basically be O(1)
20:16:00 <zid> for less than a few hundred thousand symbols
20:16:00 <mrvn> zid: my guess would be reading char by char till it finds something that could be a mangled name. Then it demangles and if it works it outputs the demangled string, otherwise the original.
20:16:00 <geist> alas i gotta go. the meetings are starting. will be occupied for most of the rest of the afternoon
20:17:00 <geist> MEETINGS ARE THE BEST
20:17:00 <geist> (been watching the show Severance lately, it's *fantastic*)
20:17:00 <zid> oh yea could be doing that, I don't know enough about how reverseable the mangling is to know if it can do that
20:17:00 <mrvn> zid: O(input size). Can't be faster than touching every char once.
20:17:00 <zid> or if it has to LUT them
20:17:00 <clever> 800044da: ff 9f ea ff bl 800044ae <test::foo(int)>
20:17:00 <clever> that is in the default lk.elf.debug.lst
20:18:00 <clever> [nix-shell:~/apps/rpi/lk-overlay]$ time vc4-elf-objdump -dC build-bootcode-fast-ntsc/lk.elf | grep test::
20:18:00 <clever> 800044ae <test::foo(int)>:
20:18:00 <clever> and its also in the -dC output
20:18:00 <clever> 800044da: ff 9f ea ff bl 800044ae <test::foo(int)>
20:18:00 <mrvn> what's the mangled name?
20:18:00 <clever> real 0m0.101s
20:18:00 <geist> cool, so now time it piped
20:18:00 <clever> 800044da: ff 9f ea ff bl 800044ae <_ZN4test3fooEi>
20:18:00 <clever> without -C, it turns into this
20:19:00 <mrvn> $ echo foo bar _ZN4test3fooEi baz | c++filt
20:19:00 <mrvn> foo bar test::foo(int) baz
20:19:00 <geist> yah at some point i actually grokked the format. basically _ZN is i think the return part, then each thing after that is i think a length, name, and code to modify it
20:19:00 <clever> [nix-shell:~/apps/rpi/lk-overlay]$ time vc4-elf-objdump -d build-bootcode-fast-ntsc/lk.elf | vc4-elf-c++filt | grep test | grep foo
20:19:00 <clever> 800044ae <test::foo(int)>:
20:19:00 <clever> geist: ok, so at least with this, its still fast...
20:19:00 <clever> 800044da: ff 9f ea ff bl 800044ae <test::foo(int)>
20:19:00 <clever> real 0m0.098s
20:19:00 <zid> yea the mangle format actually looks fairly simple for gcc at least
20:19:00 <geist> yah that's hwy i'm suspecting your initial hypothesis is off
20:20:00 <clever> i checked top multiple times, and c++filt was at the top of the charts
20:20:00 <mrvn> you could just starce it to see if it forks c++-filt
20:20:00 <clever> let me shove a time into your makefiles...
20:20:00 <geist> right, add a echo date or whatnot before and after
20:20:00 <geist> also it runs a lot of things through c++filt, it might not be the disassembly
20:20:00 <geist> there's a symbol table dump, etc
20:20:00 <geist> maybe one of the other things is really slow
20:21:00 <clever> yeah
20:21:00 <geist> https://github.com/littlekernel/lk/blob/master/make/build.mk#L39 etc
20:21:00 <bslsk05> github.com: lk/build.mk at master · littlekernel/lk · GitHub
20:21:00 <geist> and the rules below
20:21:00 <geist> though the .debug.lst should be the largest by far
20:21:00 <geist> make sure you benchmark *that*
20:21:00 <clever> 42 $(info generating listing: $@)
20:21:00 <clever> 43 time $(OBJDUMP) $(ARCH_OBJDUMP_FLAGS) -S $< | $(CPPFILT) > $@
20:21:00 <geist> which generates the full debug listing files
20:21:00 <clever> real 0m4.450s
20:21:00 <geist> yes that's the -S one
20:22:00 <geist> that's the slow one, rerun your tests with that (as i think i keep saying)
20:22:00 <clever> time vc4-elf-objdump -S build-bootcode-fast-ntsc/lk.elf | vc4-elf-c++filt > build-bootcode-fast-ntsc/lk.elf.debug.lst
20:22:00 <clever> oh
20:22:00 <clever> i just realized something, *checking*
20:22:00 <clever> [nix-shell:~/apps/rpi/lk-overlay]$ time vc4-elf-objdump -S build-bootcode-fast-ntsc/lk.elf | vc4-elf-c++filt > /dev/null
20:23:00 <clever> dev null, shaves 4 seconds off it....
20:23:00 <clever> because i have zfs set to lz4 everything on that filesystem
20:23:00 <geist> and it's a huge file
20:23:00 <geist> that should make any overhead of c++filt even *less* important
20:23:00 <geist> anyway, alas gotta go. figure it out!
20:23:00 <clever> yeah
20:23:00 <geist> JUST DO IT as the great shia lebeuf once said
20:24:00 <clever> its clearly not c++filt now that i check that
20:24:00 <clever> its the disk io
20:24:00 <geist> (but this does point out that i can probably switch these to -C with no real ill effect)
20:24:00 <zid> https://preview.redd.it/ixp71ug0j8x81.png?width=512&auto=webp&s=0605bc9d406ef0b4c4553cca4fdca486812715ee
20:24:00 <zid> replace 'task manager' with 'time ./'
20:24:00 <geist> zid: hahaha
20:24:00 <clever> geist: also, this answers a second puzzling thing
20:25:00 <clever> i havent updated my lk reference in a while
20:25:00 <clever> so i was slightly puzzled as to how your recent c++ work got into my worktree, lol
20:25:00 <clever> it didnt!
20:26:00 <geist> nah there's been C++ code n LK for a long time. like 10 years
20:26:00 <geist> just not used that much
20:27:00 <geist> but doing more drivers/subsystems in it, but will remain C as the abi between modules
20:27:00 <clever> yeah
20:27:00 <clever> so this is code that ive been running since the day i started using LK
20:27:00 <clever> and its nothing you changed recently
20:27:00 <zid> Programmers hate this one weird trick to avoid C++ ABI issues: gcc -x c
20:27:00 <clever> my fs is just being wonky
20:31:00 <clever> [nix-shell:~/apps/rpi/lk-overlay]$ ls -lhs build-bootcode-fast-ntsc/lk.elf.debug.lst
20:31:00 <clever> 2.4M -rw-r--r-- 1 clever users 1.1M May 3 17:28 build-bootcode-fast-ntsc/lk.elf.debug.lst
20:31:00 <clever> oh god, gang blocks!
20:31:00 <clever> its my fragmentation coming back to bite me!
21:05:00 <geist> between meetings: i just had a thought
21:05:00 <geist> if the writing out of the file is very expensive because of zfs and lz4 then the last process in the pipe chain is charged the cost
21:05:00 <geist> ie, foo | bar > baz
21:05:00 <geist> bar gets all the kernel time accounted for it since it's left writing to the FD
21:06:00 <geist> hence why c++filt maybe seems to be expensive
21:07:00 <clever> i think its not lz4, because turning that off didnt help
21:07:00 <clever> i think its the severe fragmentation
21:07:00 <clever> and the cpu cost, to just find free blocks
21:07:00 <geist> word.
21:07:00 <geist> `filefrag -v` is a great tool for thsi too
21:08:00 <geist> though i dunno if zfs is wired through for this
21:08:00 <clever> zfs isnt compatible with filefrag
21:08:00 <clever> you have to use the zdb cmd instead
21:08:00 <clever> filefrag assumes your fs is backed by a single block device
21:09:00 <clever> same reason LK isnt able to mount zfs so easily
21:11:00 <geist> well, works fine with btrfs
21:11:00 <geist> but btrfs goes through one level of translation, so the addresses filefrag returns are logical
21:11:00 <geist> (i guess)
21:11:00 <clever> for zfs, every block is a tuple of: the vdev#, the block#, the hash of the block, and more!
21:11:00 <mrvn> clever: how little ram do you have that writing c++filt output to disk flushes the contents?
21:12:00 <clever> mrvn: 32gigs
21:12:00 <geist> there might be some sort of encoding to the block addresses it returns that's non obvious, but i think btrfs has a intermediate translation where the FS operates in logical address mode and theres a layer that allocates rather large chunks out of the underlying physical devices
21:12:00 <geist> nice thing is it can move those physical slices around without modifying the higher level FS data structures
21:12:00 <geist> and/or duplicate (raid1, etc)
21:13:00 <geist> the arge chunks are usually in order of 1GB or so, so you dont need a very expensive translation table
21:13:00 <clever> this is a typical file under zdb: https://gist.github.com/cleverca22/ed74fbeda71c8e7339c49a72f26e8918
21:13:00 <bslsk05> gist.github.com: gist:ed74fbeda71c8e7339c49a72f26e8918 · GitHub
21:14:00 * klys arrives from work
21:14:00 <clever> in this case, each block is 128kb, and it is stored as 2 blocks (the L0's), and then there is a single L1 containing pointers to the L0's
21:14:00 * geist nods
21:14:00 <mrvn> Does c++filt have a check for /dev/null like tar does and skip outputing anything?
21:14:00 <clever> and i'm not sure how that totals up right, because it claims to be 469kb on-disk
21:15:00 <geist> mrvn: i dont think so, i think you can basically pipe a raw binary through it and it'll dutifully translate whatever it sees
21:15:00 <clever> [nix-shell:~/apps/rpi/lk-overlay]$ time vc4-elf-objdump -S build-bootcode-fast-ntsc/lk.elf | vc4-elf-c++filt > /dev/shm/lk.elf.debug.lst
21:15:00 <clever> real 0m0.162s
21:15:00 <geist> oh i see what you mean, on the output
21:15:00 <geist> probably not
21:15:00 <clever> mrvn: and when writing to a tmpfs, its instant
21:15:00 <geist> well, good to know at least
21:15:00 <clever> so its not c++filt being cheaty, its the fs being slow
21:15:00 <mrvn> clever: don't tell me c
21:15:00 <mrvn> c++filt does a fsync
21:15:00 <geist> also yet another reason i dont use ZFS
21:16:00 <geist> sounds like you need to defrag your disk, and/or get a bigger disk
21:16:00 <clever> aha, and i see part of the problem
21:16:00 <clever> 0 L0 DVA[0]=<0:3b0ff38000:67000> DVA[1]=<0:2fc4647000:4000> [L0 ZFS plain file] fletcher4 lz4 unencrypted LE gang
21:16:00 <clever> the word "gang" there, means it failed to find a free space chunk that was 128kb
21:17:00 <clever> so it had to instead use up 3+ chunks of free space, 1 to hold a list of chunks, and then 2 (or more) chunks of actual data
21:17:00 <geist> geez. is your disk totally full?
21:17:00 <clever> 4.8gig free
21:17:00 <geist> or just totally fragmented? seems like zfs would have some sort of online defragmentation
21:17:00 <geist> out of?
21:17:00 <mrvn> WTF? c++filt doesn't close and FDs or free any memory. it just calls exit_group(0)
21:17:00 <geist> mrvn: haha
21:17:00 <clever> https://gist.github.com/cleverca22/a10ae8fac93d1f55ca1dac09923a3360
21:17:00 <bslsk05> gist.github.com: gist:a10ae8fac93d1f55ca1dac09923a3360 · GitHub
21:17:00 <mrvn> clever: 4.8g free out of how many TB?
21:17:00 <zid> destructors are annoying :P
21:17:00 <clever> geist: this is a histogram of the size of free space chunks
21:18:00 <kazinsal> "fragmentation 91%"
21:18:00 <clever> so i have 2909 blocks, that are 32kb in length each
21:18:00 <clever> and then 25k holes, that are 16kb long each
21:18:00 <geist> yah issume those are in powers of 2, so looks like most of your free blocks are 4K or 8K
21:18:00 <clever> and down and down
21:18:00 <mrvn> Filesystems get really really really slow approaching 100% full
21:18:00 <geist> and yeah that super sucks. you better fix it
21:18:00 <clever> exactly
21:18:00 <zid> I ran out of inodes once that was fun
21:18:00 <geist> yah that's what i was asking about. 4.8GB out of what?
21:18:00 <clever> and because zfs is immutable, you have no real option to defrag
21:19:00 <zid> I filled a small disk with lots of small files
21:19:00 <zid> and it just went "Nope, too full", at half capacity
21:19:00 <clever> 320gig total size
21:19:00 <clever> your only defrag choice is to basically move the data elsewhere, delete, then move it back
21:19:00 <geist> well, sounds like time to move it off to another disk, recreate, and add back
21:19:00 <mrvn> soo <2% space. That really doesn't work with zfs
21:19:00 <clever> exactly
21:19:00 <clever> mrvn: there is also a special slop-space thing, one min
21:19:00 <zid> you can actually do a bunch of tricks on zfs though
21:20:00 <zid> One of my friends is a fs nerd and he does all sorts of things to it
21:20:00 <clever> [root@amd-nixos:~]# cat /sys/module/zfs/parameters/spa_slop_shift
21:20:00 <clever> 5
21:20:00 <clever> this reserves a certain mount of space for internal usage
21:20:00 <clever> so zfs doesnt deadlock when you "run out", much like ext4 has a reserve
21:20:00 <mrvn> At some point zfs becomes real carefull with the remaining free space because as a COW FS if it ever runs out it's dead.
21:20:00 <geist> sometimes moving lots of stuff to tar files, comprssing the shit out of it to free up space might hep it get some better runs on free space
21:20:00 <clever> if i pop a 15 into that file, i suddenly have 15gig free
21:21:00 <clever> so i have more free space then df claims, just because zfs is forcing me to not be that low
21:21:00 <clever> i forget the exact math behind slop space
21:21:00 <mrvn> clever: yeah, but for such a small disk you want quite a bit reserve
21:21:00 <clever> yep
21:21:00 <geist> the fact that it says 'shift' implies it's some power of something
21:21:00 <clever> yeah
21:21:00 <mrvn> full space >> slop shift?
21:22:00 <clever> smaller numbers mean more is reserved
21:22:00 <geist> mrvn: oooh yeah thats probably what it is
21:22:00 <clever> to the source!
21:22:00 <clever> https://github.com/openzfs/zfs/blob/master/module/zfs/spa_misc.c
21:22:00 <bslsk05> github.com: zfs/spa_misc.c at master · openzfs/zfs · GitHub
21:22:00 <geist> so smaller numbers would be larger and larger, yeah
21:22:00 <clever> https://github.com/openzfs/zfs/blob/master/module/zfs/spa_misc.c#L347-L356
21:22:00 <bslsk05> github.com: zfs/spa_misc.c at master · openzfs/zfs · GitHub
21:22:00 <clever> this comment explains it exactly
21:22:00 <mrvn> Anyway, go and buy a harddisk.
21:23:00 <geist> yah
21:23:00 <clever> [root@amd-nixos:~]# fdisk -l /dev/nvme0n1
21:23:00 <clever> Disk /dev/nvme0n1: 476.94 GiB, 512110190592 bytes, 1000215216 sectors
21:23:00 <clever> mrvn: i'm on a 470gig nvme drive
21:23:00 <mrvn> It's 20E well spend to double your disk space.
21:23:00 <Ermine> Did you see modern devices running m68k?
21:23:00 <kazinsal> this is why I don't use zfs
21:23:00 <geist> ah 320GB i thought you were using some old spinny disk
21:24:00 <geist> since that was a standard size for a while
21:24:00 <clever> geist: 320gig partition on a 470gig nvme
21:24:00 <mrvn> kazinsal: nothing to do with zfs. try ext4 or btrfs or any other. They all go exponential when reaching 100% full
21:24:00 <geist> well, tht's good. if you had a spinny disk this fragmented, it'd be a shit show on reading
21:24:00 <clever> with a 64gig swap partition for chrome to burn a hole in the nvme, lol
21:24:00 <geist> OTOH you would have noticed it much faster
21:24:00 <geist> mrvn: yes except COW fses will probably fragment the free space faster, on the average
21:25:00 <geist> but yes, running any fs that low is a bad idea
21:25:00 <mrvn> geist: depends on the FS design
21:25:00 <clever> let me double-check things...
21:25:00 <geist> indeed.
21:25:00 <clever> yep, there is a 94gig hole between zfs and swap
21:25:00 <clever> so i could just expand zfs by another 94gig on the spot
21:25:00 <kazinsal> I can honestly say I've never been (tangentially) involved in a conversation about ZFS that didn't involve a pile of esoteric troubleshooting and/or consulting hte source code
21:25:00 <clever> lets do it!
21:25:00 <geist> if your root is not ZFS you can switch to a /swap file and set it a little smaller/resize it
21:25:00 <geist> then reclaim that space too
21:26:00 <geist> kazinsal: haha
21:26:00 <geist> clever: back yer shit up first
21:26:00 <geist> alwayyyyyys do that
21:26:00 <clever> na!
21:26:00 * geist shrugs and goes back to meetings
21:26:00 <mrvn> geist: zfs allocates bigger chunks and uses them for data or metadata so they don't get interleaved and you can defrag too
21:26:00 <clever> ive done this once before, in the middle of a screen sharing session :P
21:27:00 <geist> because the bear didn't attack you before doesn't mean it's a good idea to sleep in the bear cave
21:27:00 <mrvn> Is swap on compressed zfs stable now?
21:27:00 <geist> yah in general i've moved to /swap files, as have a lot of distros. much nicer to not have to dork with the partition table in a fairly static way
21:27:00 <kazinsal> introducing the Leopards Eating Peoples Faces File System
21:27:00 <klange> _I didn't think the leopards would eat _my_ files/faces!_
21:28:00 <geist> hah on a related note i noticed in one of the recent netbsds it actually mentions LFS
21:28:00 <clever> geist: i did it the scary way, i just deleted the partition, then remade it! https://gist.github.com/cleverca22/4ffa587af2bfe0dc283f9ff4afa44368
21:28:00 <bslsk05> gist.github.com: gist:4ffa587af2bfe0dc283f9ff4afa44368 · GitHub
21:28:00 <geist> like 'LFS got some stability improvements' in netbsd 9 i think
21:28:00 <geist> like. wow someone uses LFS?
21:28:00 <clever> and the device node is just magically bigger, and still contains an fs
21:28:00 <mrvn> geist: linux filehirachy standard or large file support?
21:28:00 <geist> mrvn: oh silly. log based file system
21:28:00 <kazinsal> log-structured file system
21:29:00 <geist> the *old* one, from BSDs, back in the 80s
21:29:00 <mrvn> .oO(how is that still unstable?)
21:29:00 <geist> interesting idea, didn't go anywhere, has serious downsides, but one can argue that lots of the modern stuff is based on the idea
21:29:00 <clever> gist updated
21:29:00 <clever> i now have 3 holes, that are 2^28 bytes long
21:29:00 <geist> or at least it was potentially a source of ideas
21:29:00 <clever> 256mb each
21:29:00 <geist> though i hear DEC had some sort of log based fs at some point. Spiralog i think?
21:29:00 <mrvn> clever: have you defraged the fs lately?
21:30:00 <clever> mrvn: you cant really, zfs is immutable
21:30:00 <clever> your only option is to move+delete, then copy it back
21:30:00 <mrvn> clever: zfs has a defrag
21:30:00 <clever> what is the cmd called?
21:31:00 <mrvn> zdb something something
21:31:00 <clever> sounds like an offline operation
21:31:00 <geist> so take it offline and defrag it
21:32:00 <clever> oh, yeah, now i remember why i wasnt expanding it the last ~100gig
21:32:00 <clever> i had intentionally ran a blkdiscard on that 100gig partition, to force the nvme to have more free blocks internally
21:32:00 <clever> so its wear leveling had more room to flex
21:32:00 <geist> yah makes sense, but you can accomplish the same thing by just not using up the last of your zfs and making sure it trims things
21:32:00 <geist> OTOH, given your presumed nature, you'll probably now just run it down to the last bit
21:33:00 <clever> at the time, zfs didnt support trim
21:33:00 <geist> side note: i noticed that the `nvme list` command will show you the internal concept of how much the drive thinks it's in use
21:33:00 <geist> the 'namespace usage' column appears to track with recent trims
21:33:00 <clever> Node SN Model Namespace Usage Format FW Rev
21:33:00 <clever> /dev/nvme0n1 BTPY652506Q0512F INTEL SSDPEKKW512G7 1 512.11 GB / 512.11 GB 512 B + 0 B PSF109C
21:33:00 <clever> pretty useless in my case
21:34:00 <geist> yes. that means you have *zero* trimming going on
21:34:00 <clever> but i remember running a blkdiscard on a 100gig partition in the past
21:34:00 <clever> to create a 100gig hole in the device
21:34:00 <clever> its possible the firmware doesnt support things?
21:34:00 <geist> that is interesting, indeed
21:34:00 <mrvn> clever: I did it a few years ago and it's fully online. It just goes through the zfs data and copies data and metadata around that's fragmenting
21:35:00 <clever> /dev/nvme0n1 S3EUNB0J506630H Samsung SSD 960 EVO 500GB 1 498.80 GB / 500.11 GB 512 B + 0 B 2B7QCXE7
21:35:00 <clever> on my laptop, it reports this instead
21:35:00 <geist> and note you're also running it right to the edge
21:35:00 <geist> what fs are you using there?
21:35:00 <clever> zfs on both desktop and laptop
21:36:00 <geist> i think i'm starting to see a common pattern here
21:36:00 <geist> (zfs aint trimming, yo)
21:36:00 <clever> the laptop is also zfs ontop of luks
21:36:00 <clever> so i would need to get luks to also trim
21:36:00 <heat> TRIM is disabled on certain ssd's im pretty sure
21:36:00 <mrvn> zpool can trim
21:36:00 <geist> yah but not on those SSDs
21:36:00 <kazinsal> mmm, nested block device abstractions
21:36:00 <geist> i have actually i think that exact model
21:36:00 <clever> kazinsal: and lvm too!
21:36:00 <mrvn> zpool-trim — Initiate immediate TRIM operations for all free space in a ZFS storage pool
21:36:00 <geist> lsblk -D should show you if it is supported
21:37:00 <heat> hmm, maybe not trim. there was a common command that was disabled on a bunch of ssds
21:37:00 <clever> reports 512 byte block size for the nvme on both machines, but 0 for the lvm nodes that zfs sits ontop of
21:37:00 <heat> oh wait, yeah
21:37:00 <geist> theres your problem clever
21:37:00 <heat> queued trim, that's what it is
21:37:00 <clever> zfs ontop of lvm ontop of luks ontop of nvme
21:38:00 <geist> need to figure out how to let LVM punch that through
21:38:00 <geist> ah it's luks for sure
21:38:00 <mrvn> or just not use lvm
21:38:00 <geist> but iirc there's a mechanism to tell luks to allow punch throgh discards
21:38:00 <geist> though you hypothetically lose a bit of security that way
21:38:00 <kazinsal> does zfs not do encryption?
21:38:00 <geist> but it's an opt in, since by default you just fill the drive with garbage
21:38:00 <mrvn> totaly, can't trim a luks or everyone can see where you have unused space.
21:39:00 <geist> so even your 100GB wasn't doing anything because you did it on top of a luks that wasn't letting you punch it through
21:39:00 <geist> but like i said there's a flag or whatnot to allow it, if you're willing to accept the punch through
21:39:00 <clever> kazinsal: zfs encryption came around after i installed this laptop
21:39:00 <clever> geist: that 100gig hole was on a non-luks system
21:40:00 <geist> fine, anyway
21:40:00 <mrvn> clever: I forgot what exactly it was but zfs encryption has faults in the design.
21:40:00 <clever> it was a 100gig bare partition, that i directly ran blkdiscard on, and then deleted
21:40:00 <clever> so that range of the nvme was just not mapped to any blocks
21:40:00 <geist> okay, anyway, for your laptop i'd personally punch discards through
21:40:00 <kazinsal> kitchen sink systems tend to end up with design faults
21:40:00 <clever> mrvn: yeah, i trust luks more then zfs
21:40:00 <clever> geist: yeah, checking the man pages for how
21:41:00 <mrvn> zfs encryption has the problem of being glued on after the fact
21:41:00 <geist> i think you add 'discard' to the crypttab
21:41:00 <geist> i see it here at least
21:41:00 <clever> but i dont think i'm using a crypttab
21:41:00 <clever> *checks*
21:42:00 <clever> https://github.com/NixOS/nixpkgs/blob/master/nixos/modules/system/boot/luksroot.nix#L146-L147
21:42:00 <bslsk05> github.com: nixpkgs/luksroot.nix at master · NixOS/nixpkgs · GitHub
21:42:00 <clever> aha, its just plain cryptsetup luksOpen, and there is an allowDiscards option right there
21:43:00 <kazinsal> ha. now we're doing a "consult the source code" for the whole fucking operating system
21:43:00 <geist> yah and lsblk -D should tell you upon reboot if it stuck
21:43:00 <geist> they're using that nix thing which is <shrug>
21:43:00 <clever> kazinsal: and even the source for a specific install! https://github.com/cleverca22/nixos-configs/blob/master/system76.nix#L25-L27
21:43:00 <bslsk05> github.com: nixos-configs/system76.nix at master · cleverca22/nixos-configs · GitHub
21:43:00 <clever> kazinsal: nixos lets you use source to define how the entire machine is configured
21:43:00 <clever> so i just have to add allowDiscards=true; to line 26 and rebuild
21:43:00 <heat> kazinsal, the linux way
21:44:00 <heat> open source isn't broken, you just need to check the source code
21:44:00 <geist> heh, it's now the linux way huh?
21:44:00 <geist> sheesh
21:44:00 <heat> because everyone needs to know how to program
21:44:00 <heat> and use terminals
21:44:00 <CompanionCube> zdb is not a defrag
21:44:00 <heat> GUIs are for noobs
21:44:00 <CompanionCube> zdb is dumpe2fs
21:45:00 <geist> i was just mostly thinking how generally polished linux distros have been compared to what existed at the time
21:45:00 <geist> ie, installig slakware linux in 1995 was downright ez compared to a BSD
21:45:00 <geist> and that trend hs generally continued
21:46:00 <heat> right, but bsd is bsd
21:46:00 <heat> you don't need to check windows's source code, it just works
21:46:00 <heat> (tm)
21:46:00 <clever> geist: and once i flipped on allowDiscards and rebooted, i see discard support clean thru lvm to the block dev zfs uses
21:47:00 <clever> so lvm just passes trim on automatically, and luks was the only problem
21:47:00 <kazinsal> I think the idea of "infrastructure as code" has started causing people to slide back towards that older era of things not working out of the box
21:48:00 <geist> and the intel ssd probably just doesn't report it right
21:48:00 <clever> started a `zpool trim tank`, and i can see the usage in `nvme list` ticking down
21:48:00 <kazinsal> when you're declaratively describing your environment at every level in a manner that is then used to "compile" that to a working system there's so many different aspects that can be huge pain points
21:48:00 <kazinsal> you wouldn't use terraform to put together a desktop environment
21:48:00 <clever> kazinsal: i do!
21:48:00 <heat> well, that's a problem with linux
21:48:00 <clever> its called nix, not terraform, but same idea applies
21:49:00 <heat> so much choice that 75% of the combinations end up broken
21:49:00 <kazinsal> and you're the only one having issues with what should be an extremely solved problem
21:49:00 <clever> kazinsal: what issues?
21:49:00 <heat> also instead of a great desktop environment you end up with 5 crap ones because "erm, muh choice"
21:49:00 <kazinsal> the past hour and a half of janitoring your filesystems
21:50:00 <clever> kazinsal: thats not because of nixos, thats because ive got data-hoarding problems :P
21:50:00 <clever> and have been killing the drive with 0% free for months
21:50:00 <clever> i'm doing the same thing to a gentoo system :P
21:50:00 <clever> Filesystem Size Used Avail Use% Mounted on
21:50:00 <clever> /dev/sda1 73G 61G 13G 84% /
21:50:00 <clever> /dev/sdb1 3.7T 3.7T 4.9G 100% /media/videos/4tb
21:51:00 <clever> the free space has just vanished due to rounding errors, lol
21:53:00 <mrvn> data4 21.8T 43.8M 21.7T - - 0% 0% 1.00x ONLINE -
21:53:00 <mrvn> rpool 1.80T 1.44T 365G - - 15% 80% 1.00x ONLINE -
21:53:00 <mrvn> clever: and that is my desktop
21:53:00 <clever> NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
21:53:00 <mrvn> data hoprding, lol
21:53:00 <clever> amd 414G 305G 109G - - 70% 73% 1.12x ONLINE -
21:53:00 <clever> my desktop, after expanding the partition to fill out the rest of the drive
21:53:00 <clever> Data Units Read: 235,638,176 [120 TB]
21:53:00 <clever> Data Units Written: 685,035,746 [350 TB]
21:53:00 <clever> and what smartctl reports
21:54:00 <clever> Percentage Used: 71%
21:54:00 <clever> ive read elsewhere, that this is a percentage of the lifetime
21:55:00 <mrvn> clever: all those smart values and life time estimates are pretty mutch fiction. Accoring to specs my m2.key has a expected liftime of a few hours under load.
21:55:00 <bauen1> i have a question for cross compiling to arm-none-eabi, libm (math.h, ...) isn't defined as freestanding, but it only seems to reference __errno, so how bad of an idea is it to just link with libm in a freestanding env ?
21:56:00 <mrvn> bauen1: just check the license
21:56:00 <heat> geist, btw your printf tests are pretty cool
21:56:00 <heat> really comprehensive
21:57:00 <geist> yah was thinking of putting those in the unit tests too by sprintfing to a buffer, etc
21:57:00 <mrvn> Do they check bit correct float, double and long double scanf/printf?
21:57:00 <heat> no, kernel tests
21:57:00 <heat> ahh wait these come from lk?
21:58:00 <heat> i was looking at the fuchsia ones xD
21:58:00 <heat> they seem pretty decently unit-testy
21:58:00 <bauen1> mrvn: good thing I don't care about that, so I guess there aren't any other hidden surprises apart from __errno
21:59:00 <heat> bauen1, which libm?
21:59:00 <mrvn> bauen1: are you sure it only links __errno? You might get more symbols when something actually uses some functions
22:00:00 <bauen1> heat: mrvn: libm from newlib I _think_ but yes, I will probably push to get it replace with a header that just does a bunch of `#define atan2 __builtin_atan2` or something like that
22:00:00 <mrvn> bauen1: you can just link any lib in freestanding as long as the ABI, hard/soft float, red-zone, ... matches.
22:00:00 <heat> bauen1, -fbuiltin does that by default
22:00:00 <mrvn> bauen1: I don't think arm has a lot of builtin trig functions
22:01:00 <heat> -fbuiltin even lets you optimise a sin + cos to a sincos
22:01:00 <mrvn> Does aarch64 have trig functions in the fpu?
22:01:00 <bauen1> mrvn: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html makes it sound like gcc provides builtins for all of those math functions listed there
22:01:00 <geist> no
22:01:00 <bslsk05> gcc.gnu.org: Other Builtins (Using the GNU Compiler Collection (GCC))
22:02:00 <mrvn> bauen1: hmm, are they in libgcc then?
22:02:00 <heat> builtins in gcc may just call the library function
22:02:00 <bauen1> ah
22:03:00 <heat> __builtin_sin() is just a way to refer to the compiler that you want the compiler optimised version, if it exists
22:03:00 <mrvn> The builtin might just be so gcc can assume a function called "cos" is the cos function and optimize it
22:03:00 <heat> if you do -fbuiltin, __builtin_sin() is implicit wrt sin()
22:03:00 <mrvn> e.g. cos(0) == 1
22:04:00 <heat> ^^this is also why every libc and libm needs to be compiled with -fno-builtin, it will realise you're calculating the sin() and optimise it to a sin call - boom, stack blew up
22:04:00 <klange> it will absolutely not realize you are calculating sin, but for a lot of other stuff sure
22:04:00 <mrvn> heat: really? it's that smart? I've only had that happen for memcpy/strcpy so far.
22:05:00 <heat> idk
22:05:00 <heat> but i've seen a sincos implementation recurse onto itself by accident in #llvm
22:05:00 <heat> (that's a pretty simple example tho)
22:05:00 <mrvn> I really would hope that gcc would not optimize a function names memcpy to call memcpy
22:05:00 <heat> sin probably won't, but that's just an example
22:05:00 <bauen1> heat: do you have the documentation where it says that e.g. __builtin_cos could just call cos ?
22:05:00 <klange> mrvn: unfortunately, the optimizer has no idea of the name of the function it's optimizing, it seems
22:06:00 <klange> bauen1: there is no documentation, but I can tell you very plainly that it absolutely will just do that
22:06:00 <mrvn> klange: so make it push "builtin=off" when recursing into a buitin function
22:07:00 <heat> https://godbolt.org/z/G9z5j9sfa
22:07:00 <bslsk05> godbolt.org: Compiler Explorer
22:07:00 <heat> __builtin_<standard C library function>() is pretty redundant if you're compiling normally though
22:07:00 <bauen1> thanks, i guess i will just continue to (ab)use the cross compiled newlibs libm
22:08:00 <mrvn> __builtin_abs() makes sense
22:09:00 <mrvn> fabs even
22:09:00 <Ali_A> on intel's manual, VOL3 section 9.9 (switching modes) it says I need to provide IDT in order to switch to protected mode from real mode
22:09:00 <Ali_A> I need to do the following, load IDT using LIDT instruction, executes LTR to load TASK segment and execute STI, however, I only loaded GDT and enabled the cr0.PE followed by far jmp and it did switch to 32-bit mode and I verified that, by compiling 32-bit C code and it did run it, so what where those 3 steps for? or did I misunderstand the steps
22:09:00 <Ali_A> from manual?
22:09:00 <mrvn> https://godbolt.org/z/zsacdYhMv
22:10:00 <bslsk05> godbolt.org: Compiler Explorer
22:10:00 <heat> Ali_A, that's not true
22:10:00 <heat> you don't need an IDT to switch to protected mode
22:10:00 <mrvn> you only need an IDT if you want to do anything interesting
22:11:00 <heat> mrvn, https://godbolt.org/z/PGzG8qh85
22:11:00 <bslsk05> godbolt.org: Compiler Explorer
22:11:00 <mrvn> heat: -fno-builtin
22:11:00 <heat> https://godbolt.org/z/MaWEPModh
22:12:00 <bslsk05> godbolt.org: Compiler Explorer
22:12:00 <heat> -fno-builtin is stupid don't use it unless you must
22:12:00 <heat> usually you don't need to
22:12:00 <mrvn> If you don't use -fno-builtin then all the __builtin_* are implicit
22:13:00 <heat> yes
22:13:00 <Ali_A> Uploaded file: https://uploads.kiwiirc.com/files/52801932bf962537851e2c4bcfc5712a/image.png
22:13:00 <heat> "if you're compiling normally" <-- that's normally
22:13:00 <Ali_A> heat
22:13:00 <heat> Ali_A, well, that's a lie. you only need a GDT and paging structures if you're enabling paging (bet you're not right now)
22:14:00 <kazinsal> "to support reliable operation of the processor" is the key phrase there
22:14:00 <kazinsal> I would not call "any interrupt causes an immediate triple fault due to no IDT" to be reliable operation
22:15:00 <mrvn> kazinsal: works 100% reliable. Just don't turn on interrupts or fault
22:15:00 <Ali_A> No, it is okay I will attempt to enable paging today, but I just wanted to be sure, that I read the manual right and I was not missing something something.
22:15:00 <heat> tip: don't
22:15:00 <psykose> simply run zero code, and then so it will be perfectly run
22:15:00 <kazinsal> no operation is more reliable than disabling interrupts and NMIs and then halting
22:15:00 <heat> paging is totally non-trivial
22:15:00 <mrvn> Ali_A: as soon as you want to do something interesting you will need the IDT. But you can set that up in 32bit code.
22:15:00 <heat> in fact, it's hard
22:16:00 <kazinsal> paging is math, and math is hard, let's go shopping
22:16:00 <mrvn> kazinsal: can't disable NMIs. :)
22:16:00 <heat> do not rush paging, just take your time in 32-bit mode
22:16:00 <Ali_A> I have to enable at least 4 level paging to get to 64-bit mode so it is a must for me '=D
22:16:00 <heat> well, you've got your hands full then
22:16:00 <mrvn> Ali_A: you can map 2MB pages or even 1GiB pages if your CPU supports that. Much fewer levels.
22:17:00 <mrvn> Ali_A: Most people just map the first 2GB of memory to 0 and -2GB.
22:17:00 <mrvn> or even just 1
22:18:00 <Ali_A> I was expecting it to be something as simple as getting into 32-bit mode (turns out that was not simple at all I wasted 6 hours to get to work) + I read in the manual that to switch to 64-bit mode, u have to have at least 4 level paging (not sure what advantage I will get from 4-level paging or 5 level paging but it is just a step required by the
22:18:00 <Ali_A> processor)
22:18:00 <kazinsal> mrvn: if your machine has just booted then it's in legacy mode and you're using an XT PIC and can disable external NMI routing on it!
22:18:00 <kazinsal> now, I don't know what happens if a cosmic bitflip occurs while the processor is in a HALT state in a manner that causes it to resume from HALT state...
22:18:00 <mrvn> kazinsal: oh, I'm never in that mode, that's pre UEFI
22:19:00 <mrvn> Ali_A: 5 level page tables are for servers with tons and tons and tons of memory.
22:19:00 <heat> Ali_A: 4+ level paging is the only paging you have in 64-bit mode
22:20:00 <heat> the easier 2-level 32-bit paging won't work
22:20:00 <mrvn> 1GB pages only needs 2 levels, 2MB pages needs 3 levels.
22:20:00 <Ali_A> mrvn I don't really understand what u mentioned (I will need to read more theory about paging, because I just read the chapters from the intel's manual and it didn't say a lot about the structure, I just know I have to load specific data structure in specific format and so on, will probably read AMD manual about paging as well to see if I can
22:20:00 <Ali_A> understand)
22:21:00 <heat> yes, it's hard
22:21:00 <mrvn> Ali_A: in the 2nd level page table there is a bit that says the address it points to is a 3rd level page table or a 1GB physical page. Same for the 3rd level table but with 2MB pages.
22:21:00 <Ali_A> yeah I read that 5level paging allows u to address a lot larger address space something like 4 zetabyte or something (I did the calculation, just don't remember the number)
22:21:00 <heat> you probably should take a quick dip in 32-bit mode
22:22:00 <heat> you can safely-ish learn paging from there without the confusion of raw assembly
22:22:00 <heat> a lot of it is trial and error, really
22:23:00 <mrvn> Ali_A: when you read the paging stuff draw it out on paper. It's really confusing in words but as diagrams it's much easier to learn.
22:23:00 <heat> paging is one of those concepts that are completely alien to you unless you've done it before
22:23:00 <mrvn> Ali_A: and keep in mind: it's just a (radix) tree and you lookup and address.
22:24:00 <Ali_A> heat what do u mean by safely learn it in 32-bit ? oh, do u mean like enable level 2 paging before trying to enable level 4? make senes
22:24:00 <heat> like play around in 32-bit x86 C
22:24:00 <heat> get your basic printf going, do whatever, then do paging
22:24:00 <heat> easier to debug if you've got a printf for instance
22:25:00 <Ali_A> well, I implemented a hacky printf through VGA just by writing to memory location 0xb8000
22:25:00 <heat> x86_64 paging is just 2-level paging with extra steps (and levels :P)
22:25:00 <heat> i mean like an actual printf, with %x and everything
22:26:00 <heat> for instance, you could build a function that dumps your page tables
22:27:00 <heat> of course, you can try to sniff around with qemu's info tlb and info mem and 'x' if you so desire
22:27:00 <Ali_A> make sense, thanks! will definitely try this before attempting the paging thing (I am surprised people here called it hard, because here people call many of the hard stuff easy)
22:28:00 <heat> this is just my take, of course
22:29:00 <heat> big tip: *EVERYTHING* in page table land uses physical addresses
22:29:00 <heat> this is a common pitfall for newbies
22:31:00 <zid> page tables are easy to do, hard to conceptualize for the first time
22:32:00 <heat> hard to debug too
22:32:00 <zid> It's effectively a sparse 9bit trie
22:32:00 <zid> with interesting tricks like loops
22:32:00 <zid> (recursive paging)
22:34:00 <heat> i think recursive paging is really hard in practice because of tlb shootdowns and whatnot
22:34:00 <zid> good job nobody needs tlb shootdowns
22:35:00 <heat> well, not shootdowns, just TLB invalidation
22:36:00 <zid> howso? if you unmap/restrict a page, use that addr in invlpg gg
22:39:00 <heat> yeah but you get extra pages mapped because of the recursive mapping
22:39:00 <heat> if you need to change the paging structures, invlpg you go
22:40:00 <zid> 'you get extra pages mapped' ?
22:40:00 <heat> yeah, as part of the recursive mappig
22:40:00 <heat> mapping*
22:41:00 <zid> yea what do you mean
22:41:00 <heat> your page tables also get mapped
22:41:00 <heat> if you remove a page table, you need to invlpg that recursive mapping as well
22:41:00 <heat> otherwise, boom
22:41:00 <zid> right
22:41:00 <zid> I was talking about invlpg'ing the page tables
22:41:00 <zid> as the extra step
22:42:00 <zid> because you're always invlpging the mapping you're unmapping (I hope)
22:42:00 <heat> not always but sure
22:42:00 * heat looks at the A bit first