Search logs:

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present


http://bespin.org/~qz/search/?view=1&c=osdev&y=20&m=8&d=6

Thursday, 6 August 2020

00:36:30 <doug16k> j`ey, clearing bss is part of preparing to run compiled code. I wouldn't prepare to run compiled code in compiled code
00:36:50 <doug16k> compiler could be making hidden assumptions
00:40:23 <doug16k> compiled code usually thinks it "knows" that bss is clear and any hidden variables it defined behind your back are all there and initialized, and all the constructors have been run, and it can emit as many constructors as it pleases
00:41:35 <doug16k> the constructors also assume bss has been cleared
00:42:52 <yelhamer> Hello, I'm interested in info sec, I tried to do reverse enginering and binary exploitation in the past but low level concept often stood in my way (mainly how shared libraries work, how memory virtualization works, how processes and threads work, ...), I found ostep.org and it seems to cover some topic that I have interest in (and other which I
00:42:53 <yelhamer> know nothing about). My question is: is ostep.org a good book to go by? and does it explain the same concepts as a "regular" book that's targeted towards people wanting to learn operating systems? and should I spend time learning everything that's on this book or should I just learn the stuff that I (might) need?
00:44:03 <yelhamer> (here's the book in pdf form (not segregated as in the site): https://raw.githubusercontent.com/mthipparthi/operating-systems-three-easy-pieces/master/book.pdf the detailed content table starts at page 10)
00:45:25 <yelhamer> do you recommend another book?
00:45:51 <yelhamer> or should I just google stuff (and go down mazes) as I stumble onto them
00:46:02 <doug16k> contents looks great
00:46:07 <doug16k> advanced book
00:47:03 <doug16k> hard to believe it can cover that much in so few pages
00:47:19 <yelhamer> doug16k oh that encourages me to read it them
00:47:21 <yelhamer> *then
00:52:06 <ronsor> I should read that book
00:53:07 <yelhamer> ronsor ah I'm going to read it then :D
00:58:25 <doug16k> it's kind of recipe style. I prefer reference style
00:59:02 <doug16k> reference just tells you how it is without pushing you in a particular direction
01:02:03 <yelhamer> doug16k uhm does it still explain the concepts it treats?
01:02:11 <yelhamer> (is it still worth the read?)
01:05:31 <doug16k> I like the topics it covers though
01:05:52 <doug16k> what kills people is first, their allocator, then, paging
01:06:02 <geist> looks pretty decent
01:06:11 <geist> it seems to be using xv6 which is an alright learning thing
01:06:35 <geist> it'll teach you some fairly bad abstraction concepts, and it's tied very much to posixy stuff, but it's better than nothing
01:06:40 <geist> and there are far worse examples
01:07:08 <yelhamer> geist can you suggest a better alternative?
01:08:33 <geist> not really, so that's why it's not a bad option
01:08:36 <yelhamer> doug16k I see, I guess I'll try to read up (externally) on the topics it covers
01:08:42 <yelhamer> geist I see
01:09:25 <yelhamer> doug16k "what kills people is first, their allocator, then, paging" uh I don't understand this
01:09:30 <yelhamer> : p
01:09:41 <doug16k> when you start out, there is nothing to call
01:09:56 <doug16k> strcpy isn't even there
01:10:06 <doug16k> you have to build all the infrastructure
01:11:00 <doug16k> but you usually can't go straight to the giant fancy one. have to have a simplistic early one to bootstrap yourself, and transition to nice one
01:11:37 <doug16k> stuff like that is what bogs you down when getting started
01:12:26 <yelhamer> doug16k I see
01:12:59 <doug16k> for example, what if your allocator design needs some dynamically sized allocation to initialize itself. now the memory allocator needs a memory allocator :D
01:13:08 <doug16k> so you do a simplistic base one for that to leverage
01:15:16 <yelhamer> aah I see, so the first piece of software that runs on the machine has to set-up what is to come (and ultimately prepare for loading the full-on OS), and also be efficient at it
01:16:32 <doug16k> it's neat how much you can cheat early on though. you can get far with statically allocated arrays
01:17:10 <doug16k> if you don't mind setting hard limits on counts of things
01:17:17 <ronsor> of course, if you use a language like zig you can create an allocator using a static allocated array
01:17:25 <ronsor> and send all your allocators there until you get a better one
01:17:28 <ronsor> *allocations
01:18:39 <doug16k> I do that for my asan allocator
01:18:50 <yelhamer> uhm what's the difference between static/dynamic memory at that level? isn't everything to the initial software that runs... just the same? like everything is memory addresses and it can access everything
01:18:57 <doug16k> I made asan work no matter what. there is no "too early"
01:20:27 <ronsor> yelhamer: dynamic memory just refers to memory that's allocated at runtime
01:21:06 <yelhamer> ronsor ah i see
01:21:16 <doug16k> yelhamer, you can declare an array that sets a fixed size area for those items. on the other hand, if you have a dynamic allocator that can keep track of free and used regions, and provide pointers to allocations, then you can make the limit of those items unlimited
01:22:23 <doug16k> you just slap this in C outside a function foo_handle_info foo_handles[MAX_FOO_HANDLES];
01:23:26 <doug16k> if you do dynamic, then it becomes a pointer but you can't use that until you have started everything up and you can allocate a block and reallocate it larger if necessary, etc
01:24:14 <yelhamer> yep I see
01:26:28 <doug16k> it's like a pilot checklist where it is turning on fuel pumps and electrical systems etc, and it's 20 steps before the engines are mentioned. at first nothing works :D
01:29:00 <doug16k> it could be so rough at your very first instruction that the stack is undefined, you can't even use push or whatever until you set that thing up :D
01:30:07 <doug16k> sometimes important parts of the cpu aren't even on at the first instruction
01:30:42 <yelhamer> oh so the push instruction for example is just defined as (mov this to address in esp register then increase esp by ptr_size)
01:30:57 <yelhamer> dang idk why for some reason I didn't think of esp as a register
01:31:03 <ronsor> it's a good thing we have firmware for platform initialization
01:31:03 <yelhamer> *as just a register
01:31:20 <doug16k> push means first *subtract* word size from esp, then store the value to that address. esp ends up pointing to what you just pushed
01:31:39 <ronsor> rsp/esp/sp -> stack pointer register; the stack grows downward
01:31:52 <yelhamer> ah yeah I always tend to confuse that
01:31:53 <doug16k> pop means, first load from the address in esp, then add word size to esp
01:32:09 <yelhamer> i see
01:32:12 <yelhamer> ronsor is that the BIOS?
01:32:18 <yelhamer> *the same as the bios?
01:33:04 <doug16k> yes, the bios is firmware
01:33:05 <ronsor> firmware? yes, on x86 that's the BIOS (or UEFI)
01:33:25 <doug16k> software that is baked into the hardware to some extent
01:34:21 <doug16k> it does the really ugly startup stuff, like training the memory controller and pci lanes
01:34:51 <doug16k> extremely hardware specific stuff
01:35:09 <ronsor> stuff you couldn't have your OS do for every platform
01:35:14 <yelhamer> uhm and bios is the piece of software (firmware) that loads first, and UEFI is a standard that specifies how the firmware should be, is that correct?
01:35:22 <yelhamer> oh i see
01:35:51 <ronsor> the UEFI is the new form of firmware that replaces the legacy BIOS
01:36:05 <ronsor> they perform the same duties, but in different ways
01:36:32 <ronsor> most UEFI still bundle a BIOS implementation in order to boot old OSes
01:37:08 <yelhamer> I see
01:37:19 <yelhamer> also in modern computers (i mean 64-bit or 32-bit) the computer first loads in 16-bit mode correct?
01:37:40 <doug16k> x86 starts up with the same ISA and memory model as 8088 yes
01:38:33 <yelhamer> and doesn't it use some sort of "paging" method to be able to access memory addresses of 20-bit width
01:38:36 <doug16k> you would setup a gdt and go into protected mode almost right away though
01:39:24 <yelhamer> (where i'm going with this is that i often confuse Operating system's paging with that type of paging)
01:39:32 <doug16k> in real mode, there is a segment and an offset
01:39:39 <doug16k> both are 16 bits
01:39:55 <doug16k> the address you access is calculated from (segment << 4) + offset
01:40:11 <doug16k> the total address space in real mode is 20 bits
01:40:27 <ronsor> or 1 MiB
01:40:45 <doug16k> that gives a 1MB reach. 0xA0000-0xFFFFF are reserved for hardware
01:41:01 <yelhamer> ah I was going to ask about that
01:41:04 <doug16k> that leaves, 0x00000-0x9FFFF, the 640KB limit
01:42:01 <doug16k> even the firmware would go almost straight into protected mode
01:42:18 <doug16k> it would setup 0 base, 4GB limit on all segments and just use 32 bit pointers to reach 4GB
01:42:40 <ronsor> "unreal mode"
01:43:03 <yelhamer> are addresses used to access hardware in modern cpu's? like from 0xa000...000 to 0xfff...fff is for hardware
01:43:05 <doug16k> unreal mode is pointless
01:43:24 <doug16k> unreal mode is pretend protected mode with tons of address size override prefixes
01:43:37 <doug16k> no benefit over actual protected mode
01:43:43 <doug16k> it's worse
01:44:02 <yelhamer> uhm with real mode (segment + offset) does this mean that an address is stored in 2 addresses,
01:44:03 <yelhamer> ?
01:44:10 <doug16k> normally the limit on a segment is 64KB in real mode
01:44:23 <doug16k> yelhamer, yeah, that's called a "far" pointer
01:44:35 <doug16k> 32 bits altogether. little endian order. offset then segment
01:45:02 <doug16k> and a pain
01:45:19 <yelhamer> "yeah, that's called a "far" pointer" is this directed at my first question or the second?
01:45:29 <doug16k> to use one you have to use a very expensive instruction to change a segment register
01:45:44 <ronsor> second question.
01:45:54 <yelhamer> ah I see
01:45:58 <doug16k> not sure what you mean by "addresses" it's two 16 bit values consecutively yeah
01:46:18 <doug16k> 4 bytes
01:46:58 <doug16k> and also, to access the memory in the other segment you usually have to also prefix that instruction to tell it to override which segment it uses
01:47:09 <doug16k> awful
01:48:09 <doug16k> so you might `les di,[some_far_ptr]` to load di with [some_far_ptr] and es with [some_far_ptr+2]
01:48:32 <doug16k> now pointer takes up two registers, and you hardly have any segment registers
01:48:58 <doug16k> and you have to use it like mov ax,[es:di]
01:49:10 <yelhamer> uhm is the far pointer the same as the segment value?
01:49:57 <doug16k> far pointer points to memoryt address [(segment << 4) + offset]
01:50:12 <doug16k> in real mode
01:50:39 <doug16k> this is the ancient mode you mentioned it wakes up in
01:50:41 <ronsor> far pointers are often written as [seg:off]
01:50:53 <deltab> yelhamer: it contains a segment value and an offset value
01:51:37 <yelhamer> isn't this still used nowadays? I often see something like inst op1, [fss:off]
01:51:46 <yelhamer> *in protected mode
01:51:57 <doug16k> worse than that: only bx, bp, di, si can be used for pointers, and you can only add one of bx or bp and si or di in address calculations
01:52:28 <ronsor> yelhamer: in userspace?
01:52:35 <yelhamer> ronsor yes
01:52:38 <doug16k> segment overrides are still a thing, one segment points to the TLS area
01:52:44 <yelhamer> i don't know if i'm rambeling
01:53:05 <doug16k> TLS = thread local storage
01:53:14 <ronsor> yes
01:53:16 <ronsor> fs, gs
01:53:31 <ronsor> https://stackoverflow.com/questions/6611346/how-are-the-fs-gs-registers-used-in-linux-amd64
01:53:32 <bslsk05> ​stackoverflow.com: assembly - How are the fs/gs registers used in Linux AMD64? - Stack Overflow
01:53:41 <doug16k> 32 bit protected mode is drastically better than real mode
01:53:54 <ronsor> anything is better than real mode
01:54:24 <doug16k> it's hardly even the same instruction set in protected mode
01:54:35 <doug16k> addressing modes overhauled completely
01:54:47 <ronsor> real mode only exists for backward compatibility
01:55:48 <yelhamer> uhm so it is practically the same thing as something like mov op1, [seg:off] except this time this "trick" isn't used to access more addresses than the cpu can handle, but rather to sort stuff out, like move this thing at address that's offset away from value in fs register... and they use this to organize a program memory into segments?
01:56:10 <yelhamer> *...as something like mov op1, [seg:off] in real mode ....
01:56:41 <yelhamer> like at address in fs, is the text segment
01:56:47 <yelhamer> then at bs is the data segment
01:56:50 <yelhamer> and so on
01:57:56 <doug16k> each thread wants to be able to instantaneously access per-thread variables. sometimes you can optimize it down to just making each thread assume that using the segment override to change the base address for the access, you can do the access with 0 extra instructions (you would have spent figuring out where the TLS is located)
01:58:42 <doug16k> the environment arranges for each thread to have the appropriate segment "pointing" to that thread's TLS
02:00:08 <yelhamer> I see
02:01:37 <doug16k> so if the linker figured out that __thread int errno; is located at -8 offset in TLS area, then the code movl fs:errno@tpoff,%eax linker would put -8 in it: movl fs:-8,%eax
02:02:05 <doug16k> and it would magically use the correct TLS area
02:02:30 <doug16k> with 0 extra instructions
02:02:58 <doug16k> and without permanently taking up a general register
02:03:47 <doug16k> x86_64 got rid of almost all segmentation, but kept the fs: gs: thing for TLS, because it is very good
02:05:16 <doug16k> it's not only for user threads, you can use it for cpu-local data in the kernel
02:05:27 <doug16k> allows you do instantly get the current thread or cpu number or whatever
02:06:35 <yelhamer> ah I don't understand everything you said because I haven't really gotten into how threads work...
02:06:48 <yelhamer> still, I think I got the idea of it though
02:07:08 <doug16k> TLS ~= magical global variables that have a separate copy for each thread
02:07:54 <doug16k> errno is the classic example. can't have multiple threads trampling on the same errno. would not be able to know if that was your errno or the other cpu that just did something 10 nanoseconds ago
02:08:32 <Hash> how do threads work? that's a good question
02:09:26 <yelhamer> doug16k I see
02:10:14 <ronsor> Hash: on Linux, the same way processes do, except they can access the same address space. https://man7.org/linux/man-pages/man2/clone.2.html
02:10:15 <bslsk05> ​man7.org: clone(2) - Linux manual page
02:10:15 <doug16k> Hash, multiple separate threads have separate call stacks and state, and you can switch between them or even run them simultaneously on multiple cpus
02:11:11 <doug16k> it can stop what it's doing in one thread, and continue where it left off in another
02:11:12 <Hash> Thank you
02:12:27 <doug16k> if you understand processes, imagine threads as being multiple processes that are sharing the same memory space
02:13:41 <doug16k> or you might say a process has one or more threads if you are being less posixy
02:14:17 <doug16k> (not implementing POSIX specification)
02:14:27 <yelhamer> threads are posix right?
02:14:40 <doug16k> pthreads
02:14:54 <yelhamer> ah i see
02:14:56 <doug16k> it's just an API for creating, managing, and synchronizing threads
02:15:55 <doug16k> gives you some guarantees you can leverage to make robust multithreaded code
02:16:49 <doug16k> some architectures require you to do some special stuff to synchronize with other cpus. it takes care of that
02:17:47 <yelhamer> I see
02:41:29 <doug16k> I suppose all architectures require special stuff, come to think of it
02:43:16 <doug16k> what varies across architectures is how weird the behaviour is without proper synchronization
02:44:14 <doug16k> it might almost do what you expect, or do something baffling
02:49:58 <doug16k> the order that changes appear in memory to other cpus isn't necessarily in program order. when you load from memory it might occur far enough ahead of time and miss a change. you have to control that
02:50:22 <geist> indeed
02:54:39 <doug16k> everybody has to worry no matter what the architecture does, since the compiler is also allowed to reorder loads and stores unless you constrain it
02:54:45 <ronsor> `volatile`
02:54:57 <ronsor> prevents any compiler weirdness
02:55:10 <doug16k> not really
02:55:25 <doug16k> all it does is lock it so it never thinks it knows what value that holds
02:55:32 <doug16k> and not allowed to delay writing a change to it
02:56:15 <ronsor> oh, I should've mentioned on zig
02:56:17 <doug16k> i.e. if it just loaded from it and it was 42, if the next line uses it, it won't think it must still be 42, it will load from it again
02:57:16 <ronsor> on zig, `volatile` explicitly prevents any reordering
02:57:30 <doug16k> like java then
02:58:01 <doug16k> java compiles atomic instructions to load and store volatile things
02:59:00 <doug16k> i.e., storing to it guarantees that all prior stores are globally visible before allowing the store to the volatile to be globally visible
02:59:36 <geist> take off every zig
03:01:40 <doug16k> it guarantees that if you load from it, any subsequent loads not be speculatively executed before the volatile load (IIRC)
03:04:56 <doug16k> so stores are essentially "don't let this happen before anything I just did" and loads are "make sure that when I do this load, I can see everything that happened before the store that changed it"
03:34:46 <geist> didn't know java had volatile keyword, but it makes sense
04:00:54 <doug16k> my bios bootloaders are growing beyond 64KB total. have to rearrange the linking to put some stuff below 64KB for 16 bit relocations, and let the 32 bit part extend past 64KB. will probably move the stack up another 64KB and have 1st 128KB for bootloader, 64KB stack, and ~448KB low heap
04:01:28 <doug16k> had to do some tricky adjustments when transitioning from ss=0x1000 to 32 bit protected mode
04:02:32 <doug16k> bios call now deduces real mode ss so there is almost 64KB of stack pointer range available (pointing to wherever esp was)
04:03:10 <doug16k> ss:sp pointing to wherever 0-base:esp was
04:03:36 <doug16k> with ss about 64KB below sp and large sp
04:03:57 <doug16k> er, ss about 64KB below flag esp
04:04:05 <doug16k> flat esp omg
04:05:23 <doug16k> I'm adding a text editor for the kernel parameters in my boot menu TUI
04:05:39 <doug16k> making things just big enough to screw up
04:06:02 <doug16k> screw up assumptions that things are below 64KB and you can fit a pointer in a 16 bit register
04:08:42 <doug16k> have to fix HD, CD, and PXE startup code and linking
04:11:22 <doug16k> I could link Doom into my EFI bootloader without lifting a finger :D
04:12:33 <doug16k> someone made efi doom right?
04:13:08 <kingoffrance> well yeah every os needs a process management gui https://www.cs.unm.edu/~dlchao/flake/doom/chi/chi.html
04:13:09 <bslsk05> ​www.cs.unm.edu: Doom as an Interface for Process Management
04:13:21 <doug16k> would be hilarious to have a boot menu option "Doom" that runs an EFI build of Doom
04:14:15 <doug16k> you shoot them to kill them? lol
04:14:26 <kingoffrance> god mode meets superuser
04:14:36 <kingoffrance> it was inevitable
04:15:15 <doug16k> it would be really funny if it took more shots to kill processes that are running as root
04:15:31 <doug16k> they definitely would have more health
04:15:40 <doug16k> boss process
04:19:01 <FreeFull> Your own character would be init
04:19:13 <FreeFull> So do your best not to die
04:49:24 <Kazinsal> fighting the urge to start working on my project again
04:49:54 <Kazinsal> damn pandemic making bad ideas look good
04:57:52 <klys> well okay then
04:58:46 <klys> have you tried qt creator
04:59:01 <klys> it put me to work a few weeks ago
05:24:09 <doug16k> should I take this opportunity to do this: shl $ 4,%esp
05:24:19 <doug16k> I may never get an opportunity to shift esp again :)
05:24:59 <Kazinsal> Make sure you add the comment /* no, really */
05:26:05 <doug16k> will do
05:26:20 <Kazinsal> :D
05:29:27 <ronsor> I found this in a Google search: https://www.skyenterprisesau.com/skyos/
05:29:28 <bslsk05> ​www.skyenterprisesau.com: SkyOS - Sky Enterprises
05:30:08 <Kazinsal> > SkyOS is still in early development with very little work being done on the operating system itself.
05:30:27 <doug16k> basically this code sequence (dx has 0x20 selector): movl %ss,%esp ; mov %dx,%ss ; shl $ 4,%esp
05:31:16 <doug16k> no wait
05:31:32 <Kazinsal> ah, setting up a 32-bit stack to start off where your 16-bit stack finished up?
05:32:06 <geist> doug16k: huh actually kinda surprised you can. for some reason i figured esp was fairly limited in what sort of direct math you can do against it
05:32:18 <geist> but i guess that's not really true, it's just various addressing modes that can't apply to it iirc
05:32:42 <doug16k> Kazinsal, ya
05:33:04 <doug16k> and my code is too big for stack to be at seg 0, have it at 128KB line now (ss=0x2000)
05:33:21 <doug16k> 0x20000-0x2FFFF
05:33:45 <doug16k> wait though, have to add 4x ss to esp
05:33:51 <doug16k> 16x
05:35:10 <doug16k> ah: movl %ss,%eax ; shl $ 4,%eax ; mov %dx,%ss ; add %eax,%esp
05:35:12 <doug16k> oh well
05:36:07 <doug16k> and it's sure esp 31:16 are zeros
05:37:33 <doug16k> geist, yeah, it's impossible to use esp as a scaled index, like [eax+esp*4]
05:38:13 <doug16k> and it must have an offset, i.e. it will be 0(%esp) if you say (%esp)
05:38:14 <zid> That's strange, the J and U versions of pokemon yellow use different chips on the cart
05:38:26 <zid> availability issues I guess
05:39:02 <zid> they're compatbile enough the game doesn't care, v3 vs v5 of the mcu
05:39:47 <doug16k> they squeezed a couple of special meanings into the impossible encodings
06:01:03 <doug16k> Kazinsal, when does the bios call it does the opposite transformation, translating %esp into ss:sp where ss is almost 64KB below %esp, and sp is calculated to line up with %esp from the new ss, so the bios call interrupt handler has tons and tons of stack space
06:01:30 <Kazinsal> doho, very clever
06:02:48 <doug16k> I ran into an issue where seabios was overwriting my stuff from too much stack use so I fought back by slamming stack to absolute max and have 64KB of stack now :D
06:03:45 <doug16k> but added a bunch of complexities like switching back and forth. before ss was "flat" already
06:04:08 <doug16k> pmode ss and real mode ss matched
06:04:19 <doug16k> (as far as addressing is concerned I mean)
06:05:09 <doug16k> fun though
06:05:18 <doug16k> I like doing awkward stuff in assemmbly
06:05:22 <doug16k> fun puzzle
06:07:02 <doug16k> so weird having an initial sp=0x0000
06:07:35 <doug16k> ss=0x2000,sp=0x0000 makes first push write to 0x2FFFE
06:08:57 <doug16k> I guess I need to treat sp as 0x10000 when it is zero eh?
06:09:13 <doug16k> when doing real mode ss:sp -> pmode esp
06:11:52 <doug16k> because ss=0x2000,sp=0 pushes to 0x2FFFE, to make the same thing happen in pmode, I have to put esp=0x30000
06:13:52 <doug16k> which is (ss<<4)+(sp?sp:0x10000)
07:10:30 <yawkat> with vt-d, is there a way to reduce the number of required translation structures for an identity mapping without using PT and without hardware support for large pages? can i reuse PTs somehow maybe?
07:18:50 <doug16k> is vt-d support even possible without support for large pages?
07:19:29 <yawkat> evidently. i have a chip here that reports 0 as SLLPS
07:19:30 <doug16k> in reality
07:19:46 <yawkat> i mean large pages in the iommu, the actual mmu supports large pages.
07:19:52 <doug16k> ah ok you mean large pages in IOMMU. gotcha
07:20:17 <yawkat> bit weird that one supports it and one doesnt tbh
07:20:32 <doug16k> why does page size make any difference?
07:20:43 <doug16k> was easier to do big identity map with large pages?
07:20:57 <yawkat> i can save the 512 PTs per PD
07:21:17 <yawkat> i have memory constraints and cant afford building a multi-mb paging tree
07:21:45 <doug16k> multi-mb?
07:21:52 <doug16k> how much physical RAM?
07:22:49 <yawkat> not entirely sure, but if im doing the math correctly id need about 2MiB in paging structures per GiB of physical memory.
07:23:02 <doug16k> it's 8 bytes per 4KB, plus 8 bytes per 2MB, plus 8 bytes per 1GB
07:23:24 <doug16k> plus constant factor
07:23:57 <yawkat> yea, then my math was right :)
07:24:00 <doug16k> 2/1024
07:24:16 <doug16k> 0.19%
07:25:09 <yawkat> ironically i am constrained to one large page of memory right now. more than enough for a few paging structure trees with large pages, not enough for even one tree without them
07:28:54 <doug16k> (8+8/512+8/512^2)/4096=0.195694715% overhead :D
07:32:02 <doug16k> I guess fpu would appreciate if I did (8/512^2 + 8/512 + 8) / 4096
07:34:13 <doug16k> (you can get more precision when summing numbers from smallest to largest)
07:34:26 <yawkat> the fpu is already happy with your powers of two.
07:34:32 <doug16k> yes it loves them
07:35:43 <doug16k> does all that crap to set two more mantissa bits and adjust the exponent a few notches
07:36:27 <doug16k> 9?
07:37:09 <doug16k> 12
07:37:27 <doug16k> goes to -9 I think
07:38:02 <doug16k> -9 plus the bias
07:48:08 <yawkat> it would be cool if there was some flag you could set that makes paging structure addresses relative instead of absolute. i.e. the physical address isnt the value in the paging structure, but rather the virtual address + that value. then you could reuse PTs for contiguous mappings
07:57:17 <doug16k> what exactly is the point of the iommu if you map everything?
07:58:09 <doug16k> it's be like saying you have paging with a complete identity map only, and never change it
07:58:24 <yawkat> im eventually going to protect a few pages, but not many of them. it will be a "mostly identity" in the end, im just doing identity right now for testing
07:58:33 <doug16k> ah
07:58:46 <yawkat> thats why i cant use passthrough (ok i called that PT earlier, a bit confusing)
08:09:48 <veltas> Can you access a PCD entry in your UEFI system from a UEFI app, or is it something internal to the image?
08:11:37 <kingoffrance> ronsor, https://en.wikipedia.org/wiki/SkyOS that was totally unrelated, dead, dont think there was any source -- i never got it to run in vm or real hw to my knowledge https://web.archive.org/web/20110910053108/http://www.skyos.org/?q=node/647 just saying the name was already used is all
08:11:38 <bslsk05> ​en.wikipedia.org: SkyOS - Wikipedia
08:11:40 <bslsk05> ​web.archive.org: SkyOS development is currently halted | SkyOS
08:16:42 <doug16k> veltas, PCD?
08:22:53 <veltas> doug16k: Platform Configuration Database
08:23:07 <veltas> It's something that seems to come with EDKII
08:29:41 <veltas> Ah right I think this is what I'm looking for, never mind https://edk2-docs.gitbook.io/edk-ii-pcd-specification/2_pcd_protocol_definitions
08:29:43 <bslsk05> ​edk2-docs.gitbook.io: 2 PCD Protocol - EDK II PCD Specification
08:41:56 <j`ey> doug16k: while all that you said is true, it would still be fun to try (clearing bss w/o asm)
08:42:17 <zid> for(p = _bss_start; p < _bss_end; p++) *p = 0;
08:42:24 <zid> main();
08:42:42 <clever> but bss must be aligned to the width of an int
08:43:02 <clever> though, if you use uint8_t, it wont matter
08:43:08 <clever> but would take more cycles to clear
08:43:23 <clever> how did i do it most recently? ...
08:43:49 <j`ey> zid: exactly! I don't think it should be an issue
08:43:53 <Mutabah> just use `rep stosb`
08:43:57 <clever> ah, just plain old `bzero(&__bss_start, &__bss_end - &__bss_start);`
08:44:11 <zid> probs need to use a volatile pointer though
08:44:15 <zid> to stop it being optimized out
08:44:21 <zid> as it won't alias anything definitionally
08:44:28 <zid> and nothing will ever touch _bss_start again
08:45:38 <j`ey> but also it's only ~10lines of asm, so not a big deal
08:47:38 <clever> zid: which would you generally prefer, raw asm or bzero?, i like bzero more, because i dont have to rewrite it when changing arch
08:47:54 <zid> you made bzero up
08:48:20 <zid> and gcc has been more than capable of recognizing what that code does semantically for a very very long time
08:48:38 <zid> it will optimize your silly mess of a function that implements memset/memcpy into a straight rep instruction
08:49:15 <clever> yeah, i have noticed gcc inlineing bzero and memset in the past
08:49:56 <Mutabah> zid: bzero is a (deprecated) POSIX call
08:50:44 <zid> if you're calling posix functions inside your kernel I am worried
08:52:00 <Mutabah> Well, it's a defined function name
08:52:36 <clever> zid: what if i link to newlib, or i re-implement them in my kernel?
08:54:19 <geist> yah you can use bzero, though memset is really just as good
08:54:28 <geist> and can implement bzero in terms of memset if you want
08:54:39 <clever> yep, and i hadnt heard about it being deprecated
08:54:58 <geist> only slightly nicer thing about bzero assembly is it's easier to synthesize the zero in most arches
08:55:22 <geist> yah dunno about that. i always just marked bzero/bcopy as BSD variants
08:55:27 <clever> i think the asm i saw in rpi-open-firmware for bzero and memset, uses some ugly fallthru
08:55:39 <clever> bzero is just a prefix of a few opcodes on the front of the memset function
08:55:44 <geist> yah i think i hve some ancient one in LK that does that too
08:55:48 <clever> and it lets fall-thru cause it to enter memset
08:56:12 <geist> and on some arches memset ends up with a special case path for 0
08:56:21 <clever> that kind of hack can only be done in asm, c wouldnt let you do that
08:56:48 <geist> yah though you can probalby get more or less the same thing if you just call through. a good compiler will probalby just tail call memset
08:57:07 <clever> but tail-call is still an extra opcode, vs fallthru
08:57:13 <geist> and on a good day you only get a straiht jump or branch
08:57:15 <geist> yah
08:57:34 <Kazinsal> branch predictor will usually solve the cost of that
08:58:57 <geist> yah
08:59:03 <j`ey> Mutabah: where do you clear bss in rust_os, I cant find it
09:00:15 <Mutabah> Get loaded by an ELF loader
09:00:18 <Mutabah> so done for me ;)
09:00:36 <Mutabah> I _should_ have that in the ARM and uEFI loaders
09:01:02 <clever> not currently an option for me, i have an arm.bin file jammed into a section in a vc4.bin file
09:02:55 <j`ey> Mutabah: still cant find it :S
09:03:03 <clever> something id like to do, is to run a linker on .o files from 2 arches, and put them into a single address space
09:03:19 <clever> but i dont think thats going to be easy
09:03:32 <Mutabah> https://github.com/thepowersgang/rust_os/blob/master/Bootloaders/_common/elf.rs#L158
09:03:33 <bslsk05> ​github.com: rust_os/elf.rs at master · thepowersgang/rust_os · GitHub
09:03:35 <Mutabah> j`ey: ^
09:04:01 <Mutabah> The `.chain()` bit fills the difference between filesz and memsz with zeros
09:04:29 <j`ey> Mutabah: ah was looking for 'bss'
09:04:30 * Kazinsal rubs his eyes
09:04:59 <Kazinsal> I should probably sit down and actually figure out Rust one of these days if for no other reason than to be able to read it
09:05:22 <j`ey> I gotta look up what phent.p_type ==1 is
09:05:33 <clever> j`ey: dt_load i think
09:05:49 <Mutabah> `PT_LOAD`
09:06:36 <j`ey> ok nice, I get it now, ty
09:07:15 <clever> i was recently looking at the bcompiler project, to see how it works at a more in-depth level
09:07:42 <clever> it looks like a function body can either contain raw assembly (in hex form), or calls to other functions
09:08:06 <clever> and it cant deal with register allocation at all
09:08:21 <clever> so intead, it abuses the stack to make a stack based machine, and do RPN type calculations
09:08:43 <j`ey> Mutabah: I see you're already using the new asm!
09:08:47 <clever> https://github.com/certik/bcompiler/blob/master/header.bc#L311-L328
09:08:48 <bslsk05> ​github.com: bcompiler/header.bc at master · certik/bcompiler · GitHub
09:09:02 <clever> putc is a raw asm based function, while putchar is "compiled"
09:09:49 <Mutabah> j`ey: Yeah, transitioned when it was added
09:11:02 <j`ey> Mutabah: atm I just use a separate .s file, but it's probably silly when I want some single instruction functions
09:12:51 <Mutabah> I use a mix
09:13:05 <Mutabah> Inline for small things, separate file for the more complex stuff
09:13:22 <Mutabah> e.g. context switching and init are in a file, while CR3 manipulation is inline
09:18:46 <j`ey> Mutabah: looking, but do you use any of rusts fmt stuff, or did you write your own?
09:19:37 <froggey> rust has a new asm!?
09:19:57 <j`ey> yep, in nightly from a month or so ago
09:20:03 <j`ey> https://blog.rust-lang.org/inside-rust/2020/06/08/new-inline-asm.html
09:20:04 <bslsk05> ​blog.rust-lang.org: New inline assembly syntax available in nightly | Inside Rust Blog
09:21:12 <Mutabah> j`ey: yep - It's in libcore so no reason not to use it
09:22:14 <j`ey> Mutabah: last time I tried I was having issues with some of the trait object stuff, so Im using ufmt atm
09:39:39 <kingoffrance> clever, yeah, thats what bcompiler looked like to me: outputs raw machine code, there is perhaps no intermediate "assembly language" or assembler per se
09:40:15 <clever> kingoffrance: it doesnt even have a linker or elf support!
09:40:17 <clever> the elf header is just raw hex at the top of header.bc
09:41:01 <kingoffrance> i am curious about it, but i think any lang i might ever do some far off day will output assembler or otherwise be translated to
09:41:20 <clever> the reason i was looking into it, was bootstrapping
09:41:22 <kingoffrance> as that is "how to potentially use anything the processor offers"
09:41:26 <kingoffrance> yeah, i still think its cool
09:41:32 <clever> how can i go from source -> toolchain, while involving as few binaries as possible
09:41:46 <kingoffrance> and i dont think its either or: use the higher level b lang to write an assembler :)
09:41:50 <j`ey> Mutabah: btw have you tried a non---release build with -Z build std?
09:41:50 * Mutabah is away (Hometime)
09:41:51 <clever> and just for fun, i made bcompiler bootstrap itself under the nix package manager
09:42:15 <clever> with nix, i'm able to bootstrap becompiler with just: cat, cp, chmod, ash, hex1
09:42:31 <clever> and hex1 is available as a raw hex (in text form) file, so you can hexdump it, and confirm its not been trojaned
09:42:42 <clever> but that assumes you can trust your hexdump binary!
09:43:05 <kingoffrance> well i already stated my low priority on "reproducible builds" but if you are into that, yeah :)
09:43:14 <clever> cat/cp/chmod/ash are all provided by busybox, but i feel busybox is a bit heavy, and harder to confirm its "clean"
09:44:03 <bauen1> clever: not sure if you know about it, but http://bootstrappable.org/ has a very long "bootstrap" chain from a minimal almost all the way to modern gcc
09:44:03 <clever> the biggest problem i can see though, is how do i go from bcompiler to plain gcc?
09:44:08 <bslsk05> ​bootstrappable.org: Bootstrappable builds
09:44:48 <clever> bauen1: ah, hadnt seen that one yet, but i did find a wiki listing many different ways to bootstrap
09:44:50 <bauen1> clever: i've also looked into doing that using bcompiler for bootstrapping tinycc but ultimately decided that the language is too different from C
09:44:53 <j`ey> Guix too
09:45:12 <j`ey> https://guix.gnu.org/blog/2020/guix-further-reduces-bootstrap-seed-to-25/
09:45:13 <bslsk05> ​guix.gnu.org: Guix Further Reduces Bootstrap Seed to 25% — 2020 — Blog — GNU Guix
09:45:14 <clever> j`ey: yeah, ive heard what guix did, something -> scheme -> tinycc -> gcc
09:45:29 <bauen1> so i've resorted to making my kernel buildable using tinycc (and binutils, but that part can be replaced by a few changes to tinycc)
09:45:49 <bauen1> the end goal is to become self-hosting using just kernel + tinycc + musl (and a few tools to faciliate building)
09:45:53 <j`ey> guix uses something called Mes, not tinycc
09:46:11 <bauen1> they're also on freenode #bootstrappable
09:46:40 <j`ey> bauen1: Im using rust for my "kernel", but tcc makes me want to use C
09:46:43 <bauen1> iirc someone was also building a kernel / compiler combination that bootstraps from the mbr to something that can eventually build more complete binaries
09:47:09 <bauen1> j`ey: i originally choose it for its speed and low line count
10:02:21 <Mutabah> j`ey: No, I have not - but did see someone complaining about a crash...
10:07:45 <j`ey> not sure if it's an OOM issue
10:07:56 <j`ey> I could try give the VM more ram
10:13:34 <doug16k> qemu default is pitiful 16MB IIRC
10:14:13 <j`ey> oh, I meant the VM where Im building my kernel
10:14:23 <doug16k> assuming you didn't say how much ram
10:14:24 <doug16k> ah
10:31:15 <kingoffrance> well after looking at mes and stage0 all i can say is....gcc-2.95.3 even builds on nextstep; looks like mes + mescc compiles tinycc, which looks like it can then build gcc 2.95 which can build etc.
10:31:40 <kingoffrance> im not sure i follow stage0 -- runs inside a vm at first?
10:36:36 <kingoffrance> stage0 names their vm struct "lilith" which doesnt offend me, but id love to know the etymology/metaphor :/
10:40:37 <clever> kingoffrance: https://en.wikipedia.org/wiki/Lilith
10:40:38 <bslsk05> ​en.wikipedia.org: Lilith - Wikipedia
10:41:25 <kingoffrance> yeah but why does stage0 use it?
10:41:37 <kingoffrance> "early stage" is fine i guess, i just want details
10:41:56 <clever> thats what i would guess
10:42:38 <kingoffrance> otoh i called one my things stage0 so i appreciate 0-based indexing
11:08:21 <j`ey> Mutabah: already reported https://github.com/rust-lang/rust/issues/73677
11:08:22 <bslsk05> ​github.com: SIGSEGV when building core for aarch64-unknown-none-softfloat target · Issue #73677 · rust-lang/rust · GitHub
11:26:27 <kingoffrance> same guy as bcompiler 4 years later: https://web.archive.org/web/20160402225843/http://homepage.ntlworld.com/edmund.grimley-evans/cc500/
11:26:31 <bslsk05> ​web.archive.org: CC500: a tiny self-hosting C compiler
12:01:41 <bauen1> kingoffrance: iirc the primary bootstrap chain uses a tiny vm, but there is also a reimplementation for x86_64
12:12:40 <clever> kingoffrance: now that, has a chance of maybe being able to compile tinycc
12:13:30 <j`ey> there's also https://bellard.org/otcc/
12:13:30 <bslsk05> ​bellard.org: OTCC : Obfuscated Tiny C Compiler
12:14:06 <kingoffrance> i should point out mes seems to want to get away from gcc 2.95 so lest anyone take my comments as indicative of anything -- that just looks a temporary thing until perhaps mescc can compile gcc 4.x
12:14:27 <j`ey> can only go up to 4.7
12:15:48 <kingoffrance> s/mes/guix/ "This amazing achievement is mirrored only by its terrible clumsiness. Is this really how we want to secure the bootstrap of our GNU system?"
12:27:13 <kingoffrance> well i said months ago itd be funny to have a distro whose sole purpose was to host a bunch of cross compilers...those type of projects might vaguely want something similar, but for different reasons: reproducability/auditing/verification :)
12:28:12 <kingoffrance> s/host a bunch of .../& to build other os from/
13:02:42 <zid> erghh finally got a stupid change to my gameboy emulator working, holy shit
13:02:59 <zid> took me all morning because I found three different docs that described the thing I was trying to do poorly
13:03:12 <zid> and as soon as I fixed it someone linked me doc that had it correct :p
14:00:53 <bauen1> clever: you do kind of need a step between cc500 and tinycc that implements structs
14:02:22 <rain1> that would be awesome
14:02:32 <clever> bauen1: ah, but its more c based, and i could see adding a struct support to cc500 beign simple
14:02:40 <clever> far better then bcompiler
14:02:53 <rain1> there is a nice one somewhere on github
14:03:05 <zid> I like the idea of a makefile you run
14:03:13 <zid> that just keeps building slightly more advanced C compilers
14:03:16 <zid> in a big loop
14:03:23 <bauen1> true
14:03:33 <clever> zid: shake is a haskell framework for basically compiling a makefile like build file
14:03:55 <rain1> https://github.com/ras52/bootstrap
14:03:56 <bslsk05> ​ras52/bootstrap - Richard's compiler bootstrap experiment (3 forks/20 stargazers/GPL-3.0)
14:04:08 <bauen1> zid: i've toyed with the idea of a makefile that basically builds an entire linux system (without recursive invocations)
14:04:16 <bauen1> bootstrapping the initial c compiler is a problem
14:04:17 <zid> ..haskell
14:04:17 <rain1> I think this is the best one ive seen
14:04:21 <zid> who the hell wants to run haskell
14:04:29 <zid> bauen1: that's neat
14:04:41 <clever> zid: i thought you meant compile a makefile, jumped before you could finish typing
14:04:47 <rain1> you can start with an assembler written in hex
14:04:55 <bauen1> but if there is a clear path, like <minimal binary> -> compiler1 -> compiler2 -> compiler3 -> tinycc that would work
14:05:04 <zid> It's sort of like one of those incremental games
14:05:35 <bauen1> ideally it results in a complete dependency graph for *every* binary, a small change to a random header, make, and everything is updated
14:05:49 <clever> bauen1: thats exactly what the nix package manager does
14:05:55 <bauen1> technically make isn't the perfect tool for this, but it's the best i know of
14:06:04 <clever> bauen1: just expand that idea out to every single package and config file on the system
14:06:27 <bauen1> clever: does nix do dependencies based on packages or individual files ?
14:06:41 <clever> bauen1: when using nix, its a mix of make and nix files, nix just describes the project level deps, and make deals with making the project itself
14:06:49 <clever> bauen1: package level for nix
14:07:08 <clever> the main way it works, is by using a unique --prefix for every package
14:07:24 <bauen1> clever: i want to implement file based dependencies, that would also allow massive parallelism when building (across packages)
14:07:29 <clever> for example, my ls binary is at /nix/store/2y0772ldj0wmc9n38a19k4pqf7w47yg2-coreutils-8.31/bin/ls
14:07:43 <clever> and that depends on things like /nix/store/c2rlh7xa8fcgg7qz8pl76ipvvb172c6k-glibc-2.30
14:08:06 <clever> the path to coreutils is a hash over many things, including the path to glibc
14:08:22 <clever> so if i mutate glibc, then coreutils gets a new path, and can co-exist along side the old coreutils
14:08:33 <bauen1> intersting
14:08:47 <clever> the cpu arch is also involved in that hash
14:08:48 <bauen1> i played a bit with nix but i think i never really got pass the initial install
14:08:54 <clever> so arm and x86 binaries can co-exist within /nix/store/
14:09:17 <clever> i once made an SD card, that had a full nixos install, for both arm7 and x86-64
14:09:26 <clever> it could boot on both my desktop and my pi
14:09:32 <clever> from the same rootfs, with the same config
14:09:46 <j`ey> just buy a 2nd sd card :P
14:10:06 <clever> j`ey: the point was to share the rootfs and everything from $HOME to /etc/
14:13:47 <bauen1> iirc stage-5 of ras52/bootstrap can't quite compile tinycc
14:14:03 <bauen1> at least when i last tried it against mob HEAD of tinycc
14:14:15 <bauen1> it might be possible to compile an older version of tinycc with it
14:14:55 <rain1> oh yes it doesn't build tinycc sorry if i implied that
14:15:07 <rain1> there is a cool system which can build tinycc
14:15:13 <rain1> https://github.com/giomasce/asmc
14:15:14 <bslsk05> ​giomasce/asmc - None (0 forks/2 stargazers/NOASSERTION)
14:15:17 <bauen1> i didn't assume that, just wanted to point it out
14:15:27 <bauen1> but it gets quite close
14:15:46 <rain1> annoyingly the author never responded to me about licensing
14:16:02 <rain1> (ra52 that is)
14:16:04 <bauen1> i think https://gitlab.com/giomasce/asmc/ is a bit more up to date
14:16:06 <bslsk05> ​gitlab.com: Projects · Giovanni Mascellani / asmc · GitLab
14:16:31 <bauen1> oh wait it isn't
14:16:35 <rain1> oh what its GPL
16:36:26 <theseb> I noticed you can have a cpu with lots of instructions....or you can have minimal instructions but have an assembler with lots of macros....I'm writing an assembler, for a hypothetical cpu, and I'm amazed that with enough MACROS I can present a powerful easy to use assembly that gives you all the power of a cpu with countless instructions....is either way "better" than the other or does it not matter?
16:37:06 <theseb> I'm guessing this is basically the difference between CISC and RISC?
16:37:08 <j`ey> looks like youre discovering RISC and CISC :P
16:38:49 <theseb> j`ey: imho....RISC seems more elegant and wiser....hardware is tricky and can't be changed...so seems better to have a simple debugged hardware core...then everyone can do whatever software layers on top of that they want
16:39:48 <theseb> j`ey: only downside i can see is sometimes those macros get replaced with A LOT of instructions...so do you want to run a FEW instructions on a COMPLEX cpu or A LOT of instructions on a SIMPLE CPU?
16:40:01 <theseb> not sure who "wins" that race
16:40:27 <j`ey> note that even intel is a RISC underneath now
16:45:49 <heat> right now I think it's pretty clear that RISC doesn't work for relatively high performance computing
16:46:07 <heat> even ARM doesn't go full RISC
16:46:34 <heat> (and the desktop market is completely dominated by CISC CPUs)
16:47:15 <zid> and x86 isn't risc inside
16:47:37 <zid> someone somehow confused supercisc with risc
16:47:40 <theseb> heat: is there some fundamental reason RISC can't compete speedwise with CISC?
16:48:12 <theseb> heat: i can imagine a very carefully designed assembler/compiler that minimizes instructions required could be good
16:48:26 <heat> no fucking clue
16:48:44 <zid> "hardcode entire complex groups of instructions so they run in 1 cycle" is risc in.. not many reasonable ways :P
16:49:02 <heat> I imagine that some operations are really really much faster if they're able to be done as a whole, and able to be optimised
16:49:29 <heat> such that intel keeps coming up with instructions that do a lot of stuff at the same time(simply because people need them)
16:49:53 <zid> I think these people have the wrong end of the stick
16:50:03 <theseb> We were all taught that complexity is evil, especially in programming, so intuitively it would seem that over the long term a debugged simple solid core would win but what do i know
16:50:25 <heat> yeah but that's bullshit
16:50:30 <zid> "We have memcmp as a single instruction" is very very very cisc
16:50:35 <heat> complexity is needed
16:50:49 <theseb> x86 can keep bolting on new hacks for speed but that doesn't seem like it should be sustainable...although the evidence is that it is..so again..what do i know
16:51:19 <zid> I think as it turns out, it's very easy to just completely saturate your clockspeed with 1 cycle ops like adds and movs
16:51:24 <zid> and that's what risc was designed to achieve
16:51:28 <zid> so it's dead now
16:51:37 <theseb> you'd think by now x86 would have crashed and burned
16:51:38 <heat> your classic C for-loop'd memcpy is a bunch of times slower than a really really complex memcpy
16:51:52 <zid> (except rep movq is just plain faster on my cpu)
16:51:57 <heat> and a really really complex memcpy is, in most situations, slower than a simple rep movsb
16:52:10 <heat> guess what? the CPU kinda knows itself much better than you do
16:52:34 <zid> It's easy to see why you go risc in the 70s/80s
16:52:37 <heat> (and can do stuff like use SIMD-like loads and stores without you even knowing)
16:52:59 <zid> cpus are crap, you can't even get good performance because you're hung up dealing with fetches and decoding
16:53:04 <zid> so you make everything super simple and regular
16:53:29 <theseb> zid: what..are you saying RISC as an idea is fading? if yes i find that very troubling
16:53:39 <zid> and just try and do as little complex stuff as possible to try get the instruction cycle length down
16:53:45 <heat> yes, that's very clear
16:53:52 <theseb> zid: but i thought RISC-V was the future n' stuff
16:53:58 <heat> lmao
16:54:00 <zid> now we have a billion spare transitors per opcode, fetch bandwidth is 1000x what it needs to be, etc
16:54:26 <heat> RISC-V is the rust of CPUs but without the benefits
16:54:35 <zid> risc-v is a teaching aide
16:54:51 <heat> if you want a RISC-ish CPU you can go ARM, but that's about it
16:54:54 <heat> it's not pure RISC
16:55:06 <zid> and then someone went "hmm, seeing as it's free.. why don't we actually make one and try make it somewhat useable?" and here we are :P
16:55:53 <theseb> zid: sigh...but elegance *should* be better...this news is really bugging me
16:56:41 <heat> welcome to the real world
16:56:52 <heat> elegance is sacrificed for usability and speed
16:57:32 <theseb> zid: ok..fine....if it is really true that CISC is faster/better...why not take that to its extreme and make things like hardware realizations of JVM, LLVM, Python VM etc...could that be the future....call it "ultra-CISC" or something
16:57:43 <theseb> ??
16:58:03 <zid> heat: Who are we talking to btw? :P
16:58:14 <theseb> zid: me...you're the expert
16:58:17 <theseb> i'm the newb
16:58:26 <zid> oh wait, I always forget this channel is logged
16:58:41 <j`ey> theseb: arm had a java extension in the past
16:58:53 <theseb> zid: no really...am i on to something?....if CISC is good wouldn't ultra-CISC be even better?
16:59:19 <theseb> do as much as possible in hardware basically
16:59:19 <heat> theseb: what makes you think the extreme is always the best?
16:59:40 <heat> only a sith deals in absolutes
17:00:02 <theseb> heat: well you said we like cpu being able to groups of errands all in one cycle...so if it can do 10x more per cycle...seems logical no?
17:00:20 <theseb> or 100x more
17:00:46 <theseb> heat: yea..that's one of the dumbest quotes from Star Wars
17:00:58 <bauen1> theseb: ASIC's could be called "ultra-CISC"
17:01:02 <bauen1> *ASICs
17:01:21 <theseb> heat: "Was Hitler absolute wrong to kill XX million Jews?"...."hmmm I'm not a Sith so I can't agree with that."
17:02:00 <theseb> bauen1: good point...fair enough
17:02:30 <heat> no, that's not the point of the quote
17:03:22 <heat> things are always relative, and dealing only with the absolutes is stupid
17:04:35 <heat> like if I said RISC is 100% crap, and you said RISC is perfect... it's just counterproductive and plain wrong
17:05:22 <heat> none of us would be in the right here because RISC's approach obviously has some positives and some negatives
17:06:00 <heat> (and being extremely-RISC and extremely-CISC is not very good)
17:07:19 <heat> lower-level abstractions tend to be faster if not done excessively
17:16:15 <theseb> Anyone ever heard of cpu having a register designated for return addresses of functions?
17:16:58 <heat> arm
17:17:15 <theseb> ah good thanks
17:19:17 <zid> (RISC is 100% crap)
17:20:24 <uplime> so one could say its a riscy processor to target?
17:22:20 <geist> theseb: yes most risc machines did exactly that
17:22:29 <geist> ra, lr, etc
17:22:51 <geist> in fact i'd say that pretty much all risc machines use some sort of 'link register' or 'return address register' for a call
17:23:21 <geist> since the alternative is generally automatically pushing things on a stack and automatic stack operations are also not something risc designs do
17:31:07 <theseb> geist: thanks...seems you need to reserve a register to hold return address and another one still to hold the return value
17:32:20 <j`ey> yup
17:33:02 <theseb> or you must pop the return data from the stack
17:33:10 <theseb> one or the other yes?
17:33:31 <geist> the former
17:33:42 <theseb> problem with putting return data in register is it is limited...putting return data on stack allows unlimited flexibility
17:33:50 <geist> yes and no
17:33:59 <geist> remember most risc machines also have lots of registers
17:34:13 <j`ey> theseb: big things will go on the stack
17:34:17 <geist> typically at least twice as much as an equivalent cisc machines
17:34:17 <theseb> geist: what is a typical reg count?
17:34:37 <theseb> i have no idea what "big" means here
17:34:37 <geist> usually 32, though there are smaller risc machines with less, like 16
17:34:48 <theseb> ah..ok thanks..that helps
17:34:59 <geist> but when you get into 32 territory you can 'burn' more registers for specific tasks
17:35:10 <geist> but ike i was saying risc machines usually dont mandate a stack
17:35:22 <geist> as in there can be a stack, but it's usually not 'built in' to the architecture
17:35:39 <geist> as much as it's an ABI convention that you use one (or more) registers for a stac
17:35:54 <geist> this is where they diverge a bit, some risc machines do, like arm64 for example
17:36:15 <geist> but still almost no risc machine has any operations that automatically push/pop things on the stack, bcause they dont assume there is one
17:37:04 <geist> that actually is maximally flexible for things like exceptions and calling convention because it's not baked into the architecture how the stack works. you can push/pop entirely in software so you have a lot of options for codegen
17:38:14 <zid> in exchange for it not being optimized by the silicon
17:39:10 <geist> this is precisely why arm64 has its own dedicated SP
17:39:15 <geist> outside if the usual register file
17:39:39 <geist> there's talk in the riscv architecture manual about using a specific register for your stack operations because the cpu can optimize it better
17:39:42 <j`ey> SP == XZR
17:39:43 <zid> more silicon optimization -> more baked in path
17:39:55 <zid> <-> I guess
17:40:10 <geist> indeed, but the key is from a software point of view you can still choose when to use the stack
17:40:20 <geist> so stuff like double faults and triple faults dont exist on most risc machines
17:40:26 <geist> because there's no 'illegal' state you can get the cpu into
17:40:47 <geist> ie, no situatios where the cpu cant make forward progress because the stack is fucked up, or whatnot
17:42:16 <geist> the fact that riscv even acnolwedges that future high performance designs will want to try to determine what you're doing by looking at which register you're using is pretty nice. former risc machines like mips would pretend none of that existed
17:53:42 <theseb> geist: while back i was getting the sad news in here that the future is a CISC one because RISC is just too slow for high performance
17:53:50 <theseb> geist: do you agree?
17:53:58 <j`ey> this doesn't look right :( PC=0000000000000052
17:55:30 <geist> theseb: nope
17:55:48 <theseb> geist: interesting
17:55:53 <geist> j`ey: indeed, and on arm64 it's even worse because it's not aligned
17:56:03 <geist> theseb: in fact i'd argue that the ISA is not that important
17:56:19 <geist> it's like 10% of the picture, at least for high performance stuff
17:56:36 <theseb> geist: good...i was starting to think RISC-V was doomed
17:56:37 <geist> for very low end it starts to matter a lot, which is why risc machines dominate embedded space
17:56:43 <j`ey> I have singlestep on qemu, so I should bee able to see the instruction before it.. but the instruction before is: adrp x8, 80003000
17:57:11 <geist> but in very high end i think you can design high end anything as long as it's an expressive enough ISA. but at that pointi t's all about money, market share, etc
17:57:42 <geist> j`ey: one hack i use sometimes is to run with mega tracing on: '-d cpu,exec,int'
17:57:59 <j`ey> oh exec, that's the one I was missing
17:58:03 <j`ey> I'll try that
17:58:11 <geist> it gives you a total firehose to stderr but if your code blows up fairly early you can usually see where it goes wrong
17:58:21 <j`ey> I need to remember to make more notes, before I leave a project for a year
17:58:22 <geist> then of course there's -singlestep vs not
17:58:55 <geist> which basically tells qemu to generate a translated single instruction or a block of instructions (the default)
17:59:08 <geist> when it's doing the latter it's harder to trace sinc eyou only see the entry and exit for a run of instructions
17:59:18 <geist> though most of the time that's sufficient too
17:59:43 <geist> theres some other flags in there too, like in_asm and out_asm
17:59:51 <geist> those are neat, shows you the input and output translated instructions
17:59:56 <j`ey> I have enabled -singlestep
18:00:00 <j`ey> in_asm sounds nice too
18:00:37 <geist> yah
18:01:06 <geist> thoug iirc it only prints it the first time it runs a block, when doing the trnaslation. if the code is blowing up is in a loop or some shared code you might not see what you think
18:01:16 <geist> because it'll only jit it once and then hold onto it
18:01:19 <siberianascii> would you trust a VPN that ask for a sudo permission with a stripped binary ?
18:01:35 <siberianascii> for installation
18:01:43 <geist> but you can usually reverse back in the spew and find the place that that one block was translated
18:01:47 <j`ey> geist: yeah, I can search for the address and see it futher up at least
18:01:49 <j`ey> ^
18:02:12 <geist> yah in_asm and out_asm are entertaining to see what qemu is doing
18:02:27 <geist> there's another one in there that shows you the intermediate opcodes it translates to
18:02:40 <geist> not particularly useful, but really interesting if you're into that kind of thing
18:04:18 <j`ey> something still feels odd in the logs, ldr x8, [x1, #24], is the last 'chain' it shows
18:04:38 <geist> yah that's strange
18:04:48 <geist> what EL are you at? EL1?
18:04:52 <j`ey> yeah
18:05:13 <geist> so it's not likel to be something like it too an exception to 0 or whatnot
18:05:14 <j`ey> there is a blr to x8 in 2 instructions time
18:05:34 <geist> aaah well that's almost certainly it, but why is there a gap? when it crashes, what is in lr?
18:05:59 <j`ey> ah nice, yeah LR is the instruction after the br
18:06:00 <geist> possible it's not really single stepping, or it's chaining blocks together and not giving you an intermediate trace
18:06:13 <geist> and what's in x8 that'd be a smoking gun
18:06:23 <j`ey> it's the 0x52 :)
18:07:08 <geist> yah so it's at least behaving as it should. your problem is further upstream
18:10:00 <j`ey> so I'm pretty sure it's trying to access some static/global data
18:10:15 <geist> yah maybe some sort of vtable?
18:10:28 <geist> something like x1 being a vtable pointer and #24 being an offset into it
18:11:01 <j`ey> yep, it's exactly that
18:11:21 <j`ey> and I can see from objdump, what it should be
18:11:36 <geist> so might want to look at x1 and see if it's reasonable. if it is then i'd verify that the binary is at the right spot and you're running at the right spot
18:11:51 <geist> since arm64 is very PIC by default you can run at the wrong spot for quite some time
18:12:10 <geist> and you'll be doing pretty good until you end up with some sort of double indirect where it loads a hard coded absolute address
18:12:36 <j`ey> X1 is reasonable, 800037b8, which is the start of a symbol (from objdump)
18:12:59 <geist> yah then maybe validate that what is in memory is what you expect
18:13:20 <geist> incidentally, where is that located in ram?
18:13:26 <j`ey> yeah, will dump a: b ., as you say
18:13:29 <geist> is that 0x4000.0000 -> 0x8000.0000?
18:13:35 <j`ey> yeah
18:13:51 <j`ey> oh hmm
18:13:55 <geist> how are you loading the binary? via -kernel? is it a flat binary or an elf file?
18:14:18 <j`ey> -kernel, ELF
18:14:34 <geist> try this: move the link address out 512KB or 1MB
18:14:47 <geist> iirc i've bumped into issues where qemu likes to put stuff at the 'bottom' of memory
18:15:01 <j`ey> I had another thing that I just tried, I had my rodata in the same section as text
18:15:04 <geist> specifically that's usually where it puts the device tree. i'm fairly certain it's smart enough not to overwrite the kernel it just loaded
18:15:06 <j`ey> .. which had an AT(0x40000)
18:15:17 <j`ey> I wonder if that's the issue
18:15:25 <geist> yah that AT stuff is definitely not needed here
18:15:31 <geist> since you're running in a single segment
18:15:56 <j`ey> ok, moving rodata didnt fix it
18:16:03 <j`ey> I still have AT() for the .text
18:16:03 <geist> but yah while you're at it also move the kernel load address out some. at the minimum you'll want to do it later so you can access the FDT
18:16:23 <geist> yah you dont need the AT for sure. a plain ol binary linked to run at the final address is what you want
18:16:35 <j`ey> I dont need the AT() at all?
18:16:49 <geist> wait, hang on. lemme check
18:17:13 <j`ey> LK has it, I thought it was needed for the .text
18:17:19 <j`ey> this stuff is all... confusing :D
18:18:00 <geist> ah yeah you're right
18:18:02 <geist> i do
18:18:20 <geist> https://pastebin.com/gguHQ8sY that's why
18:18:21 <bslsk05> ​pastebin.com: build-qemu-virt-arm64-test/lk.elf: file format elf64-littleaarch64build-qe - Pastebin.com
18:18:37 <geist> see how it's vaddr is high (0xffff....) but the load addr is low (0x40....)
18:18:46 <j`ey> yeah
18:18:54 <j`ey> I also confirmed by removing it locally
18:19:01 <geist> but it's also now 0x40000 like you put p there
18:19:07 <geist> not
18:19:45 <geist> but, also see that 0x100000 offset? that's 1MB into dram space because qemu likes to put the FDT at 0 into dram
18:20:01 <j`ey> what command did you run for that output?
18:20:04 <geist> i am fairly certain it wont mess up the kernel but it's possible its doing something dumb like overwriting part of your code with the FDT
18:20:18 <geist> thats aarch64-elf-objdump -x <your binary>
18:20:28 <geist> -x dumps a whole page of stuff, but that's the header part
18:20:42 <geist> i always generate a dissassembly and a dump and a symbol list with every build
18:21:01 <j`ey> https://pasta.cx/C.txt
18:21:01 <bslsk05> ​pasta.cx: pkg-build/aarch64/release/rkernel: file format elf64-littleaarch64 pkg-build/aarch64/release/rke...
18:21:36 <j`ey> I'll try the higher address quickly, see if it is related
18:21:52 <geist> seems okay. the STACK thing is a little fishy
18:22:07 <geist> hopefully the rust stuff isn't using it and trying to use 0 as its stack or something
18:22:09 <j`ey> dunno what that's from
18:23:29 <geist> it's probably harmless, i forget exactly what STACK program headers are for, but i think it's something to do with creating a new thread in a process and having some sort of hint as to what the size should be or something
18:25:08 <froggey> to indicate if the stack needs to be executable or not
18:25:45 <geist> ah interesting, based on the flags of the STACK header? makes sense why there appears to be no meaningful other bits in there
18:25:51 <froggey> yeah
18:26:03 <geist> figured id might also have a size like 64k or something as a hint to the loader how big of a stack it should make for the initial one
18:29:05 <j`ey> ok, well I guess my page table stuff needs more work :)
18:30:04 <j`ey> (since now I get fetch aborts when trying with 0xfffff)
18:33:45 <j`ey> I need to read about this all again, I forgot 99% of the details
18:35:41 <j`ey> I wrote a bunch of notes... on paper, a year ago. and I moved since then
18:38:23 <geist> yah and sadly qemu does not have info mem or info tlb for ARM
18:38:31 <geist> so you're flying blind with the page tables
18:39:10 <geist> though youcan try to manually verify thigns by stopping the cpu just after enabling paging and then using x and xp on the command line to see if things appear to be mapped properly
18:46:11 <j`ey> someone should write a page where you can click what settings you want, and it generates the asm/settings for you
18:55:26 <geist> heh, or maybe not? depends on if that's a net positive or negative
19:28:06 <j`ey> well I made some progress, I'm back to where I was before the 0x800.. -> 0xffff change
19:28:19 <geist> oh you bumped up to the high address? yay
19:28:30 <geist> i assume you noticed that there are two TTBRs and you had to use the other one for the kernel bits?
19:28:50 <geist> that's a really nice feature of arm64 vs x86. having a split page table system like that makes so much more sense
19:28:52 <j`ey> yeah, I added another table!
19:29:04 <j`ey> my table is a bit wonky though
19:29:20 <j`ey> as in, I dont feel confident in it at all
19:29:38 <geist> well the secondary tables should be basically the same
19:29:53 <geist> just remember, each L2 table corrsponds to 512GB of address space
19:30:25 <geist> or 2<<39
19:31:30 <j`ey> so I basically have a setup a bit like this: L1[0] = &L2, L2[0] = KERNEL_PHYS_ADDR
19:33:45 <geist> it's 48, 39, 30, 21, 12
19:34:01 <geist> note each shifts by 9 bits until you get to the final 12 bit page (4K)
19:35:48 <j`ey> so if I make a block entry at L2[0] to KERNEL_PHYS_ADDR, it then uses that + bottom 12 bits
19:37:18 <j`ey> if the address has 18 0s at the top
19:37:24 <j`ey> well from 48 downards
19:39:48 <j`ey> (argh that was a terrible explanation)
19:43:57 <zid> It's just a trie
19:44:46 <geist> where it gets weird is when you fiddle with other base page granules on arm64 where the shift count is 11 bits and the final page is 16k
19:45:12 <geist> or 13 + 64k
19:45:51 <j`ey> I have to do some understanding of the compiled output, it's trying to store to 0x9000000
19:48:09 <j`ey> oh! const QEMU_VIRT_PL011_BASE: usize = 0x9000000;
19:48:40 <geist> oh oops. yah may want to rewrite that in terms of some sort of KERNEL_BASE #define or something
19:49:27 <j`ey> hmm but I write to it before and after th physical/virtual switch
19:49:54 <j`ey> or for now Ill see if I can add something to my lower page table to allow that through
19:50:33 <clever> i remember linux having 2 config fields for such things, i think the physical address was used before virtual, and the virtual addr was used by early (pre-tty setup) stuff in virtual mode, but also by the page table generation
19:58:02 <geist> yah i hve some #defines in LK for PHYS and VIRT addresses for various drivers
19:58:13 <geist> but usually i just immdeiately get into VIRT by the time the drivers are doing much
19:58:16 <j`ey> geist: say it reaches a block entry at level 2... does it used the rest of the bits as the offset?
19:58:51 <geist> yah
19:59:16 <geist> that's exacty what it does. it terminates the walk and then the rest of the lower address bits are offset into that large page
19:59:24 <j`ey> cool
19:59:28 <clever> ive had problems on linux in the past (my own doing), where the very first printk would throw an exception
19:59:41 <clever> but the earlycon logging, doesnt use printk, so it worked
20:00:15 <clever> but the earlycon stuff only printed before activating virtual, once virtual was up, it would use printk and route that to the earlycon device
20:00:21 <j`ey> I think I understand this a lot more than my code works yet
20:00:59 <geist> yah i think the way the arm page tables are present actually makes a little more sense than x86, even though its the same thing
20:01:05 <geist> the downside s the arm manual is Expert Mode stuff
20:01:13 <geist> like game+ mode
20:01:38 <geist> x86 spends too much time giving different names to different levels (pd, pdp, pml4) etc which seems to be just a distraction
20:01:53 <geist> i just like to hink about it as L0-L3 page tables
20:01:55 <clever> ive still yet to make sense of arm paging
20:02:08 <zid> you're right geist
20:02:16 <zid> let's rename pml4 to pdpdpdt
20:02:20 <clever> x86 is just documented better, with graphics to show which addr bits become indexes into which tables
20:02:26 <geist> yah
20:02:43 <j`ey> im looking at some decent graphics
20:02:50 <j`ey> https://static.docs.arm.com/100940/0100/armv8_a_address%20translation_100940_0100_en.pdf
20:02:51 <clever> and arm64 has a weird split page table thing
20:02:51 <geist> since ARM is a lot more configurable with the way the page tables work they have like 4 version of everything, and lots of floaty bits which does not help
20:03:00 <clever> where the upper and lower half of the virtual space, have seperate paging tables
20:03:12 <geist> oh the split thing is simpler IMO, because now you dont have to sit around and try to grok the split thing that s on x86
20:03:15 <j`ey> you dont need to swap out the kernels page tables then!
20:03:24 <geist> that's *more* straightforward than x86
20:03:34 <geist> where you have to eventually grok that the single table is split in the middle
20:03:48 <geist> with arm it's just two setrs of page tables, one for 0000 and one for ffff prefix. simple as can be
20:04:08 <clever> i think i saw something about being able to choose where the split point was?
20:04:16 <geist> well, arm64. arm32 had a split mechanism thats mega complicated. another one of those things that arm64 cleans up immensely
20:04:22 <geist> clever: that's arm32. and yes it's complicated
20:04:26 <geist> and stupid
20:04:35 <clever> ahhh, thats why i'm more confused
20:04:52 <geist> arm32 paging < x86 paging < arm64 paging
20:04:52 <clever> ive not tried fixing arm64 boot yet, so i'm still working in arm32 land
20:05:04 <geist> riscv paging is more or less == x86-64 paging
20:05:19 <geist> same splitting, same single page table root pointer, etc
20:10:19 <j`ey> ok, managed to add a new entry for that serial port address, and it worked, so it's good to know Im not just doing rubbish
20:11:43 <geist> side note you know that you need to map mmio addresses as device memory and whatnot right? possible that qemu wont care
20:11:48 <geist> but on real hardware you'll definitely need to do that
20:12:02 <geist> that's why the LK thing has the whole peripheral map thing which uses the different mapping flags
20:12:06 <j`ey> yeah I know, just hoping it works
20:12:15 <j`ey> (on qemu)
20:12:22 <geist> yah it's probably totally fine on qemu
20:12:30 <geist> even in KVM it'll still trap it i bet
20:16:42 <j`ey> fixed another bug where I was just overwriting the L2[1] entry with 0 for seemingly no reason
20:30:13 <johnjay> hey geist i'm trying to remake my github account with a new email
20:30:24 <geist> grats
20:30:26 <johnjay> does it particularly matter what email you use? like does it have to be gmail or what
20:30:33 <geist> i dont think so, no
20:30:46 <geist> though i generally recommend using stable mail accounts
20:30:56 <uplime> you can have multiple emails too
20:31:05 <uplime> so you can keep the old one as a backup
20:31:12 <johnjay> ok. i have an old email i used for gaming that's like mozart2345 (a) hotmail
20:31:19 <johnjay> i was worried maye it would get caught by spam filters
20:31:23 <geist> yah i have been committing stuff under multiple email accounts as i've gradually switched to using gmail more and more as my permanent one
20:31:23 <johnjay> or look weird
20:31:53 <johnjay> i have a gmail but i use it for separate things
20:32:02 <uplime> i use the same account for work and personal stuff (well back when i used github for personal stuff) so I've got my work email and personal email
20:32:03 <johnjay> dont' want everything to be under one provider
20:32:15 <uplime> get a personal email domain!
20:32:18 <uplime> its all the rage
20:32:36 <johnjay> heh well that's more like if you have a small business right
20:32:50 <johnjay> then you can be geist⊙gc
20:33:02 <uplime> meh, I have email setup for @naughtysysadmins.com and @securitea.app
20:33:11 <uplime> looks kind of cool on a resume
20:33:36 <johnjay> well i guess if that's the concern i can always add an email to github later
20:33:45 <johnjay> since you said it lets you use multiple ones
20:34:15 <uplime> yeah i've had maybe 4 or 5 at various points
20:34:32 <johnjay> i've just had some bad experiences in the past. like a site that only accepted gmail
20:35:00 <johnjay> so it's like... why is my yahoo email not ok here?? -_-
20:58:13 <heat> https://www.reddit.com/r/programming/comments/i4xxnk/20gb_leak_of_intel_data_whole_git_repositories/
20:58:17 <bslsk05> ​www.reddit.com: 20GB leak of Intel data: whole Git repositories, dev tools, backdoor mentions in source code : programming
20:58:23 <heat> Intel can't get a break :/
21:01:38 <clever> heat: nintendo has also been leaking like crazy lately, full source for entire games, verilog files for chunks of the wii console, and more
21:02:04 <heat> really?
21:02:45 <heat> although I kinda feel really sad for Intel these past 3 years... Lots of vulnerabilities, leaks :((
21:02:50 <clever> heat: https://hackaday.com/2020/05/21/no-the-nintendo-leak-wont-help-emulator-developers-and-heres-why/
21:02:51 <bslsk05> ​hackaday.com: No, The Nintendo Leak Won’t Help Emulator Developers, And Here’s Why | Hackaday
21:03:26 <clever> source for all 3 stages of the wii bootloader, and the entire bloody os!
21:03:45 <heat> no mario kart = no party
21:04:55 <clever> heat: other groups have already recreated one of the n64 mario games, via decompiling, but they made the source so bloody accurate, it compiles back into a bit identical recreation of the original
21:05:22 <clever> and then they fixed fps problems by just turning on gcc optimizations
21:13:57 <heat> really? that sounds broken as hell
21:21:30 <clever> heat: https://www.youtube.com/watch?v=NKlbE2eROC0
21:21:30 <bslsk05> ​'Did Nintendo really forget to Optimize Super Mario 64 ? | MVG' by Modern Vintage Gamer (00:13:14)
22:09:27 <heat> clever: the video seems BS
22:09:54 <heat> how would they get access to the original Makefile if they didn't have the source code?
22:10:15 <clever> heat: SDK's and tweaking compile flags until its bit-identical
22:11:04 <clever> there would be example Makefile's in the SDK, for anybody wanting to develop an official game
22:11:15 <heat> are they using the original compiler too?
22:11:23 <clever> i believe so, from the SDK
22:29:53 <geist> unclear what they'd be using for N64. probably some sort of SGI originated compiler
22:41:17 <j`ey> why does it seems like writes to the pl1101 serial dont show up, until after the MMU is enabled?
22:41:33 <j`ey> but after it's enabled, things I wrote to it before *do* show up
22:42:41 <heat> caching?
22:42:58 <geist> address changed
22:42:58 <j`ey> I thought none of that happened before the MMU was on
22:43:23 <heat> oh right right, arm is like that
22:43:39 <j`ey> geist: what do you mean?
22:43:40 <geist> on qemu none of the caching stuff is emulated
22:43:49 <geist> what you describe doesn't make sense, so i suspect operator error
22:44:00 <geist> but... mmu off and on is evertyhing in terms of accessing the registers
22:44:05 <geist> so obviously there's a big change there
22:45:34 <j`ey> I'll do some bisecting
22:47:09 <geist> it working on either side of the mmu enable makes sense but the cachedbits dont, unless that's some sort of software queuing thing, but i'm guessing you wrote it
22:47:20 <geist> so you'd know if there was something there, unless rust has some sort of queue built in
22:56:22 <j`ey> I narrowed it down, ish, but it doesn't help with my understanding :)
22:56:45 <j`ey> if I print the address of the L1 page table, prints appear, if I dont print it, they dont
22:57:09 <geist> fun!
22:57:34 <j`ey> some freaky stuff going on!
23:03:11 <geist> since it's on qemu most f the real voodoo stuff is probably not there
23:04:57 <j`ey> oh, looks like it might be a bug in my clear bss code
23:05:22 <geist> haha see even after all that blabbing about it still got it messed up
23:05:34 <geist> question: was it because of over optimization or just a straight bug
23:06:18 <j`ey> I have to dig into this more to see what the issue could be
23:06:37 <heat> does any other system except Linux use netlink?
23:06:50 <doug16k> really strange, almost-working behaviour could be a misaligned stack
23:07:13 <geist> note you can alost certainly just comment out the bss code and it'll still work because qemu will start memory off zeroed already
23:07:16 <j`ey> stack is baligned to 16
23:07:19 <geist> and there's no existing firmware running before your kernel
23:07:32 <j`ey> oh that's a good idea to try quickly
23:07:43 <geist> re: stack there's a SCTLR bit to enforce 16 byte alignment in bith EL0 and EL1
23:07:46 <doug16k> balign means the stack is placed in an aligned place. that won't magically cause the actual stack pointer to be aligned
23:07:55 <geist> if you have it set then it by definition cannot be unaligned or it generates an exception
23:08:07 <geist> like the instant the bottom bits are set in the SP it faults
23:08:42 <doug16k> that's cool
23:08:53 <j`ey> commenting out the bss clear makes things better
23:18:15 <j`ey> hm bss size is 0xc, and bss_start isnt 5604, which doesnt seem 16 byte aligned!
23:22:24 <heat> huh, despite the RFC it doesn't look like netlink is used anywhere else
23:23:36 <heat> the question is whether it's "worth it" to copy linux and implement netlink as to avoid rewriting tools if they're too hard to rewrite
23:31:57 <j`ey> for some reason I was getting bss_size == 0xc with this https://pasta.cx/g.txt
23:31:57 <bslsk05> ​pasta.cx: .bss : ALIGN(16) { /*.bss : {*/ . = ALIGN(16); __bss_start = .; *(.bss*) *(COMMO...
23:32:21 <j`ey> well if you remove the : ALIGN(16) from the .bss line
23:33:52 <heat> j`ey: I'm guessing __bss_size only takes into account the size of the actual .bss
23:34:08 <heat> and disregards all sorts of alignment you do there at the end
23:34:45 <heat> if you do __bss_size = __bss_end - __bss_start it'll have an aligned size
23:34:55 <j`ey> yeah, I just switched to that
23:56:07 <doug16k> putting : ALIGN(N) in there puts a very powerful gun to its head. you will override the actual alignment if it is different
23:56:19 <doug16k> if it inferred something else
23:56:32 <doug16k> you can tell that is happening by looking at the program headers. see if the base disagrees with the align
23:57:05 <doug16k> hmm, for bss maybe not so clear
23:57:18 <doug16k> ah, it should show up in section headers
23:58:35 <doug16k> if the section contains a thing aligned to 16, you don't need to "help" ld by saying : ALIGN(16)
23:58:44 <doug16k> ld isn't that stupid
23:59:29 <doug16k> the less you strongarm ld, the sooner it will be all stable and working fine