Search logs: #osdev - 5 August 2020

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev&y=20&m=8&d=5

Wednesday, 5 August 2020

01:53:39 <unlord> OK, so tell me more
01:53:59 <unlord> If I use the DPMI host to map that memory, I can continue parsing these headers
01:55:40 <doug16k> unlord, about?
01:55:56 <unlord> So I still don't understand paging
01:56:30 <doug16k> i686 PAE?
01:57:07 <unlord> well, AFAICT this exists back to i386
01:57:38 <doug16k> you'd be lucky to have 1MB on i386
01:57:45 <unlord> doug16k: I have a physical 386 with 32MB
01:57:58 <doug16k> ya now
01:58:15 <doug16k> pae came out around ppro
01:58:16 <unlord> I own it now, but the parts are all 20+ years old
01:58:27 <geist> neat! that's a really beeefy 386
01:58:33 <doug16k> nobody felt squeezed by 4GB until then
01:58:46 <ronsor> that's because Windows 95 couldn't even use 4GB
01:58:48 <geist> i have a 386 but it was a random desktop and only could take up to 4MB (4 1MB SIMMS)
01:58:51 <ronsor> in fact, it'd refuse to boot
01:59:08 <unlord> doug16k: ok, so I'm using the wrong terminology then
01:59:12 <geist> unlord: anyway, so what question do you have about paging then?
01:59:22 <doug16k> i686 pae means not x86_64 pae
01:59:24 <geist> you're really gonna need to understand it pretty well one way or another
01:59:31 <unlord> so I'm still trying to walk the ACPI tables, and I got as far as needing to map the physical address
01:59:49 <geist> wait. you have a 386 with ACPI?
01:59:53 <unlord> no
02:00:10 <unlord> I have a 386 with a linear framebuffer VLB though
02:00:25 <geist> so the ACPI machine is not a 386. are ou using 32bit paging then?
02:00:31 <geist> straight 2 level page tables, no PAE?
02:00:51 <unlord> geist: I don't know how page tables work I think
02:01:22 <unlord> essentially, I have a selector with a base address and a 4GB limit, and I'm trying to understand why I cannot just index phys_addr - base_addr
02:01:50 <geist> well, have you tried reading any of the tutorials and manuals on the topic?
02:01:57 <geist> understanding paging is pretty essential
02:02:29 <geist> have you enabled paging?
02:02:39 <geist> or are you just trying to use 32bit segmentation to access the thing?
02:03:25 <unlord> geist: I'm in real protected mode, but right now I'm using a DPMI host to switch to pmode
02:03:35 <unlord> I have other code that does it w/o a DPMI host, but I have not tried with that
02:03:38 <geist> so 32bit protected mode?
02:03:41 <unlord> yes
02:03:46 <geist> but paging is not enabled?
02:03:58 <geist> or more specifically do you intend to completely 'take over' the system?
02:04:12 <geist> since DPMI implies you're coming out of some sort of DOS environment
02:04:29 <unlord> geist: I'd like to be able to return to the system when I'm done. It isn't strictly a requirement
02:04:49 <geist> that makes things *far* more complicated, and i really dont know how to do that
02:05:03 <geist> really coming out of dos is already a wrinkle. is that a requirement?
02:05:23 <unlord> assume I don't care about the current system then. I think I understand how that part will work
02:06:02 <geist> then you need to start reading the manual on how page tables work
02:06:19 <geist> it's not really that complicated, but there are some conceptual leaps to be made. how much have you learned so far?
02:06:44 <geist> what part is giving you trouble?
02:07:22 <unlord> http://dgql.org/~unlord/virtual_box_crash.png
02:08:28 <unlord> I guess, I don't understand what it means to have a 4GB limit on DS and ES selectors, if you cannot just access all of the memory
02:08:33 <unlord> why did I set a 4GB limit?
02:08:34 <adu> I have never heard of DPMI, is that like a BIOS interrupt thing?
02:08:39 <geist> DS and ES are not paging
02:08:41 <geist> that's segmentation
02:08:49 <ronsor> DPMI is the DOS Protected Mode Interface
02:08:51 <geist> that's a compoetely different and separate mechanism
02:09:17 <unlord> geist: I'm pretty sure this is not segmentation
02:09:20 <geist> unlord: but re DS and ES, what do you mean 'cannot just accesss all of the memory'?
02:09:21 <ronsor> It provides an API for running (mostly) 32-bit protected mode applications on DOS.
02:09:47 <unlord> ronsor: I have non DPMI based code, let me go rewrite this so it uses that
02:09:52 <unlord> sheesh :)
02:10:09 <geist> what is your question about the 4GB limit?
02:10:36 <geist> when you say you dont understand 'what it means to have a 4GB limit'. vs what? having > 4GB limit?
02:10:50 <unlord> geist: ES=009F (00012A40,FFFFFFFF,8FFe)
02:10:53 <unlord> err
02:10:55 <unlord> geist: ES=009F (00012A40,FFFFFFFF,8FF3)
02:11:03 <geist> kay
02:11:43 <unlord> that middle number is the 4GB limit, note how CS=0097 (00012A40,0000FFFF,40FB) means I only have 64kB access on that selector
02:12:02 <geist> okay.
02:12:07 <geist> so what is the question?
02:13:09 <unlord> if the XSDT is at 0x1ff0030 or just under 32MB, why can't I simply access it as mov edi, 0x1ff0030 - 0x12A40 \ mov al, [edi]
02:13:43 <geist> what is in DS?
02:13:53 <unlord> DS = ES
02:13:55 <unlord> see the png
02:14:11 <geist> i dunno then. possibly not mapped with paging
02:14:21 <geist> remember segmentation != paging. segmentation sits on *top* of paging
02:14:31 <doug16k> those segments havve bases
02:14:43 <geist> yah that's the 0x12a40 they're trying to subtract
02:14:52 <geist> that seems sound, but if it's not mapped with paging then it still wont work
02:15:02 <doug16k> right
02:15:07 <unlord> can I print the page table?
02:15:18 <doug16k> what happened with that api call to map physical memory?
02:15:22 <doug16k> why not use that?
02:15:37 <unlord> doug16k: that was my next step, yes
02:15:54 <unlord> but that requires a DPMI host to be available, a requirement I was going to drop
02:16:01 <doug16k> no
02:16:12 <doug16k> if you happen to be in the dpmi scenario, you use that api
02:16:23 <doug16k> if you are not under a supervisor, you just go into pmode yourself
02:16:30 <unlord> yes, as I said before, I'm dangerously close to creating my own DPMI host
02:16:36 <doug16k> no
02:16:43 <geist> right. either you use DPMI to do what you want, or you drop it, or you use it to bootstrap yourself and then drop it, but either way you're gonna wanna just take ownership of the machine
02:16:51 <doug16k> you set up a gdt and do a small procedure and you are in pmode
02:16:57 <unlord> doug16k: I have code that does that
02:17:06 <geist> well then just do it
02:17:07 <unlord> it isn't integrated into my linker yet, but I have a POC that owrks
02:17:19 <geist> i guess to step back a bit, what are you trying to accomplish here?
02:17:39 <unlord> geist: I want to walk the ACPI tables and turn on all the cores
02:17:42 <geist> are you just inspecting the state of things and thus are okay keeping DPMI around, or are you trying to take ownership of the machine and using DOS as a loader?
02:17:55 <geist> and then what are you going to do with the cores?
02:18:02 <unlord> "compute" and then later turn them off
02:18:09 <geist> while staying in DOS?
02:18:11 <unlord> yes
02:18:11 <doug16k> geist, I believe he wants to do the absolute minimum code to run code on the APs from DOS
02:18:25 <unlord> doug16k: yep, exactly
02:18:34 <geist> okay, so you'll still need to understand how to bootstrap the APs up from scratch. are you intending to use paging on the APs?
02:19:21 <unlord> geist: I would like them all to execute in the same address space, does that answer the question?
02:19:49 <unlord> so I can do the minimum of message passing
02:20:08 <geist> probably run them in physical space without paging most likely
02:20:15 <doug16k> the paging related thing was that he has to physical map the LAPIC to access the command register
02:20:19 <ronsor> APs? Wait, is he trying to do SMP on MS-DOS?
02:20:39 <geist> possible you can also just point the APs at the same GDT as the main one
02:20:53 <geist> so that way at least you're still getting the same segments
02:21:03 <geist> but, also depends a lot on if DPMI is using paging to move things around
02:21:11 <doug16k> unlord, you don't need paging for message passing
02:21:12 <unlord> geist: let me take DPMI out of the picture
02:21:20 <geist> unclear. it may be mostly using it to just map everything 1:1 but my experience tracing it with qemu is it doesn
02:21:27 <doug16k> SMP - the memory is shared
02:21:54 <geist> okay good if you get DPMI out of the picture it's far more traditional
02:22:03 <geist> the boot cpu you get things to whatever state you want
02:22:11 <geist> paging on, paging off, GDT set up, IDT set up, etc
02:22:22 <geist> then youbootstrap each AP and they get the same thing set up
02:22:35 <geist> that's the definition of SMP basically. at that point all cpus are identical
02:22:54 <geist> they're all pointing at the same resources, and you start treating them as independent entities
02:23:21 <geist> trouble with using DPMI is it may have a bunch of hidden state that assumes a single cpu
02:23:32 <geist> so it might require digging around inside its inner data structures
02:24:09 <doug16k> oh ya, it is almost reckless to call any APIs on the APs
02:24:36 <doug16k> DOS is completely thread-unsafe code unless you go to great lengths to swap certain things
02:24:42 <geist> and even if you dont on the APs it may be that DPMI fiddles around with page tables and whatnot and since its not SMP doesn't do things like TLB shootdown
02:25:01 <geist> i know for example that emm386 uses paging to emulate expanded memory and some UMB stuff
02:25:07 <geist> i poked at the page tables with qemu
02:25:21 <geist> unlord: side note, if you use qemu you can easily dump the page tables for any cpu
02:25:29 <geist> 'info mem' and 'info tlb'
02:25:40 <unlord> geist: I've been using dosbox, virtualbox and now bochs
02:25:57 <unlord> bochs has a nice debugger, I'll try qemu
02:26:42 <unlord> this is the latest frankenstein program I have: http://paste.debian.net/1159120/
02:26:43 <bslsk05> paste.debian.net: debian Pastezone
02:27:20 <geist> yah bochs is nice, though the page table dumping stuff in qemu is really nice too
02:28:56 <geist> the big ones like virtualbox or vmware or hyperv i wouldn't recommend trying to do active development on
02:29:01 <geist> they're not as good for debugging
02:29:15 <unlord> bochs was a huge improvement
02:30:42 <geist> flip side is once you get past initial development bochs is too slow and limited
02:31:37 <ronsor> bochs is *very* slow
02:37:18 <unlord> heh, all of my stdlib code doesn't work since I cli
02:38:19 <unlord> or rather, since I mask off all but IRQ0
02:51:40 <unlord> doug16k: how do I find the LAPIC table?
02:58:35 <geist> strictly speaking it's pointed to by a variety of things
02:58:45 <geist> like ACPI in the MADT table, or the MP table
02:58:51 <geist> also says where the ioapic(s) are
02:59:02 <geist> in practice i think the LAPIC is pretty much always in exactly the same spot
02:59:11 <geist> iirc it's something like fffe.0000
03:01:08 <geist> ioapics can vary but i think the lapics (one per cpu) are pretty much always in one spot
03:01:44 <unlord> geist: I'm installing qemu
03:21:06 <adu> I always got confused between ACPI and APIC
03:21:53 <adu> iirc, one was giant programming language, and the other was a physical chip with like 8 pins
03:27:40 <geist> yah i keep messing that up
03:28:01 <geist> a real APIC is a big chip too, i think it has like 40 pins or so
03:28:50 <geist> at least. i'd actually like to see a real one. wonde if you can find one on ebay or something
03:30:11 <unlord> < $10
06:49:51 <doug16k> ronsor, bochs is so slow it won't let you get away with slow code :D
06:50:33 <doug16k> seriously, testing in bochs has made me fix perf mistakes
06:50:35 <kingoffrance> there can be only one
06:52:27 <doug16k> when your kernel is a disaster and barely works, then screws up catastrophically, it can be easier with bochs (it tells you why it crashed, not SIGQUIT mystery), and you can do "trace on" at some point before it crashes and continue, then when it crashes work back to where it blew up
06:52:48 <doug16k> once it isn't such a trainwreck, gdb+qemu is better
06:53:53 <doug16k> trainwrecks happen because it's so much new error prone stuff at the beginning
06:58:54 <doug16k> qemu is nearly as good if you are an expert with the tracing output you can create
06:59:00 <doug16k> and way better in some ways
07:12:02 <bauen1> i recently learned about gdb's `l *<addr>` command to show me the line of every stack entry
07:12:17 <bauen1> makes debugging a lot more fun until i figure out how to do that step in kernel
07:12:43 <bauen1> still, hunting random memory corruption (or rather random memory zeroing) isn't that easy
07:13:04 <doug16k> data breakpoints
07:13:28 <doug16k> also, you can run qemu with -icount option, and it will be deterministic, every run will be identical sequence of code
07:13:41 <doug16k> if you can figure out a repro, it will happen exactly every time
07:14:07 <bauen1> oh nice
07:14:54 <doug16k> you can also use icount to simulate extremely high IRQ rate or slow cpu handling. you can simulate a very slow cpu (frequently interrupted by irqs) with -icount shift=6
07:15:27 <doug16k> makes time pass 2^6 times faster in the guest
07:15:54 <doug16k> you can make it skip over delays until the next wakeup with -icount sleep=off
07:16:43 <doug16k> if the code was going to halt and get awakened in 2 seconds, instead of 2 seconds going by it will just timewarp to the 2 seconds instantly elapsed and continue immediately
07:17:04 <doug16k> but the guest can't feel it
07:17:33 <doug16k> time advances 2 sec in their reference frame :)
07:22:42 <bauen1> hm, i'm not really sure how adding a watchpoint messed qemu up, but `qemu: fatal: Raised interrupt while not in I/O function`
07:29:40 <doug16k> try this: -accel tcg,thread=single
07:30:02 <doug16k> it's defaulting to multithreaded one now
07:30:44 <doug16k> need single for icount too iirc
07:33:03 <doug16k> icount is like bochs. each instruction is Nns where N = 2^-(30-icount_shift). without shift (the default), each instruction is 1ns, or 1GHz magic single cycle everything processor
07:34:05 <bauen1> some memcpy call is taking forever
07:34:36 <doug16k> it executes it as fast as it can, it just advances time 1ns per instruction
07:34:43 <doug16k> in the guest world
07:36:05 <doug16k> if you put sleep=off then your idle threads would cause days of guest time to elapse in minutes of real time, it just skips to the next time it would wake up and keep going full speed
07:36:24 <doug16k> I mean, -icount sleep=off
07:37:40 <doug16k> but, if some code loops for some amount of time in a spinloop, then it's awful, it has to execute N instructions where N=1 billion for each second
07:38:02 <doug16k> you can make it be less instructions per second like this: -icount shift=4
07:38:35 <doug16k> that would make it ~16M loops per second
07:38:56 <doug16k> sorry about 64M
07:39:24 <doug16k> but more shift doesn't mean more fast.
07:39:43 <doug16k> it causes more time to elapse per instruction so there are fewer instructions between timer irqs or whatever
07:39:57 <doug16k> it's a balance
07:40:57 <doug16k> 6 is really high
07:41:38 <doug16k> only code spinning on the passage of time is pessimized. good code that halts and wakes on irqs blazes through time
07:45:29 <bauen1> whatever `gdb watch` is doing makes everything really slow
07:45:43 <doug16k> ah you probably didn't do a hardware watch
07:46:01 <doug16k> what did your watch command look like?
07:46:19 <doug16k> if you do a "complex" watch. it will just single step and check repeatedly
07:46:30 <bauen1> `watch init_file->inode->superblock->device->inode`
07:46:38 <doug16k> it will step, evaluate, step, evaulate, until the evaluated value changes
07:46:51 <doug16k> no that will be slow
07:46:55 <doug16k> probably anyway
07:47:18 <bauen1> thanks
07:47:29 <bauen1> putting in the raw address makes things _a lot_ quicker
07:47:41 <doug16k> do this: watch -l &init_file->inode->superblock->device->inode
07:47:52 <doug16k> I think
07:48:24 <doug16k> yeah you are telling it "watch such and such address" not "watch this awesome C expression have fun good luck"
07:54:34 <bauen1> thanks a lot
07:54:58 <bauen1> now i "just" need to figure out how `uintptr_t entry = 0` messes up my allocated struct ...
07:55:40 <bauen1> or rather why the struct is pointing to my stack, i don't think it should be doing that ...
07:58:19 <bauen1> oof allocating on the stack and then passing that pointer around isn't a good idea
08:35:25 <yawkat> hey. im working with intel vt-d and attempting to program a dma remapping unit. it works fine with qemu, but even with a very simple all-passthrough setup i get faults of type 1 (present bit in root entry is 0) when running on real hardware. how can i debug this?
08:36:52 <yawkat> the rtaddr is 00 40 c6 00 00 00 00 00 and the root table is filled with 01 30 c6 00 00 00 00 00, so the present bit is certainly not 0...
08:46:41 <yawkat> hm when i fill the root table with 0xff it gives me fault 0xa which is root table entry has non-0 in reserved space. so the rtaddr is right, but i dont get why the hardware thinks 0130c6... would have the present bit not set
09:34:58 <yawkat> ohhh a wbinvd changes things
09:35:23 <zid> I'd have said that, but I didn't know that was the answer
09:37:41 <doug16k> yawkat, buggy PCI(e) devices can do glitchy nonsense DMA to strange addresses sometimes
09:38:24 <doug16k> usually address zero
09:38:57 <yawkat> id still not expect paging faults when passthrough is on :)
09:39:24 <doug16k> yeah just keep it in mind, since you are enabling iommu
09:39:52 <yawkat> will do
09:40:39 <doug16k> you should never have to wbinvd
09:41:35 <doug16k> everything snoops your caches and is as coherent as you are, unless it is a sophisticated device that has been programmed to send send special no-snoop PCIe packets
09:43:19 <doug16k> the only way to even get to the ram is through the the cpu's pcie complex - the memory controller is in the cpu
09:44:16 <yawkat> doug16k: i think the DMA memory controller may be separate. these are the structures that define page mappings, after all
09:44:45 <doug16k> separate as in, it mind reads the ram contents without going through the pcie complex?
09:45:04 <yawkat> lots of things point to the DMA memory controller being somewhat separate, such as the separate SLAT and the fact that faults are asynchronous
09:45:14 <doug16k> what "dma controller"
09:45:31 <yawkat> the controller on the northbridge / in the cpu that is responsible for processing dma requests
09:45:45 <doug16k> you mean the pcie complex
09:46:03 <yawkat> if thats what it is, sure
09:46:07 <doug16k> it's not like there is some chip in there and you program it for dma
09:46:24 <yawkat> it certainly looks that way from software :)
09:46:46 <doug16k> pci devices each request bus master. eventually granted, they do their thinng
09:47:08 <doug16k> on pcie, it's all point to point, so the request is granted almost immediately
09:47:36 <doug16k> it snoops the dma and will provide coherent data from cache
09:48:07 <yawkat> im not talking about pcie devices. i am talking about the DMAR unit accessing its programmed page tables
09:48:18 <doug16k> yeah and how do you think that accesses the ram
09:48:50 <yawkat> from everything i can see so far, directly. without going through the main chip memory unit (with its own EPT and stuff)
09:49:30 <yawkat> i mean they obviously dont have two memory controllers sitting on the ram data lines, but they are almost separate, from what i have seen dealing with them so far
09:49:49 <doug16k> it gets the data the same way your sound card reads audio from a ring buffer
09:50:30 <doug16k> what it has a special snowflake connection to the memory controller that has to be arbitrated? what's the point?
09:51:02 <doug16k> to make it what, totally incoherent with the OS?
09:51:09 <doug16k> have to map the EPT uncached eh? no
09:51:25 <yawkat> well it doesnt respect EPT at all
09:51:39 <yawkat> just physical addresses everywhere
09:52:18 <doug16k> it has to read your ram to read the dma remapping page tables
09:52:24 <yawkat> yes
09:52:29 <doug16k> ya that
09:52:34 <doug16k> that snoops your cache obviously
09:52:44 <doug16k> everything else does
09:52:54 <yawkat> well evidently not, since wbinvd changes behavior
09:53:12 <doug16k> ah you think that is conclusive
09:53:16 <doug16k> lol
09:53:54 <doug16k> I wonder how many wbinvd I need to fix racy code
09:54:12 <yawkat> i only have one thread :)
09:54:59 <doug16k> you never need wbinvd
09:55:31 <doug16k> except perhaps when you are offlining a cpu and it is going to have its power cut
09:55:42 <doug16k> or otherwise won't snoop anymore
10:08:05 <opios> for pae kernel , do i need to use uint64_t for pointers and int variables that hold an address right?
10:12:24 <doug16k> opios, right
10:12:28 <doug16k> physical address
10:12:35 <doug16k> pointers are still 32 bits
10:13:09 <doug16k> so your physical allocator and page table update code can feel it, everything else is still 32 bits. 4GB mapped in at any one time
10:13:35 <doug16k> just physical addresses are 64 bit
10:15:12 <doug16k> technically, the architectural limit on physical addresses is 52 bits (16 times more than the entire x86_64 address space), so you are safe to use bit 63:52 for something if you must
10:15:29 <opios> hmm
10:15:38 <doug16k> not in the tables I mean
10:15:51 <doug16k> say some hacky lock free aba avoidance thing or whatever
10:16:13 <opios> yeah i see what you mean
10:17:07 <opios> i made mistake , when i tried to redo my physical allocator to support pae i declared everything as int32 :(
10:17:23 <doug16k> make a physaddr_t or something like that
10:17:36 <opios> i hope changing to int64 be not so pita
10:17:49 <doug16k> it should be effortless
10:17:54 <opios> doug16k: yeah good idea that will avoid future confusion
10:18:24 <doug16k> the change-the-type part I mean
10:18:35 <doug16k> tedious maybe
10:19:14 <opios> yep i will start by the init function if that works fine after the changes then everything else will be easy
10:19:31 <doug16k> if a physical address is part of the expression the promotion rules will cause things to be promoted and mostly just work
10:19:36 <opios> init function is where i read mmap from bootloader and allocate bitmaps for each region etc...
10:19:37 <doug16k> sometimes you have to help it though
10:20:54 <doug16k> if you did 1 << something, then you'd have to help it out with (physaddr_t)1 << something, or whatever
10:21:19 <opios> yeah all my bitmap set/read/clear functions
10:21:37 <doug16k> why would they be affected?
10:22:11 <doug16k> ah
10:22:27 <doug16k> you need to accept a 64 bit parameter when freeing or whatever
10:22:36 <opios> void clear_bita(uint32_t *bitmap, uint32_t addr)
10:22:36 <opios> {
10:22:36 <opios> bitmap[((addr >>= 12) >> 5) & 0x7FFF] &= ~(1 << (addr % 32));
10:22:36 <opios> }
10:23:26 <doug16k> % 32?
10:23:44 <opios> i have a bitmap of uint32
10:27:27 <doug16k> why not & 0x1f like the left hand side is doing
10:28:19 <doug16k> if you had a personal robot would you tell it to shoot you in the face just because you know it will optimize it into being really nice
10:28:49 <opios> hmm
10:28:58 <opios> how do you think i must change that then?
10:29:05 <doug16k> it's fine
10:29:23 <opios> i mean to avoid being shot by my robot :p
10:29:24 <doug16k> I just hate to see modulus ever used when they are praying the optimizer won't really do a divide
10:30:16 <doug16k> when I modulus, it's really modulus, non constant divisor
10:31:08 <doug16k> sometimes you need a log2 representation of a value
10:31:31 <doug16k> that allows you to do it with ultra efficient math and use masks and shifts
10:32:06 <doug16k> instead of dividing by whatever, you shift it right log2(whatever)
10:32:12 <doug16k> for nonconstant whatever
10:35:11 <doug16k> hey, why even have that 0x7FFF
10:35:46 <doug16k> want to hide out of bounds accesses by making them smash something at addr mod 32768?
10:37:27 <doug16k> why are you modifying addr in the middle of the expression
10:37:30 <doug16k> you crazy?
10:38:16 <doug16k> you want it shifted ahead of time then evaluate it? then say that
10:38:33 <doug16k> does the right hand side get the >> 12 value or no?
10:41:10 <doug16k> you should shift it ahead of time, and assert that it is within bounds, then do the &= thing
10:41:52 <doug16k> and don't do the >>= in the assert!
10:43:03 <doug16k> asserts could be compiled away to nothingness
10:50:27 <doug16k> % 32 could be replaced with & ~-32
10:52:18 <doug16k> it will be promoted as large as necessary before it negates it
10:54:27 <doug16k> -32 is all ones except 5 zero LSBs
10:54:56 <doug16k> tilde that is 5 ones at LSBs
11:31:58 <opios> doug16k> pointers are still 32 bits
11:32:39 <opios> what if when i have int64 that holds an address and now i want to cast that to a pointer?
11:39:38 <doug16k> you can't cast it to a pointer, that is a nonsensical operation
11:40:12 <doug16k> if you want to access the memory, you update a page table entry with that physical address, then access it at the page frame address in address space
11:45:34 <opios> int32_t *bt_ptr = (uint32_t *)current_frame;
11:47:43 <opios> at this stage i dont have paging
11:48:34 <opios> i have a pointer to where the bitmap is allocated, that might be in a range higher than 32bit right?
11:48:51 <opios> so that pointer has to be int64?
11:51:04 <opios> though i allocate my bitmaps after the kernel_end and my kernel is mapped at 0x100000 so that address should never be higher than 32bit
11:58:50 <doug16k> do you understand that pointers are still 32 bit in pae?
11:59:51 <opios> hmm how thats possible? how you would address the addresses bigger than 32bit?
11:59:58 <doug16k> no matter what memory those bitmaps are in, they are accessed through one or more pages within a 32 bit address space
12:00:41 <doug16k> the bitmap's physical address could be above 4GB, but you can't access a physical address with an instruction
12:00:49 <doug16k> you can ONLY access virtual addresses
12:01:06 <opios> yes i think i got it now
12:01:24 <doug16k> to make your bitmap be at somewhere above 4GB, you would write a physical address above 4GB to somewhere in page tables
12:01:54 <opios> yep
12:02:03 <doug16k> but the page frame in the address space is below 4GB, the phys mem behind the page frame is maybe above 4GB
12:02:22 <opios> thats why i allocate them after the kernel_end, i just got a bit confused
12:02:47 <opios> let me go change my code and see what else will confuse me
12:02:56 <doug16k> yeah it is a confusing
12:04:53 <doug16k> normally in userspace "addresses" are "pointers". not anymore :)
12:05:16 <doug16k> now you create the illusion
12:06:05 <opios> yeah
12:07:33 <immibis> ~-32 is a strange way to write 31
12:08:23 <doug16k> what if you have a variable, called "alignment"
12:08:37 <doug16k> minus one it?
12:08:50 <doug16k> I guess
12:11:07 <doug16k> the precedence of ~- is convenient. no parentheses needed
12:16:03 <doug16k> did you know that 32 bit pae page table walks only access two levels of the page tables?
12:16:33 <doug16k> the one CR3 points to only has 4 entries. the cpu just copies them into the cpu and never accesses that page again until it loads a new cr3
12:18:33 <opios> my page table has 64bit entries
12:18:57 <doug16k> it's 32 bit pae
12:19:07 <doug16k> 32 bits of virtual address space
12:19:09 <opios> yes
12:19:24 <opios> by two level of page table you mean like nested page table?
12:20:07 <immibis> have you used page tables? they are nested
12:20:14 <doug16k> no I mean how normal i386 paging is two levels (PD, PT), i686 pae is three levels (PDPT, PD, PT), and x86_64 is four (PML4, PDPT, PD, PT)
12:20:47 <doug16k> nested page tables means something else
12:21:06 <doug16k> for hardware virtualization
12:21:29 <opios> yes ok i get what you mean now
12:21:36 <doug16k> but yes the page tables are a tree
12:23:07 <doug16k> in i686 pae, when you load cr3, it proactively goes and loads the 4 PD pointers from the PDPT, then when it does a walk, it starts at the correct PD page, without accessing PDPT one
12:23:50 <doug16k> so it isn't actually more steps to do a page table walk
12:25:30 <opios> when it does reference PDPT then?
12:26:54 <doug16k> CR3 holds physical address of 4KB aligned PDPT page that has four entries, pointing to the 4 PD pages that each describe 1GB
12:28:08 <doug16k> the 4 PD pages have 512 slots, each one pointing to a page table that maps 2MB
12:29:24 <doug16k> 512 * 4KB == 2MB
12:31:50 <doug16k> page tables have 512 slots, each one pointing to a 4KB physical page
12:35:13 <doug16k> opios, only when you load cr3
12:35:41 <doug16k> it assumes that highest level page never changes from then on
12:36:46 <opios> aha
12:37:11 <doug16k> easily accomodated. if you do 3GB user / 1GB kernel, then you just create the kernel PD and use it in every process's PDPT. user process gets 1st 3 entries of PDPT
12:37:27 <doug16k> kernel gets 4th entry
12:37:37 <doug16k> so it never will change
12:38:00 <doug16k> same with a user process. if you have their 3 PD pages stay put (why not?) then you never change them either
12:44:55 <doug16k> just proactively create your kernel PD, put its physical address somewhere (because each process will be putting a copy of it in entry [3] of their newly created PDPT page). from now on, the 1GB kernel address space is controlled by that PD page, consisting of 512 entries that each optionally point to a PT that maps 2MB
12:45:23 <doug16k> then make a PDPT with the physaddr of the kernel PD in entry [3], and point CR3 at that
12:45:51 <doug16k> 0xC0000000-0xFFFFFFFF will be controlled by the kernel PD
12:49:00 <opios> copied all you said, im sure what you said was answer to my future questions hahaha
12:49:14 <doug16k> yes!
12:49:19 <opios> for now im trying to change my int32 to int64 and debug and make it work hahaha
12:57:38 <doug16k> no physaddr_t?
12:57:52 <opios> yeah that is what i meant
12:57:59 <doug16k> or paddr_t. that's even shorter than uint32_t
12:58:45 <doug16k> phys is a bit clearer
12:59:14 <opios> ep
12:59:21 <opios> yep
13:19:04 <b3n_> exirt
15:47:16 <Bitweasil> doug16k, neat, I didn't realize it cached those 4 entries like that.
15:47:49 <Bitweasil> But now we've got 5 level stuff coming out, woo?
16:11:45 <geist> yah page table style systems usually have a page table cache like that
16:11:59 <geist> it's invisible on intel x86, since it gets thrown out when you do an invlpg
16:12:23 <geist> AMD x86 has a feature bit that lets you take more finer grained control of the page table cache, but i dunno if any OS actually uses it
16:22:09 <opios> where is gog? :/
16:36:11 <geist> i dunno
16:51:47 <vendu> aha
16:51:50 <vendu> old anthrax
16:51:55 <vendu> among the living o/~
16:55:20 <opios> ?
17:00:48 <siberianascii> opios: idk
17:01:20 <siberianascii> i also miss him even though i shouldn't
17:08:38 <opios> why you shouldnt?
17:08:47 <opios> he is very smart and very nice person
17:09:08 <siberianascii> because he betrayed me more than once
17:09:32 <opios> what do you mean?
17:15:45 <siberianascii> + he putted me on ignore
17:24:09 <siberianascii> put ***
17:50:18 <geist> who gog?
17:50:54 <geist> oh yeah they put you on ignore because you were being annoying siberianascii
17:51:08 <geist> when you start mewing and petting people on irc that dont want it that tends to annoy people
17:52:38 <siberianascii> who
17:52:41 <siberianascii> 's they ?
17:53:00 <siberianascii> he and his cat ?
17:55:46 <adu> I don't remember any of that
17:58:21 * geist shrugs
17:58:26 <geist> maybe i'm misremembering it
17:58:49 <froggey> close enough
18:01:40 <siberianascii> anyway .. whoever feel the need to put me on ignore can do it .. IDGAF
18:02:18 <geist> remember i've kicked you a few times already, and you keep coming back
18:02:29 <geist> i keep giving you second chances, but you're on extremely thin ice
18:25:46 <siberianascii> geist: i dont know what you want from me :D
18:25:57 <siberianascii> i just told the people that they can do what they want
18:28:35 <geist> kay
18:32:37 <j`ey> in LK the virtual address is set to KERNEL_BASE and the load address to MEM_BASE, https://github.com/littlekernel/lk/blob/master/arch/arm64/system-onesegment.ld#L7
18:32:38 <bslsk05> github.com: lk/system-onesegment.ld at master · littlekernel/lk · GitHub
18:32:49 <j`ey> what address will this load https://github.com/littlekernel/lk/blob/master/arch/arm64/start.S#L50 ?
18:32:50 <bslsk05> github.com: lk/start.S at master · littlekernel/lk · GitHub
18:36:02 <geist> lets see
18:36:19 <j`ey> I think it as the physical non virtual address? The pagetable addresses must be physical
18:36:25 <geist> correct
18:36:38 <geist> it's subtle, but adrp is PC relative address calculation
18:36:50 <geist> so it always works relative to wherever the PC is at that point
18:37:01 <j`ey> ah yeah
18:37:13 <geist> that adrp + add sequence is computing the physical address of the symbol within the binary
18:37:21 <geist> which iirc is basically in bss
18:37:32 <geist> (though there's a hack there to keep it from getting zeroed out later)
18:37:33 <j`ey> some prebss thing I think
18:37:41 <geist> exactly
18:37:43 <j`ey> https://github.com/littlekernel/lk/blob/master/arch/arm64/mmu.c#L31
18:37:45 <bslsk05> github.com: lk/mmu.c at master · littlekernel/lk · GitHub
18:37:47 <siberianascii> someone said hack ?
18:37:54 <siberianascii> where is the hack ?
18:37:56 <geist> Pacific Northwest Title & Escrow, Silverdale
18:37:57 <geist> 2021 NW Myhre Rd Ste. 300
18:38:09 <j`ey> I just found my old (1 year old) arm64 stuff, that Im trying to mess around with
18:38:11 <geist> err, well that was a mispaste, but yah it's at mmu.c
18:38:30 <geist> yah the prebss thing is that. it's a linker script thing that puts it outside of the zeroed section
18:39:09 <siberianascii> geist: send me the link now and i bless you on the 6am news with only my underwear's on
18:39:13 <geist> j`ey: i *think* there may be a pseudo instruction that does the adrp/add thing
18:39:26 <geist> but i did it manually just to make it clear
18:40:00 <geist> in case you see it again the :lo12:<symbol> thing is a arm64 thing that resolves to the bottom 12 bits of the address of a symbol
18:40:14 <j`ey> yeah
18:40:18 <j`ey> adrp is 4kb aligned
18:40:21 <geist> yah
18:40:27 * siberianascii has lost him... geist is on -vvv
18:40:46 <geist> siberianascii: *this* is why i've kicked you in the past
18:40:48 <j`ey> I'm basically seeing how quickly I can jump out of assembly
18:40:49 <geist> you're being mega annoying
18:41:05 <siberianascii> ok ok ... chill bro
18:41:06 <geist> j`ey: sure. you can also not enable paging, but then you run with the data cache off
18:41:35 <j`ey> geist: well I want vmm too
18:41:44 <geist> nice thing on arm64 is 1GB pages are guaranteed to exist so you can pretty quickly identity map a ton of crap
18:41:52 <geist> ah sure thing
18:42:26 <geist> so i think basically what that code is doing is mapping thefirst N GB of ram at the base of the kernel, so that the kernel can continue to run where it was mapped, but bounced up high
18:42:49 <geist> it's a bit more complicated than that i did on the riscv port or whatnot, because it has this more flexible mmu_initial_mapping table, which is filled in in platform/* land
18:43:01 <geist> basically lets the platform more dynamically control how the initial kernel address space is set up
18:43:15 <geist> mostly because there are ARM platforms with wonky memory maps
18:43:24 <j`ey> https://github.com/littlekernel/lk/blob/89cdb26d5b27f8d494761f3875a94f72607e52e2/platform/qemu-virt-arm/platform.c#L36
18:43:26 <bslsk05> github.com: lk/platform.c at 89cdb26d5b27f8d494761f3875a94f72607e52e2 · littlekernel/lk · GitHub
18:43:58 <geist> exactly, that's the simple one, and actually has an offset built into it, since the qemu virt machine starts DRAM at 0x4000.0000 (1GB)
18:44:26 <geist> so that's basically mapping 0x4000.0000 (MEMORY_BASE_PHYS) to 0xffff.0000.0000.0000 (KERNEL_BASE)
18:44:50 <geist> for MEMORY_APERTURE_SIZE which is iirc 30GB for whatever reason
18:47:05 <j`ey> so when I look at my objdump output I see: adrp x0, 80008000 <early_stack_bottom+0x3ac0>, where my KERNEL_BASE would be 0x800.. and BASE_PHYS 0x400..
18:47:09 <j`ey> is that right then?
18:47:38 <geist> well, is that where you want to map the kernel?
18:47:56 <geist> the objdump stuff will be filling in addresses assuming the code is running at the virtual address its linked to run at
18:47:56 <j`ey> I mean that I want the physical address for early_stack_bottom
18:48:05 <geist> but if it's running at a different physical address then it'll compute the right thing
18:48:08 <j`ey> ok
18:48:12 <geist> and it wont match up with what objdump thinks
18:48:51 <siberianascii> j`ey: that's our boy geist
18:48:52 <geist> so yeah objdump will appear 'wrong' when dealing with code that's running from addresses that dont line up with virtual
18:51:55 <siberianascii> geist: fuck objdump.. we are following your lead
18:53:20 <geist> note that on arm64 you probably want to use something like 0xffff.0000.0000.0000 or 0xffff.ffff.ffff.0000 to run the kernel at, since later on that'll give you two seperate translation tables
18:53:30 <geist> ie, two separate cr3s
18:53:44 <siberianascii> objgeist
18:54:06 <geist> it's been a while since i thought about it, but there's a mcmodel=kernel thing i think for arm64 that likes to run in negative 2GB space i think
18:54:22 <geist> or 4GB space. one of those sign extension things
18:54:38 <geist> since ARM64 code is very position independent by default so i think it doesn't get you very much at all
18:55:08 <j`ey> I gotta look up how to read the exception stuff again
18:57:26 <siberianascii> geist: i dont want you to put me on ignore i definitely gonna need you one day...
18:58:35 <siberianascii> wait ... am i already on ignore ?
19:08:26 <j`ey> Im not sure if it's because I gave the outer VM less memory, or because I changed the load address, but qemu takes longer to start now
19:11:56 <geist> interesting dunno
19:12:05 <geist> this is an arm64 qemu instance on a x86 host i assume?
19:12:11 <j`ey> yah
19:12:28 <j`ey> well arm64 qemu inside x86 virtual box inside macOS host
19:13:26 <geist> ah, well i guess it could be the memory allocation but it should start pretty quick
19:16:40 <j`ey> I think it's related to my bss clearning routine too, no worries I'll look into it
19:17:45 <geist> yes! i've been hosed by bss zeroing later too
19:18:09 <geist> another alternative is to zero the bss prior to setting up any mmu. you can zero in physical address space
19:18:27 <geist> only downside there is on real hardware you're running without the mmu enabled so the data cache isn't active
19:18:29 <j`ey> that's what Im currently doing
19:18:30 <geist> which is why i do it later
19:19:18 <geist> so much be something you want to fiddle with later, but it'll probably just be a few ms on real hardware anyway, depending on how big your bss is
19:21:38 <j`ey> well I was doing str xzr, [..], and then sub #1 :D so that's probably why it took quite a while
19:22:02 <j`ey> but im not sure why changing the load at address made that suddenly show up
19:22:20 <geist> ah possibly the unaligned emulation is slow?
19:22:29 <geist> but yah you'd also do 8 times as much work
19:22:48 <geist> side note if you writ eyour bss code that way make sure you align the bss_start and _end symbols on 8 byte boundaries
19:23:00 <geist> i've also been burned by that where you end up in some builds where the loop runs forever
19:23:14 <geist> or messes up a few bytes on one side or another
19:24:20 <j`ey> I think it was meant to be 16 byte aligned in my code, but I think I should swap these two lines: __bss_start = .; . = ALIGN(16);
19:24:49 <geist> yah
19:25:22 <geist> and if you do it on __bss_end too your loop can be a simple cmp of the end address
19:26:05 <geist> or i guess a dec and bnz
19:26:57 <j`ey> this is it currently https://pasta.cx/t.S
19:26:58 <bslsk05> pasta.cx: .macro clear_bss adrp x1, __bss_start add x1, x1, #:lo12:__bss_start ldr x1, [x1] ...
19:28:36 <j`ey> (I'll eventually make it write some random non 0 byte, and check the memory with qemu)
19:28:40 <geist> and __bss_size is a symbol with the subtract already in it?
19:28:58 <j`ey> __bss_size = SIZEOF(.bss);
19:29:20 <geist> ah be careful there in case the .bss isn't a multiple of 8
19:29:29 <geist> ie, if the start and end of it aren't 8 aligned
19:29:43 <geist> but if you put those ALIGNs inside the bss then you're cool
19:29:53 <j`ey> just added one to the end too
19:29:55 <geist> if you do it on 16 you can use stp xzr, xzr, as well
19:30:43 <j`ey> gonna have a nice souped up clear_bss routine, but nothing else working :P
19:33:38 <geist> haha totally
19:33:50 <j`ey> It prints out a lot of stuff, but I have no idea what's really working or not yet
19:33:53 <geist> plus on qemu it'll already be full of zeros, so it can be totally broken and you wont no
19:34:10 <geist> yah this is where i start doing the whole b . trick and then inspecting the state of the cpu to make sure its sound
19:34:32 <j`ey> yeah, that's why I said I'd replace xzr with #0xsomthingrandom later
19:35:04 <j`ey> https://pasta.cx/r.txt
19:35:04 <bslsk05> pasta.cx: build@debian:~$ qemu-system-aarch64 -kernel pkg-build/aarch64/release/rkernel -M virt -cpu cortex-a5...
19:39:11 <geist> oh cool, well that seems to be correct
19:40:03 <j`ey> all I do before jumping out of asm is clear bss and set a small stack pointer, could probably clear bss in non-asm too