Search logs:

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present


http://bespin.org/~qz/search/?view=1&c=osdev&y=19&m=2&d=18

Monday, 18 February 2019

12:08:56 <nyc> ppc64 hcall written by using hexdumps of a hello world string to generate immediates to load into the hcall argument registers.
12:12:27 <doug16k> graphitemaster, but surely the mouse button down event has coordinates associated with it? why not lookup the button from the mdown event?
12:13:22 <doug16k> it's as though they maintain a "last mousemove position" then check it in mousedown
12:14:12 <graphitemaster> yeah that's the easiest
12:15:52 <doug16k> I'm considering tagging it the other way around too. each native keyboard event has the mouse position at that moment in there :D
12:17:18 <doug16k> scared though because how do you put multitouch in that
12:17:51 <doug16k> variable size = gross
12:25:50 <nyc> ppc64 stack set up, time for RISC-V.
12:26:47 <doug16k> ah, yeah I can imagine game code originally having to read the controller and update and maintain the mouse position itself. if you take code like that and invert it so you are just told the mouse position at mouse events, you might not end up updating that internal mouse position at every place needed
12:26:57 <jmp9> ok i'm trying to jump into 64 bit segment
12:26:58 <jmp9> jmp 0x08:_kernel_64
12:27:02 <jmp9> and it crashes
12:30:55 <doug16k> jmp9, qemu?
12:31:01 <jmp9> yes
12:31:12 <doug16k> run it with this option: -d int
12:31:45 <doug16k> it will tell you about the interrupts, one of the last few will be your crash, then probably double fault then halt
12:32:15 <jmp9> SMM: enter
12:32:21 <doug16k> make sure you already have the -no-shutdown -no-reboot options
12:32:22 <jmp9> system management mode???
12:32:27 <doug16k> ignore the SMM, nothing to do with you
12:32:27 <doug16k> yes
12:32:56 <doug16k> the bios does stuff(tm)
12:33:00 <jmp9> check_exception old: 0x8 new 0xd
12:33:32 <doug16k> back one more
12:33:48 <jmp9> check_exception old: 0xd new 0xd 1: v=08 e=0000 i=0 cpl=0 IP=0008:0000000000101b93 pc=0000000000101b93 SP=00
12:34:20 <jmp9> looks like #GPF
12:34:23 <doug16k> ok so it had a #GP when trying to handle #GP, then did #DF (the 8)
12:34:38 <doug16k> probably bad GDT base or bad GDT
12:34:45 <jmp9> CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
12:34:56 <doug16k> what is GDT base?
12:35:01 <doug16k> near there
12:35:11 <jmp9> GDT= 00200000 00000017
12:35:36 <doug16k> do this: x/1gx 0x00200000+8
12:36:24 <jmp9> 0x0060840000000000
12:36:50 <nyc> Hmm, the UART base is 0x10000000 but the length is only 0x100.
12:37:40 <aalm> doesn't need to be any bigger
12:39:10 <nyc> The registers don't need that much room, but I may need to hunt down which registers go where because it must differ from the PC offsets.
12:39:31 <aalm> <<2
12:41:14 <doug16k> jmp9, 0x84 is hard to believe
12:41:20 <doug16k> the accessed bit isn't set?
12:41:42 <doug16k> you've never loaded that descriptor, so you must not be in long mode yet
12:42:09 <doug16k> actually, 0x84 isn't even a code segment
12:42:36 <nyc> I haven't bothered trying to load the ARM or the POWER executables. They're almost certainly going to need their link addresses fixed up.
12:43:52 <doug16k> you're going to want a limit that doesn't make the segment 1 byte in size too in that code segment, so it is sensible in compatibility mode
12:44:49 <aalm> nyc, idk. which UART you're dealing with, but often it's simply shifting the known offset, some might require accessing those as 32bit instead of 8bit too
12:45:20 <nyc> aalm: https://github.com/riscv/riscv-qemu/blob/riscv-all/hw/riscv/virt.c#L46-L55 gives the qemu code.
12:46:13 <nyc> NS 16550 so it should be the same general set of registers and such at a different address.
12:46:42 <jmp9> amd64 tells that i can ignore access byte
12:46:59 <doug16k> jmp9, ya, AFTER you are in long mode
12:47:13 <jmp9> i've already working 32 bit GDT
12:47:19 <jmp9> i've no problems with that gdt
12:47:24 <doug16k> before then it is just as bitchy as a 386 about the GDT
12:47:27 <jmp9> i need to create 64 bit long mode gdt
12:47:47 <aalm> weird, that there's no "reg-shift"
12:47:55 <jmp9> my 32 bit gdt with access bytes and other shit works fine
12:47:56 <aalm> nyc, https://www.kernel.org/doc/Documentation/devicetree/bindings/serial/8250.txt
12:48:08 <doug16k> jmp9, no, just put a 4GB limit with 0 base on that one and make it a sensible segment type (code segment / data segment) and make sure L is set on the code segment
12:48:24 <doug16k> s/make it/make them/
12:48:45 <doug16k> you need code and data segments that are sensible in protected mode, they are used briefly during bootstrap
12:49:08 <nyc> aalm: Bit-banging a string that says "Hello, world!" shouldn't be too tough.
12:49:16 <doug16k> then, when you load cs with a selector that selects a descriptor that has the L bit set, all the long mode magic kicks in
12:49:31 <aalm> yep, go for it:]
12:50:44 <doug16k> magic = cs, ds, es, ss segment base and limit ignored, etc
12:50:45 <radens> Question: will I need a different physical page for each processor's APIC?
12:51:04 <doug16k> radens, no
12:51:27 <doug16k> it isn't memory, it's MMIO
12:51:48 <radens> yeah that's what I thought, I was just confused about a comment I saw about moving the APIC base
12:51:55 <Mutabah> And (?usually) the LAPIC address is the same for all cores, just each core accesses a different device
12:52:04 <doug16k> the device is local to that hardware thread
12:52:08 <radens> yup
12:53:21 <jmp9> okay it jumped
12:53:22 <jmp9> and stuck
12:55:10 <jmp9> it stuck at cli instruction
12:55:38 <doug16k> what is right before the cli
12:55:53 <jmp9> uh no
12:55:54 <jmp9> check_exception old: 0xd new 0xd
12:55:59 <jmp9> it's the same exception
12:57:53 <doug16k> the instruction is cli? are you in cpl=0?
12:58:15 <doug16k> the DPL in info registers on cs will tell you
12:59:01 <jmp9> okay i've copypasted GDT
12:59:04 <jmp9> and it doesn't work
01:02:18 <Mutabah> jmp9: 0xd = 13 = GPF
01:02:24 <jmp9> yes
01:02:29 <jmp9> something wrong with gdt
01:02:31 <Mutabah> GPF, and then another one trying to handle the GPF
01:04:15 <jmp9> amd64 spec tells my that GDT pointer must be 10 bytes long
01:04:23 <jmp9> 2 bytes limit 8 bytes base ptr
01:04:34 <doug16k> 2 byte limit followed by 8 byte base
01:04:39 <jmp9> yes
01:05:07 <doug16k> is it packed (or have 6 bytes before the limit to align it)?
01:05:41 <jmp9> it is in assembly
01:06:14 <doug16k> secret assembly apparently
01:06:41 <jmp9> https://hastebin.com/ohojoxefoj.nginx
01:07:17 <doug16k> what assembler?
01:07:34 <jmp9> nasm
01:07:58 <Mutabah> why are you hard-coding the GDT address?
01:08:12 <jmp9> because this is special region in memory for loader data
01:08:22 <jmp9> for GDT, ELF file header and page tables
01:08:31 <Mutabah> instead of doing `gdt_ptr: dw GDT_LEN; dq gdt gdt: dq ...`
01:08:33 <doug16k> no you are doing it quite wrong
01:08:43 <Mutabah> The GDT isn't loader data, it's runtime data
01:08:54 <jmp9> i mean my 32 bit loader
01:09:27 <doug16k> you don't just set a few bits and jump and bingo, long mode. you have to enable paging at the point you thought you were in use64, then far jump to reload cs and then you are in use64
01:09:56 <jmp9> reload paging where?
01:10:01 <jmp9> before jumping to 0x08 cs?
01:10:22 <doug16k> see line 37, where you thought you are gonna jump to long mode?
01:10:26 <doug16k> that's totally wrong, it won't
01:10:55 <jmp9> what I should do?
01:10:57 <doug16k> you need to enable paging (set CR0.PG=1), THEN do that, THEN you are in long mode
01:11:03 <jmp9> ooh
01:11:53 * doug16k is now braced for bugs in the page tables
01:13:06 <aalm> :]
01:13:17 <jmp9> https://hastebin.com/beyimazabe.rb
01:13:19 <jmp9> like this?
01:13:33 <jmp9> (oh i know that i did wrong in cr3)
01:16:05 <klys> doug16k, enable pagingbefore going 64-bit? does this require pae?
01:16:21 <jmp9> uuuuuhmmmhmhm
01:16:25 <jmp9> cr3 is a 64 bit register
01:16:27 <jmp9> for paging
01:16:29 <jmp9> in x86-64
01:16:31 <zhiayang> paging needs to be disabled before setting LME
01:16:44 <doug16k> klys, yes long mode requires PAE
01:16:50 <jmp9> i disabled paging
01:16:52 <jmp9> enabled PAE
01:16:55 <jmp9> then long mode
01:16:58 <jmp9> enabled new paging
01:17:02 <jmp9> and far jump to cs 0x08
01:17:04 <jmp9> right?
01:17:11 <zhiayang> ya that should be the correct procedure... i think
01:17:16 <doug16k> jmp9, looks like it should work. where do you put the gdt base and limit in?
01:17:28 <jmp9> limit is 23 (0x17)
01:17:29 <doug16k> nevermind, I see
01:17:34 <doug16k> ya that should work
01:17:35 <jmp9> base points to null segment
01:17:44 <doug16k> assuming gdt is right
01:18:00 <doug16k> base of 0 and "null segment" are two different things
01:18:10 <jmp9> yes
01:18:12 <jmp9> CR3=0000000000201000
01:18:24 <doug16k> what does info mem say?
01:18:38 <doug16k> right after you turn on paging
01:18:38 <jmp9> weird shit
01:19:22 <klys> poop.
01:19:46 <doug16k> that is what the cpu will be using if you let it keep going. it probably won't last long before #PF, #DF, dead
01:19:46 <jmp9> okay i got page fault
01:20:02 <jmp9> now it something wrong page tables
01:20:25 <jmp9> Page-Map Level-4 Table Base Address
01:20:32 <jmp9> that should cr3 contain?
01:20:33 <doug16k> can you print $cr3
01:21:09 <doug16k> probably not new enough. do info registers and get cr3, then do x/1gx thatAddressHere
01:21:10 <jmp9> Page-Map Level-4 Table Base Address
01:21:12 <jmp9> oh
01:21:18 <jmp9> CR3=0000000000201000
01:21:49 <doug16k> what does x/1gx 0x201000 say?
01:22:02 <jmp9> cannon access memory
01:22:05 <jmp9> after i enabled paging
01:22:23 <doug16k> oops
01:22:25 <zhiayang> use xp
01:22:26 <doug16k> xp not x
01:23:24 <doug16k> xp/1gx 0x201000
01:23:31 <jmp9> it says 0x202023
01:23:59 <doug16k> ok now: xp /1gx 0x202000
01:24:42 <doug16k> (zeroing out low 3 hex digits)
01:25:01 <doug16k> repeat with what you find there
01:25:25 <jmp9> 0x202023
01:25:38 <klys> infinite recursion?
01:25:44 <jmp9> it points to itself
01:25:45 <doug16k> ok that is recursing
01:25:48 <jmp9> lol
01:26:30 <jmp9> https://hastebin.com/temaqaxavu.php
01:27:05 <jmp9> oh
01:27:10 <jmp9> 03:26
01:27:10 <jmp9> am going sleep
01:28:20 <doug16k> recursing can work but probably not what you want at 0x202000
01:32:59 <nyc> It's 20:32 (8:32 PM) here and I'm probably going to have to go to sleep instead of finishing the RISC-V hello code.
01:34:11 <mahackamus> why you writing so many hello's?
01:34:42 <bcos_> Should at least do one "Goodbye cruel world" to add variety..
01:35:30 <doug16k> "hello? hello? you're breaking up I can't quite... hello? oh I heard you for a sec... hello?"
01:35:44 <nyc> mahackamus: I'm outputting a little bit in asm before I call into the generic C (or whatever) code, so it has to be done once for each architecture. It isn't that much more work than setting up the C stack and calling into C anyway.
01:36:36 <zhiayang> doug16k: huh, when was qemu able to print control regs like that?
01:37:06 <aalm> nyc, don't forget to zero .bss:]
01:37:07 <nyc> mahackamus: Most of the work is dull stuff like getting the options to the assembler and linker right and setting the right addresses in the linker script and figuring out how to use qemu for the architecture and plugging the architecture code into the build system and so on.
01:38:11 <nyc> aalm: Actually, it doesn't touch anything besides the string and registers, so .bss isn't there yet.
01:38:39 <mahackamus> yeah i mean i guess if you're going to be writing it eventually anyway it makes sense, seems like a lot of work though tbh
01:38:43 <aalm> i wasn't serious, but your stack will likely be there?
01:39:18 <nyc> aalm: I just declared a sym in .data with .rept 16384 .byte 0 .endr and such.
01:40:56 <aalm> ic, no worries then i guess:]
01:40:58 <nyc> mahackamus: I'm bearing flexibility / portability / etc. for multiple platforms from the outset.
01:41:14 <doug16k> zhiayang, they just accepted my patch to add them :)
01:41:21 <zhiayang> doug16k: ah, neat
01:41:37 <doug16k> I added cr0/cr2/cr3/cr4/cr8/efer/fs_base/gs_base/k_gs_base
01:41:44 <zhiayang> oh niiice
01:41:50 <doug16k> you can set them too and they have the magical effects
01:41:59 <doug16k> setting PE=1 will really do the things
01:42:17 <doug16k> insane, but would do it
01:42:26 <doug16k> it decodes them too
01:42:35 <doug16k> if you print $cr0 you might see [PG PE]
01:42:37 <zhiayang> oh like the flags and stuff?
01:42:40 <doug16k> ya
01:42:41 <zhiayang> nice
01:42:48 <doug16k> efer and those are all decoded
01:43:17 <doug16k> print /x $cr0 will show the 0x80000001
01:43:53 <doug16k> best thing: you can put a conditional breakpoint expression for cr3=some-value and make process-local breakpoints on raw machine debugger
01:44:08 <nyc> I'm not sure if Zig is ready to go on as many target architectures as I'm planning. C++ might be worth considering too given that RTTI isn't that big a deal and the stack unwinding for exceptions actually isn't either because things have to do that sort of nonsense for wchan and more anyway.
01:44:08 <doug16k> can now
01:44:23 <zhiayang> :o
01:44:56 <nyc> doug16k: Spiffy.
01:45:15 <doug16k> build qemu master to get it right now
01:45:32 <doug16k> it's amusing to watch swapgs swap them the first time
01:46:51 <nyc> I have the slight issue that I'm tired enough to make packing up and going back upstairs difficult. Checking out the latest qemu git snapshots will have to wait.
01:56:44 <doug16k> no pressure, it'll fall into most everyone's lap eventually :)
02:02:46 <nyc``> That's the great thing about getting code into mainline.
02:04:36 <doug16k> the non-obvious part of my patch is that the gdbstub was sending an invalid machine description, and gdb was silently(!) falling back to the default, so the qemu gdb stub was constrained to be the default, and wasn't done quite right either, the default has fs_base and they errored if gdb asked for it
02:05:03 <doug16k> so it was broken and they let me fix it and simultaneously add features in one go
02:05:24 <doug16k> most projects would be all whiny about that
02:05:39 <doug16k> qemu guys are solid
02:05:45 <nyc``> It sounds like gdb needs a patch, too.
02:06:27 <doug16k> ya the 'g' packet too large crap? you can just patch it to realloc and it works perfectly
02:06:47 <doug16k> too bad they wrecked the qemu side in x86_64 build. I'm un-wrecking it soon
02:07:02 <doug16k> if they accept it of course
02:07:21 <mrvn> nyc``: RTTI and exceptions are the two things best disabled to start with.
02:07:48 <doug16k> before they broke qemu side (to "fix" unpatched gdb) you could patch gdb and fluently switch 32/64 all day no problem
02:08:02 <doug16k> now that they make unpatched gdb never error out, you can't patch gdb to fix it
02:08:19 <doug16k> it's all screwy in real and protected mode
02:08:48 <doug16k> the raw debugging/registers/whatever is fine, but gdb's handling of symbols and source and stuff blows up if you use current x86_64 qemu and debug 32 bit code
02:09:09 <doug16k> because of the hack that forces it to stay 64 bit forever
02:09:51 <doug16k> for now I just force to use qemu-system-i386 if I want to debug real mode or bootloader code
02:10:07 <doug16k> intend to really fix soon though
02:11:38 <doug16k> uefi is fine though, it just goes to long mode long before my code runs, so normal debugging setup just works
02:16:45 <nyc``> I'm not terribly optimistic about actually getting anything done. Maybe the project will keep me from getting bored. Maybe my brain is too mushy after so many years and medical problems and not really being that good in the first place.
02:18:22 <doug16k> nyc``, start with something that seems like it would be really easy, then realize how not easy even that is :)
02:18:37 <doug16k> just until you become accustomed to the tools etc
02:20:12 <doug16k> several people have asked me what they should write as beginner programmers. I say "make a tic-tac-toe AI"... invariably the response is shock, way too hard
02:20:23 <nyc``> Not being able to get real hardware anymore does make me reliant on qemu.
02:20:25 <doug16k> not hard at all
02:21:01 <doug16k> point is, you're waaaaay past that point :D
02:21:34 <doug16k> nyc``, why must it NOT be x86?
02:21:45 <doug16k> is that just off the table altogether?
02:22:18 <doug16k> you'll like your thready scary code running a bazillion ops/sec on hardware virtualization, I think
02:23:05 <doug16k> well, when you have some scary code to test under extreme load that is
02:25:43 <doug16k> qemu is almost utterly deterministic. you get a hairline test coverage on it for temporal problems
02:25:57 <doug16k> the emulator I mean, is deterministic
02:26:08 <geist> yah, it's good for that
02:27:32 <nyc``> doug16k: Page size spectrums with actual coverage of sizes demonstrate algorithms working well in ways others can't.
02:28:26 <nyc``> doug16k: Gaps like x86 also have something to demonstrate.
02:29:15 <doug16k> x86 has 4KB pages and 4MB pages and 2MB pages and 1GB pages
02:29:36 <doug16k> so, 4KB, 2MB, 4MB, 1GB
02:29:47 <doug16k> if you don't mind 4MB pages being 32 bit code
02:29:59 <nyc``> doug16k: 4KB-2MB is a gap.
02:30:15 <doug16k> of course it is a gap
02:30:17 <doug16k> it's always a gap
02:30:19 <geist> so is needing to switch to another architecture
02:31:29 <doug16k> you could make it all 32 bit code (except 1GB)? it would be just PAE or not PAE. that's the switch from 2MB to 4MB large page
02:31:47 <doug16k> 1x to 2x too much "gap"?
02:31:59 <geist> or as i've said before, for the demonstration of an algorihm, simply pretend that there are larger pages at everywhere but the lowest level code
02:32:02 <nyc``> ppc64 has a bigger gap, so it's a better demonstration.
02:32:20 <doug16k> yes, what geist said, and I wanted to say too, make a simulator
02:32:39 <doug16k> just fill so many entries
02:32:56 <doug16k> how does that not demonstrate whatever your algo does?
02:33:36 <doug16k> you could just write against a facade that makes it look like you can make 64KB pages, it just puts 16 PTEs in
02:35:03 <doug16k> if you dirty one entry you might as well dirty a whole line. same cost
02:35:22 <doug16k> well same manipulation cost
02:36:18 <doug16k> 32KB would be the limit to only dirty 1 line in PAE, 64KB in not-PAE
02:37:34 <andrewrk> nyc``, what target architectures are you looking to support that zig does not (and therefore that LLVM do not either)?
02:37:36 <nyc``> Most of the benchmark sorts of affairs would be things like fragmentation statistics. Some real world performance data would show that issues like internal fragmentation burn things in odd places like page zeroing for buffered IO and ZFOD faults.
02:38:07 <geist> what sort of benchmarks are you expecting to run? running on multiple emulated architectures is largely irrelevant
02:38:12 <doug16k> that would all show up exactly right if you just made your paging use groups of architectural pages as individual pages
02:38:23 <geist> it wouldn't prove much because there are so many external variables out of your control
02:38:50 <geist> picking a single real world architecture that you can run on real hardware would let you at least do some sort of A/B testing and control your varriables
02:40:10 <nyc``> I don't have any real world benchmarking capability of my own.
02:42:55 <nyc``> My laptop would not be adequate.
02:44:47 <geist> okay. so i'm not sure what that does or doesn't add to the discussion
02:49:05 <geist> huh. interesting. riscv has a similar global pointer to mips. turns out you just have to add the magic symbol to the linker script
02:49:16 <geist> and the linker is the one that can relax memory references to use it
02:51:41 <mahackamus> what's it point to?
02:52:28 <geist> wherever you want to put it, but basically if you stick it off in the middle of the data segment, then the linker will, if it senses that it can 'relax' a memory access to deref off of gp, will do so
02:52:55 <geist> https://gnu-mcu-eclipse.github.io/arch/riscv/programmer/ described at the bottom of the page
02:54:29 <geist> and sure enough it really did
02:55:11 <doug16k> intel, srsly? nobody cares about netburst anymore. can you please remove that crap from the primary x86 manual. thanks
02:55:28 <geist> doug16k: yeah no kidding
02:55:31 <zhiayang> reasons to use the amd manual:
02:55:40 <doug16k> zhiayang, ya!
02:56:56 <geist> https://github.com/littlekernel/lk/commit/dc2fe55fa90642706404a61283e5bbbcf18f8b1e
02:57:15 <geist> nyc``: that may be basically what you need to do on mips as well
02:57:17 <geist> for the GP stuff
02:58:17 <nyc``> Checking it out.
02:59:12 <geist> as a result of setting the gp, the linker then emits fun instructions like "sw a0,1384(gp) # 80000dc4 <src>"
02:59:17 <geist> cute!
03:01:18 <doug16k> called a "relaxation" I suppose? where it can change a thing to a better instruction?
03:02:05 <mahackamus> if i'm pickin up what they're puttin down it uses 12 bit addresses to offset from the global pointer
03:02:31 <mahackamus> to optimize accesses in 4096K region
03:02:39 <doug16k> geist, if I'm right -Wl,--no-relax breaks that and it will still do the old thing, assuming it is even possible on that arch
03:02:58 <doug16k> relaxations are awesome
03:04:03 <doug16k> I managed to get ld to completely relax my uefi code to not use the got
03:04:44 <doug16k> not needed there, they just spam my text and data with all the required relocations. the whole thing is so small I don't care
03:06:15 <doug16k> ah just noticed you mentioned relaxations in the commit message :)
03:06:21 <mahackamus> or is it 4097K region
03:08:00 <mahackamus> i think they mean +/- 2047
03:08:10 <ybyourmom> but it's 2019
03:08:48 <geist> doug16k: yeah
03:09:26 <geist> yah in this ase it's only a 4K reach
03:09:46 <geist> generally converts a two instruction sequence to compute the address into one
03:11:11 <geist> otherwise riscv is about as powerful at synthesizing 1 or 2 instruction sequences to compute something relatively near PC
03:11:42 <geist> ARM64 in general doesn't do any of this fixed register biz, but a lot of the more pure risc machines still do it. mips, microblaze, riscv, etc
04:17:24 <GwenNelson> kernel development is fun
04:17:25 <GwenNelson> https://i.imgur.com/8UXozG3.png
04:18:40 <geist> nice!
04:28:53 <ybyourmom> Riveting
06:21:28 <doug16k> GwenNelson, funniest thing is that crap can be actually useful information to crazy people like us
06:22:05 <doug16k> crap = crash dump of random hex digits and funny names that end with x and s
06:23:26 <geist> yep!
06:23:33 <geist> i actually mean 'nice' in a good way
06:23:37 <geist> as in that's a cool crash
06:23:53 <geist> beats just a spontaneous reset, which are the worst of the crashes
06:33:41 <doug16k> oh I did too
06:33:51 <doug16k> looks like real exception handling behind that bug
06:34:17 <doug16k> it's trying to report it!
06:34:26 <geist> yah
07:38:45 <lsneff> Hi everyone
07:39:13 <geist> hi
07:39:30 <lsneff> Does anyone have some experience with exception handling on macos?
07:39:52 <lsneff> I'm trying to figure out if __register_frame from libunwind takes a compact unwind encoding or a dwarf eh frame
07:45:39 <lsneff> Do I have to convert the data in the __LD, __compact_unwind section to something that looks like the data in an __unwind_info section?
07:45:46 <lsneff> Anyway, this is weird stuff.
07:45:53 <lsneff> Working on exception support for a wasm runtime.
08:32:39 <doug16k> lsneff, I use __register_frame and it's in libgcc
08:33:04 <lsneff> And that takes a pointer to each fde on macos?
08:33:04 <doug16k> I pass the beginning of the .eh_frame section
08:33:31 <lsneff> I read somewhere that on linux, you pass a pointer to the section, and on macros, a pointer to each entry
08:33:34 <doug16k> no idea about macos. just saying in general strangely I do that too but it's in libgcc not libunwind
08:33:36 <lsneff> (by calling it multiple times)
08:33:41 <lsneff> Ah, okay, thanks
08:34:02 <doug16k> I have to manually register it so I can forced unwind in longjmp
08:34:24 <doug16k> normally the dynamic linker does it from .dynamic record (I think)
08:34:42 <lsneff> I'm jitting some code using llvm, so I need to register the exception frames
08:35:19 <doug16k> you should be able to just pass the beginning of your .eh_frame section (or its in memory equivalent) to __register_frames
08:35:33 <lsneff> It's so annoying how every platform does this differently and most of it isn't documented.
08:35:37 <lsneff> I'll give that a try.
08:35:54 <doug16k> I meant pass a pointer to, of course
08:36:16 <lsneff> The object file llvm is producing has a `__compact_unwind` section instead of `__eh_frame`, not sure why.
08:36:28 <doug16k> it might just work
08:36:30 <lsneff> Sometimes it outputs an `_eh_frame`, othertimes, the other one
08:36:33 <doug16k> __register_frames might "know" that format
08:36:53 <doug16k> ah, sorry, idk about that one or the other thing
08:36:55 <lsneff> Yeah, I'll guess I see.
08:37:06 <lsneff> Thanks for your help
08:39:28 <lsneff> llvm has been a huge pain for this project.
08:39:43 <lsneff> we're calling it from rust, so the interface is pretty poorly developed.
08:40:15 <lsneff> A lot of llvm features aren't supported, or straight-out don't work
08:44:20 <doug16k> sounds difficult
08:44:33 <doug16k> it's so frustrating when tools don't work
08:45:11 <lsneff> It really is
08:45:57 <lsneff> And when they're so complex, it's nearly impossible to know why.
08:47:43 <lsneff> Oh, and sometimes it just doesn't make sense.
08:48:08 <lsneff> Like llvm is emitting a mach-O object and then it says it's malformed when it tries to parse it again.
08:48:11 <lsneff> Seriously?
08:49:19 <doug16k> yeah. where do you begin to fix that?
08:49:52 <lsneff> you don't, just go write a new optimizing compiler
08:51:56 <lsneff> it's probably easier :P
08:57:26 <doug16k> I guess if you really had to fix that you could intercept that supposedly malformed object and see if and how it is invalid and start searching and debugging more specifically why some field is wrong
08:57:46 <lsneff> Well, it's not malformed
08:58:03 <lsneff> objdump and a couple other mach-o parsing libraries do just fine.
08:59:10 <doug16k> you'd start working on llvm at that point I guess
08:59:58 <doug16k> no small undertaking
09:00:36 <lsneff> even building it is quite a feat
09:01:57 <doug16k> is there any tracing infrastructure you can flick on and look at something?
09:02:05 <doug16k> maybe see what it was about to do when it failed?
09:02:52 <lsneff> Yeah, there probably is, but I honestly think it'll be easier just to write a mach-o object dynamic loader
09:03:01 <Prf_Jakob> lsneff: https://github.com/VoltLang/Volta/blob/master/rt/src/vrt/os/eh/unwind.volt
09:03:17 <Prf_Jakob> That works for me on MacOSX
09:04:15 <Prf_Jakob> If I understood correctly that you are trying to do EH on OSX. If not sorry for the noise.
09:07:39 <lsneff> Oh, thanks, I appreciate it
09:54:07 * klys finisned another part of his allocator, twigs_alloc.
09:58:26 * geist figured out how to cold bringup the sifive_e board
09:58:40 <geist> the sifive folks have some 'freedom sdk' bare metal driver thing
09:58:50 <geist> but it's one of the most complex, unreadable things i've ever seen
09:59:11 <geist> plus it relies on tons of magic init sections and whatnot to initialize, so it's hard to see what order stuff is done in
09:59:50 <geist> it's all supposed to be just magic 'all you have to do is supply main' sort of thing, but that means 100KB of some random driver code initializes in front of you and does a bunch of stuff
10:00:10 <geist> hate that kind of stuff. just give me a bunch of leaf node driver functions, helper routines. i can take it from there
10:00:32 <klys> neat, so what have you been running on it?
10:00:56 <geist> oh just getting LK up and running. i've been running on the qemu emulation of it, but haven't gotten around to running on a real board
10:01:17 <geist> which requires figuring out how their SDK thing drives openocd, how to bring up the oscilators and plls and whatnot
10:01:36 <Prf_Jakob> geist: Does that board use DDR4 btw? I heard that an init code for DDR4 needs to be super hard locked down.
10:01:49 <geist> which ordinarily would be simple, but their example code is so opaque i was literally walking through the dissassembly of it to see what things are beingdone
10:02:06 <geist> Prf_Jakob: nah, this is the 'e' version of the board. which is basically a little embedded things
10:02:15 <Prf_Jakob> Ah ok
10:02:16 <geist> 512KB flash, 16KB sram, etc
10:02:23 <Prf_Jakob> *nods nods*
10:02:32 <geist> the 'u' version is the big board
10:02:50 <Prf_Jakob> Cool
10:03:17 <geist> this is combined with some extremely.... terse docs on the soc
10:03:31 <klys> well, if you know how to bring up the board, perhaps you've been making some notes?
10:03:35 <geist> the pll description for example is basically a list of the bits of a register and about 2 paragraphs
10:03:48 <Prf_Jakob> My main worry about RISC-V is that I keep seeing talks about it and where they go "Oh we will just add more drivers in the firmware" and the spec requiring you to call into firmware things that could be standardize.
10:04:04 <Prf_Jakob> Standardized*
10:04:08 <geist> yah
10:04:18 <geist> they definitely have a long way to go
10:04:27 <geist> and so far i'm not tremendously impressed with the 'freedom SDK'
10:04:37 <Prf_Jakob> Hehe
10:04:45 <geist> it's nifty, but i dont want nifty. i just want some plain ass code
10:04:59 <geist> like what ST and atmel and TI and whatnot provide for their cortex-ms
10:05:30 <klys> s/atmel/intel/
10:05:40 <Prf_Jakob> Oh yeah, I dislike code that tries to be nifty and clever, be smart and simple instead :)
10:06:08 <geist> Prf_Jakob: it's clear they're trying to provide an all in one environment, somewhat arduino like
10:06:15 <geist> which is fine if you're willing to work within that environment
10:06:22 <Prf_Jakob> Mmhhmm
10:06:30 <geist> if you want to do what i do, which is bring their drivers into my environment, then having a lot of cleverness works against it
10:07:24 <geist> but annnyway, this is an extremely simple SOC, so it's not like there's a lot of driver code to write
10:07:42 <geist> just a bummer that it's easier to basically rewrite from scratch than to try to figure out how to bring over a few thousand lines of code
10:07:53 <klys> does it have a jtag interface you could hook up to something?
10:08:11 <geist> yep, that's openocd i was talking about
10:08:26 <geist> had to extract how they're driving it out of their scripts
10:08:30 <geist> not too bad once you figure it out
10:08:31 <klys> is that how it talks to the programmer?
10:08:40 <geist> yes. it basically is the programmer
10:08:52 <geist> essentially openocd is bit banging the flash interfave via jtag
10:09:02 <geist> fairly common on these sort of embedded boards
10:09:29 <geist> hijack the cpu and then reach in and directly drive the flash controller over jtag
10:09:46 <klys> what frequency is the clock line operating at?
10:09:59 <geist> the external oscilator is running at 16Mhz
10:10:23 <klys> slow enough for "simple" pci
10:10:28 <geist> was just screwing around with code to enable the xtal, let it stabilize, and then tell the main pll to run in bypass mode so that it clocks the cpu at 16Mhz
10:10:41 <geist> you can pull up to 200-300 Mhz i think
10:10:45 <geist> pll
10:11:07 <geist> i was just trying to get a stable clock so i can print something to the uart that isn't garbage
10:11:23 <geist> otherwise the default clock is an internal RC oscillator, but it's too crummy to get a good uart clock on
10:11:40 <geist> it has a variance of like 45%
10:12:15 <klys> yeah 200 baud was pretty slow
10:12:18 <klys> 300*
10:12:28 <geist> eh?
10:12:43 <klys> running up to 300 MHz to drive a uart?
10:12:59 <geist> no. i didn't explain well enough
10:13:08 <klys> kk
10:13:26 <geist> the default clock when the soc comes out of reset is an internal RC oscillator, which starts at 13.8Mhz. it's very poor though, says it can vary up to 45%
10:13:37 <geist> so it's not good enough to drive a uart off, because the clock frequency varies
10:13:38 <klys> wew
10:14:08 <geist> so, you bring up the external crystal, which is 16Mhz, then program the internal pll to run in bypass (1:1) mode, and now you're running at a solid 16Mhz
10:14:22 <geist> plenty good enough to derive a 115200 baud off of
10:14:29 <klys> neato
10:14:48 <geist> you *can* also run the pll up to 300Mhz i think, but that's another step
10:15:02 <geist> so now you're clocking up a 16Mhz input clock to 300Mhz
10:15:08 <geist> because that's what plls do
10:15:49 <klys> makes me wonder how difficult it would be to simulate jtag with just a serial line
10:16:20 <klys> jtag has separate send and receive pins
10:16:27 <geist> yah you really cant
10:16:47 <geist> and a clock, it's closer to something like SPI
10:17:12 <geist> it's essentially exactly not a asynchronous serial thing (UART). it's a synchronous protocol
10:17:21 <klys> yeah a clock line too, and another line to interrupt the process (?)
10:18:11 <klys> well, I've had it in the back of my mind, those little lichee devices that just have a jtag interface and that's all
10:18:44 <klys> mebby there's some way to affix an old bus to it
10:21:13 <klys> much better to use the jtag line to bus out than a serial line
10:22:58 <klys> just have to make sure the bus doesn't take over until the device is programmed and running.
10:24:22 <klys> pretty sure an eg. isa bus would have been running at 8 MHz
10:28:38 * geist nods
10:28:43 <geist> up to 8mhz yes
10:29:19 <klys> so, the jtag interface is fairly amenable to putting a few chips in series to form your circuit
03:04:20 <c10ud> hi, is there some nice select() simple implementation I could take a look at?
03:07:11 <renopt> maybe openbsd's, select() doesn't make a lot of sense out of context from the vfs implementation
03:09:42 <c10ud> ok I'll try to find it
03:09:44 <c10ud> thanks
03:15:50 <renopt> oh actually, does xv6 have a select() implementation...
03:18:38 <renopt> "first appeared in 4.2BSD", probably not, nevermind
04:43:08 <zhiayang> wtf
04:43:19 <zhiayang> rcx is randomly getting corrupted, leading to bad sysret
05:19:30 <zhiayang> is there anything wrong with this code? https://github.com/zhiayang/nx/blob/master/kernel/source/arch/x86_64/syscall.s#L66
05:19:38 <zhiayang> this is the entry point for x64 syscall
05:20:02 <zhiayang> barring inefficiencies
05:28:30 <program> are there big differences between instruction encodings in ARM families?
05:28:46 <program> for example armv5 and armv6 and armv7
05:29:10 <program> i could not find good reference
05:30:03 <nyc`> Thumb is a big difference.
05:36:49 <program> i'm writing a disassembler and i'm confused with what instruction set to start ; my target is armv5 but i would like to start in some uniform way
05:37:21 <program> like there is an practically unified instruction encoding for x86 then i wonder if ARM has it like that too
05:38:00 <program> basically if there is http://ref.x86asm.net but for ARM processors
05:49:18 <doug16k> zhiayang, is it possible to context switch in a syscall? of course, right?
05:49:46 <zhiayang> right
05:49:53 <doug16k> if so, the next syscall that runs on this cpu will clobber that stuff you restore at the end
05:50:04 <zhiayang> ohhhh
05:50:13 <zhiayang> damn, totally overlooked that
05:50:24 <zhiayang> ok so i should be saving the state on the kernel stack instead?
05:51:00 <doug16k> ya save it there
05:51:06 <doug16k> then it'll come back when you context switch back
05:51:55 <doug16k> if you're real tight on registers, you can do that stash in gs thing, just make sure you put it somewhere safe before it is possible to context switch
05:52:05 <doug16k> like the kernel stack
05:52:30 <zhiayang> ah, i see
05:52:39 <zhiayang> ok thanks for this great insight
06:13:20 <zhiayang> alright, things work again; thanks doug16k
06:13:35 <doug16k> nice
06:13:40 <zhiayang> obviously did not discover this till i started testing with > 1 user process
06:13:58 <doug16k> ya
06:14:07 <doug16k> grats on multitasking working btw :)
06:14:19 <zhiayang> \o/
06:14:29 <zhiayang> the real victory is multitasking without heisenbugs
06:15:01 <zhiayang> thankfully (with hand placed on wood) they have yet to crop up
06:18:59 <program> where can i find instruction encoding for armv5?
06:31:55 <geist> program the arm manual... but they left
06:33:18 <mrvn> zhiayang: Personally I prefer saving stuff in the struct Task of the process and leave the kernel stack pristine on exit.
06:33:47 <mrvn> One stck to rules them and and in ring0 bind them.
06:34:03 <geist> yah i've done it either way. works pretty well either way
06:34:36 <geist> some arches its a little easier to push a lot of stuff to the stack (like x86-32), some it's easier to push to a part of the thread struct
06:35:17 <mrvn> saving all regs at least makes accessing user regs from syscalls easier. Is there much point in not doing that at all?
06:35:53 <zhiayang> do you typically need to access user regs in a syscall?
06:35:54 <geist> well, if you have a single kernel stack, like i think you do, then it's non optional
06:36:07 <mrvn> geist: I believe all the archs that can easily push to a thread struct can push to stack just the same, they simply don't differentiate the address registers. The opposite is not true.
06:36:21 <mrvn> I hate special purpose registers
06:36:25 <mrvn> :)
06:36:43 <geist> well, the only counter example i can think of right now is pusha
06:36:54 <geist> but that's not a great example since it over-saves, but it is at least convenient
06:36:56 <mrvn> zhiayang: int open("file"); Where does the syscall return the FD?
06:36:59 <geist> same with pushfl/popfl
06:37:13 <geist> etc but it's pretty much the same thing
06:37:33 <mrvn> geist: pusha pushes all reg or not?
06:38:43 <zhiayang> mrvn: right, but if in the entry point the syscall is called, then the return value would already be in the proper register, no?
06:38:49 <zhiayang> ofc this only applies for return values
06:40:59 <mrvn> zhiayang: If the syscall entry saves regs to the stack then poping them back will overwrite the return values. But you would probably exclude the return register there. But then what if you need two?
06:41:26 <mrvn> And now you have different save code for interrupts, syscalls and extra handling for task switches.
06:41:35 <zhiayang> hm, when would i need two?
06:41:36 <geist> you need that anyway
06:41:48 <geist> syscall does not work the same way as irq entry
06:42:00 <zhiayang> for syscalls and timer interrupts i just save the necessary registers
06:42:04 <geist> it has a different set of regs it trashes, etc. you have really no choice but to build a separate entry point
06:42:19 <mrvn> geist: looks pretty much the same to me
06:42:22 <geist> it's not.
06:42:49 <geist> just to be clear, we're talking about x86-64 here
06:42:56 <mrvn> geist: not exclusively.
06:43:01 <geist> syscall does not have the same entry mechanism as a regular irq
06:43:06 <mrvn> x86(-64) is horrible there
06:43:16 <geist> thus it needs a separate set of logic
06:43:31 <doug16k> in my x86-64 syscall entry point, it has the rules of elf x86-64 function calls, I clobber everything a regular function call would clobber and preserve and return the rest
06:43:31 <geist> mrvn: that's great, but right now zhiayang is talking about x86-64 (probably)
06:43:35 <mrvn> yeah, sysenter/leave have their own entry point there
06:43:53 <mrvn> doug16k: so you are leaking secret kernel infos.
06:44:03 <doug16k> reasoning is, that's almost certainly a libc wrapper making the call, so the compiler already is assuming clobbers are clobbered
06:44:11 <doug16k> no I zero out on the way back
06:44:11 <geist> so, back to a question that zhiayang had a while ago: do you need to inspect the user state on syscall entry? no, until you do
06:44:24 <zhiayang> geist: erm...
06:44:32 <doug16k> I zero out all clobbered things
06:44:43 <mrvn> doug16k: smart
06:44:46 <geist> later on ifyou build in debugging and user space inspection of state on a thread, you will need to ave it
06:44:54 <geist> but thats way down the road
06:45:00 <zhiayang> oh, hmm
06:45:19 <zhiayang> doug16k: i should probably do that
06:45:21 <geist> and yes. what doug16k said. you must zero out all the temporary regs on the way out. basically you cannot leak any kernel stuff
06:45:57 <doug16k> can't leak juicy internal kernel pointers that easy, gotta make them work for it
06:46:03 <geist> yah
06:49:26 <lsneff> Well, seems like we're going to write some of our codebase in c++, so we can use more llvm jit features
06:53:04 <lsneff> It's annoying how much functionality isn't accessible from the c api.
06:53:36 <doug16k> lsneff, so make a c++ shim callable from rust then? that sounds like an alright solution
06:54:00 <lsneff> Yep, that's the plan.
06:54:11 <geist> yah we have a fairly complex system for that in fuchsia
06:54:14 <doug16k> would solve the brokenness I hope
06:54:37 <geist> the driver layer flatens down to a C abi, so it is stable, and then the different languages have their own bindings for it
06:54:39 * doug16k realizes he just casted a high spellpower jinx
06:55:14 <doug16k> jk
06:56:09 <lsneff> Our long-term plan is to write a custom compiler for this use case, but llvm is the poison we're drinking atm
06:56:45 <lsneff> Have to be careful to not link against multiple instances of llvm.
07:16:58 <mrvn> I'm always worried if I should make *.o depend on the sha256 of the compiler binary.
07:20:37 <rain1> mrvn: what do you mean?
07:20:53 <rain1> ensure the c compiler is exactly the same?
07:24:48 <mrvn> rain1: yep. and when it changes and you type "make" it recompiles all
07:25:23 <rain1> ah that makes sense if you are working on a toolchain as well as stuff built with it
07:25:36 <rain1> yeah good idea
07:25:42 <rain1> i'll have to keep that in mind
07:32:11 <doug16k> autotools seems to magically do that for me mrvn
07:32:14 * doug16k hides
07:32:29 <mrvn> it's magic.
07:32:52 <lsneff> I can't believe it's not autotools
07:38:05 <doug16k> to be completely clear, if I have a build directory configured to use a toolchain in ~/cross-elf, then I run my toolchain build and make install it, then just "make" my main project, it just knows to build everything. I think it puts the compiler(s) as part of the auto-dependencies
07:38:41 <doug16k> I did nothing to make that happen that I am aware of
07:48:37 <doug16k> I tend to call helpful things that happen by themselves "magic" :)
08:00:23 <nyc> doug16k: Bibbity, bobbity, boo.
08:06:26 <doug16k> depending on a hash is a tad overkill maybe, depending on its last modified is 99% as good and much lighter
08:07:11 <doug16k> if you have a way of making compilers with the same name and earlier dates and different content appear, then by all means hash it
08:08:48 <lsneff> ooo, I got a 600ms startup time down to 5ms
08:08:55 <lsneff> very happy with myself
08:13:44 <nyc> lsneff: Excelente!
08:14:45 <lsneff> caching ftw
08:15:04 <nyc> lsneff: Caching what in particular?
08:15:14 <lsneff> compiled code
08:15:27 <lsneff> caching the output of this jit runtime I've been working on
08:42:36 <mobile_c> does a JIT alter the expected execution of an application in comparison to a statically compiled one eg gcc
08:44:01 <lsneff> mobile_c: What do you mean?
08:44:45 <mobile_c> for example, does it process all #define's as it encounters them or does it find all #define's first like a standard compiler would
08:45:39 <mobile_c> and the same for function wrappers
08:45:40 <lsneff> Ah, well, it's more like a normal compiler.
08:45:54 <lsneff> It's really an AOT compiler that only runs in memory
08:47:10 <mobile_c> for example, /* assume this is pre defined in a seperate library b() {return 0;} */ a() {b();} b() {return -1;} c() { b();}
08:47:39 <mobile_c> would a()'s b return 0 and c's b return -1 or would both a() and c()'s b return -1
08:49:24 <mobile_c> (the same can be done for macro's, eg if a macro is defined after, say after a(), #define b a , would a be self recursive due to the definition taking effect ahead of time like aa standard compiler or would it proceed as normal as a>b and c>a>b)
08:49:45 <lsneff> It's compiles a bytecode to machine code, so it doesn't have to worry issues of that nature.
09:18:05 <bcos_> mobile_c: If performance doesn't matter, there isn't any reason why a JIT couldn't start with "un-pre-processed text source". For performance reasons you want to do as much work as possible beforehand, but (for max. performance) that implies AOT compiling all the way and not using a JIT at all
09:25:58 <geist> yah. usually though they start with an intermediate bytecode of some type that's fairly generic
09:26:07 <geist> easy to transliterate to native code on demand
09:28:46 <nyc> https://en.wikipedia.org/wiki/Threaded_code
10:04:43 <geist> yah you see that sort of stuff for things like C compilers for 8 bit machines and whatnot. Just emit a series of calls to helper routines. not super fast, but its all native, and fairly compact code wise
10:18:47 <mobile_c> ok
10:19:52 <nyc> There's a sort of way to have these "virtual machine models" (a virtual machine in a different sense) representing the execution semantics of various programming languages and one of the steps is to translate from those things to normal code IR's by what are basically something akin to "assembly macros" that implement the "instructions" of the virtual machine models before going and trying to optimize the normal code IR's. GHC does this w
10:19:53 <nyc> ith what's called the STG (Spineless Tagless Goteborg) machine for LLVM. Normal things like C would probably use stack machine models. Anyway things would probably want to implement the moves of those virtual machine models with threaded code or things like "assembly macros" to get native code without much pain.
10:25:54 <nyc> Example move: pop two things off the stack, add them, and push the result on the stack.
10:31:07 <geist> and of course this whole virtual machine emulation thing is not new at all, they were doing it way back in the 60s easily
10:31:34 <geist> especially when there was very little ISA standardization and having something run an abstract machine helped you port it between machines
10:32:10 <geist> it was definitely the case in the late 70s/early 80s personal computers where there are tons of different incompatible machine architectures floating around
10:32:57 <nyc> https://en.wikipedia.org/wiki/VM_(operating_system)
10:39:20 <nyc> IBM was still using VM/ESA and CP/CMS internally when I was there.
10:40:51 <geist> yah pretty forward thinking since you can still run that old stuff nowadays no sweat
10:41:59 <geist> IBM loves them some virtualization. you can even tell they were heavily involved in PS3 and cell, since that's super virtualized
10:42:06 <geist> and built on PPC
10:45:11 <nyc> I think there's even qemu-system-s390x beyond just hercules.
10:46:36 <geist> yah i looked at it thinking it'd be interesting to load up a linux s390x system, but apparently it's not really intended for full system emulation
10:46:44 <geist> i think it's only really used for user space emulation of the architecture
10:47:20 <geist> or at least ibm wont either describe all the hardware, release all the roms, or no one cares to do full system emulation work
10:47:54 <nyc> I think hercules does full system emulation.
10:48:32 <nyc> Or at least it's reputed to boot up Linux' S/390 port.
10:56:51 <jmp9> okay
10:56:57 <jmp9> before long mode i should enable paging
10:57:00 <jmp9> but which paging?
10:58:27 <bcos_> jmp9: There's only one choice that works for long mode ("long mode paging"!)
10:58:46 <jmp9> in compability mode CR3 is still 32 bit
10:58:50 <jmp9> so i can't do this
10:59:11 <jmp9> what i should put n CR3?
10:59:13 <jmp9> PML4 ptr?
10:59:15 <bcos_> CR3 is "32-bit physical address extended to 64-bit", so...
10:59:44 <bcos_> ..like "mov cr3,eax" is the same as "movzx rax,eax; mov cr3,rax"
11:00:41 <bcos_> Physical address of PML4 needs to go into CR3 (before enabling long mode)
11:01:34 <jmp9> mov eax,0x00201000 mov cr3,eax
11:02:50 <jmp9> last question
11:02:56 <jmp9> https://hastebin.com/xirilahixa.rb
11:02:59 <jmp9> is that correct code?
11:03:17 <jmp9> if it's correct, bug is in page_map function
11:05:02 <doug16k> jmp9, what is line 43 for?
11:05:39 <jmp9> for debugging purposes
11:05:48 <doug16k> ah ok
11:05:50 <jmp9> it will stop if there is no page faults
11:05:55 <doug16k> right
11:06:34 <doug16k> jmp9, why line 44
11:06:53 <doug16k> doesn't use it
11:07:40 <jmp9> uhm
11:07:47 <jmp9> what i should use instead?
11:07:48 <jmp9> retf?
11:08:00 <doug16k> strangely loads 4 opcode bytes into eax
11:08:06 <doug16k> oh no, nasm syntax
11:08:24 <doug16k> ya pointlessly puts the address your next absolute far jump jumps to into eax
11:08:38 <bcos_> jmp9: The "jmp 0x08:_kernel_64" will work fine (without loading anything into EAX beforehand)
11:08:50 <jmp9> (without loading anything into EAX beforehand)
11:08:51 <jmp9> why?
11:09:05 <bcos_> jmp9: Can we assume it doesn't stop at line 43 (gets a page fault and triple faults instead)?
11:09:19 <doug16k> because jmp 0x08:_kernel_64 is encoded as <jmp opcodes> <offset (_kernel_64)> <0x08>
11:09:22 <doug16k> all the information is there
11:09:40 <doug16k> the cpu is ignoring eax
11:21:27 <jmp9> so
11:21:33 <jmp9> bug is in page_map
11:21:38 <jmp9> i'm sure that it's there
11:28:02 <uelen> I'm at the point of implementing context switches in my operating system and I'm having a bit of trouble. As soon as I try writing to the segment registers to restore them I get a page fault -> immediate triple fault
11:28:02 <uelen> it happens at `mov ds, word [rel segments.ds] ; restore ds`
11:28:02 <uelen> this happens even when I'm restoring the same values, and interestingly it just bypasses all of my interrupt handlers even though I have some set up for double and triple faults
11:28:02 <uelen> however, it only occurs when I change cr3. If I leave cr3 alone then everything's fine. I have no clue what the problem with my new page table is or why it causes a fault when I try to change a segment register
11:31:24 <doug16k> uelen, what RPL are you loading
11:31:58 <doug16k> the RPL is in the low two bits of the value you load into a segment register
11:32:22 <uelen> I've tried it with both 0 and 3, kernel and userspace
11:32:26 <uelen> both cause an exception
11:32:56 <doug16k> the GDT base is a virtual address right? not physical?
11:33:10 <doug16k> the GDT is mapped in when userspace is running right?
11:33:17 <uelen> wait it might be physical
11:33:22 <doug16k> gotta be virtual
11:33:26 <uelen> welp, thanks
11:33:31 <doug16k> linear technically
11:36:50 <doug16k> the cpu is going to remember the gdt base internally, then any time code (userspace or otherwise) loads a segment register it calculates the descriptor offset, adds that gdt base, and reads that virtual address to fetch descriptors
11:36:54 <doug16k> even in user mode
11:37:16 <doug16k> it allows supervisor pages to be read for that special purpose though
11:38:56 <doug16k> paging could swap out the gdt (and ldt and idt for that matter), but you probably wouldn't do that
11:39:06 <uelen> should both the address to the gdtr and the address to the actual gdt be virtual?
11:39:24 <doug16k> yes
11:39:37 <doug16k> the only physical thing ever is cr3 and the entries in page tables
11:39:56 <doug16k> and drivers feed physical addresses into devices for dma of course
11:40:42 <uelen> huh, I probably did the same thing with the idt
11:41:15 <doug16k> ya if you are still using some early boot one in low virtual address, then it'd die when you switched into a user process and it all disappeared
11:41:58 <doug16k> you can put page table entries in to make those tables appear somewhere else, then change your gdtr, idtr, etc appropriately
11:42:33 <doug16k> map whatever physical pages they are now at some other address up in kernel space where it doesn't disappear and switch
11:44:29 <uelen> I mapped every physical address to address+0xFFFF800000000000 so it's not a painful fix
11:44:44 <uelen> thanks for pointing that out, I probably never would have realized on my own
11:46:20 <jmp9> ok guys i figured out the bug
11:46:26 <jmp9> it's because of unaligned structure
11:46:29 <jmp9> which i use
11:46:35 <jmp9> how align structure to 4KB?
11:49:38 <jmp9> this bitch doesn't want align to 4 KB
11:50:32 <uelen> what language are you using?
11:50:40 <nyc> Okay, and there's the RISC-V version.
11:50:50 <jmp9> gcc
11:50:54 <jmp9> language C