Search logs: #osdev - 4 May 2019

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev&y=19&m=5&d=4

Saturday, 4 May 2019

12:04:31 <doug16k> zid, you can assign a name to a symbol in your linker script and use a symbol instead of hardcoded 0xffffffff80001000 -> https://github.com/zid/boros/blob/master/boot/long.asm#L44
12:07:04 <doug16k> if you do that then you can use a rip-relative lea and it will work if you randomize the load address
12:07:18 <doug16k> (depending on where this golong code is)
12:10:50 <zid> doug16k: Yea I don't intend to support kaslr, too lazy
12:10:59 <zid> if I did, I would be adding the kaslr base
12:21:40 <stisl> hi
12:22:39 <stisl> what could be the problem when "lgdt" triggers a reset?
12:22:56 <stisl> in another sourcecode from me it works
12:23:25 <zid> ldgt not where you said it was
12:23:30 <zid> lgdt contents make no sense
12:23:31 <eryjus> proper alignment of the structure is my guess
12:23:35 <zid> you immediately got an interrupt
12:23:37 <zid> lgdt alignment
12:23:59 <stisl> thx
12:24:19 <stisl> how should it be aligned, is page aligned ok?
12:25:05 <geist> it is but that's almost certainly way more than it needs to be
12:25:07 <eryjus> iirc 8-byte should suffice, but would have to look to confirm
12:25:17 <geist> if it needs it at all. you should consult the manual for those sort of things
12:25:40 <zid> possibly 16 on amd64, if it even needs alignment there
12:26:21 <stisl> thanks for the quick response I will test it
12:26:31 <geist> when in doubt assume it needs to be aligned on the unit of one of its elements
12:29:51 <stisl> ok, the 16 byte alignment of the table is now implemented but the restart is still there
12:30:02 <zid> Time to actually put sensible things in it then
12:30:56 <stisl> https://pastebin.com/6ZRZ5bYn
12:31:08 <stisl> it is very short just for a test
12:31:29 <zid> That's a gdt
12:32:36 <zid> and it's aligned to 8, not 16
12:32:49 <zid> so you failed on every front, good job, I guess
12:33:51 <stisl> II have also a bigger gdt, but this gdt also doesnt work on the current os
12:36:30 <eryjus> several things jump out at me: 1) are you getting a timer IRQ at sti? 2) you are not jumping properly to set the new GDT and fixing up the segment selectors. 3) what is this rip offset thingy with the lgdt opcode?
12:37:43 <zid> rip-rel
12:37:49 <zid> it's required for position indpendant code
12:38:01 <zid> without rip-rel you need awful 'base' registers
12:39:03 <stisl> I tried it also with fixing the segment registers
12:39:48 <stisl> and the 1) I didn't try - but I tried a keyboard irq
12:40:00 <eryjus> put the structure in the text section right after the ret and then you should be able to access it within a few bytes offset. I may be wrong, but the way I read that is you are adding rip to the address of the structure, which is not where it really lives
12:40:28 <stisl> ah,ok
12:40:50 <stisl> hubsan was also telling me sth. like this
12:45:51 <doug16k> lgdt has a misaligned layout if you are not careful. a naive structure will put a uint16 followed by pointer sized field
12:46:43 <doug16k> the compiler will insert 2 or 6 bytes of padding unless you make it packed or offset the limit field so the base field ends up aligned
12:47:26 <doug16k> stisl, ^
12:48:37 <doug16k> there is no alignment requirement on the gdt itself, but a sane implementation will at least 16-byte align it
12:49:35 <stisl> what could be the problem, I have aligned everything
12:49:46 <stisl> I also removed the rip
12:49:53 <doug16k> the structure for lgdt, can you show that?
12:50:32 <stisl> doug16k, do you mean https://pastebin.com/6ZRZ5bYn?
12:51:43 <doug16k> .short is better than word. word means weird things
12:52:08 <stisl> ah,ok
12:53:31 <stisl> doesnt help
12:53:31 <doug16k> weird is no exaggeration either, see https://ftp.gnu.org/old-gnu/Manuals/gas-2.9.1/html_chapter/as_7.html#SEC136
12:53:58 <stisl> I see
12:54:37 <doug16k> it must crash at sti then?
12:55:06 <stisl> yes it crashes mostly directly after this
12:55:06 <doug16k> I guess I have to assume all those random numbers between 44 and 53 are correct
12:55:09 <doug16k> define constants
12:55:14 <stisl> I ok
12:55:29 <doug16k> 44 47 41 43 53 eh? oh I see your problem. no constants :P
12:56:22 <doug16k> 1 << 44 works? since when can int be shifted 44 bits?
12:56:45 <doug16k> 1LL maybe
12:57:17 <zid> https://github.com/zid/boros/blob/master/boot/long.asm#L56 did someone say magic constants!?
12:58:37 <stisl> copy & paste failure
12:59:22 <geist> huh. gcc 9.1 released today
12:59:53 <zid> yea, it has some minor optimizations for switches and a couple of attributes
01:09:27 <doug16k> "Support for opening file streams with wide character paths on Windows" ... can someone fine out if hell froze over?
01:10:49 <doug16k> there's a vega 10 AMD GCN arch now
01:12:00 <doug16k> "Support of Intel MPX (Memory Protection Extensions) has been removed."
01:12:45 <doug16k> broken like the broken transactional memory thing?
02:01:26 <bluezinc> zid: nice bunch of magic numbers you've got there.
02:02:00 <zid> Thanks, I grew them myself
02:08:26 <doug16k> think my gcc build will work just by changing the version number?
02:09:19 <zid> sure
02:10:13 <doug16k> laptop takes a few minutes
02:10:58 <geist> yah generally does. also usually pick up the lastest binutils to match
02:14:26 <zid> I only just rebuilt all my crosscompilers though :(
02:18:00 <FreeFull> https://bugzilla.mozilla.org/show_bug.cgi?id=1548973 Mozilla done screwed up
02:18:26 <geist> oops
02:19:49 <zid> hah
02:20:11 <geist> wouldn't be the first though. i've seen lots of random sites lose their shit over expiring stuff
02:20:48 <doug16k> heard a little skip in the audio. ya still compiling. lol completely fair my ass
02:20:52 <FreeFull> Right, but this is addons, for all firefox users
02:21:07 <zid> Meanwhile over in pale moon land I am fine :P
02:21:19 <geist> yah
02:21:44 <geist> so much stuff relies on phoning home to some site though. i weep for our children
02:22:28 <zid> good news, building cross-mipsel-unknown-elf/gcc-9.1.0
02:22:29 <doug16k> there should be a demand for facades of things already. is there a thing to MITM that?
02:23:38 <zid> hmm something dicked my kernel source dir, nice
02:27:56 <doug16k> yep, built fine
02:28:46 <zid> ah I figured it out
02:29:00 <zid> I did some tricks to build it with extra config options which it has now forgotten I did, oopsie :D
02:32:00 <jmabsd> curious, PCI express board interrupts changed, for instance the "Intel 82571EB" NIC uses "INTX", while the "Intel 82575EB" uses "MSI". what's going on, what's the difference?
02:33:44 <doug16k> jmabsd, MSI is more efficient and has lower latency. older PCI devices used INT #A thru #D
02:34:03 <jmabsd> doug16k, interesting, and why is MSI more efficient?
02:34:29 <doug16k> msi does a memory write which is intercepted by the cpu and delivered that way
02:34:44 <geist> and it skips interrupt controllers from a software point of view
02:35:03 <doug16k> old pci IRQs tended to pile up devices on one or two of those IRQs. then the cpu has to check them all
02:35:47 <doug16k> for example, if the NIC and sound card end up on IRQ 10 then you have to poke at both pieces of hardware when IRQ 10 happens, in case it was both
02:36:05 <jmabsd> interesting, always learning something new. great. thanks!
02:36:09 <doug16k> MSI IRQs are dedicated. when that IRQ happens, you know exactly what it is
02:36:24 <jmabsd> geist, oh there was such a thing as a "PCI interrupt controller" chip, which the A-D would go through?
02:36:53 <jmabsd> cool.
02:36:56 <geist> well, not pci specific, yes
02:37:03 <geist> the old legacy PIC and later on the ioapic(s)
02:37:15 <geist> msi skips that and delivers an irq directly to one or more cpus
02:37:16 <jmabsd> geist, any hardware, so PCI + .. IOAPICs?
02:37:32 <jmabsd> ah, MSI is CPU core? CPU? specific
02:37:37 <geist> can be, yes
02:37:41 <jmabsd> cool.
02:37:49 <geist> you can also tell it to deliver to a bunch of cpus, though i forget the details on x86
02:37:55 <jmabsd> question, actually what IO facilities does a CPU/OS use to communicate with PCIe -
02:38:02 <jmabsd> first there's the PCIe enumeration which uses.. memory ??
02:38:16 <jmabsd> and then to interact with a device, you use.. memory only ??
02:38:17 <geist> correct. PCIe has a mmio version of the configuration space
02:38:19 <geist> called ECAM
02:38:30 <geist> not necessarily. you ca still do IO port access to PCIe
02:38:32 <doug16k> Enhanced Configuration Access Method
02:38:41 <geist> though most newer devices use almost exclusively mmio
02:38:50 <jmabsd> geist, i never did "IO port access" myself, there are some special CPU instructions for those "ports" aren't there, like, "WRITE DWORD TO PORT N"?
02:39:04 <geist> correct. x86 specific
02:39:09 <geist> in/out instructions specifically
02:39:11 <doug16k> yes read or write 8, 16, or 32 bits (never 64)
02:39:31 <doug16k> from/to I/O space (as opposed to memory space)
02:39:47 <jmabsd> super slow and useless??
02:39:58 <geist> yes and not necessarily, but 'legacy'
02:40:00 <jmabsd> if it's not DMA, it's like that old IDE interface method, what was it called, PIE or something
02:40:14 <doug16k> somewhat slower but way more clunky to use in code than memory access
02:40:19 <jmabsd> what do you Need To Use "IO port access" for today?
02:40:28 <jmabsd> on a latest AMD64 or ARM64
02:40:41 <geist> at the end of the day it's just an older mechanism to access hardware registers. one that's almost exclusive to intel style cpus (8080, 8085, 8086, etc)
02:41:00 <geist> works the same on amd64. arm64 has no concept of io ports
02:41:07 <geist> nor does almost any other architecture
02:42:19 <doug16k> one of the other things pcie adds is expanding the config space for each device. it's max 256 bytes on old pci, ecam expands it a lot (I forget how far)
02:42:34 <doug16k> I doubt much uses that though, for back compat
02:42:37 <geist> 4K i believe
02:42:58 <jmabsd> cool!
02:43:12 <jmabsd> so on ARM64, SPARC64, MIPS64, it's ALL DMA?
02:43:14 <jmabsd> plus interrupt
02:43:19 <geist> well it's all MMIO
02:43:23 <doug16k> DMA is not MMIO
02:43:28 <geist> DMA and register access are two different things
02:43:31 <jmabsd> oh, i used to think MMIO is a kind of firewall
02:43:39 <geist> there's how does the cpu programmatically 'get to' registers
02:43:44 <jmabsd> geist, do ARM64 SPARC64 MIPS64 do "register access"
02:43:49 <doug16k> think of MMIO as registers masquerading as memory
02:43:49 <geist> correct. via mmio
02:43:54 <geist> correct
02:44:45 <doug16k> DMA is when a device does memory accesses that aren't part of the running program
02:45:12 <doug16k> a disk controller can read data from RAM to put on disk in a write command, or write to RAM with data coming from disk in a read command
02:45:25 <doug16k> that is DMA
02:46:05 <geist> what you were saying was PIE (it's PIO, polled IO) is really not specific to io ports
02:46:26 <geist> it generally means 'the cpu must sit there in a loop and move data into or out of a device via a hardware register manually'
02:47:02 <geist> sometimes folks conflate it with using io ports (an x86 specific thing), but really PIO is more of a 'style' of hardware access. simple devices are usually PIO
02:47:23 <geist> like, say, a uart or a keyboard controller or whatnot
02:47:47 <zid> Interesting, why do so many people have 'quassle' as their ident
02:47:50 <zid> err sel
02:47:58 <geist> presumably that's some sort of client
02:48:25 <zid> I had some guy in another channel ignored as *!quassel@* because he kept changing nicks, this guy has it too so I thought it was just him again, but this guy sounds not-retarded :P
02:48:53 <zid> yea, some gpl irc client
02:49:07 <geist> yah, so banning someone based on that is not a good idea then
02:49:23 <zid> tbf that client shouldn't be using its name in the ident field
02:50:13 <doug16k> what happened to liberal in what you accept, strict in what you transmit?
02:50:51 <doug16k> what name do you suggest ?
02:55:31 <doug16k> I'm changing my cpu context so it just saves and restores fsbase and gsbase with all the other registers
02:56:20 <doug16k> who is supposed to set that up? is tls supposed to be up and working from entry?
02:56:43 <doug16k> I have it so TLS is 100% up and running before it executes one instruction of the program
02:57:02 <doug16k> is that too good?
02:57:12 <geist> it should be the programs problem
02:57:36 <geist> i think that's too good. IMO the kernel doesn't care what user space does with them. different runtimes may decide to use TLS differently
02:57:59 <geist> so i'd simply give user space the ability to set fsgsbase to whatever they want (provided they're valid addresses) and then the kernel saves it witih a context switch
02:58:33 <geist> f you want you can even give user space the ability to set it directly with the instructions (if present) so the kernel would always need to be able to deal with it being fiddled with without it knowing
02:58:37 <doug16k> I was considering just emulating rd/wr fs/gs base instructions in #UD if not supported
02:58:58 <doug16k> plus syscalls
02:59:15 <geist> might be one way to do it. consider non x86 architectures which have their own
02:59:31 <doug16k> ya that's why I must have syscall way
02:59:33 <geist> i'd say putting in a generic syscall that is pseudo arch specific is useful
02:59:36 <geist> right
03:00:04 <geist> since they dont get modified much (fsgsbase) i dont think there's a strong reason to emulate the instructions and/or make it fast for user to set it
03:00:42 <geist> there is an incentive to make the context swtich fast. iirc reading/writing the fsgsbase MSRs is non trivially slow. tens of cycles and i think serializing
03:00:50 <geist> the fsgsbase instructions are far faster
03:01:11 <doug16k> yes my kernel tries hard to use them. if they #UD they are patched with call to msr code
03:01:18 <doug16k> I special case both and do one call
03:01:48 <doug16k> i.e. consecutive rd fs gs to r13 r14 are converted to call + long nop
03:01:51 <geist> eehhh not sure that's a good idea
03:02:08 <doug16k> why? happens on the first interrupt and never again
03:02:18 <doug16k> the #UD
03:02:18 <geist> you'll take almost assuredly far more cycles for that. i'd just reduce it to a single if (global bool) { this } else { that }
03:02:23 <geist> oh you patch the code?
03:02:39 <doug16k> yes the rd wr msr are 5 bytes. exactly same as call disp32
03:02:43 <geist> well, okay.
03:03:48 <doug16k> I special cased the #UD to look for rd of fs and gs into r13 and r14 as peepholed into calling a special version that does both
03:04:01 <doug16k> same for wr
03:04:12 <geist> sure. seems a bit overkill but sometimes that stuff is just fun to do
03:04:22 <doug16k> ya it's a tad overkill
03:04:24 <geist> :)
03:04:48 <doug16k> it did clean out a bunch of crap I had for patching it manually though
03:05:04 <doug16k> I had labels and tables and C++ code that dug through that and did cpuid and screwed around
03:05:10 <geist> yah. while you're at it you should probably design a system for patching up the ERMS stuff
03:05:14 <doug16k> now it just dynamically fixes up in the unlikely case it is old
03:05:33 <geist> yah that's why i was syaing if anything just get the really critical stuff into a globall bool
03:05:46 <geist> most likely the cpu will branch predict it, etc
03:06:26 <doug16k> enhanced rep move string?
03:06:56 <geist> yah. especially if you have a modern intel and AMD machine. can't optimize your memcpy/memsets for both without having multiple versions
03:07:14 <geist> so simplest and pretty effective solution is to have two variants of that stuff and patch it at boot
03:07:54 <zid> I should finish the 10 minutes of work I did on acpi
03:08:47 <zid> I needed to add a whopping if(*p) continue; to mmap
03:08:48 <geist> same. i've been meaning to slam together a little 'parse the few tables of acpi kernel needs'
03:09:06 <doug16k> I do that for fxsave/xsave/xsaveopt and fxrstor/xrstor
03:09:10 <zid> so that I didn't try to map the same shit multiple times if the acpi tables ended up in the same page (which I imagine is superlikely)
03:09:14 <elderK> What's the best way to optimize memset and memcpy anyway? My, probably naive, idea would simply be to try and transfer as much of the bytes as possible in larger units. So, do "slower transfers" until I'm suitably aligned, then transfer in the largest unit I can, slowly reducing unit size until I'm done
03:09:29 <zid> rep movsb :P
03:09:31 <doug16k> patched to go straight to the right code
03:09:44 <geist> well, that's the point. with modern intel (since a bit after sandy bridge) there's a cpuid bit called ERMS
03:09:51 <elderK> doug16k: Right, so that you use the MMX or SSE registers when available, right? But the core idea is the same: Transfer as much as you can using the largest units you can, as long as you can?
03:09:56 <doug16k> elderK, the cpu does that for you to some extent
03:10:01 <zid> (rep movsb is super fast, it isn't the *fastest* usually, but do you want to write avx into your kernel?)
03:10:09 <geist> it means 'you can basically just use rep movsb/stosb' and it's pretty fast
03:10:33 <doug16k> the ace up rep mov's sleeve is the ability for it to weakly order the stores and fence at the end
03:10:41 <zid> rep movsb has some weirdness on a couple of cpus where it's slightly slower than it needs to be, but it's in general really good
03:10:51 <geist> it's enough of a win, especially in the kernel since you usually dont have fpu active, to just replace all copies/sets with that
03:10:59 <zid> sse with prefetchws and stuff mixed in beats it on some cpus, can't figure out why
03:11:04 <elderK> geist: Neat. I was thinking that if you implemented the "get to aligned boundary, then copy in largest units, progressively using smaller units" would probably be more expensive for tiny transfers than a naive "byte by byte" copy.
03:11:05 <geist> but.... it's intel only. it definitely pessimizes AMD cpus
03:11:22 <doug16k> ryzen has good rep mov
03:11:27 <geist> so, you need both paths. hence what i was saying before about needing to patch up the with ERMS and without
03:11:33 <geist> doug16k: good but not great
03:11:35 <elderK> What is ERMS?
03:11:43 <elderK> I'm unfamiliar with the acronym.
03:11:44 <zid> extra really microcoded scopies
03:11:57 <geist> enhances rep movs or something
03:11:59 <geist> enhanced
03:12:06 <doug16k> I said it: Enhanced Rep Move String
03:12:10 <geist> ah that
03:12:14 <elderK> Ah, sorry doug16k. I didn't see that.
03:12:25 <doug16k> np, it wasn't that obvious
03:12:25 <geist> the bit literally means 'for this cpu that's the best way to copy'
03:12:55 <geist> note that i think if you're moving craptons of data there are still m ore optimized AVX or SSE routines, but for everyday copies and sets those are fantastic (for cpus that are ERMS)
03:13:25 <zid> https://cdn.discordapp.com/attachments/417023075348119556/574071022337392670/unknown.png seems to be working
03:13:26 <geist> note that when you do it you even use the 'b' version, not even aligned 'd' or 'q' copies
03:14:02 <geist> my testing shows that sure enough rep movsb definitely pessimizes AMD cores, zen as well. it does an okayish job with it if everything is already aligned and whatnot
03:14:09 <geist> but zen doesn't really optimize generically for all scenarios
03:14:14 <geist> (using rep)
03:14:37 <doug16k> ah... I copy bytes until 8-byte aligned, then do 8-byte aligned rep xxx q
03:14:49 <geist> yah. that's still a fairly good solution for zen
03:14:51 <doug16k> seems pretty awesome
03:15:03 <geist> but then there's the whole source vs destination aligned, etc
03:15:11 <geist> sicne it's possible to align one and not the other
03:15:19 <doug16k> it's not fair against intel. their load and store are 2x wider usually
03:15:31 <doug16k> zen2 fixes that
03:15:43 <geist> like i said, the ERMS stuff says dont bother at all, just rep movsb and get on with things
03:15:54 <geist> it's really nice when the compiler knows this and blats out a copy like that
03:16:04 <geist> one of the things you'll see done a lot if you are using a recent march
03:16:19 <zid> I tried recently to get gcc to emit one and couldn't make it do it
03:16:30 <zid> I have seen it do it, msvc too
03:16:40 <bluezinc> FreeFull: amusingly enough, I clicked your link one minute after encountering the bug.
03:17:03 <zid> my msvcrt memcpy gets 7.8GB/s, rep movsb gets 8.3GB/s
03:18:00 <zid> msvcrt memset is 19GB/s, notbad
03:18:20 <zid> I can run two of those on different cpus at the same time I think too
03:18:24 <geist> i thin for sufficiently large copies there are AVX loads that work better, and _mjg was pointing out that for certain smallish copies it doesn't really do the best job
03:18:58 <geist> ie, it takes a good 32 or 64 bytes or so to really kick in, but it doesn't do *bad* in small copies, and the code complexity, especially if you inline it, is great
03:19:42 <geist> zid: that seems about right. is that a single channel about DDR4-2400 machine?
03:19:56 <geist> 19GB/sec is what i generally exepect if you saturate it
03:19:56 <elderK> Actually writing a nice memcpy sounds quite difficult, if you don't want to rely on ERMS.
03:20:02 <zid> geist: DDR3
03:20:10 <elderK> Like you said, details about aligning the source and the destination.
03:20:14 <_mjg> memcpy for kernel or userspace?
03:20:16 <zid> 1800
03:20:34 <_mjg> most difficulty writing performant memcpy stems from fucking around with all the different microarchs
03:20:56 <geist> right. that's the real trick. ERMS at least is a step forward where the microarch says 'use algorithm A'
03:21:03 <elderK> _mjg: So that you can determine whether AVX, SSE or MMX is there? Or for other reasons, like caching?
03:21:12 <_mjg> all of the above
03:21:15 <_mjg> and more
03:21:24 <geist> otherwise generically the best solution is to align to ... i think source? and move as much as you can in large instructions
03:21:27 <elderK> :) I'd like to learn more :)
03:21:41 <zid> geist: I can post passmark numbers if you want to cry :P
03:21:44 <geist> ie, if you need to write a single generic C routine as your fallback, that's the algorithm
03:21:47 <_mjg> depends what you what specifically want. if you want general info, see agner fog's manuals
03:22:07 <geist> and when i say generic i mean 'on all architectures'
03:22:13 <_mjg> if you want microarch-specific badness get glibc sources and check how thye optimize routines based on what they found
03:22:47 <geist> but, since kernel usually doesn't have SSE/AVX available, you have to fallback to using simpler versions of stuff
03:22:48 <zid> https://media.discordapp.net/attachments/417023075348119556/512467796182171658/unknown.png I wish I had the screenshot of the 'memory threaded result'
03:22:55 <zid> I was about 10 football fields off the edge of the graph
03:23:02 <zid> bcause nobody had ever ran my kit in quad channel
03:23:21 <geist> and of course short vs long copies and whatnot are all the thing. it's hard to optimize for all cases
03:23:49 <_mjg> from my own tests, although pardon for not rmeembing which microarch is which, target alignment to 16, 32 and 64 bytes matters
03:24:02 <_mjg> played with skylake and haswell
03:24:08 <_mjg> and epyc
03:24:09 <geist> yah i think target may be the case
03:24:12 <zid> https://cdn.discordapp.com/attachments/417023075348119556/574073685279768606/unknown.png It's pretty cheap ram, one of the kits goes to 900MHz but the other doesn't like it sadly :(
03:25:08 <bluezinc> geist: depends on if you have any interest whatsoever in supporting pre-SSE/AVX machines.
03:25:13 <zid> I got the timings a lot tighter than that but when I backed the speed down I couldn't be bothered to redo the timings
03:25:16 <_mjg> also fun fact, i wrote memset et al with some branches to handle small sizes (as opposed to just ERMS) and it fucked hardcore with a fstat microbenchmark on ivy bridge
03:25:21 <_mjg> branch mispredictions
03:25:39 <zid> throw in a prefetchw :P
03:25:45 <bluezinc> AVX is almost 10 years old, SSE is older.
03:25:50 <geist> bluezinc: yep, there's always a fallback. AVX is the one that is a realistic line in the sand
03:25:52 <zid> sse is ancient at this point
03:26:02 <geist> at least for x86-64. thankfully basic SSE is implicit to that
03:26:14 <zid> except for like, those 2 intel core cpus right?
03:26:15 <geist> AVX however is sandy bridge so still relatively recent
03:26:24 <gog> doesn't x86_64 have sse2 at minimum?
03:26:26 <zid> yea I at least have AVX thankfully, I'd like BMI though
03:26:29 <geist> nope. intel core (prior to core2) was 32bit only
03:26:57 <geist> intel core 1 was really a rebranded pentium m. good cpu, but 32bit
03:26:59 <zid> https://cdn.discordapp.com/attachments/417023075348119556/574074448857006088/unknown.png yay avx :P
03:27:06 <gog> what about Prescott E0
03:27:17 <gog> i guess core 2 was out by then
03:27:29 <geist> beats me. you mean the first p4 with 64bit? fit woulda had SSE2 as well
03:27:49 <gog> yeah you said intel wasn't 64-bit until core2
03:27:49 <geist> SSE2 is AFAIK implicit with amd64. the one that's not precisely (though you can generally ignroe) is cmpxcgh16b
03:28:07 <gog> but i can't remember when the first core2 stepping came out
03:28:10 <geist> well, i really meant 'things named core weren't 64bit until core2'
03:28:16 <gog> ohhhhhhh ok
03:28:35 <zid> I swear there are like 2 cpus out there with no sse2 but are 64bit
03:28:35 <geist> yah p4 got 64bit somewhere in there too, 2003-2006 is hazy
03:28:45 <gog> E0 was 2005
03:28:47 <gog> iirc
03:28:56 <geist> atom perhaps, but i think they've always been SSE2
03:29:01 <geist> just not SSE3+
03:29:13 <geist> bonnell and whatnot. but then optimizing for those is a nightmare
03:29:18 <zid> something weird like the final core or first core2 chip during achangeover or something has something really weird about it like that
03:29:21 <zid> I just can't remember what
03:29:30 <geist> the original K8 was missing some modern stuff. cmpxcgh16b
03:29:35 <geist> also early core was missing XN
03:29:54 <gog> i think cmpxchg16b was on early cores but it caused a fault?
03:30:06 <gog> there's something with an instruction like that with a major errata
03:30:07 <zid> maybe I am just thinking of that then
03:30:12 <zid> My cpu has the best errata
03:30:20 <geist> yah. that's a generically useful instruction to add to your generic 64bit codegen, so generally K8 gets the axe
03:30:25 <zid> typo in the cpuid makes it an "E5-1620 0"
03:30:34 <gog> lmao whoops
03:30:35 <zid> someone wrote 0 not \0
03:31:03 <geist> hah you can imagine someone in the factory keying it into some industrial machine to fuse it out
03:31:08 <gog> my dev machine has some kind of tlb optimization errata and i got the laptop cheap
03:31:16 <geist> probably far post verilog and whatnot. near assembly
03:31:39 <geist> yah K10 had a really bad TLB errata. someone explained it to me at work the other day. it was a doozy
03:31:41 <zid> You can see it in the screenshot in its full glory ^
03:31:52 <geist> are you somehow accessing irc via some discord proxy?
03:31:56 <gog> yeah i think my A8 shares some silicon design with the K10
03:32:02 <gog> same error
03:32:09 <zid> no I just find it easier to du mp images to a discord window and back
03:32:13 <geist> ah okay
03:32:13 <zid> than it is to upload to to imgur or whatever
03:32:17 <gog> i almost bougt the 6-core with the bad tlb
03:32:21 <zid> imgur has been spotty with my browser in the past
03:32:21 <gog> it was like half off lol
03:32:41 <geist> yah i forget the detail. it was some low level L1 cache/TLB/page table interaction
03:32:55 <zid> who was it here who made that cool TLB size graph btw
03:32:59 <geist> caused the page table walker to skip L1 cache when doing a D/A bit writeback, iirc
03:33:00 <zid> was it you geist
03:33:15 <geist> which could cause it to get the TLB all fucked up
03:33:19 <geist> zid: negative
03:33:20 <zid> 2M pages vs 4K graphs
03:33:23 <zid> I remember seeing it here
03:33:37 * geist throws zid a quarter
03:33:43 <zid> I'd be interested to see them redo it but on skylake
03:33:46 <geist> get a real microarch that doesn't split TLBs, kid
03:33:59 <zid> skylake has L2 on the tlb now right
03:34:15 <gog> idk my newest cpu is haswell
03:34:16 <geist> most x86 machines have had L1/L2 tlbs for years now
03:34:19 <geist> probably since early 2000s
03:34:24 <geist> multilevel TLBs that is
03:34:49 <zid> oh okay
03:34:52 <gog> haswell is excellent glad i spent the money on it
03:34:58 <zid> ah yea, reading the conbo back now, says back to nehe
03:35:09 <geist> it's an interesting optimization to split TLBs for different sized pages
03:35:14 <zid> disregard haswell aquire sandy bridge EP
03:35:23 <geist> most other microarches dont do that (ARM, mips, etc)
03:35:24 <gog> does it fit in 1175
03:35:28 <zid> 2011
03:35:43 <geist> but presumably it's because they generally support more pages, so having more complex TLB match logic to support multiple page sizes is worth it
03:35:51 <geist> er support mroe page sizes
03:36:17 <gog> i don't have a motherboard for a sandy bridge e lol
03:36:18 <geist> since x86 really only has two i guess having a high level split is still a win
03:36:23 <zid> also errata: My multipliers don't read back correctly through I assume cpuid, maybe msrs?
03:36:28 <zid> It always reads 57
03:37:02 <zid> Internally it abides what you write in there in the bios, but it always claims to be able to go to 57, but 57 is actually the hard limit inside the cpu, not the actual current limit
03:37:31 <doug16k> you want to program the multiplier in your kernel?
03:37:38 <zid> no, just interesting that it does that
03:37:44 <zid> cpu-z etc are all 'wrong'
03:37:59 <doug16k> I'm pretty fearless but that one is maybe not the best idea :)
03:38:13 <zid> it'll thermal throttle or die if you turn it up too high so you're fine :P
03:38:37 <doug16k> it's the cpu's power regulator that I'm concerned about. you need a blowtorch to hurt an x86 nowadays
03:38:41 <geist> yah i dunno how to do it on intel, but the AMD P settings MSRs are pretty straightforward
03:38:47 <geist> i think you can just wail on them all you want
03:39:14 <zid> It's actually set to 12-42 afaik, but it always reads 12-57 no matter what, just a neat little thing that confused me when I was messing with my timings/clocks/etc
03:39:34 <doug16k> once you touch the multiplier then you want to change the voltage, then it's officially pretty crazy
03:39:46 <zid> I'm actually undervolted
03:40:37 <zid> I can't actually overclock it enough to get it to crash, it gets too hot first
03:40:52 <zid> I had to undervolt it so I could clock it higher without it catching on fire and it's *still* stable
03:41:00 <zid> top-binned beast of a thing
03:41:16 <geist> yah my 6700k doesn't have much headroom. i've pushed it to 4.4 or so, but can't go much farther.
03:41:29 <doug16k> see P = V^2 / R though if increasing voltage
03:41:29 <zid> What's the stock turbo on a 6700k?
03:41:29 <geist> but i do disable the multi core turbo slideback or whatnot
03:41:33 <geist> so it sits nicely at 4.2
03:41:43 <geist> stock single core turbo is 4.2. base is 4.0 i believe
03:41:52 <zid> yea, my base is 3.8 and I run it at 4.4 lol
03:41:54 <zid> undervolted
03:42:16 <geist> yah when i overclock i generally push it as far as i can go without fiddling with the voltage
03:42:21 * doug16k hopes zid's filesystem stays uncorrupted for a while at least
03:42:23 <zid> Did you know all sandy bridge xeons for some reason are unlocked? :)
03:42:24 <geist> i figure that's when things go pretty far
03:42:31 <bluezinc> I'll take that as a strong vote towards the 6700k.
03:42:34 <zid> doug16k: nah I didn't do that until I started trying to go for extreme ram timings :P
03:42:49 <geist> but the skylake i just leave at stock, i just unlock the turbo so it sits there most of the time
03:43:02 <zid> Did you know my 667Mhz ram kit will hit 850Mhz before it starts producing errors?
03:43:26 <doug16k> now ya
03:43:27 <bluezinc> zid: interestingly, I have a sandy bridge in my desktop.
03:43:43 <zid> bluezinc: unlocked? I recommend wasting a day trying to see how fast it'll go :P
03:43:46 <geist> what is an issue is even with liquid cooling if i start wailing on it with prime95 and really really light up the AVX units it'll eventually climb up to 70C or so
03:43:55 <geist> otherwise without the AVX push i can barely get it to break 50
03:44:02 <zid> geist: yea I'm only on air, else I'd be clocked higher
03:44:14 <geist> AVX on skylakes is srs bsns
03:44:14 <bluezinc> zid: I'll admit I never actually checked.
03:44:27 <zid> does avx on skylake disable turbo
03:44:41 <zid> first avx2 chips at least, maybe a gen or two more, definitely did
03:44:54 <doug16k> the universe rolls a 1 bazillion sided die for me with stock everything ecc 2400 miles below limit of cpu. when it rolls 42 I lose. you are rolling a 200,000 sided die
03:45:16 <zid> doug16k: Considering the headroom I still have, I'd say 1 quintrillion vs 1 quadrillion
03:45:17 <geist> the server skylakes definitely have some sort of more aggressive frequency rollback with the presence of AVX
03:45:33 <zid> I physically *cannot* get this cpu to crash lol
03:45:39 <geist> but like i said i did at least pin the cpu at full turbo
03:45:40 <zid> I've tried everything
03:45:45 <geist> so it's not exactly an overclock, but it's pushing it
03:46:03 <zid> the only way I can get it to crash is to give it so little voltage you'd struggle to power a pocket calculator
03:46:19 <doug16k> ya making the turbo have a higher power limit is legitimate. I think the cpu should stay stable
03:46:43 <zid> these cpus are all massively tdp limited rather than actual perf limited
03:46:50 <zid> because the IHS is using cheap paste instead of being soldered
03:46:54 <doug16k> you are basically telling the cpu, "I have a beefy regulator on the power pins, push it harder please, thanks"
03:46:56 <zid> there's a reason an 8086k is 5GHz stock
03:47:27 <doug16k> motherboards now have power far exceeding minimums
03:47:38 <zid> doug16k: mine's a special 128392 phase power design mobo
03:48:00 <doug16k> if they got them to work on those aluminum electrolytic crap caps, imagine how easy it is with low esr polymer caps
03:48:53 <zid> it even has 4 phase power for the ram lol
03:49:07 <doug16k> the universe in those inductors is not happy about all these crazy current changes
03:49:28 <zid> doug16k: At least, I didn't change the power state ramp times, you can be happy about that
03:49:56 <zid> my bios gives me full control over how many usec it delivers extra power for after a c-state change etc
03:50:17 <doug16k> what's the feedback loop at now? 300ps? it can count the electrons going by it is so fast now
03:50:53 <doug16k> lightspeed propagation of the current reading input and its input capacitance would be limiting factor
03:51:15 <doug16k> TLDR: it can regulate it flawlessly
03:51:44 <zid> You need a damn motherboard engineering degree to operate my bios screens
03:53:53 <zid> can't find a picture of it on google images :'(
03:55:59 <zid> https://www.pcper.com/files/imagecache/article_max_width/review/2017-04-30/32-170425222817.jpg Something like this, but this is the one for the ram
08:18:57 <geist> so it's not exactly an overclock, but it's pushing it
08:19:17 <geist> huh. must ave been an up arrow flub
08:19:27 <geist> huh. must ave been an up arrow flub
08:19:29 <geist> like that
10:52:09 <doug16k> I love gcc 9.1.0 so far. new warnings are nice
10:52:41 <sortie> doug16k: Any good examples?
10:53:03 <doug16k> taking address of packed structure member may result in unaligned pointer value
10:53:14 <sortie> Woot
10:53:19 <sortie> I've been worried about stuff like that
10:53:35 <sortie> Part of -Wall? Or -Wextra?
10:53:47 <doug16k> I have both on, not sure
10:54:04 <sortie> I always do too, just great to know it's in that set
10:54:14 <sortie> Hmm does -Wextra imply -Wall
10:54:27 <doug16k> ya I didn't add it on or anything
10:54:48 <sortie> Some of the optional warnings are cool too, I recall
10:55:01 <sortie> Can be worth it to read the list of non-default warnings
10:55:22 <doug16k> my favourite warning of all is -Werror=return-type
10:55:42 <doug16k> seriously you don't have to return a value in a function that returns non void? come on! (warning fixes language bug)
10:56:19 <doug16k> having codepaths that fail to return a value will be an error with that
11:00:16 <elderK> I usually use -Wall -Wextra -pedantic
11:00:22 <elderK> But I'd definitely be interested in learning lots more useful ones.
11:00:30 <elderK> Problem is, I often read then forget :P
11:00:53 <doug16k> I recall also "casting type A (with alignment X) to type B (with alignment Y) may result in misaligned pointer value. love it
11:01:52 <doug16k> s/ue./&"/
11:05:31 <elderK> I would love to have those warnings enabled :D
11:05:37 <elderK> I've never seen these. Are these new to 9.1?
11:07:01 <aalm> o_O
11:08:59 <aalm> .theo
11:09:17 <aalm> or not
11:18:38 <Vercas> God I migrated my GCC patches to 8.3.0, two days before 9.1 was released.
11:18:54 <Vercas> I'm absolutely not ready to deal with even more new errors and warnings. >:(
11:20:15 <aalm> fun, here, have a -Wno- :]
11:25:01 <aalm> almost wanted to suggest disabling all colorings in the term while building, and just fixing the fatals xD
11:53:47 <bauen1> Vercas: if your gcc patch is for an os-specific toolchain according to the osdev wiki you will only need to fix 'gcc/config.gcc' and replace NO_IMPLICIT_EXTERN_C with SYSTEM_IMPLICIT_EXTERN_C
11:57:35 <bauen1> it was actually pretty smooth
12:32:25 <stisl> sth. is wrong with my paging implementation, when I map for example PA 0x2000 to VA 0xffff80300000 - can't access via a pointer the mapped data sometimes, with the framebuffer it works
12:33:38 <stisl> and when I calculate the physical address via software I get the data
12:36:26 <stisl> do I have forgotten to activate sth.?
12:45:56 <stisl> maybe it has todo sth. with the gdt, because the gdt doesn't work at the moment and is deactivated
12:46:20 <stisl> no Idea
12:47:23 <stisl> I get only 0xff....
12:47:29 <bcos> Maybe you should fix GDT, then writes some exception handlers, so you can get useful information from page fault handler, so that the symptoms ("can't access") can be much more descriptive
12:47:45 <olsner> if the GDT "doesn't work", how could you set your segment registers?
12:48:47 <stisl> I don't set them actually manually
12:49:46 <stisl> but when I try to load a gdt with setting the segment registers I get a reboot
12:50:03 <stisl> I tried a lot of different gdt configurations and sources
12:51:11 <stisl> I just tried to fix sth. in the bootloader and now I have other problems
12:55:11 <stisl> without gdt no interrupts right?
12:58:34 <bcos> Right - an interrupt causes CPU to load CS and SS, and CPU needs GDT to do that
01:25:42 <Vercas> bauen1: It's more than that, lol.
01:26:48 <Vercas> It adds an option to set the TLS segment to GS.
01:37:50 <bauen1> stisl: when running qemu for osdev, always add `-no-reboot -no-shutdown`, `-d int` can be helpfull for debugging issues with gdt, idt and paging if you're exception handlers aren't (yet) working
01:39:14 <stisl> bauen1, thanks for the tip, I found some old version which is working fine, and then I will integrate step by step the new features I have already integrated in the other versions ;)
01:40:41 <bauen1> i should probably add these options to the wiki ...
01:40:55 <stisl> hehe
01:41:24 <bauen1> there are too many people that come here and one of the things they say is "qemu is rebooting"
01:41:38 <bauen1> i'm always wondering how they're reading debug info off the screen before it reboots
01:42:33 <stisl> the old version I have has a tiny asm/debugger integrated
01:43:27 <stisl> also a C subset shell
01:43:57 <bauen1> ._.
01:44:07 <bauen1> how does that work ?
01:44:32 <stisl> the assembler is working fine, but the C compiler has a lot of bugs
01:44:46 <stisl> with bison & yacc
01:45:08 <stisl> before I did it with a top down parser, but this is not so comfortable
01:46:12 <stisl> on an exception the disassembler will be called
01:53:13 <bauen1> has there been any effort on submitting the links on the osdev wiki to archive.org ?
02:01:46 <bauen1> oh and what happened to the wiki bot in here ?
04:58:00 <zenix_2k2> ok not sure if this is the right place to ask but, on a 64bit OS, it will use 8 bytes on the RAM to store an int right ?
04:58:07 <zenix_2k2> and 32bit = 4 bytes
04:58:24 <bcos> Maybe (but probably not)
04:58:53 <zenix_2k2> wait what
04:59:07 <bcos> For 64-bit 80x86 most (all?) C compilers say an int is 32 bits
04:59:21 <zid> I'd be comfortable saying all
04:59:38 <zid> IL32LLP64 or whatever
05:00:08 <vdamewood> Pretty much everyone uses 32-bit ints. It's the size of long that varies.
05:00:19 <bcos> I'd be more comfortable saying that if you write your own compiler you can decide that "int" is anything you feel like, regardless of target (e.g. 256-bit "int" on old 8-bit 6502 CPUs)
05:00:34 <vdamewood> On Windows, a long is 32 bits. You have to use long long to get 64.
05:00:38 <zid> well, it has to be at least -32767 32768
05:00:46 <zid> but other than that it can be what you want
05:01:25 <zid> LLP64 is inferior to LP64
05:01:29 <bcos> ..but, yeah, for 64-bit 80x86 when AMD added the extension they decided that everything that uses 64-bit will have a special "REX" prefix, so code that uses 64 bit integers is slower/larger than code that uses 32 bit integers; so compilers decided to use 32-bit for "int" because that's faster
05:01:30 <zid> idk why windows chose LLP64
05:01:36 <vdamewood> 32767 is the maximum
05:01:37 <vdamewood> err the minimum maximum
05:01:48 <zid> yes, hence 'at least'
05:02:20 <vdamewood> zid: I mean, you got the last digit wrong.
05:02:23 <zid> oh
05:02:34 <zid> right I swapped the ends
05:02:38 <zid> -8 to +7, not -7 to +8
05:02:41 <zid> bcos: it actually works out well, btw, the rex thing
05:02:45 <zenix_2k2> bcos: HHHmmm... ok but i kinda mean the OS's background services
05:02:55 <vdamewood> Naw, is 7 on both ends, because fo 1's compliment or sign/magntude.
05:02:55 <zid> because you're almost never moving pointer constants to regs for example
05:02:56 <zenix_2k2> do they consider an int as 32bit if that is a 64bit OS ?
05:02:59 <vdamewood> err it's
05:03:05 <zid> vdamewood: oh right, as per the spec, gotcha, makes sense, good catch
05:03:23 <zid> the *common* case is still using the 32bit regs, even in amd64
05:04:11 <bcos> zenix_2k2: If you write your background services in Pascal (or assembly, or Rust or Fortran or Go or ...), there is no "int"
05:04:33 <bcos> ..and if you use C or C++ then it's "whatever the compiler felt like" (probably 32 bits)
05:05:16 <bcos> Of course for things like kernel API (where you need a specific standard thing) you'd use something like "uint32_t" to make sure you get a specific standard thing
05:06:46 <zenix_2k2> ok let's choose a specific example, like Windows... when i install i usually see x86_64 Windows or x86 Windows, so when i install the x86_64 version ( i think it's 64bit ), it will consider int as 8 right ?
05:06:59 <bcos> No
05:07:37 <vdamewood> Windows, even the 64-bit versions, use 4-byte ints.
05:07:37 <zenix_2k2> so... still "whatever the compiler felt like" ?
05:08:02 <zenix_2k2> why don't they just use 8 byte ints ?
05:08:05 <zenix_2k2> i mean it's 64bit
05:08:14 <zid> and it uses 4byte longs too, presumably because they named their 32bit type LONG, as they had a 16bit background
05:08:22 <vdamewood> backwards compatibility and to save space.
05:08:51 <bcos> "sizeof(int)" is still a property of a compiler (a third-party app) and nothing to do with the OS; but if the kernel itself is written in C then (for the compiler that was used to the kernel and nothing else) it'll probably say "sizeof(int) == 4"
05:09:01 <vdamewood> There are very few cases where one actually needs more than 32-bits for a value.
05:09:36 <ybyourmom> Pi
05:09:44 <zid> vdamewood: yea, there's even an ILP32 version of amd64, I keep meaning to run it on my desktop
05:11:09 <bcos> Hrm
05:11:29 <bcos> vdamewood: Windows typically defines its own types (e.g. "Dword") for all its APIs, etc
05:11:48 <zid> bcos: you wish it used DWORD, it also uses LONG, seemingly at random :P
05:12:10 <vdamewood> LPCSTR
05:12:45 <zid> LPLONG pnCommentLen
05:12:46 <zid> etc
05:13:08 <zenix_2k2> vdamewood: will this slow down anything ? for now i think not, but maybe some cases ?
05:13:26 <Vercas> bcos: You mean DWORD And UINT and ULONG and ULONG32 and etc. etc.
05:13:27 <bcos> (e.g. https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-createprocessa )
05:13:38 <zenix_2k2> Hm nvm, it's gonna be faster if this runs on a 64bit CPU :P
05:13:40 <bcos> ^ not one "normal" C++ primative type involed
05:13:42 <Vercas> They literally don't document half of the typedefs.
05:13:42 <bcos> *involved
05:13:48 <vdamewood> zenix_2k2: If you want speed, use fast ints.
05:14:09 <bcos> If you want speed, use 1-bit bools for everything!
05:14:23 <bcos> :-)
05:14:23 <vdamewood> Yeah, MS uses their own data types, partially because they defined them before C became common on MS platforms.
05:14:49 <vdamewood> So, Yeah, technically Windows doesn't use ints, it uses WORDs.
05:15:24 <zenix_2k2> i think that is C++
05:15:44 <vdamewood> zenix_2k2: Think what is C++?
05:15:59 <zenix_2k2> WORDs, it's a type on C++
05:16:14 <zenix_2k2> i usually code in C, but don't come across WORDs much, like never
05:16:16 <bcos> Ideally; for C or C++ people should be using things like "uint_fast32_t" to make sure it's the fastest size that fulfils their requirements
05:16:25 <bcos> ..and "int" should be banned
05:16:59 <bcos> It's just, nobody likes typing stuff like that everywhere and the standard library is ancient
05:17:21 <isaacwoods> Rust did a good job tbf, just has u8, u16, u32, u64, and usize for however big a pointer is
05:17:51 <zid> That precludes the optimization C allows with 'int' though
05:17:57 <vdamewood> zenix_2k2: No, It's not C++.
05:18:14 <bcos> isaacwoods: Sometimes you want "at least 32-bits (larger if it's faster)" and sometimes you want "exactly 32-bits"
05:18:38 <zenix_2k2> so i guess running a 64bit OS on a true 64bit CPU will be faster than having 32bit OS running on it ?
05:18:53 <bcos> Hrm. In fact; sometimes you want (e.g.) "exactly 32-bits and guaranteed little endian", so it can be used for file formats, networking protocols, etc
05:18:56 <zenix_2k2> cause both OS consider data types the same
05:19:10 <zenix_2k2> and a 64bit CPU's register can hold like 8 bytes int
05:19:13 <isaacwoods> bcos: true, don't think that case is covered
05:20:55 <bcos> zenix_2k2: For 80x86; typically 64-bit code is faster because there's more registers and less spilling to stack (and not because the CPU has built-in support for 64-bit integers, etc)
05:21:05 <zenix_2k2> vdamewood: so what is that, i mean i wrote some codes on Windows in C and i went over their documents all the time
05:21:58 <vdamewood> zenix_2k2: They're windows-specific types.
05:22:17 <zenix_2k2> that's weird... but ok then
05:23:46 <zenix_2k2> bcos: oh so a 64bit CPU means it has more registers than a 32bit one but its registers still hold the value the same ?
05:25:34 <bcos> For 64-bit 80x86; it means that there's twice as many general purposes registers and all of them are 64 bits (instead of 32 bits, or 16 bits); and almost all general purpose instructions can work with 64 bit integers
05:26:06 <bcos> ..but 64 bit integers are often unnecessary and the instructions are larger so..
05:30:21 <vdamewood> zenix_2k2: No.
05:30:35 <zenix_2k2> ookk bcos cleared it
05:30:46 <zenix_2k2> twice as many, and can still hold 64bit values
05:32:08 <vdamewood> zenix_2k2: It's specifically about x86 that the number of registers doubled.
05:37:41 <zenix_2k2> vdamewood: how about x86_64 ?
05:43:53 <zenix_2k2> wait, nvm
06:29:22 <IRCMonkey> Hello everyone !!
06:47:43 <jussihi_> Hi there
06:50:04 <jussihi_> What could be the problem when I create a TSS struct, add it to GDT and after it load the TSS, my OS crashes (and reboots immediately). Things I have checked: The struct is page aligned. The struct is correctly formatted and of right length (104 bytes). I give the right GDT offset to ltr. The "base" in GDT entry is the pointer pointing to the TSS struct.
06:50:41 <zid> well, you triple faulted
06:50:56 <jussihi_> I do not have any sort of interrupt handling in place yet, does the ltr command invoke some sort of interrupt that I'm not able to handle and therefore it crashes?
06:51:05 <bcos> jussihi_: Is busy flag clear?
06:51:15 <zid> well if you get a #GP, you'll then double fault and triple fault due to the lack of interrupts
06:51:28 <zid> that's why it *reboots*, but you need to solve why it faults
06:51:46 <zid> I don't remember the tss stuff off the top of my head, but you'd be best of showing your gdt + tss + whatever, along with the code on a paste website
06:51:47 <jussihi_> I've been trying to solve it, but cannot find any success
06:52:00 <jussihi_> Is github ok? I've put my project there
06:52:03 <zid> yea
06:52:06 <zid> either as source or make a gist
06:52:21 <jussihi_> https://github.com/jussihi/UranOS
06:52:35 <zid> with files and line number links thanks :P
06:52:50 <jussihi_> arch/x86/gdt.c and arch/x86/tss.c and arch/x86/include/tss.h
06:52:59 <jussihi_> Ok, sec
06:53:13 <zid> wtf is an i64
06:53:23 <jussihi_> It is an IDA64 file
06:53:27 <jussihi_> got there in an accident
06:53:37 <zid> yea that's what I was about to open it with lol
06:53:38 <jussihi_> I tried to statically analyse that the TSS struct is located correctly
06:53:39 <zid> just to see if it'd work
06:54:05 <zid> okay so you're building your gdt dynamically with insert_segment_descriptor
06:54:12 <jussihi_> yes
06:54:13 <zid> 9A, 92, FA, F2 etc all look good..
06:54:28 <zid> and all your segmentation otherwise works?
06:54:52 <jussihi_> Actually I haven't tested it at all
06:55:13 <zid> so how do you know it isn't ds/cs triggering the fault? heh
06:55:21 <jussihi_> As you can see there is no much code yet, so I dont have userspace etc programs at all
06:55:34 <zid> me either but I already have network drivers etc
06:55:43 <zid> no excuse not to know if your gdt works
06:55:55 <jussihi_> Okay, could you tell how to test it?
06:56:09 <zid> can you load 8 and 16 into cs and ds and not crash?
06:56:18 <bcos> jussihi_: Try setting the busy flag (in the TSS descriptor - e.g. use 0x8B instead of 0x89)
06:56:36 <jussihi_> bcos: will try, what is the idea behind it?
06:57:45 <bcos> CPU uses that flag to determine when the OS messed something up - e.g. switched to a task that is already running, or returned from a task that isn't running. The LTR instruction needs to load a "busy" descriptor (I think), or the CPU thinks something's wrong
06:58:27 <zid> table 7-1 should list that then?
06:58:30 <jussihi_> Aaah
06:58:51 <jussihi_> Well there's no luck with 0x8B either :/
06:59:08 <zid> go through the table on 7-1 and check all those lovely conditions :P
06:59:23 <jussihi_> 7-1?
06:59:26 <zid> yes
06:59:36 <bcos> D'oh - it's opposite of (busy flag needs to be clear, CPU sets it)
06:59:53 <jussihi_> bcos: yeah that's what I remember reading from osdev wiki too :)
06:59:55 <zid> iret will #TS if busy flag is set, I think
07:00:15 <zid> and #GP if set when call/int/except
07:01:21 <jussihi_> zid: do you mean just moving values to those registers? When you said loading 8 and 16
07:01:30 <zid> well you can't move to cs
07:01:42 <zid> you have to do a far jump
07:02:06 <jussihi_> Hmmm
07:02:41 <zid> (mov would make no sense, in practice)
07:03:08 * bcos is assuming "ltr" and no actual task switches
07:03:09 <jussihi_> For far jump I would need to map some program code somewhere
07:03:28 <jussihi_> sounds too complicated for now
07:03:35 <zid> bcos: I'm just puzzled why he's this deep into things like TSS but hasn't even gotten a gdt loaded
07:04:08 <jussihi_> zid: isn't it loaded in "lgdt"
07:04:28 <zid> yes, but if your cs/ds/ss aren't loaded with values from that table, what good is it
07:04:49 <zid> you'll be using the old cs value, your ds will point.. *somewhere*
07:08:46 <jussihi_> It is good for the future once I need to load a program/code to some other memory region
07:09:02 <jussihi_> But you're saying there's no way to check whether it works or not for now?
07:09:10 <jussihi_> well, clearly something is not working
07:09:22 <bcos> Does anyone remember GCC's inline assembly constraints?
07:09:24 <zid> well you could go down the 7-1 table and check all the conditions that generate a fault
07:09:29 <zid> bcos: vaguely
07:09:37 <zid> m is memory, +m is memory contents, r is register, etc
07:09:44 * bcos is looking at the ""q" (0x28)" and wondering what "q" is
07:10:33 <zid> Q is any register Rh... q is..
07:10:49 <zid> any register accessable as rl
07:11:07 <bcos> ..so, AL and not AX?
07:11:09 <zid> in 32bit that's abcd, in 64 it's any reg
07:11:35 <zid> are you looking at some mov ds, al or something
07:11:44 <zid> I think that needs to be ax, but it'd not assemble rather than not work
07:12:04 <jussihi_> maybe this __asm__ __volatile__ ("ltr %w0" : : "q" (0x28));
07:12:09 <zid> ah
07:12:14 <bcos> Yes
07:12:28 * bcos would be tempted to disassemble and see what happened there
07:12:32 <jussihi_> where the "q" is just any reg abcd if I understood correctly
07:12:45 <jussihi_> bcos: sec..
07:12:49 <bcos> ..like, if the code is "mov al,0x28; ltr ax"
07:13:34 <jussihi_> .text:080001A0 mov eax, 28h .text:080001A5 ltr ax
07:13:37 <zid> I wonder if there is an rx constraint
07:13:51 <zid> how does it know to use ax there and not eax
07:14:25 <jussihi_> Sorry for poor line formatting, but I think this assembled right?
07:14:32 <zid> yea looks fine
07:14:36 <zid> just wondering how it knows to use ax there..
07:15:15 <bcos> Not sure if source is "correct"; but resulting assembly isn't causing a problem
07:16:35 <jussihi_> Hmm, how would you write it down?
07:18:54 <bcos> Hrm. GCC doesn't have an "any 16-bit register" constraint
07:19:42 <zid> yea doesn't look like it to me either
07:19:54 <zid> so how did we end up with ax there then though
07:20:20 <bcos> I'd say that's what the "w" in "%w0" does
07:20:22 <jussihi_> Does it even need to be in the ax? If the gcc assembles it to move the 28h to ecx, then the ltr would also most likely load from cx
07:20:42 <zid> maybe it ended up as 66 ltr eax? or something?
07:21:13 <bcos> It doesn't need to be AX, but also doesn't need to be EAX (e.g. it could be "mov ax,28h" instead, which would be 1-byte shorter)
07:21:14 <jussihi_> yeah, w for word :)
07:21:34 <zid> oh there's a w
07:21:41 <zid> yea that works, cool
07:21:46 <jussihi_> hmm, but again, the resulting assembly is fine
07:21:56 <zid> yea, that's fine it was just a mystery to me
07:21:58 <jussihi_> should I just implement the interrupt handling next
07:22:21 <jussihi_> and try to catch what is going on when loading the ltr
07:22:22 * bcos would be more worried about whether it's legal for GCC to do "mov al,0x28h" if it wants to (e.g. if it's using AH for something else)
07:22:37 <zid> I can't find where this w thing is documented
07:23:10 <zid> ahh got it
07:23:16 <zid> w Print the HImode name of the register. %w0 %ax ax
07:23:47 <jussihi_> https://locklessinc.com/articles/gcc_asm/
07:23:56 <zid> yea I'd rather read the docs
07:24:01 <jussihi_> There is some text about it
07:24:17 <zid> It was on the extended asm doc page, but wasn't on the constraints page, because it's a modifier to the asm not the constraints
07:24:26 <zid> I didn't even scroll down far enough to see ther ewas a table :P
07:24:38 <jussihi_> haha
07:25:06 <zid> z is a cool one, prints the size of the operand
07:25:20 <zid> so you can do dynamic AT&T
07:25:41 <jussihi_> I've just tried to stay as far from inline asm as I can
07:26:22 <zid> I've only got inb/outb which I am getting rid of soon once I stop probing pci via ISA
07:26:26 <zid> and load cr3
07:26:36 <zid> 2? 4? I can never remember
07:26:40 <zid> The page tables :P
07:26:50 <jussihi_> 3 iirc
07:28:00 <bcos> jussihi_: Do you know what kind of exception it causes?
07:28:09 <zid> no because he just triple faults
07:28:22 <zid> knowing if it's np gp or ts would be a start, at least
07:28:25 <bcos> We've mostly ruled out all of the cause of GPF, so I'm leaning towards PF
07:28:51 <zid> ltr can't cause a pf though?
07:29:01 <bcos> It can
07:29:12 <jussihi_> how can I debug it?
07:29:14 <zid> you get a #GP if the segment doesn't reference valid memory
07:29:23 <bcos> There's no segment
07:29:30 <zid> the one in ax, 0x28
07:29:34 <zid> segment selector
07:29:40 <zid> entry index thing
07:29:51 <jussihi_> entry offset
07:29:54 <zid> it checks that "Segment selector for a TSS descriptor references
07:29:54 <zid> the GDT and is within the limits of the table.
07:30:07 <zid> and causes #GP
07:30:24 <zid> #TS for limit >108, LDT is not valid is #TS, etc
07:30:30 <bcos> "check the limit" doesn't mean "check that the page is present"
07:30:32 <zid> nothig in the table generates PF
07:30:42 <jussihi_> If I just debug it with gdb, how should I proceed? target remote localhost:1234 ?
07:30:54 <zid> run it under qemu like a sane person?
07:30:56 <zid> or bochs
07:31:07 <jussihi_> I'm running it in QEMU
07:31:16 <bcos> jussihi_: Can you boot it in an emulator (Bochs, Qemu) and check the emulator's log? Will probably say something about what caused the triple fault
07:31:23 <jussihi_> Ok
07:31:34 <zid> failing that you can throw a shit load of debugging filters on, not sure if there are any good ones though
07:32:50 <jussihi_> how can I see the qemu log?
07:32:52 <bcos> Actually; starting to wonder if it's the "LTR" that causes the triple fault
07:33:04 <zid> bcos: That was my assumption the entire time
07:33:09 <zid> and it can't cause a PF so I was very confused
07:33:27 <zid> But he doesn't seem to want to actually single step or whatever to find out where so this is all guesses :P
07:33:35 <bcos> zid: What do you think happens if the page containing the GDT is "not present"?
07:33:47 <zid> gdt is linear? I'd have to had checked
07:33:52 <jussihi_> I can single step but I don't know how to single step the os! :D
07:34:00 <zid> wha
07:34:03 <zid> "I can x, but I can't x"
07:34:22 <jussihi_> Let me correct myself: I would love to, but I don't know how :)
07:34:42 <zid> bcos: yea I guess it could then, but the ltr logic itself doesn't, but I assume he lgdt'd at some point, but maybe not..
07:35:02 <zid> jussihi_: You told me it had remote gdb support? use that?
07:35:33 <bcos> jussihi_: https://unix.stackexchange.com/questions/237409/logging-and-debugging-for-qemu-virtual-machines
07:39:36 <doug16k> jussihi_, recompile everything with -g, run qemu with -s -S options added, then run gdb your-executable -ex 'target remote localhost:1234'
07:40:54 <doug16k> your qemu command line should also have -no-reboot -no-shutdown
07:41:18 <jussihi_> Thanks doug16k !
07:41:25 <doug16k> c to continue and it should run until it crashes.
07:41:38 <doug16k> it will pause there and you can look
07:42:51 <doug16k> for even better viewing results, note the instruction where it crashes, then restart the debug run and do b *0xwhatever at that address. continue and it will break right before it crashes
07:44:22 <jussihi_> doug16k: How can I start the qemu with gdb already running? I cannot start gdb quickly enough if I run qemu in another terminal. That is the problem
07:44:35 <doug16k> if your setup is a bit cumbersome, what you can do is `objdump -S your-executable | less` then use / command to find the crash, note the address, then you can know up front where to breakpoint
07:44:57 <doug16k> -s -S will pause and wait for you to attach
07:45:22 <doug16k> it won't even execute one instruction of the bios until attached debugger says go
07:45:48 <jussihi_> Ok thanks
07:47:31 <jussihi_> 0xc01002b9 in ?? ()
07:47:36 <jussihi_> Ooh
07:47:53 <doug16k> looks like it is in the kernel a bit
07:48:06 <doug16k> that's good
07:48:30 <doug16k> layout asm command will let you see better
07:48:35 <doug16k> or layout src
07:48:46 <jussihi_> .text:C01002B9 ltr ax
07:48:57 <zid> nice
07:49:09 <doug16k> print /x $ax
07:49:14 <zid> 0x28
07:49:26 <jussihi_> yes
07:49:28 <jussihi_> 0x28
07:49:30 <bcos> Can dump GDT?
07:49:46 <doug16k> monitor info registers
07:49:56 <zid> bochs has a significant advantage here, in that it can actually just dump the gdt and stuff in a nice format, but the rest of bochs is meh
07:49:59 <doug16k> but you probably need to leave TUI mode
07:50:11 <jussihi_> https://hastebin.com/lexurovote.ini
07:50:33 <doug16k> info registers will tell you GDTR base. then you can x /1gx whatever+0x28
07:50:58 <doug16k> that will show you the GDT entry it is going to use
07:51:45 <jussihi_> hmm
07:51:51 <jussihi_> there is no entry for GDTR
07:52:06 <doug16k> it would be: x /1gx 0xc0133080+0x28
07:52:08 <doug16k> GDT
07:52:35 <bcos> c0103080+0x28
07:52:36 <zid> GDT= 000...100a20 for me, left col, under the segments
07:52:37 <jussihi_> How can you see that?
07:52:43 <doug16k> line 12?
07:52:59 <jussihi_> ah lol
07:53:01 <jussihi_> I'm blind
07:53:19 <zid> That's a good point, I'll need to move my gdt at some point :P
07:54:01 <jussihi_> (gdb) x /1gx 0xc0103080+0x28 0xc01030a8: 0xc040891030000068
07:55:16 <doug16k> `x /4hx 0xc0103080+0x28` will be easier to interpret
07:55:40 <jussihi_> 0xc01030a8: 0x0068 0x3000 0x8910 0xc040
07:55:50 <jussihi_> Hmm
07:55:53 <jussihi_> Gotta check it
07:55:57 <doug16k> looks wrong
07:56:18 <doug16k> should be base, limit, permission stuff, stuff, limit
07:56:22 <doug16k> IIRC
07:56:27 <jussihi_> yeah
07:56:32 <doug16k> last thing is base actually
07:56:41 <jussihi_> it seems like they are in wrong order
07:56:51 <bcos> Looks right to me
07:57:16 <bcos> limit = 0x68, base = 0xC0403000
07:57:24 <jussihi_> nvm, I'm reading the table in wrong way
07:57:26 <jussihi_> from osdev
07:57:27 <bcos> D'oh, no
07:57:36 <doug16k> base is 1st no?
07:57:45 <jussihi_> limit is 0:15 bits
07:57:51 <jussihi_> base 16:31
07:58:03 <jussihi_> https://wiki.osdev.org/images/f/f3/GDT_Entry.png
07:58:05 <doug16k> jussihi_, the cpu is little endian
07:58:17 <bcos> limit = 0x68, base = 0xC0103000 ?
07:58:19 <doug16k> are you putting 7:0 byte, then 15:8 byte, then 23:16 ...
07:58:31 <jussihi_> oh gosh
07:58:41 <doug16k> the rightmost 8 bits is 1st byte
07:58:42 <zid> Thankfully, the cpu you used to construct the table is also little endian, so you can just do *(short *) = 7; or whatever
07:58:45 <doug16k> if it is a 32 bit row
07:58:50 <jussihi_> https://github.com/jussihi/UranOS/blob/master/arch/x86/gdt.c#L16
07:59:50 <doug16k> oh sorry. it is limit 1st
08:00:11 <jussihi_> I'm too confused now whether it is in the wrong order or not
08:00:39 <zid> There's always the manual
08:00:50 <doug16k> 0x68? that's a funny limit
08:00:58 <bcos> For TSS?
08:01:01 <jussihi_> I've had similar issues before when creating some data encryption algos for data that is sent over internet (big endian). This is a mother of mistakes for me
08:01:13 <doug16k> ah for tss it's fine sorry
08:01:15 <bcos> Fairly normal limit if there's no IO permission bitmap..
08:01:36 <jussihi_> doug16k: in OSDEV I found that I can use the length of the TSS struct, which was 0x68 (104 bytes)
08:01:47 <zid> https://cdn.discordapp.com/attachments/417023075348119556/574324802056355860/unknown.png
08:02:07 <doug16k> limit is inclusive though
08:02:34 <doug16k> 100 byte thing = limit 99
08:02:35 <jussihi_> hmm?
08:02:43 <jussihi_> Ahaaa....
08:02:50 <doug16k> but error in that direction is not too bad
08:03:44 <jussihi_> Well yeah I can set the limit to sizeof(..) -1 but I dont think it will clear this issue
08:03:46 <doug16k> what exception occurs?
08:03:51 <jussihi_> how can I check it?
08:03:59 <jussihi_> I've not used gdb before
08:04:00 <zid> install interrupt handlers
08:04:05 <zid> or see if qemu can tell yo
08:04:09 <doug16k> do you have any interrupt handlers?
08:04:13 <jussihi_> None
08:04:23 <jussihi_> Maybe I should implement that thing next
08:04:24 <bcos> For early dev, I just let it triple fault and check emulator log (but, Bochs)
08:04:25 <doug16k> kill qemu and add -d int
08:04:45 <jussihi_> doug16k: to qemu params?
08:04:57 <doug16k> you can skip the gdb stuff. omit -s -S and just full speed run qemu -d int -no-reboot -no-shutdown
08:05:04 <bcos> Remember the address of that "LTR" before you quit ;-)
08:05:15 <doug16k> it will print information about every interrupt, including exceptions
08:05:17 <jussihi_> Yeah :)
08:06:04 <doug16k> you can split the difference too. you can just use -s in qemu, do the run, and when it pauses at the crash, attach gdb then
08:06:40 <jussihi_> https://hastebin.com/ivuwetusoc
08:06:43 <jussihi_> does this help?
08:06:43 <doug16k> just -s instead of the full -s -S I mean
08:07:20 <doug16k> that looks like power on only
08:07:41 <jussihi_> XD
08:07:42 <doug16k> did you let it run to the crash
08:08:01 <jussihi_> qemu-system-i386 -kernel kernel.img -d int -no-reboot -no-shutdown
08:08:07 <jussihi_> I think it should've
08:08:22 <bcos> "check_exception old: 0xffffffff new 0xe"
08:08:36 <zid> 6 e e e 8
08:08:44 <bcos> My new theory is that there's nothing wrong with the TSS and it generates a page fault later on
08:09:01 <bcos> Ah
08:09:15 <doug16k> 6 is #UD
08:09:20 <jussihi_> But why doesnt it get to the printing part after initializing the gdt & loading tss?
08:09:37 <doug16k> jussihi_, are you compiling with -mgeneral-regs-only ?
08:09:37 <jussihi_> UD?
08:09:43 <doug16k> undefined opcode
08:09:50 <jussihi_> no I am not
08:10:02 <doug16k> you should, if not the compiler will try to autovectorize
08:10:10 <jussihi_> A-ha
08:10:15 <doug16k> one sse instruction = undefined opcode, if you didn't configure sse yet
08:10:29 <zid> but we got a fault on the ltr though
08:10:36 <bcos> zid: Did we?
08:10:53 <jussihi_> at least it got stuck there last time
08:10:55 <zid> [20:47] <jussihi_> 0xc01002b9 in ?? () Oh was this a bp?
08:11:08 <zid> I assumed that was a faultiboi
08:11:18 <jussihi_> It was not a bp
08:11:22 <doug16k> this says pretty conclusively that it encountered an undefined opcode exception: check_exception old: 0xffffffff new 0x6
08:11:23 <jussihi_> https://github.com/jussihi/UranOS/blob/master/arch/x86/Makefile
08:11:44 <bcos> doug16k: How does it have "old = 0xfffffffff" twice?
08:11:48 <doug16k> then next line it is a page fault
08:11:57 <jussihi_> Should I include that general-regs-only to CFLAGS only?
08:12:28 <doug16k> ah! looks like it tried to do #UD, but page faulted on the IDT
08:12:41 <jussihi_> Ofc, I dont have idt set up
08:12:52 <bcos> Ah - maybe #UD, then #PF caused by page containing the (old, dodgy) IDT not existing
08:13:02 <doug16k> jussihi_, then you must not let any exception occur
08:13:08 <jussihi_> :D
08:13:19 <jussihi_> I will just write the interrupt handler next...
08:13:44 <jussihi_> but about the "-mgeneral-regs-only", how should I know about this kind of stuff at all?
08:13:55 <zid> gcc manual?
08:14:28 <jussihi_> I did not want to read through every tool's manual prior to coding
08:14:32 <zid> I avoided using it for a long time via -mno-sse but attribute((interrupt)) requires it, that's how I learned about it
08:14:42 <zid> then don't be surprised if you don't know things then, lol
08:14:50 <jussihi_> :-)
08:15:03 <bcos> jussihi_: Did "-mgeneral-regs-only" fix the problem?
08:15:09 <jussihi_> Well, this was a nice experience anyway
08:15:12 <jussihi_> I'll test it right now
08:15:38 <bcos> ..because, "ltr" isn't likely to be an SSE instruction.. ;-)
08:15:52 <jussihi_> no, it still crashes
08:16:01 <doug16k> same -d int output with the 0x6
08:16:02 <doug16k> ?
08:16:08 <jussihi_> I'll check
08:16:24 <jussihi_> Yeah
08:16:27 <jussihi_> 0x6
08:16:29 <doug16k> weird that it is not dumping the registers at each interrupt
08:16:30 <jussihi_> then 0xe
08:16:40 <doug16k> when did they break that? :)
08:17:10 <jussihi_> Maybe it is some additional flag
08:18:15 <jussihi_> -mno-sse doesn't help either
08:18:16 <doug16k> in qemu monitor what does this say: info mem
08:18:27 <jussihi_> sec
08:19:30 <jussihi_> https://hastebin.com/sonusolaqo.nginx
08:19:49 <jussihi_> seems ok
08:19:52 <bcos> Does anyone else think that "CR0=80114010" (from GDB, earlier) seems a little borked?
08:20:18 <bcos> (like, not in protected mode)
08:20:47 <bcos> Oh my...
08:21:00 <bcos> LTR generates a #UD if you executed it in real mode
08:21:08 <bcos> :-)
08:21:13 <jussihi_> :D
08:21:19 <jussihi_> What the hell
08:21:31 <doug16k> how did you get to real mode?
08:21:48 <jussihi_> I think I have escaped it already in the bootloader
08:21:50 <bcos> That can't make sense (you can't be executing code at 0xC0XXXXXX in real mode)
08:21:52 <doug16k> more like unreal mode but ya
08:22:00 <jussihi_> lol
08:22:41 <doug16k> miracle. you didn't do a call or jmp yet
08:22:54 <jussihi_> I did
08:22:57 <doug16k> without an addr16 prefix it would truncate the instruction pointer
08:23:04 <jussihi_> From the bootloader
08:23:15 <doug16k> ah, maybe the cs32 in cs descriptor cache was set
08:24:50 <bcos> Ah - line 78 in "bootloader.asm" should be "mov ecx,cr0" and not "mov ecx,cr3"
08:25:52 <jussihi_> :)
08:25:56 <bcos> (it's doing "CR0 = address of page directory | PGE | WP")
08:26:00 <bcos> :-)
08:26:45 <jussihi_> boom - everything works
08:26:58 <bcos> That's freaky
08:27:02 <jussihi_> oh god, how did I even got this far with an error there
08:27:10 <bcos> (like, how it got so far without crashing)
08:27:15 <jussihi_> indeed
08:27:27 <doug16k> jussihi_, you will be amazed how it can work when something is catastrophically wrong
08:27:37 <doug16k> it's annoying
08:27:48 <jussihi_> Well, at least I learned something about debugging today
08:27:59 <jussihi_> Yeah, weird stuff
08:28:01 <stisl> hi
08:28:09 <bcos> jussihi_: While you've got "bootloader.asm" open; change the lines 52 and 54 to "jb" and "jae" for me
08:28:21 <jussihi_> bcos: why? :D
08:28:25 <bcos> (it's not a bug, but...)
08:29:05 <bcos> "less" and "greater" are for signed comparisons, and "below" and "above" are for unsigned
08:29:10 <jussihi_> Well, changed them for you :)
08:29:16 <bcos> Thanks :-)
08:29:18 <stisl> is there somebody who wants to program together an OS?
08:29:22 <jussihi_> thanks for the help guys!
08:29:52 <jussihi_> stisl: I think we are all interested in that topic :-)
08:30:16 <stisl> jussihi_, I mean more in general
08:30:26 <bcos> stisl: The normal problems start with everyone having different ideas about what it should be
08:31:08 <bcos> ..followed by "bikeshedding"
08:31:27 <stisl> bcos, that is not so a big problem for me the only thing I want is a realtime os
08:31:28 <bcos> ..followed by the most enthusiastic/experienced person doing all the work and getting annoyed at everyone else
08:31:49 <bcos> stisl: Why realtime?
08:31:55 <jussihi_> :D
08:32:14 <jussihi_> You should just look for some open-source realtime OS source codes
08:32:15 <stisl> bcos, because I dont want poor embeddded targets
08:32:27 <bcos> Erm
08:33:02 <stisl> the PC has a lot of memory and is fast
08:33:05 <bcos> "realtime" means "optimised to reduce worst case time at the expense of average time", which mostly just means slower
08:33:29 <doug16k> ya. "realtime" is more inherently worse than it is inherently better
08:33:52 <bcos> ..and if you need that, you probably also need a CPU that gives you guaranteed timing (e.g. little embedded CPUs that don't have SMM and power management)
08:34:33 <doug16k> it just means it guarantees that it will meet some deadline. it does not imply at all that it is so quick that it meets deadlines. more like someone exhaustively ensured that there are upper bounds in place to guarantee things
08:34:35 <stisl> I would deactivate it
08:34:37 <bcos> Although there's a major difference between "hard real-time" (guarantees) and "soft real-time" (meaningless marketting hype)
08:34:52 <stisl> I just want to have a low latency scheduler
08:35:28 <stisl> so that I can run my oscilloscope with recording on the PC
08:36:18 <doug16k> what sample rate are we talking about
08:36:34 <stisl> 5 Ms
08:37:14 <stisl> currently the driver can not record for a long time
08:37:18 <stisl> only a snapshot
08:37:44 <doug16k> probably because the oscilloscope only takes a buffer full and drops everything between
08:38:00 <doug16k> it will also drop samples until the trigger point, do the capture
08:38:10 <doug16k> most usb scopes don't even try to stream
08:38:25 <zid> bcos: nvidia gpus guarentee 4 scanline latency tops, fwiw
08:38:31 <zid> one of their customers wants it, so they do it
08:38:51 <zid> You can pretty reliably implement 120fpz split screen tearing :P
08:39:54 <doug16k> stisl, you don't need realtime to keep up with an isochronous usb transfer
08:40:35 <doug16k> one that is < 0.1% of memory bandwidth
08:40:47 <bcos> Probably chokes on USB bus bandwidth
08:40:49 <stisl> my USB can do realtime
08:41:04 <doug16k> 40Mbit?
08:41:06 <bcos> (e.g. sampling frequency > max. USB can handle)
08:41:08 <doug16k> is it usb 3?
08:41:12 <stisl> yes
08:41:35 <stisl> and I have a realtime device for it
08:41:50 <bcos> stisl: What's the highest frequency input signal the scope can handle?
08:42:16 <stisl> bcos, not more than 20 Mhz I think
08:42:18 <doug16k> 2.5MHz @ 5Ms/s
08:42:24 <stisl> ok,less
08:43:38 <bcos> For 20 MHz you'd want to sample at 80+ MHz, so (assuming 16-bit samples) you'd be looking at 160 MB/s
08:43:40 <doug16k> inputs past Nyquist frequency will just alias and appear at multiples of the frequency
08:43:41 <bcos> (maybe)
08:44:09 <doug16k> 16 bit? ya right.
08:44:09 <stisl> I would also take a digital input
08:44:14 <doug16k> 12 bit is kick ass for a scope
08:44:19 <stisl> for reverse engineering my chips
08:44:58 <stisl> 16 bits I think yes
08:45:07 <doug16k> bullshit
08:45:15 <jussihi_> :D
08:45:19 * bcos just likes nice multiples of 8 bits :-)
08:45:21 <doug16k> guaranteed 8 bit
08:46:05 <doug16k> 20MHz 16 bit scope is a logical impossibility
08:46:17 <doug16k> cheap ass bandwidth with ultra premium digitizer eh? no
08:47:06 <bcos> Not impossible - probably just an overclocked sound card chip
08:47:24 <bcos> (probably works really well for 16-bit at 44 KHz ;-)
08:47:27 <stisl> my audio interface can handle 32 bits floating point
08:48:13 <stisl> and 192 khz
08:48:52 <stisl> but then the CPU is sweating
08:49:34 <doug16k> https://www.keysight.com/en/pdx-2969966-pn-DSOX1204G/oscilloscope-70-100-200-mhz-4-analog-channels?pm=spc&nid=-32110.1258591&cc=CA&lc=eng <-- 2 gig sample per second 200MHz, 8 bit, ~$1500
08:50:36 <doug16k> here's a $20,000 one, 10 bit -> https://literature.cdn.keysight.com/litweb/pdf/5991-4087EN.pdf?id=2456396
08:51:06 <doug16k> in case you wonder why I said 16 bit == bullshit
08:52:18 <doug16k> 12 bit scopes exist in the > 1500 range, but they are totally outlier
08:52:55 <bcos> 8 channels at 10 bits per channel = 80 bits
08:53:15 <stisl> multiplexing
08:53:50 * bcos wonders how much "payload data" you can actually push through USB 3
08:54:48 <doug16k> 300MB/sec no problem
08:54:51 <doug16k> bytes
08:58:11 <bcos> Ok, so (for 8-bit per channel, single channel) it'd be limited to a 300 MHz sample rate
08:59:03 <bcos> (or 15 times max. scope input frequency)
08:59:23 <doug16k> a "real" scope will be doing gig samples per second and thousands of updates per second
08:59:32 * bcos nods
09:00:16 <bcos> My theory is that stisl scope only works for a short while because it buffers the samples and can't send them to computer fast enough to avoid "internal buffer full"
09:00:45 <bcos> (in other words, the problem is a hardware/bandwidth problem that a real-time OS can't solve)
09:00:53 <doug16k> it's also a feature. the scope is helpfully not sending irrelevant data until it triggers
09:02:51 <doug16k> stisl, I started working on a similar project and realized it would be 90% getting the usb transfers working, 5% worrying about parasitic crap on the frontend, and 5% actually doing oscilloscope work :P
09:05:36 <doug16k> I think you'd have to get a usb 3 transceiver and do the usb quite manually to achieve the high transfer rates
09:10:25 <bcos> Hrm - seems to be a lot of "20 MHz with 48 MSa/s" USB scopes in the <$100 price range
09:19:51 <catern> what's a good short word to use for "thing which allows me, as its only/primary function, to read and write main memory from the CPU"?
09:20:50 <bcos> catern: maybe "memory monitor"
09:21:30 <catern> ideally it would be a word that doesn't include "memory" yet is still relatively unambiguous
09:21:43 <catern> because any of the names I've used that include "memory" are too long
09:21:49 <catern> and annoying to type everywhere
09:22:44 <bcos> Could abbreviate - e.g. call it "rm" (for "RAM monitor") so people could just "rm *" to monitor all their RAM
09:23:10 <bcos> ;-)
09:23:39 <catern> good idea, and I can support really-fast mode as well
09:24:05 <bcos> I like that idea - "rm -r *" for really fast mode
09:24:18 <catern> I was thinking -rf
09:24:35 <bcos> That'd work :-)
09:25:19 <doug16k> that is the best option for ram page functions
09:25:30 <bcos> Mostly; I think you'll end up having to compromise - name length vs. ambiguity/risk of name clashes
09:26:28 <catern> doug16k: er, to what do you refer? what's the best option?
09:26:41 <catern> bcos: Yeah I kind of like the idea of calling it "ram"
09:26:45 <catern> that's nice and short
09:27:13 <catern> a legitimately good idea :)
09:27:24 <bcos> Hrm - might be able to do a nice descriptive name with 2 characters if you switch to something like Japanese or Chinese
09:27:32 <doug16k> (I was kidding. ram page = "rampage")
09:27:51 <catern> hah
09:28:44 <doug16k> as in rampaging through the filesystem :D
09:30:29 <doug16k> raw reading/writing sounds more like granting debugging permission than anything else
09:31:29 <doug16k> infuse_with_omniscience(int pid) is too long eh?
09:32:36 <doug16k> grant_omnipotence() maybe?
09:33:21 <doug16k> :)
09:35:50 <doug16k> linux has a function to grant indirect omnipotence. it's called ioperm
09:36:10 <doug16k> on x86 anyway
09:36:36 <doug16k> just trick the hardware into DMA to what you want to hijack, and you're in
09:38:43 <doug16k> catern, is it for physical memory? device level DMA addresses kind of thing?
09:39:34 <doug16k> pm or vm would hint the use cases in the name
09:43:08 <catern> it's actually an RDMA thing, a capability to read/write anywhere in some specific remote virtual address space
09:43:20 <doug16k> ah, remote
09:44:36 <catern> (though it could also be a local virtual address space of another process)
09:45:37 <doug16k> you do a think like cache coherency where you read for ownership?
09:46:04 <doug16k> ...to acquire write
09:46:28 <doug16k> paging is pretty much that if you don't prepage
09:47:59 <doug16k> catern, is that how it works? reading a page will request it and make it read only in both places, and if they write it they send you an invalidation?
09:49:04 <catern> mmm, no, nothing like cache coherency (except inasmuch as the hardware already does that), these reads/writes are just like normal user reads/writes
09:49:42 <catern> though maybe I should think about the interaction with cache coherency if I want to be able to extend it for better performance later
09:50:33 <doug16k> no I don't mean involving the actual cache coherency, I mean, taking the concept up to whole pages and using read-only-ness to know when to read for ownership
09:50:56 <catern> right, no, nothing like that, though that's an interesting idea
09:50:57 <doug16k> what granularity is it? page?
09:51:33 <doug16k> or is it an RPC where you send a little packet that says write value X of type Y to location Z
09:51:51 <catern> indeed it is just that RPC
09:51:58 <doug16k> ah ok
09:52:06 <catern> (though it's bytes and lengths, untyped at the RPC level)
09:52:34 <doug16k> so you assume you know the endianness of the other end
09:53:07 <catern> yeah... i'm totally neglecting portability for now, otherwise i'd never finish
09:53:28 <doug16k> as long as you are aware, it's fine
09:55:06 <catern> (I think I could maybe handle endianness relatively straightforwardly in userspace though, hmm. my struct-writing code all takes place in the context of a specific Task, and it could get the endianness/whatever portability information out of that Task at write time)
09:56:08 <doug16k> nice, you have an avenue to fix it already, when needed
10:40:52 <nepugia> Hey, Is there a practical reasons that bootloaders do not set backlight via acpi? Or is it that just nobody did this yet?
10:41:30 <bcos> Bootloaders are used to start the 50 MiB of crap needed to handle ACPI properly
10:42:07 <aalm> bcos, hey hey, obsd monol-generic is still <20mb :p
10:42:15 <bcos> :-)
10:42:24 <nepugia> Well, assuming i know the specifics of the hardware beforehand, could i then not hardcode the acpi calls :)
10:43:07 <doug16k> calls?
10:43:08 <bcos> If you know the specifics of the hardware, you might be able to do it without ACPI
10:43:33 <jjuran> I thought 640K of crap was enough for anybody
10:43:54 <nepugia> hmm, i know that linux does some black magic in the gpu driver to support my backlight, would be interested to find out what that black magic is
10:44:04 <bcos> I think the rule is "every 3 years the amount of crap doubles".. ;-)
10:44:58 <nepugia> Do you think it would be feasible to do it without acpi too? as in if i boot on some amd64 machine can i figure this stuff out at runtime?
10:45:51 <bcos> For AMD64, I don't think it's practical to customise the boot loader for one specific computer and nothing else
10:46:26 <nepugia> practical?
10:46:55 <doug16k> probably smbus or i2c interface
10:47:13 <bcos> Beyond that; if the monitor is hard-wired into the computer (e.g. laptop) then you might be able to have a kind of "motherboard driver" that handles that one backlight (and no others)
10:48:01 <nepugia> that is the specific case i am thinking off
10:48:23 * bcos wonders if the old APM API supported backlights
10:48:24 <aalm> nepugia, i would mention about the efi too
10:49:11 <nepugia> did i not write efi here? :3 well, i wanted to do this as an efi chainloader
10:49:22 <bcos> Ah, OK
10:49:45 <nepugia> as in i start this efi executable, it does the black magic to set my backlight to a state that does not burn my eyes out, and then transfers control to another efi bootloader
10:50:42 <bcos> In that case; it might make more sense as an EFI application
10:51:44 <aalm> might be relevant: https://wiki.osdev.org/UEFI_Bare_Bones
10:54:02 <bcos> Hrm. Not sure it'd help though - firmware would burn your eyes out during power-on/POST; then you'd turn backlight down and 1 second later OS would boot and set backlight back to default
10:54:11 <nepugia> bcos: i am not that familiar with the terminology :3, ideally i would want my efi thingy to be able to call the bootloader itself so i may not have to add explicit suppot
10:54:38 <bcos> Which bootloader would you chainload?
10:54:53 <bcos> (what you're doing will ruin SecureBoot, etc)
10:55:25 <nepugia> haden't thought about SecureBoot :/
10:55:45 <nepugia> something like linuxes efistub or FreeBSD's efi loader
10:55:54 <bcos> ..also the "etc" - it'd effect TPM too; so for some things on Windows it'd break DRM
10:57:28 <nepugia> perhaps i can build a kind of motherboard driver as you said indeed, maybe add support into refind or something then
10:57:42 <nepugia> i see i need to do a bit more reading on this subject anyhow :)
10:57:52 <bcos> If you're booting with something like GRUB; it might be a lot easier to setup a dark splash-screen
10:58:38 <nepugia> don't really like grub, but that would not really solve my issue with the backlight i have :/
10:59:04 <bcos> (e.g. dark grey picture of moon on black background; with dark grey text on top)
11:00:35 * bcos isn't sure if those other bootloaders support splash screens
11:00:43 <nepugia> Even with a totally black background it is too bright for me :/
11:01:12 <bcos> Medical condition, or grossly inadequate ambient light conditions?
11:01:13 <aalm> no 'bios' setting?
11:01:24 <nepugia> aalm: nope
11:01:48 <bcos> Hrm ... or trying to not be caught watching porn by parents in the middle of the night?
11:02:59 <nepugia> I'd say both, not really the porn thing though
11:05:47 <bcos> Both?
11:06:20 <bcos> (sometimes it's annoying how ambiguous English is - "caught by parents" and "porn by parents")
11:06:47 <aalm> or medical&ambient
11:06:53 <nepugia> Ceeling light is too bright bright thus i turn it off which then in turn gives me a room slightly too dark
11:07:27 <markweston> when it says intel will stop supporting BIOS in 2020 does it mean I won't be able to control the speaker by writing to 0x61 anymore?
11:07:31 <aalm> markweston, was 7.6 too small minor?
11:08:24 <markweston> aalm: i replaced everything with 6.4 and now everything works
11:08:39 <aalm> eww, you really did it :D
11:08:45 <Celelibi> b-but... the BIOS is fun to play with. :(
11:09:04 <bcos> nepugia: In that case I'd be tempted to try a less bright room light
11:09:22 <Celelibi> (I got highlighted by "porn", I'll go back to my usual idle self very soon.)
11:09:31 <nepugia> That solves one of two issues :D, still too bright anyhow
11:09:56 <nepugia> sadly no BSD Illumos (or haiku) knows how to control my backlight either >.>
11:10:38 <aalm> markweston, you could have just used ftp. directly instead of any outdated mirror you had installed your snap from, and get the libdrm in sync on -current. oh well:]
11:11:03 <bcos> markweston: I'd assume that in 2020 PIT and PC speaker will still exist, but I'd also expect it might not in 2021 (e.g. that PIT will get ripped out - it's a bit deprecated/superceded by HPET)
11:11:50 <bcos> nepugia: In theory maybe; they should also support some kind of command (that uses ACPI/ACPICA to control backlight)
11:12:21 <bcos> (like, in theory you should be able to add a command to their startup script)
11:12:47 <nepugia> Well I don't really know how to figure out what linux does to make my backlight work
11:12:56 <nepugia> I only know that the radeon kernel module is responsible
11:13:42 <bcos> I don't know, but I'd be tempted to assume radeon kernel module calls some kind of generic thing in ACPICA
11:14:22 <nepugia> If it does, then why does amdgpu not do the same :g
11:14:36 <markweston> Sad. :( To hell with UEFI.
11:14:36 <nepugia> loading the generic acpi stuff does not give me control on any BSD i tried anyhow
11:14:56 <bcos> https://www.freebsd.org/cgi/man.cgi?query=xbacklight&sektion=1&manpath=freebsd-release-ports
11:15:15 <bcos> Not sure if you're using X though
11:15:41 <nepugia> I do want to use X, but this problem isn't really X specific
11:15:44 <bcos> (or if it's part of "where supported")
11:16:50 <markweston> will VGA also be purged?
11:17:09 <aalm> nepugia, which laptop is it btw.?
11:17:33 <bcos> markweston: I hope/expect VGA will die a fast death
11:17:51 <nepugia> aalm: Lenovo Ideapad 110-15ACL (though i suspect that lenovo named severall similar laptops like that)
11:17:54 <bcos> (and may have already gone - not sure)
11:18:02 <markweston> how will I do graphics then? VESA?
11:18:10 <nepugia> (also sais 80TJ)
11:19:14 <bcos> nepugia: There might also be a low-tech solution (go to a shop that sells stuff for cars and get some film for tinting windows, and stick on on the screen) ;-)
11:19:40 <aalm> xD
11:20:01 <markweston> So much for the "IBM-compatible" PCs...
11:20:23 <bcos> markweston: For UEFI there's no VBE - you'd use UGA or GOP during boot to setup a framebuffer; then use the framebuffer (until/unless you have a native video driver)
11:20:34 <nepugia> I have sunglasses, with two pairs on it is almost bearable :P
11:21:13 <bcos> nepugia: Might need to get yourself tested for vampirism ;-)
11:22:49 <nepugia> I have sharp teeth, I am awake at night and i have the lightest skin color I know off, does this qualify?
11:23:01 <markweston> Ugh, whatever. I'll just buy an old PC. Real mode OSs rule!
11:25:11 <markweston> x86-64 will probably die soon enough anyway after ARM takes over
11:26:15 <bcos> nepugia: More seriously, there are medical conditions ( https://en.wikipedia.org/wiki/Photophobia )
11:29:37 <nepugia> I do know, but i doubt that there is anything i can do about it, other than try to beat my hardware to make it a bit easier on me, and wearing sunglasses outside :)
11:30:19 <bcos> markweston: Even if x86-64 dies (incredibly unlikely); all the ARM stuff will be essentially the same "UEFI + ACPI + PCI + USB" as PC
11:31:51 <bcos> nepugia: In that case, several layers of window tinting film doesn't sound too silly
11:32:40 <nepugia> Then i kind of loose the finer control the backlight could give me though :/
11:34:29 <epony> markweston don't count on it
11:51:28 <doug16k> nepugia, try not to get a stake through the heart, just in case