Search logs: #osdev - 15 November 2018

channel logs for 2004 - 2010 are archived at http://tunes.org/~nef/logs/old/ ·· can't be searched

#osdev2 = #osdev @ Libera from 23may2021 to present

#osdev @ OPN/FreeNode from 3apr2001 to 23may2021

all other channels are on OPN/FreeNode from 2004 to present

http://bespin.org/~qz/search/?view=1&c=osdev&y=18&m=11&d=15

Thursday, 15 November 2018

12:00:15 <eryjus> mrvn: I am planning to model after my x86 setup -- with the upper TTL1 entries pointing to the same tables and the lower TTL1 being for user processes.
12:00:34 <eryjus> so user space will see all the kernel pages
12:01:09 <mrvn> eryjus: The 2 tables make that simpler. You can have a user page table size 128 byte - 8k iirc and a kernel page table of 8k for the higher half.
12:01:38 <mrvn> So no need to keep the upper half in sync between all processes. They have their own page table register.
12:02:15 <eryjus> is that laid out in the ARM ARM?
12:02:24 <mrvn> That also allows for tiny processes that have a single 4K page as page table. 1k for TTL1 and 3* 1K for TTL2.
12:02:45 <mrvn> eryjus: should be. Or the RPi specs from broadcom.
12:03:30 <eryjus> thx -- I may have follow-ups
12:06:19 <mrvn> geist: ARMv8 has 4 levels of 4k page tables, right?
12:07:53 <eryjus> mrvn: ARM ARM B4.7.1 -- if you wanted to keep track of it...
12:30:28 <geist> mrvn: correct. at least unless you reduce the address space size, which you can
12:30:41 <geist> but if you have a full 48 bit address space, it's 4 levels
12:44:42 <clever> mrvn: also note, the rpi has another mmu layer you may not be aware of
12:44:54 <clever> mrvn: there is an MMU between "arm physical addressing" and the real physical addresses
12:45:36 <clever> that was originally used to stop the ARM side from having r/w to the GPU firmware and HDMI encryption keys
12:45:50 <clever> but the rpi firmware just configures 1:1 paging tables and doesnt restrict anything
12:45:53 <geist> yeah this is why i hate that soc
12:46:13 <geist> there's something that still shows up, in that i think physical is mapped twice, once through a secondary cache and once without
12:46:23 <geist> which starts to kick in when you start using devices that DMA
12:46:49 <clever> ah yeah, i do remember that
12:46:58 <geist> when we still supported it, it alone caused a need to have some sort of physical dma offset in the driver layer
12:47:04 <clever> its abusing a spare address bit to enable/disable caching
12:47:16 <geist> ie, phsyical address is X, add Y to it before initiating DMA
12:48:00 <geist> or something like that. i dont know the details
01:01:32 <mrvn> geist: 4 times actually
01:01:52 <mrvn> The top two bits of "physical" addresses select the caching mode for the VC MMU.
01:04:46 <mrvn> MIPS had something similar without the extra VC layer. It's cheap way to get extra controll bits if you have less than 4GB of ram.
01:08:32 <bcos_> Hrm.
01:08:41 <bcos_> Scenario: thread needs to acquire 5 contented mutexes, so it tries to get the first and can't and tells scheduler to block until mutex becomes available, then this happens 4 more times, adding up to 8 tasks switches (4 for blocking, 4 for unblocking) plus kernel<->user-space switching.
01:08:52 <bcos_> Why can't kernel provide an "acquire all the mutexes in this list" function (that's called when the first can't be acquired) to avoid all task switches, etc?
01:09:21 <bcos_> *contended
01:09:41 <Mutabah> complexity?
01:09:42 <geist> because that's functionally a wait on multiple
01:09:44 <geist> which is complicated
01:10:08 <izabera> why is sigxfsz even a thing?
01:10:12 <geist> well, it's a special form of it too, it's 'wait on all of these, but only i this orderr'
01:10:28 <izabera> write fails when you go beyond the limit, why do you need a signal too?
01:13:03 <nohop> What could cause a #GP when setting the OSXSAVE bit in cr4? OSFXSR-bit is already set.
01:14:04 <bcos_> Mutabah: I'm thinking it wouldn't be a lot more complex - a loop around the same "single mutex" code with some extra sanity/permission checking
01:15:10 <bcos_> ^ probably not worth it in the past (when SYSCALL was cheap), but now with security problems/mitigations making things slower..
01:15:49 <bcos_> (would be similar for the "release" - when user-space finds a waiter needs to be woken it'd tell kernel "do all these at once")
01:16:52 * bcos_ shrugs and adds it to his list of ideas to test/benchmark one day :-)
01:19:39 <geist> i think it substantially has a lot to do with how mutexes in particular are exposed
01:19:57 <geist> something like futexes, which seem to be the standard design that all OSes are following, i dont think it'd be reasonable
01:20:03 <geist> since most of the guts of the lock are in user space
01:21:41 <bcos_> There'd have to be an "if lock can't be acquired, tell scheduler I need to block" in user-space, followed by the kernel doing "if (lock was released since user-space tested) { return } else { block the thread }"
01:21:54 <Mutabah> Maybe some simple-ish kernel bytecode could solve that...
01:22:02 <clever> there is also the problem that if one thread grabs locks a, then b
01:22:03 <Mutabah> Userland picks the logic, and the kernel executes it
01:22:05 <ybyourmom> New round of meltdown and spectre attacks apparently
01:22:08 <clever> and another thread grabs locks b, then a
01:22:11 <clever> they could deadlock eachother
01:22:27 <ybyourmom> https://arxiv.org/abs/1811.05441
01:23:00 <bcos_> clever: Yeah - the list would have to be ordered properly (and kernel would have to follow the order given), and ther's a few corner cases (e.g. if a thread deadlocks in kernel) that might be a little messier than the single mutex case
01:23:23 <clever> ybyourmom: on a related topic, i read a blog post about a spectre type problem happening in a game console ages ago
01:23:26 * bcos_ can't think of any deal breakers though
01:23:48 <clever> ybyourmom: for performance reasons, there was an opcode to read and bypass the cache
01:23:52 <bcos_> (other than APIs - something like pthreads isn't going to support it, but that's never been something I worry about)
01:24:05 <clever> but then it was causing too many cache coherency problems, so they turned it off with an if statement
01:24:27 <clever> ybyourmom: BUT!, the prefetcher, was prefetching, bypassing the cache, and then finding out it didnt need that data
01:24:43 <clever> so it was spectulatively running an opcode, with harmful sideffects on the cache
01:24:58 <Mutabah> Iirc it was a write?
01:25:02 <bcos_> "Meltdown-BR on Intel and AMD"?
01:25:12 <immibis> are we talking about the original meltdown/spectre or one of the newer variants?
01:25:22 <immibis> i believe it works with reads
01:25:26 <clever> Mutabah: i think it was a read, that put it into the cache, but didnt flag it as in the cache, so it wouldnt be updated when its value in-ram changed
01:25:31 <immibis> or maybe, ONLY works with reads, since the cache needs to be populated
01:25:37 <clever> Mutabah: so no write-back when it expires
01:25:52 <clever> Mutabah: but then the value in the cache didnt match ram, and all kinds of fun happens
01:26:18 <clever> coredumps also didnt agree with the crash, because the coredump came from ram
01:28:00 <immibis> that sounds new... where can I find some info on that vulnerability?
01:28:17 <clever> immibis: it was an old bug, on a game console
01:28:24 <clever> but i cant remember which blog it was on
01:28:29 <mischief> clever: wasnt that xbox?
01:28:33 <mrvn> bcos_: a wait on multiple easily deadlocks
01:28:35 <immibis> oh okay. I thought you were talking about meltdown and/or spectre
01:28:44 <immibis> old consoles are less interesting
01:28:54 <mrvn> bcos_: basically always deadlocks
01:29:01 <clever> immibis: it was a very similar bug, from many years ago
01:29:23 <bcos_> mrvn: A sequence of ("mis-ordered") individual mutexes also deadlocks in the same way
01:29:47 <mrvn> bcos_: the solutiuon is to not do either. Try not to have wait for multiple at all
01:30:06 <bcos_> ..and not have individual mutexes either?
01:30:21 <clever> haskell STM solves that by just copying the counter on every mutex protected var it reads
01:30:23 <mrvn> bcos_: only to some extend
01:30:29 <immibis> is there something wrong with wait-for-multiple as a concept? or just the `for(Mutex m : mutexes) m.lock();` implementation?
01:30:39 <clever> and then at the end, it grabs a single lock, and atomicly checks for conflicts, and writes data back if there are none
01:30:49 <clever> if conflicts did occur, it throws the result out and just re-computes it
01:30:53 <mrvn> immibis: if it blocks on the second mutex for 5h then the first mutex is blocked for 5h too.
01:31:17 <immibis> mrvn: well a sensible implementation that *follows the specification* wouldn't acquire any mutex until they're all ready
01:31:25 <bcos_> immibis: There's a problem with locks in general - to avoid deadlocks you need to establish and adhere to a "global lock order". It's nothing new (and not a new problem with "mutli-mutex")
01:31:44 <mrvn> immibis: then it might never aquire them becauise one of the 5 is always taken, just a different one every time.
01:32:02 <immibis> mrvn: so basically unfairness?
01:32:09 <mrvn> immibis: worse, starvation
01:32:19 <immibis> well, starvation is extreme unfairness AFAIK
01:32:27 <immibis> that is true
01:32:37 <mrvn> yeah. But some unfairness is fine. Starvation means you might never get any work done.
01:33:04 <bcos_> immibis: Standard description is something like "one thread tries to acquire locks A then B but only acquires A and waits for B; meanwhile another thread tries to acquires locks B then A but only acquires B and waits for A." - both threads wait for each other forever = deadlock
01:33:30 <mrvn> immibis: and then there is that thing with sphaghety and forks.
01:33:33 <bcos_> livelock is different (fair locks solve that)
01:34:36 <bcos_> As an amatuer philosopher, I eat spaghetti with my fingers - no problem!
01:34:43 <bcos_> :-)
01:35:28 <mrvn> Personally I think a lot of that locking stuff is over designed. I rather have more global locks and be done quickly than have millions of locks and then get problems with wait-for-multiple.
01:35:32 <immibis> bcos_: which I assume a sensible implementation of wait-for-multiple should not do, since it is specified to acquire all objects at once, not one at a time.
01:35:36 <geist> was gonna say philosophers aint good for nuthin, let em starve
01:35:58 <mrvn> Like a doubly linked list. If you lock every item in the list all that extra locking probably eats more than a global lock for the list hinders concurrency.
01:36:13 <mrvn> (if you have critical sections so task switch doesn't happen)
01:36:31 <bcos_> immibis: It'd be "aquire all locks one at a time in the order listed, without returning to user-space until all locks are acquired (to avoid unnecessary task switches)"
01:37:18 <bcos_> ..which is conceptionally identical to "one at a time with a task switches and kernel API overhead" but theoretically faster
01:37:43 <immibis> bcos_: well that's a lot more useless than I thought it was
01:38:00 <mrvn> bcos_: unless you have wait lists and make aquiring multiple locks atomic.
01:38:25 <geist> also i think in practice it wouldn't be that helpful. generally speaking if you're waiting on a series of locks its because you have a bunch of layers of code
01:38:38 <geist> to flatten it out you'd somehow need to look ahead into a bunch of layers that you dont know about yet
01:39:02 <geist> and thenyou'd end up hoisting more code under locks that maybe it doesn't need to be
01:39:16 <geist> since functionally you're moving all of the inner locks up to the outer lock's code block
01:39:39 <mrvn> geist: that depends how fine grained your locks are. E.g. a lock per linked list item where you often have to lock 3 at a time.
01:40:12 <bcos_> Realistically, if you have too many heavily contended locks you're doing it wrong; but it could be a simple as a two locks in a typical student "transfer $X from one account to another" excercise
01:40:29 <bcos_> *as simple as
01:41:06 <mrvn> bcos_: or process A sends a message to process B transfering a page of data.
01:41:44 <bcos_> mrvn: Let's not talk about messaging (messaging leads to actor model which leads to.. "why do you need locks in the first place"?)
01:41:56 <immibis> bcos_: to implement your messaging system, of course
01:41:59 <mrvn> I need to lock the address spaces of both processes
01:42:43 <bcos_> Heh - I cheat and only need to lock the receiver's "message queue" :-)
01:42:45 <mrvn> If A sends to B and B sends to A then both lock their own address space and then deadlock waiting for the target address space to become available.
01:43:25 <mrvn> bcos_: I cheat too. I remove the message from address space A, free the lock and then lock address space B and insert the page.
01:44:01 <bcos_> I split user-space into "process space" and "thread space", where message buffers are in thread space and can only be accessed by the thread, so there's no need to lock the address space
01:44:57 <mrvn> bcos_: threads? what are threads? :)
01:46:47 <mrvn> My kernel aims for async operations and message passing. So no threads, only processes. No shared memory either. All very KISS.
01:50:11 <bcos_> mrvn: Yours sounds more "actor model" than mine - shared nothing (vs. my "shared process space")
01:51:13 <geist> yah kinda what Go tries to do at a language thing
01:51:38 <immibis> bcos_: so how does the thread indicate that it's now safe to insert a message?
01:52:36 <bcos_> I remember reading something about erlang - the difficulty in figuring out which "processes" (functions from my perspective) should/shouldn't actually have separate address spaces
01:54:03 <mrvn> bcos_: separate address spaces meaning stack frames?
01:54:13 <bcos_> immibis: It's.. Each thread has 2 message buffers, each at a fixed address in "thread space". When it calls "send_message()" the kernel locks the reciever's queue, moves (not copies) data from the sender's buffer to the receiver's queue, then releases the receiver's queue
01:54:43 <immibis> bcos_: see, there's the lock
01:54:53 <immibis> i was wondering how you did it without a lock... now I know
01:54:56 <mrvn> bcos_: I don't have a send queue. But "moves (not copies) data" means moving it between the address spaces.
01:55:01 <bcos_> mrvn: Each thread has it's own value for CR3. Was originally designed for 32-bit where a process with 100 threads could have 1 GiB of process space + 100 threads with 2 GiB each = 201 GiB of space
01:55:51 <mrvn> bcos_: I don't like the idea of having more threads than cores. Always feels like a waste of resources to me.
01:56:18 <bcos_> mrvn: My theory is that no process should have more than "number of CPUs * number of priority levels used" threads
01:56:42 <bcos_> (e.g. if there's 4 CPUs, then 4 high priority threads and 4 low priority threads is probably fine)
01:57:16 <mrvn> bcos_: yeah. Like a GUI thread that reacts to input events and worker threads that run in the background.
01:57:19 <bcos_> Of course I break "should" whenever I feel like it..
01:57:21 <immibis> unpopular opinion: threads are an artificial abstraction, not something fundamental. What you really have is a queue of work to be done, in priority order
01:57:21 <bcos_> ;-)
01:57:31 <immibis> (i haven't yet got around to writing a kernel based on this concept)
01:57:49 <mrvn> immibis: that's where message passing and pipelining comes in
01:57:49 <immibis> when a work item is ready to be done, it can be performed on any CPU
01:58:14 <bcos_> immibis: NUMA says "Hi!" ;-)
01:58:27 <immibis> but the thing that makes them not threads is that no state is saved between work items except what you explicitly save
01:59:00 <immibis> I guess you couldn't have pre-emptive multitasking in that model which would be a serious problem - unless you also support traditional register-saving context switches
02:00:04 <mrvn> immibis: Having one producer and N consumers is a problem though. I work around that by having the worker thread request work from the producer and the producer replies with the next work package.
02:00:21 <mrvn> immibis: you totaly want preemptive multitasking
02:00:50 <bcos_> ^ isn't the traditional approach to have a kind of router - a single consumer that routes requests to workers?
02:01:29 <mrvn> bcos_: my message inboxes are unlimited. The producer would just producer and produce and produce and never block.
02:01:57 <bcos_> ..until the kernel starts running out of RAM and starts throttling the producer
02:02:05 <immibis> mrvn: what is a producer and a consumer?
02:02:19 <mrvn> bcos_: The router would have to check multiple processes inboxes and block till they are small enough and then send the next message to the inbox with least backlog. A HUGE wait-for-multiple problem.
02:02:46 <mrvn> immibis: a producer is a process that creates work packages and a consumer is one that does the work (consumes the work packages).
02:02:51 <immibis> maybe i should call it an event queue. you process each event independently without having a WaitForEvent loop - the kernel starts a new "task" to process each event, and tasks are explicitly lightweight.
02:03:06 <immibis> as I said I haven't actually tried to create this system, so that will probably reveal all sorts of problems
02:03:25 <bcos_> mrvn: I'm thinking more like the router would only forward a request to a worker that is idle (and router would queue requests until a worker is idle)
02:03:29 <immibis> mrvn: what's a process, besides an address space?
02:03:48 <mrvn> immibis: I've thought about that too. "Tasklets" that only consume 4K kernel memory, have 1-2MB private memory and should be super cheap to create.
02:04:12 <mrvn> immibis: process == address space + register context at a minimum
02:04:28 <immibis> but the register context is not saved between tasklets. and tasklets are usually not preempted
02:04:43 <immibis> 1-2MB memory is too much, by the way. minimum one page. or less.
02:05:00 <immibis> of course if you *want* to have 1-2MB private memory you should be able to do so
02:05:20 <mrvn> immibis: Nah. On ARM you can map 1-2MB with a single 4K page for process infos and page tables.
02:05:28 <immibis> since the register context is only saved upon pre-emption, which usually does not happen, it doesn't make sense to say the register context is an intrinsic part of the process. it's just to support preemption.
02:06:37 <mrvn> immibis: processes are usualy longer running things. That's why I named the small worker "processes" tasklets.
02:07:46 <mrvn> And with only 4k kernel memory per tasklet you could easily fire of 1024 tasklets and then wait for them to finish.
02:08:37 <mrvn> But in the end I decided to go with message passing instead.
02:09:00 <immibis> why 4k specifically?
02:09:13 <mrvn> immibis: it's a page. can't map less than that.
02:09:24 <immibis> there is no law of nature that says all allocated memory has to be on its own page.
02:09:45 <mrvn> but if you start allocating less it gets a lot more complex.
02:09:52 <immibis> 4k is a lot! 160 of those would have to be enough for anybody!
02:10:28 <mrvn> Those 4k would include 1k stack too
02:10:29 <immibis> and now I'll be back in 1.5-2 hours so I may not read your response
06:49:16 <introom> hi. i am running i3 in a virtual machine.
06:49:42 <introom> the problem is, when i switch to a host application and then switch back to the virtual machine
06:50:00 <introom> it seems i3 won't know that it grabs the focus again.
06:50:01 <geist> cool
06:50:06 <klange> and we care about this... why?
06:50:22 <introom> probably i3 never knows it has lost the foucs?
06:50:33 <klange> are you in the wrong channel?
06:50:39 <geist> that's a reasonable hypothesis
08:52:50 <Shockk> just a brief question
08:52:57 <klys> yes
08:53:00 <Shockk> thanks
08:53:07 <klys> :)
08:53:38 <Shockk> wait actually nvm I got the answer anyway
08:53:45 <Shockk> but I appreciate your succinctness
08:54:30 <klange> now I want to know what the question was going to be :(
08:55:17 <Shockk> was just going to ask if I need to worry about crt0 / _start / any of that stuff, when lowering a high-level language down to LLVM IR (which is before link time of course)
08:55:38 <klys> o.o
08:55:56 <Shockk> but someone else told me the answer
08:57:58 <klys> well crt0 is needed if you have a system to interface binaries to your apps, such as setting up memory management, etc.
08:58:37 <klys> though if you don't have a c library with syscalls, then you're still freestanding.
08:59:00 <klange> some language runtimes may have reqired setup steps, or even teardown steps
09:04:13 <Shockk> just for reference; I was just wondering about that because I'm writing a compiler for my own language at the moment but suddenly wondered if I'll need to do anything myself like deal with _start, if I want to call C functions from it
09:05:20 <klys> if you're writing a compiler for an abi which uses existing syscalls, then yes.
09:05:25 <Shockk> I assume as long as I compile to the same calling convention though, then at link time _start will be linked in from crt0 and it'll just call the main from my language
09:08:45 <ybden> Shockk: well, you can makae the linker choose any symobl as the entry point, so
09:08:50 <Shockk> oh
09:08:55 <ybden> calling it _start is just convention, really
09:09:05 <ybden> (it's the default that the linker will pick)
09:09:22 <ybden> but you can do, say, ENTRY(my_custom_entry_point_name) or so
09:09:32 <Shockk> true yes; I mean in order to be compatible with crt0s from existing libcs
09:09:48 <ybden> crt0 isn't part of the libc? it's the compiler
09:09:53 <Shockk> oh really?
09:09:56 <ybden> I think?
09:10:30 <ybden> well, I just grepped musl's source and it doesn't mention crt0 anywhere
09:10:38 <Shockk> huh
09:10:59 <Shockk> hmm I do remember writing a crt0 with a _start function when I was porting newlib
09:11:00 <ybden> nvm, there are files named such
09:11:04 <Shockk> oh right
09:11:05 <Shockk> okay
09:12:12 <ybden> well
09:12:17 <ybden> 'crt' stands for C runtime
09:12:27 <ybden> so you're not going to necessarily need that for your own language
09:12:43 <Shockk> ah right
09:12:45 <ybden> you'll want to do something similar as far as initialisation is concerned, probably
09:12:53 <Shockk> I'd need it in order to call C functions from my language though, right?
09:12:57 <klys> libgcc-8-dev: /usr/lib/x86_64-linux-gnu/8/crtbegin.o
09:13:15 <klys> erm
09:13:23 <klys> libgcc-8-dev: /usr/lib/gcc/x86_64-linux-gnu/8/crtbegin.o
09:13:26 <klys> yeah
09:14:09 <ybden> so, the crt0/crt1/whatever are just tiny things that handle initialisation of the C library
09:14:15 <ybden> crtbegin/end are provided by the compiler
09:14:19 <ybden> sorry for mixing those up
09:18:10 <klys> crt0.o was replaced with gcrt1.o at some point
09:18:32 <klys> it's in libc6-dev
09:18:35 <ybden> http://www.faqs.org/docs/Linux-HOWTO/Program-Library-HOWTO.html#INIT-AND-FINI-OBSOLETE
09:20:33 <klys> or crt1.o, it seems
09:20:52 <ybden> https://gcc.gnu.org/onlinedocs/gccint/Initialization.html ah here we go
09:20:58 <ybden> Shockk: ^ contains a fair bit of info
09:21:14 <ybden> but yes, this is compiler-specific
09:21:30 <klys> it's abi-specific
09:21:45 <klys> the .o files are provided
09:22:25 <klys> so if you link to ELF you can use them out of the box
09:22:49 <ybden> yeah
01:52:42 <SopaXorzTaker> Is there any emulator which starts with A20 disabled?
01:52:54 <SopaXorzTaker> Like, every new one has working A20 line by default
01:54:27 <lkurusa> i am pretty sure i had to manually enable a20 on qemu
01:54:42 <lkurusa> alternatively, you can also just patch qemu sources to have it disabled by default
01:59:38 <bcos_> SopaXorzTaker: For a quick hack (for testing purposes), you can probably just slap a call to the "int 0x15, ax = 0x2401" BIOS function into your boot loader in less than 2 minutes
02:00:13 <bcos_> ( http://www.ctyme.com/intr/rb-1336.htm )
02:00:31 <SopaXorzTaker> bcos_, no, A20 appears to be ENABLED by default
02:00:45 <SopaXorzTaker> I want to DISABLE it to test whether my A20 detection algo behaves correctly
02:02:34 <bcos_> Ah - in that case you might need to configure the emulator to support A20 (it adds a little overhead, especially for software emulation, so often people don't include it)
02:02:44 <bcos_> ..alternatively there's: http://www.ctyme.com/intr/rb-1335.htm
02:29:00 <SopaXorzTaker> bcos_, https://ptpb.pw/FaY2.asm
02:29:19 <SopaXorzTaker> (assemble with NASM)
02:29:50 <SopaXorzTaker> It would be amazing if you ran this on an emulator + some real hardware to see what the actual state of Gate A20 is
02:30:00 <SopaXorzTaker> For me, it's disabled by default only on VirtualBox
02:30:21 <SopaXorzTaker> Enabled everywhere else, and int 15h only affects VirtualBox and QEMU
02:30:34 <SopaXorzTaker> real hardware ignores any attempts to toggle the line if it's enabled
02:30:46 <SopaXorzTaker> (in fact, it's probably physically absent)
02:34:43 <bcos_> SopaXorzTaker: Should do a "CLD" before using LODSB
02:35:14 <SopaXorzTaker> bcos_, if you encounter a BIOS that has DF=1 by default, I'll eat a paper towel on camera
02:35:49 <bcos_> There is no default - it's just random garbage left over
02:36:04 <SopaXorzTaker> bcos_, it happened to be zero on ALL hardware I checked against
02:36:17 <SopaXorzTaker> (3 real PCs, VirtualBox, QEMU and Bochs)
02:36:52 <SopaXorzTaker> I want to write my kernel with the easiest assumptions about compatibility
02:37:14 <SopaXorzTaker> if DF=1 or you have gate A20 enabled by default, then you're an old fart with an ancient machine; GTFO
02:37:19 <bcos_> So out of about 123 thousand computers you tested 0.0000% and to avoid 1 byte of code you're willing to risk spending weeks tearing your hair out because of bug reports saying "didn't boot" with no useful info?
02:37:51 <SopaXorzTaker> bcos_, what's the chance that my code is going to be run on those 123 thousand PCs?
02:38:24 <SopaXorzTaker> I want to avoid layers upon layers of checking for BIOS oddities or 40-year-old hacks
02:38:32 <bcos_> Depends - if you say "LOL, it'll be fine with no guarantee it will be fine" then it's likely that the it'll be run on zero computers
02:39:06 <SopaXorzTaker> bcos_, well, would anyone run any binaries coming from a SopaXorzTaker on ANY machine or even a VM?
02:39:57 <bcos_> Why are you writing an OS?
02:40:13 <SopaXorzTaker> Am I? :P
02:40:23 <bcos_> Why are you writing a boot loader?
02:40:31 <SopaXorzTaker> Because I'm bored?
02:40:54 <bcos_> Then, why am I checking the code?
02:41:08 <SopaXorzTaker> well, for the same reason, I presume?
02:41:18 <SopaXorzTaker> fine, I'll add a CLD to it if that makes you feel better
02:43:08 <SopaXorzTaker> bcos_, https://ptpb.pw/ohhw.asm
02:43:09 <SopaXorzTaker> enjoy
02:43:24 <bcos_> For the A20 test code; you should be able to do (e.g.) "mov word [fs:testAddress],0xA123" to avoid the need to load BX (remove 4 instructions)
02:44:39 <SopaXorzTaker> bcos_, I have trouble memorizing which segment registers and which operands can I combine for mov
02:45:02 <SopaXorzTaker> like, [es:bx] works, but [ax] is invalid
02:45:36 * bcos_ nods - 32-bit is nicer (gets rid of the annoying rules in 16-bit)
02:46:27 <bcos_> Hrm
02:46:37 <SopaXorzTaker> reasons to stay in real mode: no paging, BIOS routines // reasons to go protected: BIOS is obsolete, paging is useful, real mode is for old farts
02:46:45 <bcos_> SopaXorzTaker: I can't see where CX is set before the "text cx,cx"
02:47:08 <SopaXorzTaker> in boot:
02:47:17 <SopaXorzTaker> >>
02:47:18 <SopaXorzTaker> ; Should we test the A20 gate again?
02:47:18 <SopaXorzTaker> mov cx, 1
02:47:29 <bcos_> Ah - OK
02:47:49 <lkurusa> reasons to avoid real mode: it sucks
02:48:03 * SopaXorzTaker wonders why did Intel break their convention notation with FS and GS which don't stand for anything
02:48:25 <bcos_> cs, ds, es, fs, gs = cdefg
02:48:28 <SopaXorzTaker> I suppose FS is secretly a Flags Segment, an entire segment dedicated to storing EFLAGS [jk]
02:48:41 <lkurusa> i used to call fs *expletive* segment
02:48:46 <lkurusa> and gs garbage segment
02:48:55 <izabera> FS is the field separator
02:49:04 <lkurusa> the awker shows themselves!
02:49:10 <SopaXorzTaker> bcos_, Code, Data, Extra, Stack, F##king, Garbage
02:49:43 <izabera> SopaXorzTaker: y u no sort
02:50:48 <SopaXorzTaker> izabera, SS existed before 386
02:53:11 <bcos_> Hrm
02:53:49 <SopaXorzTaker> [insert hitler joke]
02:54:36 <bcos_> SopaXorzTaker: Also "CLI then HLT" doesn't necessarily stop the CPU forever (e.g. an NMI can cause CPU to continue executing after the HLT) - would be nicer to do a "stop: HLT; jmp stop" loop (without disabling IRQs)
02:55:02 <bcos_> Other that that it looks fine
02:55:06 <SopaXorzTaker> bcos_, that's the way it was initially done, TBH
02:55:12 <bcos_> (will test A20 how you want it to)
02:55:17 <SopaXorzTaker> I decided to change it for no reason
02:55:24 <bcos_> :-)
02:56:28 <SopaXorzTaker> done: https://ptpb.pw/ADkw.asm
02:56:47 <SopaXorzTaker> bcos_, also, notice how it's MBRey
02:56:56 <SopaXorzTaker> that's since I wanted to boot it from a flash drive
02:57:19 <SopaXorzTaker> Lenovo BIOS ignores MBRs without bootable partition entries for some reason
02:57:25 <SopaXorzTaker> so I made one to keep it happy
02:57:49 <bcos_> https://www.reddit.com/r/osdev/comments/9wy2lg/2_stage_bootloader/
02:58:09 <SopaXorzTaker> bcos_, I know, I could use GRUB, or syslinux, or whatever
02:58:20 <SopaXorzTaker> but why if I can make a tiny, 512-byte bootable MBR
02:58:46 <SopaXorzTaker> + also boots from floppies for especially masochist people
03:10:31 <CrystalMath> i remember the segment registers in their correct order: ES, CS, SS, DS, FS, GS
03:11:07 <CrystalMath> as well as the general purpose ones: AX, CX, DX, BX, SP, BP, SI, DI
03:11:20 <CrystalMath> "ax, bx, cx, dx" is a plague :P
03:15:08 <SopaXorzTaker> CrystalMath, BX stands for Base Extended
03:15:13 <SopaXorzTaker> and BP is Base Pointer
03:15:24 <SopaXorzTaker> so BP should always point to BX judging by its name :P
03:15:57 <CrystalMath> interestingly though, addressing through bp implies SS:
03:16:09 <CrystalMath> BX doesn't do that
03:16:26 <bcos_> Just set ds=es=ss and use BP the same as any other general purpose reg
03:16:53 <CrystalMath> but segmentation is fun
03:17:00 <CrystalMath> i love writing software for DOS
03:17:45 <SopaXorzTaker> bcos_, just make all the GDT entries point to a flat address space?
03:17:50 <SopaXorzTaker> Pages? Who needs pages?
03:18:01 <SopaXorzTaker> Malloc? Why when you can use *free_pointer++
03:18:03 <SopaXorzTaker> :P
03:18:13 <CrystalMath> GDT? just set CR0's PE bit to 0
03:18:16 <CrystalMath> problem solved
03:18:26 <CrystalMath> 640K is enough for everybody
03:18:44 <SopaXorzTaker> CrystalMath, triple faults? just cut the RESET# trace on the motherboard!
03:19:01 <CrystalMath> actually it will still reset
03:19:13 <SopaXorzTaker> but you won't notice!
03:19:17 <CrystalMath> it has an internal state that gets loaded with the initial reset state
03:19:24 <CrystalMath> so a triple fault would still restart normally
03:19:27 <CrystalMath> even without RESET#
03:19:53 <SopaXorzTaker> okay, how about trapping all the exception vectors then?
03:20:06 <SopaXorzTaker> make them all point to an IRET somewhere executable :P
03:20:09 <CrystalMath> maybe you could store the register state on every PIT tick somewhere and replace the BIOS boot code with code that restores the last known good state
03:20:16 <CrystalMath> :)
03:20:25 <SopaXorzTaker> CrystalMath, that's some M$-level advanced engineering
03:20:35 <SopaXorzTaker> try applying to their kernel team!
03:20:44 * CrystalMath works on ReactOS
03:20:48 <SopaXorzTaker> really?
03:20:50 <CrystalMath> yes
03:21:24 <SopaXorzTaker> CrystalMath, rate your masochism 0-10
03:21:26 <bcos_> That's like someone realising they forgot their parachute after jumping out of a plane, and resetting their state to 5 seconds before they land so they're in a continual "about to die" state
03:21:31 <CrystalMath> SopaXorzTaker: 11
03:22:03 <SopaXorzTaker> CrystalMath, no, the real answer would be -128
03:22:09 <SopaXorzTaker> it's ReactOS, remember?
03:23:43 <CrystalMath> yeah but since i also like writing DOS software in pure assembly, that number actually overflows twice and ends up at 11
03:24:00 <CrystalMath> it's really 267
03:37:23 <SopaXorzTaker> bcos_, hmm
03:37:32 <SopaXorzTaker> this might be a good base for a short story
03:37:45 <SopaXorzTaker> a rather unlucky jumper discovers he can bend spacetime
03:37:58 <SopaXorzTaker> so now he has to convince people to save him somehow, in those 5 seconds
03:38:12 <SopaXorzTaker> Can I steal your idea? :P
03:38:46 <bcos_> Sure :-)
03:52:37 <Vercas> Microsoft... Why do SCSI devices and serial ports have the same device ID?? D:
03:53:07 <bcos_> Serial Connected Serial Interface!
03:53:07 <Vercas> Spent about an hour today puzzled why my storport miniport driver is getting IOCTLS meant for serial ports.
03:53:47 <Vercas> Turns out, Windows was just asking me for SMART info...
03:54:05 <Vercas> Windows IOCTLs are so weird.
03:54:37 <Vercas> On top of the IOCTL control codes, which are 32-bit integers but only 12 bits are really the "function" part...
03:54:49 <Vercas> You also get an 8-character "signature".
03:54:58 <Vercas> And what the control code means depends on the signature!
03:55:12 <Vercas> The same control code queries both SMART info, and temperature.
03:56:09 <Vercas> Oh, and topology info.
03:56:23 <Vercas> (used for things like RAID adapters)
03:56:37 <Vercas> Shame, Microsoft.
03:57:57 <Vercas> Obviously these shenanigans are mostly undocumented too.
03:58:36 <Vercas> But hey, at least my driver works.
03:58:39 <Vercas> Sort of.
03:59:43 <Vercas> Windows seems to be fine with my storage driver. I downloaded a disk benchmarking utility, and it apparently sees no disk.
04:01:45 <klys> https://www.zdnet.com/article/researchers-discover-seven-new-meltdown-and-spectre-attacks/
04:09:03 <mrvn> ioctls on linux are a mess too. Totaly driver specific.
04:09:22 <bcos_> klys: The good news is that the chart/diagram helps to keep track of them all :-)
04:14:35 <mrvn> What I'm getting from all those side channel attacks is that I want a CPU without caches, without branch prediction, without speculative and out-of-order execution and all that stuff that makes a cpu 1000 faster than olden times.
04:14:51 <mrvn> Or have one system per application.
04:15:26 <bcos_> I think we should also consider using hydrolics instead of electronics
04:15:35 <bcos_> ;-)
04:15:56 <mrvn> bcos_: that would allow an ouside spy with a tape recorder to record the gurgling of the hydraulic fluid to reverse engineer the secrets.
04:16:05 <bcos_> (attacker can't detect electric fields that way)
04:16:31 <bcos_> Hydrolics can be silent
04:19:02 <bcos_> Realistically, the meltdowns are likely trivial to fix in silicon, and the spectres are probably going to end up being "flush all the things!" when returning from kernel to user-space
04:19:44 <mrvn> bcos_: so no more ASID extension.
04:20:14 <bcos_> KPTI is mostly a work-around for "unfixed in silicon"
04:20:49 <bcos_> ..but likely valuable for other reasons anyway, so if ASID/PCID makes it fast enough (dubious) then..
04:21:19 <mrvn> If you flush everything on return from kernel then what's the point of keeping an ASID/PCID?
04:23:11 <bcos_> KPTI defends against problems you don't know you have (that don't involve side channels)
04:24:53 <Vercas> mrvn: So you want... A microcontroller? :D
04:25:12 <mrvn> Isn't KPTI what linux32 personality on x86_64 does anyway?
04:25:25 <mrvn> Vercas: no, I want a million micorcontrollers. :)
04:25:32 <Vercas> Aye, me too.
04:25:48 <Vercas> Until that team starts looking into microcontrollers too.
04:25:58 <Vercas> Then they're gonna ruin yet another thing.
04:26:01 <Vercas> Oh, and then GPUs...
04:26:30 <mrvn> Vercas: but there would be only one app on the microcontroller. And that app already knows all its secrets. So there is nothing to leak to.
04:26:43 <bcos_> mrvn wants this: http://www.greenarraychips.com/
04:26:51 <Vercas> Until you hit a conditional branch.
04:27:30 <bcos_> (144 single-core computers on a chip!)
04:28:13 <mrvn> bcos_: are you sure they don't have a shared cache?
04:28:35 <Vercas> You can get ARM chips with over 300 cores on them IIRC.
04:28:57 <bcos_> mrvn: I think it's "shared nothing" - probably like Cell with local memory for each core
04:29:18 <bcos_> (core/CPU/node/whatever)
04:30:21 <Vercas> Oh man, without a shared cache (and individual caches on top), any form of shared memory is gonna be painful.
04:30:39 <mrvn> Vercas: message passing
04:31:24 <Vercas> Have fun with that. :P
04:32:48 <mrvn> I do.
04:33:38 <mrvn> I'm tempted to even make my kernel run simply one instance per core and use message passing
04:33:40 <Vercas> My programming style fits shared memory far better.
07:45:03 <abysslurker> o/
07:45:19 <lkurusa_> yo
08:10:34 * abysslurker yawns
08:12:52 <jjuran> damn you
11:22:09 * geist yawns
11:22:41 <klys> g'day m8
11:26:20 <geist> hola
11:26:28 <klys> how have you been
11:26:57 <geist> not too bad. cant complain (that much)
11:27:40 <klys> yeah, I'm well today. mostly waiting for supper just now.
11:28:29 <geist> excellent. noms is good
11:29:47 <_mjg> man shitty tolling is shitty
11:29:54 <_mjg> toolling even
11:30:21 <geist> shitty tolling is bad too
11:30:48 <klange> It's cold outside.
11:30:57 <geist> how cold is cold?
11:31:03 <klys> yeah I have it at 55F
11:31:10 <geist> i like it cold, but not super cold
11:31:17 <klange> 9°C (48°F)
11:31:23 <geist> yah that's a nice cold. about what it is here too
11:32:02 <mroutis> geist: hola
11:39:33 <eryjus> klange: I'm so jealous!! I'll trade you my 80°F day... gimme 4 seasons!
11:41:13 <klange> come to japan, we love to talk about how we have four seasons