User Tools

Site Tools


ocmchatlog1

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ocmchatlog1 [2009/04/30 12:54] – created megadiscmanocmchatlog1 [2009/04/30 12:59] (current) megadiscman
Line 1: Line 1:
-dummy+<code> 
 +<ocm geek> I started documenting the bytecode opcodes in our wiki. 
 +<inquirer> nice 
 +<inquirer> I am still banging my head against the netmd blocks 
 +<inquirer> actually, not still, I just picked it up again 
 +<ocm geek> https://wiki.physik.fu-berlin.de/linux-minidisc/doku.php?id=ocmbytecode  
 +<inquirer> where/how are the entry points found? 
 +<ocm geek> The main entry point is at offset 10 in the ocm file. Martens code should  
 +           take care of that. 
 +<ocm geek> Native modules are loaded with opcode 0x75. 
 +<ocm geek> Their entry point is stored in their header. 
 +<ocm geek> They are addressed by name, usually. The name is also stored in the header. 
 +<inquirer> yeah, I think netmd is a bad one to start with, no names to be found 
 +<inquirer> which one is less cryptic? 
 +<ocm geek> Bytecode blocks can be loaded as global code with opcode 0x7C 
 +<ocm geek> If you take apart init.ocm, thats a special .ocm file, you find the native  
 +           modules making up the bytecode interpreter. 
 +<ocm geek> init.ocm is made of a special loader bytecode. I don't know whether any of  
 +           Martens tool can parse that, but that doesn't really matter: You can do it  
 +           by hand. 
 +<inquirer> ah there is a ton of ocm files in OpenMG 
 +<ocm geek> I know. The only one I looked at yet is init.ocm 
 +<inquirer> I only looked at the ones in the ocm tar ball so far 
 +<ocm geek> First byte of init.ocm is "04" which means a blob follows. 
 +<inquirer> ah, sonicstage 3.4 doesn't have 0301 format 
 +<ocm geek> Does it have 0303? 
 +<ocm geek> As I said, init.ocm is special! netmd.ocm starts with 0301 in my SonicStage  
 +           3 (I think 3.4) installation. 
 +<inquirer> right 
 +<ocm geek> Let me finish explaining the special init.ocm format. 
 +<inquirer> sure 
 +<ocm geek> The blob following the 04 is preceeded by its length. 
 +<ocm geek> 83 means that the blob is longer than 127 bytes, and the length itself  
 +           takes 3 bytes. 
 +<inquirer> ASN.1 
 +<ocm geek> The following bytes are 066e29 in my installation. 
 +<ocm geek> Yeah, right. 
 +<ocm geek> The OCM stuff uses ASN.1 like serialisation format. 
 +<ocm geek> And after the length, voila, you find 0301! 
 +<ocm geek> Because that blob is a standard 0301 bytecode blob. 
 +<inquirer> After the length I have 03 01 07 00  
 +<inquirer> yeah 
 +<inquirer> ah 
 +<inquirer> my length is different, but that's fin 
 +<inquirer> fine 
 +<ocm geek> Past that block, I find 04 82 9c 91 
 +<ocm geek> This indicates a second blob, this time length 9c91 
 +<ocm geek> That block is a native code module, name "intrins"
 +<ocm geek> It contains the bytecode interpreter core. 
 +<ocm geek> After the second blob there are just two further bytes 06 and 08. 
 +<inquirer> I am not sure how I know the length of a 03 01 block 
 +<ocm geek> You find it beforehands. 
 +<inquirer> oh 
 +<ocm geek> If your init.ocm is the same as mine, its 0x66e29 
 +<inquirer> right, the file is longer than that 
 +<inquirer> this is wiki stuff :) 
 +<ocm geek> The 06 equals the bytecode 75 and loads the last blob as native code module. 
 +<inquirer> my file is different (bytecode length is 07e834), but same layout 
 +<inquirer> incl the 06 08 at the end 
 +<ocm geek> The 08 equals the bytecode 77 and calls the startup function of that module. 
 +<ocm geek> As the 0301 blob is still on the stack of the boot interpreter, this blob  
 +           gets passed as parameter to the startup function. 
 +<ocm geek> As the startup function of "intrins" is the bytecode interpreter, it  
 +           interpretes the big bytecode blob. 
 +<ocm geek> This big bytecode blob contains further native modules. 
 +<inquirer> this is because 04 is ipush_str4 
 +<inquirer> I see 
 +<ocm geek> Be careful. Boot bytecode is not completely equivalent to standard bytecode. 
 +<ocm geek> It happens to be the same for opcodes 01..04 
 +<inquirer> ah yes the 06 08  
 +<inquirer> I just got lucky here with 04 
 +<ocm geek> Probably that's on purpose, because the codes 01 to 04 are used for  
 +           serialization of internal values. 
 +<inquirer> at this stage in the game, is it still useful for me to get my own  
 +           disassembled bytecode interpreter? 
 +<ocm geek> Probably not. 
 +<ocm geek> I have it, and am currently transferring that knowledge to the Wiki. 
 +<inquirer> good 
 +<inquirer> I guess I will focus on the netmd then 
 +<ocm geek> I just pointed you to the init.ocm because you asked for something that  
 +           might be less cryptic. 
 +<inquirer> for the wiki, could you keep the mnemonic in the title? ie. Opcode 02:  
 +           Immediate BigInt (ipush_str4) 
 +<inquirer> no big deal of course 
 +<inquirer> yep, great stuff with the init 
 +<inquirer> it will help me to recognize the extension format header 
 +<ocm geek> I didn't really look at Martens opcode names, but I can put them in. 
 +<inquirer> they can be added later of course 
 +<inquirer> the doc is more important ;) 
 +<ocm geek> You can compare with Marten's scanner.c 
 +<ocm geek> But beware. The comments indicating indices are *decimal*, while everything  
 +           I write is *hexadecimal*. 
 +<inquirer> I realized that, so I switched to opcodes.h 
 +<inquirer> ha, I should have looked at the perl unpack syntax earlier 
 +<ocm geek> Yeah. opcodes.h is hexadecimal. 
 +<ocm geek> But some mnemonics seem to not match my findings. 
 +<ocm geek> For example "allocMem" in martens code is "Store to User Dictionary"  
 +           according to my analysis. 
 +<ocm geek> Might be that Sony changed the meaning of the code, or one of us is wrong. 
 +<inquirer> yrah 
 +<inquirer> ok, codeblockparsed the binary blob 
 +<inquirer> how do you invoke gas?  can I use i596-mingw32msvc-as? 
 +<inquirer> i586... 
 +<ocm geek> yes. 
 +<ocm geek> I used that one. 
 +[...] 
 +<inquirer> anything better than as' ing the codeblockparser output into a COFF  
 +           executable? 
 +<inquirer> i guess it doesn't actually matter what format the asm is wrapped in 
 +<ocm geek> Yeah. Must be a format IDA is able to read. 
 +<ocm geek> And IDA Freeware only reads COFF objects. 
 +<ocm geek> (and Windows/DOS EXE files and drivers) 
 +<inquirer> cool 
 +<ocm geek> You might want to try loading the object at a different offset than 0, to  
 +           help IDA distinguish offsets from numbers. Somehow IDA is unable to know  
 +           that with objects, *every* offset is tagged as offset. 
 +<ocm geek> In completely linked executables without reloc info, it is not that ease. 
 +<ocm geek> easy. 
 +<ocm geek> You will need info about the import functions to make sense of it. 
 +<ocm geek> Just a second... 
 +<inquirer> sonicstage 3.4 netmd.ocm is half as big as the one from maarten 
 +<ocm geek> https://wiki.physik.fu-berlin.de/linux-minidisc/doku.php?id=ocmsalwrapexports 
 +<ocm geek> Maarten had a much older sonic stage. 
 +<ocm geek> Maybe they moved parts out of netmd.ocm into standard DLLs. 
 +<ocm geek> Or they have rewritten parts from bytecode to native code. 
 +<inquirer> yeah 
 +<ocm geek> It was known the the OpenMG virtualization/crypto stuff was very heavy on  
 +           processing power in early sonic stage versions. 
 +<inquirer> it's a bit scary how much you know about this VM 
 +<ocm geek> I talked to Marten on MSN. 
 +<inquirer> ah, ok :) 
 +<ocm geek> And I reversed salwrap.dll myself. 
 +<ocm geek> Not that I ever got once completely through it. 
 +<inquirer> at least it starts making sense 
 +<ocm geek> I will go to sleep now. 
 +<inquirer> mh ok 
 +<ocm geek> See you tomorrow. 
 +<inquirer> or do you have 3 minutes for me? 
 +<ocm geek> OK. 
 +<inquirer> let's see if this is something simple 
 +<inquirer> between the bytecodeblocks there are 63xx0f instructions 
 +<inquirer> what's their significance? 
 +<inquirer> it's always 66 BIGLENGTH BYTECODE... 63 XX 0F  and again 66... 
 +<ocm geek> Ah. I see. 63 is bipush with encrypted operand 
 +<ocm geek> But Martens decoder already decrypts it for you. 
 +<ocm geek> As 66 is just ipush_str4 with encrypted operand, that martens decoder  
 +           decodes. 
 +<inquirer> yep 
 +<ocm geek> 0F is store to dictionary. 
 +<inquirer> so, it keeps pushing stuff 
 +<ocm geek> store to dictionary pops it. 
 +<inquirer> I think I am missing the big picture here, is this for constructing a  
 +           symbol table or something? 
 +<ocm geek> Every instruction pops the operands it used. 
 +<ocm geek> Kind of. 
 +<inquirer> mmmh, ok 
 +<ocm geek> There are two dictionaries. 
 +<ocm geek> The system dictionary has 256 entries addressed by smallints between 0 and  
 +           255. 
 +<ocm geek> What you see here is bytecode blobs stored into the system dictionary. 
 +<inquirer> cool 
 +<inquirer> for now that would be enough if you want to leave me now ;) 
 +<inquirer> I'll go to bed soon, too 
 +<ocm geek> So thats a way of exporting them to other OCM modules or perhaps even to  
 +           salwrap 
 +<inquirer> but some more info on this big picture would be cool to have in the wiki  
 +           *hint hint* 
 +<inquirer> yeah 
 +<inquirer> makes total sense 
 +<inquirer> it's kinda weird to have such a dynamic format 
 +<ocm geek> Someone at some other point might decide to "call the bytecode in system  
 +           dict at index 77" 
 +<inquirer> right 
 +<ocm geek> The system dict probably is quite fixed in purpose. 
 +<ocm geek> There are magic entries near the end of the dictionary, for example 0xfd  
 +           points to a blob that represents the jump table of the bytecode interpreter. 
 +<ocm geek> Probably you won't encounter any access to it, unless you look at init.ocm. 
 +<inquirer> sweet 
 +<inquirer> this is really helpful 
 +<ocm geek> The extension modules loaded in the bytecode part of init.ocm are  
 +           hotpatching their byte code instructions into the jump table. 
 +<ocm geek> But after init.ocm is done, the jump table is full, so no sense in  
 +           accessing it. 
 +<inquirer> yeah, well, I still don't know how addressing works in this system 
 +<inquirer> but the opcode description may shed light on that 
 +<ocm geek> What do you mean by "adressing"? 
 +<inquirer> things like jumps 
 +<inquirer> branching 
 +<ocm geek> There are no jumps and branches in the bytecode. 
 +<inquirer> oh cool 
 +<ocm geek> Have you ever programmed PostScript? 
 +<inquirer> that's easy ;) 
 +<inquirer> nope 
 +<inquirer> but I know fortran 
 +<inquirer> long time ago though 
 +<ocm geek> OK, but fortran does have jumps. 
 +<ocm geek> This byte code is much more structured. 
 +<inquirer> I meant forth 
 +<inquirer> sorry 
 +<inquirer> stack based, I forgot about that.  makes sense now 
 +<ocm geek> Ah. That's something completely different to fortran. I don't really know  
 +           it, but it might have similar properties to PostScript or this byte code  
 +           (both stack-based, too) 
 +<inquirer> it's coming together now 
 +<ocm geek> Be careful with Marten's CALL_IF instruction (0x33). That is a misnomer. 
 +<ocm geek> It should be CALL_WHILE. 
 +<inquirer> ok 
 +<inquirer> I really have an itch to improve the output of the program, it's quite a  
 +           mess 
 +<inquirer> but I have to understand more first, and it might be a waste 
 +<inquirer> thanks a lot, again 
 +<ocm geek> Probably he didn't notice that CALL_IF is wrong. The idea is that  
 +           CALL_WHILE returns to the CALL_WHILE instruction after running the code  
 +           block, so it gets executed again and again, until top-of-stack is zero. 
 +<inquirer> makes sense 
 +<ocm geek> If the return address stored in the interpreter would be the next  
 +           instruction (as in CALL and CALL_IF_ELSE) it would really be CALL_IF. 
 +<ocm geek> I also have started a bytecode parsing program before I got in contact with  
 +           Marten, but that is even more rough. 
 +<ocm geek> I used Haskell for it. 
 +<inquirer> there is a problem for me of course because netmd has some recursive  
 +           decryption 
 +<inquirer> seems difficult to me to make a static analysis here 
 +<ocm geek> What do you mean by that? 
 +<inquirer> as I said, the decrypted bytecode contains encrypred bytecode 
 +<inquirer> so, you would need to recursively decrypt 
 +<ocm geek> That seems to be standard practice in OCM bytecode modules, but Martens  
 +           dumper doesn't support it currently. 
 +<inquirer> but for that you need to "run" the bytecode 
 +<inquirer> yeah 
 +<ocm geek> Marten's decoder "run"s the crypto setup instruction for the main block. 
 +<inquirer> yeah 
 +<ocm geek> So the code to do that is already there. 
 +<inquirer> yup 
 +<inquirer> I already modified it to decrypt it, but not all of it 
 +<ocm geek> But I don't know how well-designed his code is, and how easy you could add  
 +           decryption of sub-blocks. 
 +<inquirer> it's modular enough 
 +<ocm geek> Oh, you already started :) 
 +<ocm geek> Nice. 
 +<inquirer> yeah, but only very simple.  I don't catch encryption that isn't at offset  
 +           0 of a bytecodeblock 
 +<inquirer> there are some of those 
 +<ocm geek> Probably they set up other stuff first. 
 +<ocm geek> You might need to run that too. 
 +<inquirer> crunching away at it very slowly 
 +<inquirer> things like 30 80 04 07 02 6c 50 73  
 +<ocm geek> Strange. 
 +<ocm geek> 30 is "compare DWORDS for equality"
 +<ocm geek> Why would a subblock start with it? 
 +<inquirer> well, we don't know how the block is used, do we? 
 +<inquirer> at least not yet 
 +<ocm geek> Yeah. Maybe its not a bytecode containing block after all. 
 +<inquirer> there are others exactly like that 
 +<inquirer> but the encryption! 
 +<inquirer> 026c5073 
 +<inquirer> but yeah, maybe it's processed first 
 +<ocm geek> Oh, you are right. 
 +<inquirer> it's a different keyindex though than usual in netmd which is 6c50 
 +<inquirer> so, who knows 
 +<inquirer> I also got a block: 
 +<inquirer> 30 80 02 04 08 20 00 20 02 04 08 20 80 00 02 03 00 80 20 02 01 00  etc 
 +<inquirer> going on and on in that manner 
 +<inquirer> with no 02xxyy73 in that block 
 +<ocm geek> Argh, wait a moment. 
 +<ocm geek> That might be ASN.1 encoded sequences. 
 +<ocm geek> If you decode them, you get an array of bytecodes. 
 +<ocm geek> 30 80 is the ASN tag for sequence of undetermined length. 
 +<inquirer> good one 
 + * inquirer makes a mental note not to forget ASN.1 
 +<inquirer> cool 
 +<ocm geek> So your 30 80 04 07 is a sequence, whose first element is a 7 byte blob. 
 +<inquirer> so, it is code within a data structure 
 +<ocm geek> the first 4 bytes are setting up cryptography, so just 3 bytes remaining. 
 +<ocm geek> Exactly. 
 +<inquirer> you rock 
 +<ocm geek> That's the nice thing about languages where code blocks are first class  
 +           data objects: You can put them into any data structure you like. 
 +<ocm geek> OK. Good night for real now. 
 +<inquirer> like lisp 
 +<inquirer> gn! 
 +<ocm geek> Another hint for the sequences: They are really ASN.1 (including  
 +           the tags) 
 +<ocm geek> While normally, you have the bytecode and then untagged ASN.1 like encoded  
 +           data, inside the sequence instead of bytecodes the real ASN.1 tags are used. 
 +<ocm geek> That means specifically: All numbers (small numbers and arbitrary precision  
 +           integers) are encoded with ASN.1 tag 2 (INTEGER). 
 +<ocm geek> While the bytecode 02 is "16 bit constant", the ASN.1 tag 2 is  
 +           length-prefixed arbitrary precision integer. 
 +<inquirer> ok 
 +<ocm geek> Byte blocks are encoded with ASN.1 tag 4 (OCTET STRING) that happens to  
 +           coincide with bytecode 4. 
 +<ocm geek> Nested sequences are encoded as ASN.1 sequences like the top sequence. 
 +<ocm geek> BTW: your 30 80 02 04 08 20 00 20 02 04 08 20 80 00 02 03... constant you  
 +           quoted looks like an array Sony's DES implementation uses. 
 +<ocm geek> I will cross-check. 
 +<inquirer> this nested encryption is starting to annoy me 
 +<ocm geek> OK. It doesn't match any of the arrays in the HiMD Transfer Tool for Mac  
 +           used for DES encryption. 
 +<inquirer> does this ring  a bell?  4e 20 1d 3f ... 
 +<inquirer> when I try to disassemble it, I get garbage 
 +<inquirer> but it is CALL'ed, and I don't know how that works 
 +<ocm geek> Sorry. I did not yet look at calling bytecode. 
 +<ocm geek> But you are right. That doesn't seem like executable code. 
 +<ocm geek> The code in the interpreter looks like it would execute it as is. 
 +<ocm geek> Could you have something messed up with decryption? 
 +</code>
ocmchatlog1.1241096098.txt.gz · Last modified: 2009/04/30 12:54 by megadiscman

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki