# DokuWiki

### Site Tools

ocmchatlog1
<ocm geek> I started documenting the bytecode opcodes in our wiki.
<inquirer> nice
<inquirer> I am still banging my head against the netmd blocks
<inquirer> actually, not still, I just picked it up again
<ocm geek> https://wiki.physik.fu-berlin.de/linux-minidisc/doku.php?id=ocmbytecode
<inquirer> where/how are the entry points found?
<ocm geek> The main entry point is at offset 10 in the ocm file. Martens code should
take care of that.
<ocm geek> Native modules are loaded with opcode 0x75.
<ocm geek> Their entry point is stored in their header.
<ocm geek> They are addressed by name, usually. The name is also stored in the header.
<inquirer> yeah, I think netmd is a bad one to start with, no names to be found
<inquirer> which one is less cryptic?
<ocm geek> Bytecode blocks can be loaded as global code with opcode 0x7C
<ocm geek> If you take apart init.ocm, thats a special .ocm file, you find the native
modules making up the bytecode interpreter.
<ocm geek> init.ocm is made of a special loader bytecode. I don't know whether any of
Martens tool can parse that, but that doesn't really matter: You can do it
by hand.
<inquirer> ah there is a ton of ocm files in OpenMG
<ocm geek> I know. The only one I looked at yet is init.ocm
<inquirer> I only looked at the ones in the ocm tar ball so far
<ocm geek> First byte of init.ocm is "04" which means a blob follows.
<inquirer> ah, sonicstage 3.4 doesn't have 0301 format
<ocm geek> Does it have 0303?
<ocm geek> As I said, init.ocm is special! netmd.ocm starts with 0301 in my SonicStage
3 (I think 3.4) installation.
<inquirer> right
<ocm geek> Let me finish explaining the special init.ocm format.
<inquirer> sure
<ocm geek> The blob following the 04 is preceeded by its length.
<ocm geek> 83 means that the blob is longer than 127 bytes, and the length itself
takes 3 bytes.
<inquirer> ASN.1
<ocm geek> The following bytes are 066e29 in my installation.
<ocm geek> Yeah, right.
<ocm geek> The OCM stuff uses ASN.1 like serialisation format.
<ocm geek> And after the length, voila, you find 0301!
<ocm geek> Because that blob is a standard 0301 bytecode blob.
<inquirer> After the length I have 03 01 07 00
<inquirer> yeah
<inquirer> ah
<inquirer> my length is different, but that's fin
<inquirer> fine
<ocm geek> Past that block, I find 04 82 9c 91
<ocm geek> This indicates a second blob, this time length 9c91
<ocm geek> That block is a native code module, name "intrins".
<ocm geek> It contains the bytecode interpreter core.
<ocm geek> After the second blob there are just two further bytes 06 and 08.
<inquirer> I am not sure how I know the length of a 03 01 block
<ocm geek> You find it beforehands.
<inquirer> oh
<ocm geek> If your init.ocm is the same as mine, its 0x66e29
<inquirer> right, the file is longer than that
<inquirer> this is wiki stuff :)
<ocm geek> The 06 equals the bytecode 75 and loads the last blob as native code module.
<inquirer> my file is different (bytecode length is 07e834), but same layout
<inquirer> incl the 06 08 at the end
<ocm geek> The 08 equals the bytecode 77 and calls the startup function of that module.
<ocm geek> As the 0301 blob is still on the stack of the boot interpreter, this blob
gets passed as parameter to the startup function.
<ocm geek> As the startup function of "intrins" is the bytecode interpreter, it
interpretes the big bytecode blob.
<ocm geek> This big bytecode blob contains further native modules.
<inquirer> this is because 04 is ipush_str4
<inquirer> I see
<ocm geek> Be careful. Boot bytecode is not completely equivalent to standard bytecode.
<ocm geek> It happens to be the same for opcodes 01..04
<inquirer> ah yes the 06 08
<inquirer> I just got lucky here with 04
<ocm geek> Probably that's on purpose, because the codes 01 to 04 are used for
serialization of internal values.
<inquirer> at this stage in the game, is it still useful for me to get my own
disassembled bytecode interpreter?
<ocm geek> Probably not.
<ocm geek> I have it, and am currently transferring that knowledge to the Wiki.
<inquirer> good
<inquirer> I guess I will focus on the netmd then
<ocm geek> I just pointed you to the init.ocm because you asked for something that
might be less cryptic.
<inquirer> for the wiki, could you keep the mnemonic in the title? ie. Opcode 02:
Immediate BigInt (ipush_str4)
<inquirer> no big deal of course
<inquirer> yep, great stuff with the init
<inquirer> it will help me to recognize the extension format header
<ocm geek> I didn't really look at Martens opcode names, but I can put them in.
<inquirer> they can be added later of course
<inquirer> the doc is more important ;)
<ocm geek> You can compare with Marten's scanner.c
<ocm geek> But beware. The comments indicating indices are *decimal*, while everything
<inquirer> I realized that, so I switched to opcodes.h
<inquirer> ha, I should have looked at the perl unpack syntax earlier
<ocm geek> Yeah. opcodes.h is hexadecimal.
<ocm geek> But some mnemonics seem to not match my findings.
<ocm geek> For example "allocMem" in martens code is "Store to User Dictionary"
according to my analysis.
<ocm geek> Might be that Sony changed the meaning of the code, or one of us is wrong.
<inquirer> yrah
<inquirer> ok, codeblockparsed the binary blob
<inquirer> how do you invoke gas?  can I use i596-mingw32msvc-as?
<inquirer> i586...
<ocm geek> yes.
<ocm geek> I used that one.
[...]
<inquirer> anything better than as' ing the codeblockparser output into a COFF
executable?
<inquirer> i guess it doesn't actually matter what format the asm is wrapped in
<ocm geek> Yeah. Must be a format IDA is able to read.
<ocm geek> And IDA Freeware only reads COFF objects.
<ocm geek> (and Windows/DOS EXE files and drivers)
<inquirer> cool
<ocm geek> You might want to try loading the object at a different offset than 0, to
help IDA distinguish offsets from numbers. Somehow IDA is unable to know
that with objects, *every* offset is tagged as offset.
<ocm geek> In completely linked executables without reloc info, it is not that ease.
<ocm geek> easy.
<ocm geek> You will need info about the import functions to make sense of it.
<ocm geek> Just a second...
<inquirer> sonicstage 3.4 netmd.ocm is half as big as the one from maarten
<ocm geek> https://wiki.physik.fu-berlin.de/linux-minidisc/doku.php?id=ocmsalwrapexports
<ocm geek> Maarten had a much older sonic stage.
<ocm geek> Maybe they moved parts out of netmd.ocm into standard DLLs.
<ocm geek> Or they have rewritten parts from bytecode to native code.
<inquirer> yeah
<ocm geek> It was known the the OpenMG virtualization/crypto stuff was very heavy on
processing power in early sonic stage versions.
<ocm geek> I talked to Marten on MSN.
<inquirer> ah, ok :)
<ocm geek> And I reversed salwrap.dll myself.
<ocm geek> Not that I ever got once completely through it.
<inquirer> at least it starts making sense
<ocm geek> I will go to sleep now.
<inquirer> mh ok
<ocm geek> See you tomorrow.
<inquirer> or do you have 3 minutes for me?
<ocm geek> OK.
<inquirer> let's see if this is something simple
<inquirer> between the bytecodeblocks there are 63xx0f instructions
<inquirer> what's their significance?
<inquirer> it's always 66 BIGLENGTH BYTECODE... 63 XX 0F  and again 66...
<ocm geek> Ah. I see. 63 is bipush with encrypted operand
<ocm geek> But Martens decoder already decrypts it for you.
<ocm geek> As 66 is just ipush_str4 with encrypted operand, that martens decoder
decodes.
<inquirer> yep
<ocm geek> 0F is store to dictionary.
<inquirer> so, it keeps pushing stuff
<ocm geek> store to dictionary pops it.
<inquirer> I think I am missing the big picture here, is this for constructing a
symbol table or something?
<ocm geek> Every instruction pops the operands it used.
<ocm geek> Kind of.
<inquirer> mmmh, ok
<ocm geek> There are two dictionaries.
<ocm geek> The system dictionary has 256 entries addressed by smallints between 0 and
255.
<ocm geek> What you see here is bytecode blobs stored into the system dictionary.
<inquirer> cool
<inquirer> for now that would be enough if you want to leave me now ;)
<inquirer> I'll go to bed soon, too
<ocm geek> So thats a way of exporting them to other OCM modules or perhaps even to
salwrap
<inquirer> but some more info on this big picture would be cool to have in the wiki
*hint hint*
<inquirer> yeah
<inquirer> makes total sense
<inquirer> it's kinda weird to have such a dynamic format
<ocm geek> Someone at some other point might decide to "call the bytecode in system
dict at index 77"
<inquirer> right
<ocm geek> The system dict probably is quite fixed in purpose.
<ocm geek> There are magic entries near the end of the dictionary, for example 0xfd
points to a blob that represents the jump table of the bytecode interpreter.
<ocm geek> Probably you won't encounter any access to it, unless you look at init.ocm.
<inquirer> sweet
<ocm geek> The extension modules loaded in the bytecode part of init.ocm are
hotpatching their byte code instructions into the jump table.
<ocm geek> But after init.ocm is done, the jump table is full, so no sense in
accessing it.
<inquirer> yeah, well, I still don't know how addressing works in this system
<inquirer> but the opcode description may shed light on that
<ocm geek> What do you mean by "adressing"?
<inquirer> things like jumps
<inquirer> branching
<ocm geek> There are no jumps and branches in the bytecode.
<inquirer> oh cool
<ocm geek> Have you ever programmed PostScript?
<inquirer> that's easy ;)
<inquirer> nope
<inquirer> but I know fortran
<inquirer> long time ago though
<ocm geek> OK, but fortran does have jumps.
<ocm geek> This byte code is much more structured.
<inquirer> I meant forth
<inquirer> sorry
<inquirer> stack based, I forgot about that.  makes sense now
<ocm geek> Ah. That's something completely different to fortran. I don't really know
it, but it might have similar properties to PostScript or this byte code
(both stack-based, too)
<inquirer> it's coming together now
<ocm geek> Be careful with Marten's CALL_IF instruction (0x33). That is a misnomer.
<ocm geek> It should be CALL_WHILE.
<inquirer> ok
<inquirer> I really have an itch to improve the output of the program, it's quite a
mess
<inquirer> but I have to understand more first, and it might be a waste
<inquirer> thanks a lot, again
<ocm geek> Probably he didn't notice that CALL_IF is wrong. The idea is that
CALL_WHILE returns to the CALL_WHILE instruction after running the code
block, so it gets executed again and again, until top-of-stack is zero.
<inquirer> makes sense
<ocm geek> If the return address stored in the interpreter would be the next
instruction (as in CALL and CALL_IF_ELSE) it would really be CALL_IF.
<ocm geek> I also have started a bytecode parsing program before I got in contact with
Marten, but that is even more rough.
<ocm geek> I used Haskell for it.
<inquirer> there is a problem for me of course because netmd has some recursive
decryption
<inquirer> seems difficult to me to make a static analysis here
<ocm geek> What do you mean by that?
<inquirer> as I said, the decrypted bytecode contains encrypred bytecode
<inquirer> so, you would need to recursively decrypt
<ocm geek> That seems to be standard practice in OCM bytecode modules, but Martens
dumper doesn't support it currently.
<inquirer> but for that you need to "run" the bytecode
<inquirer> yeah
<ocm geek> Marten's decoder "run"s the crypto setup instruction for the main block.
<inquirer> yeah
<ocm geek> So the code to do that is already there.
<inquirer> yup
<inquirer> I already modified it to decrypt it, but not all of it
<ocm geek> But I don't know how well-designed his code is, and how easy you could add
decryption of sub-blocks.
<inquirer> it's modular enough
<ocm geek> Oh, you already started :)
<ocm geek> Nice.
<inquirer> yeah, but only very simple.  I don't catch encryption that isn't at offset
0 of a bytecodeblock
<inquirer> there are some of those
<ocm geek> Probably they set up other stuff first.
<ocm geek> You might need to run that too.
<inquirer> crunching away at it very slowly
<inquirer> things like 30 80 04 07 02 6c 50 73
<ocm geek> Strange.
<ocm geek> 30 is "compare DWORDS for equality".
<inquirer> well, we don't know how the block is used, do we?
<inquirer> at least not yet
<ocm geek> Yeah. Maybe its not a bytecode containing block after all.
<inquirer> there are others exactly like that
<inquirer> but the encryption!
<inquirer> 026c5073
<inquirer> but yeah, maybe it's processed first
<ocm geek> Oh, you are right.
<inquirer> it's a different keyindex though than usual in netmd which is 6c50
<inquirer> so, who knows
<inquirer> I also got a block:
<inquirer> 30 80 02 04 08 20 00 20 02 04 08 20 80 00 02 03 00 80 20 02 01 00  etc
<inquirer> going on and on in that manner
<inquirer> with no 02xxyy73 in that block
<ocm geek> Argh, wait a moment.
<ocm geek> That might be ASN.1 encoded sequences.
<ocm geek> If you decode them, you get an array of bytecodes.
<ocm geek> 30 80 is the ASN tag for sequence of undetermined length.
<inquirer> good one
* inquirer makes a mental note not to forget ASN.1
<inquirer> cool
<ocm geek> So your 30 80 04 07 is a sequence, whose first element is a 7 byte blob.
<inquirer> so, it is code within a data structure
<ocm geek> the first 4 bytes are setting up cryptography, so just 3 bytes remaining.
<ocm geek> Exactly.
<inquirer> you rock
<ocm geek> That's the nice thing about languages where code blocks are first class
data objects: You can put them into any data structure you like.
<ocm geek> OK. Good night for real now.
<inquirer> like lisp
<inquirer> gn!
<ocm geek> Another hint for the sequences: They are really ASN.1 (including
the tags)
<ocm geek> While normally, you have the bytecode and then untagged ASN.1 like encoded
data, inside the sequence instead of bytecodes the real ASN.1 tags are used.
<ocm geek> That means specifically: All numbers (small numbers and arbitrary precision
integers) are encoded with ASN.1 tag 2 (INTEGER).
<ocm geek> While the bytecode 02 is "16 bit constant", the ASN.1 tag 2 is
length-prefixed arbitrary precision integer.
<inquirer> ok
<ocm geek> Byte blocks are encoded with ASN.1 tag 4 (OCTET STRING) that happens to
coincide with bytecode 4.
<ocm geek> Nested sequences are encoded as ASN.1 sequences like the top sequence.
<ocm geek> BTW: your 30 80 02 04 08 20 00 20 02 04 08 20 80 00 02 03... constant you
quoted looks like an array Sony's DES implementation uses.
<ocm geek> I will cross-check.
<inquirer> this nested encryption is starting to annoy me
<ocm geek> OK. It doesn't match any of the arrays in the HiMD Transfer Tool for Mac
used for DES encryption.
<inquirer> does this ring  a bell?  4e 20 1d 3f ...
<inquirer> when I try to disassemble it, I get garbage
<inquirer> but it is CALL'ed, and I don't know how that works
<ocm geek> Sorry. I did not yet look at calling bytecode.
<ocm geek> But you are right. That doesn't seem like executable code.
<ocm geek> The code in the interpreter looks like it would execute it as is.
<ocm geek> Could you have something messed up with decryption?