ocmchatlog1
Differences
This shows you the differences between two versions of the page.
ocmchatlog1 [2009/04/30 12:54] – created megadiscman | ocmchatlog1 [2009/04/30 12:59] (current) – megadiscman | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | dummy | + | < |
+ | <ocm geek> I started documenting the bytecode opcodes in our wiki. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> https:// | ||
+ | < | ||
+ | <ocm geek> The main entry point is at offset 10 in the ocm file. Martens code should | ||
+ | take care of that. | ||
+ | <ocm geek> Native modules are loaded with opcode 0x75. | ||
+ | <ocm geek> Their entry point is stored in their header. | ||
+ | <ocm geek> They are addressed by name, usually. The name is also stored in the header. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Bytecode blocks can be loaded as global code with opcode 0x7C | ||
+ | <ocm geek> If you take apart init.ocm, thats a special .ocm file, you find the native | ||
+ | | ||
+ | <ocm geek> init.ocm is made of a special loader bytecode. I don't know whether any of | ||
+ | | ||
+ | by hand. | ||
+ | < | ||
+ | <ocm geek> I know. The only one I looked at yet is init.ocm | ||
+ | < | ||
+ | <ocm geek> First byte of init.ocm is " | ||
+ | < | ||
+ | <ocm geek> Does it have 0303? | ||
+ | <ocm geek> As I said, init.ocm is special! netmd.ocm starts with 0301 in my SonicStage | ||
+ | 3 (I think 3.4) installation. | ||
+ | < | ||
+ | <ocm geek> Let me finish explaining the special init.ocm format. | ||
+ | < | ||
+ | <ocm geek> The blob following the 04 is preceeded by its length. | ||
+ | <ocm geek> 83 means that the blob is longer than 127 bytes, and the length itself | ||
+ | takes 3 bytes. | ||
+ | < | ||
+ | <ocm geek> The following bytes are 066e29 in my installation. | ||
+ | <ocm geek> Yeah, right. | ||
+ | <ocm geek> The OCM stuff uses ASN.1 like serialisation format. | ||
+ | <ocm geek> And after the length, voila, you find 0301! | ||
+ | <ocm geek> Because that blob is a standard 0301 bytecode blob. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Past that block, I find 04 82 9c 91 | ||
+ | <ocm geek> This indicates a second blob, this time length 9c91 | ||
+ | <ocm geek> That block is a native code module, name " | ||
+ | <ocm geek> It contains the bytecode interpreter core. | ||
+ | <ocm geek> After the second blob there are just two further bytes 06 and 08. | ||
+ | < | ||
+ | <ocm geek> You find it beforehands. | ||
+ | < | ||
+ | <ocm geek> If your init.ocm is the same as mine, its 0x66e29 | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> The 06 equals the bytecode 75 and loads the last blob as native code module. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> The 08 equals the bytecode 77 and calls the startup function of that module. | ||
+ | <ocm geek> As the 0301 blob is still on the stack of the boot interpreter, | ||
+ | gets passed as parameter to the startup function. | ||
+ | <ocm geek> As the startup function of " | ||
+ | | ||
+ | <ocm geek> This big bytecode blob contains further native modules. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Be careful. Boot bytecode is not completely equivalent to standard bytecode. | ||
+ | <ocm geek> It happens to be the same for opcodes 01..04 | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Probably that's on purpose, because the codes 01 to 04 are used for | ||
+ | | ||
+ | < | ||
+ | | ||
+ | <ocm geek> Probably not. | ||
+ | <ocm geek> I have it, and am currently transferring that knowledge to the Wiki. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> I just pointed you to the init.ocm because you asked for something that | ||
+ | might be less cryptic. | ||
+ | < | ||
+ | | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> I didn't really look at Martens opcode names, but I can put them in. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> You can compare with Marten' | ||
+ | <ocm geek> But beware. The comments indicating indices are *decimal*, while everything | ||
+ | I write is *hexadecimal*. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Yeah. opcodes.h is hexadecimal. | ||
+ | <ocm geek> But some mnemonics seem to not match my findings. | ||
+ | <ocm geek> For example " | ||
+ | | ||
+ | <ocm geek> Might be that Sony changed the meaning of the code, or one of us is wrong. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> yes. | ||
+ | <ocm geek> I used that one. | ||
+ | [...] | ||
+ | < | ||
+ | | ||
+ | < | ||
+ | <ocm geek> Yeah. Must be a format IDA is able to read. | ||
+ | <ocm geek> And IDA Freeware only reads COFF objects. | ||
+ | <ocm geek> (and Windows/DOS EXE files and drivers) | ||
+ | < | ||
+ | <ocm geek> You might want to try loading the object at a different offset than 0, to | ||
+ | help IDA distinguish offsets from numbers. Somehow IDA is unable to know | ||
+ | that with objects, *every* offset is tagged as offset. | ||
+ | <ocm geek> In completely linked executables without reloc info, it is not that ease. | ||
+ | <ocm geek> easy. | ||
+ | <ocm geek> You will need info about the import functions to make sense of it. | ||
+ | <ocm geek> Just a second... | ||
+ | < | ||
+ | <ocm geek> https:// | ||
+ | <ocm geek> Maarten had a much older sonic stage. | ||
+ | <ocm geek> Maybe they moved parts out of netmd.ocm into standard DLLs. | ||
+ | <ocm geek> Or they have rewritten parts from bytecode to native code. | ||
+ | < | ||
+ | <ocm geek> It was known the the OpenMG virtualization/ | ||
+ | | ||
+ | < | ||
+ | <ocm geek> I talked to Marten on MSN. | ||
+ | < | ||
+ | <ocm geek> And I reversed salwrap.dll myself. | ||
+ | <ocm geek> Not that I ever got once completely through it. | ||
+ | < | ||
+ | <ocm geek> I will go to sleep now. | ||
+ | < | ||
+ | <ocm geek> See you tomorrow. | ||
+ | < | ||
+ | <ocm geek> OK. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Ah. I see. 63 is bipush with encrypted operand | ||
+ | <ocm geek> But Martens decoder already decrypts it for you. | ||
+ | <ocm geek> As 66 is just ipush_str4 with encrypted operand, that martens decoder | ||
+ | | ||
+ | < | ||
+ | <ocm geek> 0F is store to dictionary. | ||
+ | < | ||
+ | <ocm geek> store to dictionary pops it. | ||
+ | < | ||
+ | | ||
+ | <ocm geek> Every instruction pops the operands it used. | ||
+ | <ocm geek> Kind of. | ||
+ | < | ||
+ | <ocm geek> There are two dictionaries. | ||
+ | <ocm geek> The system dictionary has 256 entries addressed by smallints between 0 and | ||
+ | | ||
+ | <ocm geek> What you see here is bytecode blobs stored into the system dictionary. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> So thats a way of exporting them to other OCM modules or perhaps even to | ||
+ | | ||
+ | < | ||
+ | *hint hint* | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Someone at some other point might decide to "call the bytecode in system | ||
+ | dict at index 77" | ||
+ | < | ||
+ | <ocm geek> The system dict probably is quite fixed in purpose. | ||
+ | <ocm geek> There are magic entries near the end of the dictionary, for example 0xfd | ||
+ | | ||
+ | <ocm geek> Probably you won't encounter any access to it, unless you look at init.ocm. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> The extension modules loaded in the bytecode part of init.ocm are | ||
+ | | ||
+ | <ocm geek> But after init.ocm is done, the jump table is full, so no sense in | ||
+ | | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> What do you mean by " | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> There are no jumps and branches in the bytecode. | ||
+ | < | ||
+ | <ocm geek> Have you ever programmed PostScript? | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> OK, but fortran does have jumps. | ||
+ | <ocm geek> This byte code is much more structured. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Ah. That's something completely different to fortran. I don't really know | ||
+ | it, but it might have similar properties to PostScript or this byte code | ||
+ | (both stack-based, | ||
+ | < | ||
+ | <ocm geek> Be careful with Marten' | ||
+ | <ocm geek> It should be CALL_WHILE. | ||
+ | < | ||
+ | < | ||
+ | | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Probably he didn't notice that CALL_IF is wrong. The idea is that | ||
+ | | ||
+ | | ||
+ | < | ||
+ | <ocm geek> If the return address stored in the interpreter would be the next | ||
+ | | ||
+ | <ocm geek> I also have started a bytecode parsing program before I got in contact with | ||
+ | | ||
+ | <ocm geek> I used Haskell for it. | ||
+ | < | ||
+ | | ||
+ | < | ||
+ | <ocm geek> What do you mean by that? | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> That seems to be standard practice in OCM bytecode modules, but Martens | ||
+ | | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Marten' | ||
+ | < | ||
+ | <ocm geek> So the code to do that is already there. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> But I don't know how well-designed his code is, and how easy you could add | ||
+ | | ||
+ | < | ||
+ | <ocm geek> Oh, you already started :) | ||
+ | <ocm geek> Nice. | ||
+ | < | ||
+ | 0 of a bytecodeblock | ||
+ | < | ||
+ | <ocm geek> Probably they set up other stuff first. | ||
+ | <ocm geek> You might need to run that too. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Strange. | ||
+ | <ocm geek> 30 is " | ||
+ | <ocm geek> Why would a subblock start with it? | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Yeah. Maybe its not a bytecode containing block after all. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Oh, you are right. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Argh, wait a moment. | ||
+ | <ocm geek> That might be ASN.1 encoded sequences. | ||
+ | <ocm geek> If you decode them, you get an array of bytecodes. | ||
+ | <ocm geek> 30 80 is the ASN tag for sequence of undetermined length. | ||
+ | < | ||
+ | * inquirer makes a mental note not to forget ASN.1 | ||
+ | < | ||
+ | <ocm geek> So your 30 80 04 07 is a sequence, whose first element is a 7 byte blob. | ||
+ | < | ||
+ | <ocm geek> the first 4 bytes are setting up cryptography, | ||
+ | <ocm geek> Exactly. | ||
+ | < | ||
+ | <ocm geek> That's the nice thing about languages where code blocks are first class | ||
+ | data objects: You can put them into any data structure you like. | ||
+ | <ocm geek> OK. Good night for real now. | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Another hint for the sequences: They are really ASN.1 (including | ||
+ | the tags) | ||
+ | <ocm geek> While normally, you have the bytecode and then untagged ASN.1 like encoded | ||
+ | data, inside the sequence instead of bytecodes the real ASN.1 tags are used. | ||
+ | <ocm geek> That means specifically: | ||
+ | | ||
+ | <ocm geek> While the bytecode 02 is "16 bit constant", | ||
+ | | ||
+ | < | ||
+ | <ocm geek> Byte blocks are encoded with ASN.1 tag 4 (OCTET STRING) that happens to | ||
+ | | ||
+ | <ocm geek> Nested sequences are encoded as ASN.1 sequences like the top sequence. | ||
+ | <ocm geek> BTW: your 30 80 02 04 08 20 00 20 02 04 08 20 80 00 02 03... constant you | ||
+ | | ||
+ | <ocm geek> I will cross-check. | ||
+ | < | ||
+ | <ocm geek> OK. It doesn' | ||
+ | used for DES encryption. | ||
+ | < | ||
+ | < | ||
+ | < | ||
+ | <ocm geek> Sorry. I did not yet look at calling bytecode. | ||
+ | <ocm geek> But you are right. That doesn' | ||
+ | <ocm geek> The code in the interpreter looks like it would execute it as is. | ||
+ | <ocm geek> Could you have something messed up with decryption? | ||
+ | </ |
ocmchatlog1.1241096098.txt.gz · Last modified: 2009/04/30 12:54 by megadiscman