C++ Question Hooking N->1/Hooking Modules

meson800 · Aug 16, 2015

So I've been working some more on catch-ctd...

Now I want to hook every module that Orbiter loads.

If the module uses the new module method (inheriting from oapi::Module and registering that), hooking is really easy. I just simply create a module proxy to forward the events to the real module.

The issue is when a module is using the deprecated method, overloading the opc callbacks.

The issue is, unlike vessels, where the vessel DLL is called through LoadLibrary just prior to vessel creation, that I don't have any context.

I need to, if they exist, hook the following methods:

Code:

opcOpenRenderViewport
opcCloseRenderViewport 
opcPreStep
opcPostStep
opcFocusChanged
opcTimeAccChanged
opcPause
opcDeleteVessel

in every DLL. However, if I only use one function (so redirect all the opcPreStep functions to MyOpcPreStep), I don't know which original function to call!

The replacement function has to have the same signature of the original, so I can't pass additional data to MyOpcPreStep, and Orbiter doesn't reload the module DLL, so I can't use the "hint" method I used for vessel hooking.

I thought of a possible solution, which would be to define a MyOpcPreStep with an additional argument, designating which original function to call. Then, I would create a bunch of stubs that would look like this:

Code:

MyOpcPreStep_helper1(original_prestep_args)
{
  MyOpcPreStep([b]1[/b], original_prestep_args);
}
MyOpcPreStep_helper2(original_prestep_args)
{
  MyOpcPreStep([b]2[/b], original_prestep_args);
}

Then, for each new module, hook the original functions for that module to an unused stub.

The downside to this is I potentially need to create a lot of stubs. Even if I created 1000 stubs, it's still possible for catch-ctd to stop working with an obnoxious amount of addons installed.

As I was thinking about the stubs, they are all basically the same. Is there a way to create them at runtime? I know I can allocate memory, write bytes in that represent those functions, and mark it executable, get a pointer to the memory, and cast it to a function pointer. This should be "technically" doable, I just don't know how to start.

As to the calling convention (my knowledge is very sketchy), would the following work?

1. Write a MyOpcPreStep, with an additional first argument as an index to some vector that stores the original functions.
2. Write a naked stub function that pushes an integer to the stack, then jumps to MyOpcPreStep
3. Somehow convert that to bytes and malloc memory for the stub function, set it executable, and set it as the hook destination.

Because the naked function doesn't screw the stack, and the opcPreStep should (I think) use the _cdecl calling convention, arguments are pushed in reverse order.

The stub's push/jump to MyOpcPreStep would then effectively insert the additional info needed for MyOpcPreStep!

If that sounds legit, how do I do number 3?

---------- Post added at 04:38 PM ---------- Previous post was at 04:33 PM ----------

Here's how I imagine that would work:

Detours inserts a hook to the generated stub
Orbiter calls some module's opcPreStep
The detoured stub, which is inserted before the original function could mess with the stack, does a straight JMP to the stub
The stub is also a naked function, doesn't screw the stack. That function pushes an unsigned int to the stack, then JMPs to MyOpcPreStep
Because the signature on MyOpcPreStep is (unsigned int, original_prestep_args), it calls correctly

Face · Aug 17, 2015

meson800 said:
I need to, if they exist, hook the following methods:

Code:

opcOpenRenderViewport opcCloseRenderViewport opcPreStep opcPostStep opcFocusChanged opcTimeAccChanged opcPause opcDeleteVessel

in every DLL. However, if I only use one function (so redirect all the opcPreStep functions to MyOpcPreStep), I don't know which original function to call!

The replacement function has to have the same signature of the original, so I can't pass additional data to MyOpcPreStep, and Orbiter doesn't reload the module DLL, so I can't use the "hint" method I used for vessel hooking.

In OMP, I had a similar need to hook these callbacks. There I just hooked the vessel object's virtual table functions, while saving the original pointer for every class instance (which means that all instances of a class will have the same value). In the replacement functions, I used a global table to get the original function for this, then inline assembly to call that original function.

The hooking for OMP was used in order to forward these events to remote machines. For that, I've put the calls into a queue. This queue was processed in a thread asynchronously and send to the remote machine. The performance impact with this is close to zero, although every call was logged synchronously via fprintf().

Repo's down for the moment, but if you want the code, just drop me a note and I will send it to you.

---------- Post added at 09:20 ---------- Previous post was at 09:11 ----------

Ah, wait, that's about the really old method of getting events... my answer was for hooking the VESSEL:: callbacks instead. :facepalm:

Sorry for the noise.

---------- Post added at 15:30 ---------- Previous post was at 09:20 ----------

I don't know how detours works, but I would expect it to call the stub with the original function address already.

If so, you can always hold a global table that matches original function to DLL, so you know where it comes from.

meson800 · Aug 17, 2015

Face said:
Repo's down for the moment, but if you want the code, just drop me a note and I will send it to you.

I already have that hooking code from when you sent me it for Simpit-Controller :lol:

Face said:
I don't know how detours works, but I would expect it to call the stub with the original function address already.

If so, you can always hold a global table that matches original function to DLL, so you know where it comes from.

That would be really nice, but it doesn't.

Effectively, you pass detours a reference of a function pointer of the function to hook, and a function pointer of your function.

When Detours returns from the hooking call, the reference you passed of the original function gets overwritten to the "new" location to the original function.

You have to hold onto that pointer yourself, your replacement function doesn't get it.

It works kind of like:

Code:

void * originalFunction = oapiRegisterModule;

//somewhere
DetourAttasch(originalFunction, MyRegisterModule);
//now, use originalFunction to access original function

MyRegisterModule(signature)
{
  //logging code
  originalFunction(signature args);
}

---------- Post added at 11:41 AM ---------- Previous post was at 11:33 AM ----------

That is what I'm trying to do though, make a function wrapper, at runtime, that passes the original function pointer to the replacement function.

Does anyone have experience writing small ASM functions at runtime?

Face · Aug 17, 2015

meson800 said:
I already have that hooking code from when you sent me it for Simpit-Controller :lol:

Ups, sorry, did not remember that I already talked about that with you. :tiphat:

meson800 said:
When Detours returns from the hooking call, the reference you passed of the original function gets overwritten to the "new" location to the original function.

I see. I take it the "new" location is a trampoline that gets the overwritten instructions incorporated together with a jump back to the rest of the original function.

What about hooking the IAT instead of inline rewriting? Use a CALL there instead of a traditional JMP, then - in the stub - unbalance the stack by poping that return address. This way, you'll have a pointer you can use as key in a global table to get the original function.

It would not be as comfortable as calling DetourAttach, though. BTW: isn't there something like DetourAttachEx, too? There you can get the trampoline, right? Perhaps you can do a CALL in that trampoline, too.

meson800 · Aug 17, 2015

More specifically, how must I clean the stack in my stub function?

Detours calls my stub function, expecting that it has N arguments, thinking that it is a _cdecl function. The stub is actually a naked function which pushes an extra variable, and calls the real function.

The real function is also a _cdecl function, so it doesn't clean the stack.

If I just return out of the stub, there will be stack corruption, as there is an unexpected "extra" argument on the stack.

Would a pop at the end of the stub before the return be enough to clean the stack?

Here's what my revised view of the naked stub would be:

Code:

PUSH original_function_id
CALL MyReplacementFunction
POP
RETURN

Would that work?

---------- Post added at 11:55 AM ---------- Previous post was at 11:51 AM ----------

Face said:
I see. I take it the "new" location is a trampoline that gets the overwritten instructions incorporated together with a jump back to the rest of the original function.

Yes.

Face said:
What about hooking the IAT instead of inline rewriting? Use a CALL there instead of a traditional JMP, then - in the stub - unbalance the stack by poping that return address. This way, you'll have a pointer you can use as key in a global table to get the original function.

You would have to enlighten me on that. The IAT is just a table of function pointers that the DLL exports, right?

Either way, you have to have a stub that gives the function additional context. It would be cleaner than DetourAttach...

I think the "unbalancing the stack" was what I was thinking about before you :ninja:

Face said:
It would not be as comfortable as calling DetourAttach, though. BTW: isn't there something like DetourAttachEx, too? There you can get the trampoline, right? Perhaps you can do a CALL in that trampoline, too.

You lost me there. Yes, you can get the trampoline address too, I'm not sure what "doing a call in that" would entail.

Face · Aug 17, 2015

meson800 said:
You would have to enlighten me on that. The IAT is just a table of function pointers that the DLL exports, right?

Either way, you have to have a stub that gives the function additional context. It would be cleaner than DetourAttach...

I think the "unbalancing the stack" was what I was thinking about before you :ninja:

The Import Table is just a list of pointers, but the Import Address Table (the thing that is in memory) is a list of JMP commands AFAIK. Hooking this, you normally overwrite the JMP address with a new one. But nothing stops you from also overwriting the JMP command itself with a CALL additionally. Then you implicitly put the table's next memory slot address on the stack, which you can use as a key to determine which DLL was called.

However, in order to unbalance the stack (and effectively make the CALL a JMP again), you have to remove the "superfluous" argument from there before you do the usual calling convention operations. A plain cdecl function might perhaps already interfere with this before it gets to the first compiled instruction.
I guess that it will already work if you have a cdecl stub that simply adds the key pointer as first argument in its signature, pops the stack, calls the stdcall function (MyOpc), then JMPs to the address returned by a global table which uses the key pointer to get the original JMP address. Not sure, though, would have to test that first.

Just don't kill me if it crashes your Orbiter

.

meson800 · Aug 17, 2015

Face said:
However, in order to unbalance the stack (and effectively make the CALL a JMP again), you have to remove the "superfluous" argument from there before you do the usual calling convention operations. A plain cdecl function might perhaps already interfere with this before it gets to the first compiled instruction.
I guess that it will already work if you have a cdecl stub that simply adds the key pointer as first argument in its signature, pops the stack, calls the stdcall function (MyOpc), then JMPs to the address returned by a global table which uses the key pointer to get the original JMP address. Not sure, though, would have to test that first.

Just don't kill me if it crashes your Orbiter .

Ok, I think I have my head wrapped around that :uhh:

calls the stdcall function (MyOpc)

Are the Orbiter callbacks stdcall? I thought they were cdecl (because no calling convention is defined for them)

Let me try screwing around with a static ASM stub before I try implementing it at runtime.

Face · Aug 17, 2015

meson800 said:
Are the Orbiter callbacks stdcall? I thought they were cdecl (because no calling convention is defined for them)

Indeed, you are right. I've just checked the project setting for my addon projects, and they have "/Gd" as convention parameter setting, making it default to cdecl.

---------- Post added at 19:38 ---------- Previous post was at 19:04 ----------

I've just checked the following:

Get a DLL with LoadLibraryA. I used Meshdebug.dll, as it has an opcPreStep.
Get a function pointer with GetProcAddress.
Call the function.
Check disassembly.

Unfortunately, the proposed IAT approach is not working, because there is no IAT involved at all with this. With GetProcAddress, you'll get the function address directly, no IAT stub involved. So it has to be inline, anyway.

meson800 · Aug 17, 2015

Hmmm I just tried the following, with inline hooking. It gets to the main function, but crashes with invalid memory access to 0x0000000

Which means I'm not clearing stack correctly or something.

Here's my preStep functions:

Code:

__declspec(naked) void MyPreStepHelperStub0(double simt, double simdt, double mjd)
{
    __asm
    {
        push 0
        call MyPreStep
        ret 4
    }
}

void MyPreStep(unsigned int originalPointerIndex, double simt, double simdt, double mjd)
{
    Log::writeToLogDameon("Got original pointer index:", originalPointerIndex, ", simt:", simt, ", simdt:", simdt, ", mjd:", mjd, "\r\n");
}

The hook is installed with DetourAttach, with the original function location gotten from GetProAddress, and the destination being the stub function.

It gets to the writeToLog function, as the following is logged right before the crash:

Code:

Got original pointer index:0, simt:1.99916e+037, simdt:1.99916e+037, mjd:-6.40469e+180

So the 0 gets correctly passed to the function, and simt/simdt/mjd looks good.

Maybe the 0 is being used as the stack frame for the ret?

---------- Post added at 02:03 PM ---------- Previous post was at 01:55 PM ----------

Nevermind, evidentily the ret 4 didn't clean correctly.

I edited it to be:

Code:

            push 0
            call MyPreStep
            add ESP, 4
            ret

and it worked.

Is that the correct way to do it? It doesn't seem to crash now, but who knows...

---------- Post added at 02:05 PM ---------- Previous post was at 02:03 PM ----------

Now I just need to figure out how to write functions like that at runtime.

My plan is just to look at the disassembly, and copy the bytes into an array, replacing the byte representing 0 with the current index (so MyOpcPreStep can find the correct preStep)

Then malloc some memory, copy the bytes in. My knowledge is sketchy on how to make that executable and get a function pointer to it. I'll try to and report back.

---------- Post added at 02:35 PM ---------- Previous post was at 02:05 PM ----------

Found a link to how to do it.

jedidia · Aug 17, 2015

Admitt it, you are creating skynet! (and we lack a tinfoil hat emoticon, evidently)

meson800 · Aug 18, 2015

jedidia said:
Admitt it, you are creating skynet! (and we lack a tinfoil hat emoticon, evidently)

You caught me :lol:

Just wait until I combine it with my neural net project

---------- Post added at 10:44 PM ---------- Previous post was at 04:01 PM ----------

So I'm working on the solution, except it turns out the VirtualAlloc only allocates in 4KB pages.

Each of my stub functions is around 6 bytes long :uhh:

Looks like I'll be writing my own memory allocator so I don't eat tons of memory

orb · Aug 18, 2015

You don't need to allocate a separate page for each stub. You can fit 512 6-byte stubs in a single page (with alignment, or 682 without alignment).

If you need to change access mode to only a part of the page you can use VirtualProtect function.

meson800 · Aug 19, 2015

orb said:
You don't need to allocate a separate page for each stub. You can fit 512 6-byte stubs in a single page (with alignment, or 682 without alignment).

If you need to change access mode to only a part of the page you can use VirtualProtect function.

Thanks, that's the plan. I still have to track how much of a page I've consumed and allocate a new one if needed.

I'm just mostly complaining to complain :lol:

---------- Post added at 07:29 PM ---------- Previous post was at 10:44 AM ----------

Ok, I've ran into a problem.

I'm having troubles with VirtualProtect.

The following does not work:

Code:

struct Page { //page auto-allocates itself Page(unsigned int pageSize); //page deallocates itself when deleted ~Page(); void* startingAddress; void* nextFreeAddress; std::vector<AllocatedFunction> functions; };

Where

Code:

Page::Page(unsigned int pageSize) { void * startingLocation = VirtualAlloc(NULL, pageSize,MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE); if (startingLocation == 0) throw std::runtime_error("Couldn't allocate new page"); startingAddress = startingLocation; nextFreeAddress = startingLocation; }

Later:

Code:

vector.push_back(Page(4096)); DWORD oldProtectSetting; VirtualProtect(vector.back().startingAddress, 1, PAGE_READWRITE, &oldProtectSetting)

The virtual protect call fails with error 487: invalid address.

However, if I put the following in the constructor, right after the VirtualAlloc call:

Code:

DWORD oldProtectSetting; VirtualProtect(startingAddress, 1, PAGE_READWRITE, &oldProtectSetting);

the VirtualProtect call works.

I have verified that, in both positions, startingAddress is used and is the same.

In any case, if I leave the call to VirtualProtect in the constructor and try to write into the memory location, I get an access violation.

I think that I must be transforming the memory address into something invalid, but by checking with the debugger, the address is the same :uhh:

---------- Post added at 08:05 PM ---------- Previous post was at 07:29 PM ----------

Ok, I'm working on that problem. It's something in my memory allocator. Fixed, I was copying the Page object in the push_back call.

Anyway, I'm having issues with the call function.

Modeling my assembly off of the diassembled stub:

Code:

            push 0
0367C5A0 6A 00                push        0  
            call MyPreStep
0367C5A2 E8 2F 50 FE FF       call        MyPreStep (036615D6h)  
            add ESP, 4
0367C5A7 83 C4 04             add         esp,4  
            ret
0367C5AA C3                   ret

I have the following assembly code:

Code:

unsigned char preStepStub[14] = 
//push  ----------guid--------
 {0x68, 0x00, 0x00, 0x00, 0x00, 
//call  -----memory location--  
  0xE8, 0x00, 0x00, 0x00, 0x00, 
//add   esp , 4
  0x83, 0xC4, 0x04, 
//ret
  0xC3

Opcode 0xFF is supposedly a absolute call, but that errors if I simply replace opcode E8 with FF.

I then copy in the GUID, and copy in the function pointer of the MyPreStep function.
However, the memory location the stub CALLS to is invalid.

Code:

        memcpy(preStepStub + 1, &(info.guid), sizeof(info.guid));
        //set memory location
        memcpy(preStepStub + 6, MyPreStep, sizeof((void*)MyPreStep));

Do I have to calculate an offset between the two functions or something?

---------- Post added at 11:05 PM ---------- Previous post was at 08:05 PM ----------

I tried opcode 0x9A, which is apparently also a far call, but that expects a six byte operand.

My pointer to MyPreStep is only four bytes...

orb · Aug 19, 2015

meson800 said:
Opcode 0xFF is supposedly a absolute call, but that errors if I simply replace opcode E8 with FF.

0xFF isn't a simple single byte instruction opcode. It's additionally using Mod-Reg-R/M byte to select the appropriate instruction.

If you don't want to calculate relative address for the 0xE8 call and instead you want to use the 0xFF call, middle part of the 2nd byte is used for the instruction's opcode (the instruction is unidirectional, so Reg field in the Mod-Reg-R/M byte is used for extending the opcode instead) and remaining part of it is used for Mod-R/M fields.

Here's binary structure of that CALL instruction's opcode (for call within the same segment):

{colsp=8}

byte 1

||{colsp=8}

byte 2

7|6|5|4|3|2|1|0||7|6|5|4|3|2|1|0

1|1|1|1|1|1|1|1||x|x|0|1|0|y|y|y

{colsp=17} Hexadecimal:

{colsp=4}

F

|{colsp=4}

F

||{colsp=4}

1,5,9,D

|{colsp=4}

0-7

xx = mod, yyy = r/m

mod:
00 - register indirect, or SIB when r/m == 100, or displacement only when r/m == 101
01 - 8-bit signed displacement, register, or SIB when r/m == 100
10 - 32-bit signed displacement, register, or SIB when r/m == 100
11 - register direct

r/m:
000 - eax
001 - ecx
010 - edx
011 - ebx
100 - esp or SIB when mod != 11
101 - ebp or displacement only when mod == 00
110 - esi
111 - edi

SIB = Scaled Index Byte

If you want to use memory, the operand isn't direct address of the function, but pointer to address of the function.

I.e.

Code:

FF 15 67 45 23 01

would be

Code:

call dword [01234567h]

and the address of the function should be placed in a variable (or a constant) which is at address 0x01234567.

If you want to put directly absolute address of the function in the code, you can for example use a register and store the address there and call the function pointed by that register (e.g. EAX) this way:

Code:

B8 10 32 54 06
FF D0

Code:

mov eax, 06543210h ; address of the function
call eax

meson800 said:
I tried opcode 0x9A, which is apparently also a far call, but that expects a six byte operand.

I haven't used far calls in assembly since programming in 16-bit, so I don't know what you'd need to put for the segment part of the 0x9A call as segment values aren't directly used in programming in 32-bit flat addressing model, but most likely the value of the CS register. You'd however also need to use retf (0xCB/0xCA) to return from that called function instead of retn (0xC3/0xC2).

meson800 · Aug 19, 2015

I wish I could thank a post more than once :tiphat:

That did the trick, copying the address into eax and then calling that. If only I had known that opcode setup before :lol:

All of the functions work beautifully now, and the stubs are automatically generated as module DLLs are loaded

Orb, what do you use as a reference for ASM? I've never really done ASM programming before, so I was searching for some website that presented info in a nice form like you did, but couldn't find one.

orb · Aug 19, 2015

meson800 said:
Orb, what do you use as a reference for ASM? I've never really done ASM programming before, so I was searching for some website that presented info in a nice form like you did, but couldn't find one.

At the time I was actively programming in assembly I used a book as I didn't have any access to Internet from home. I don't know the English title of the book - it could be translated as "Programming in assembly language". That book included x86 instruction reference up to the set included in 80486 processors. It had similar tables showing opcodes both in binary and hexadecimal form and additionally number of cycles the instructions executed, what flags they changed.

Currently I rather just search Internet every time I need it instead of using one of bookmarked sources, although I should have a few sources bookmarked somewhere (at least in some of old backups). I generally remember how x86 instructions are constructed (at least non FPU instructions), so I know what to search for at the time when I search.

Orbiter 2024 has been released!

C++ Question Hooking N->1/Hooking Modules

meson800

Addon Developer

Face

Well-known member

meson800

Addon Developer

Face

Well-known member

meson800

Addon Developer

Face

Well-known member

meson800

Addon Developer

Face

Well-known member

meson800

Addon Developer

jedidia

shoemaker without legs

meson800

Addon Developer

orb

New member

meson800

Addon Developer

orb

New member

meson800

Addon Developer

orb

New member

Similar threads