As I explained about 2-3 weeks ago how I talked about how I made a Dolphin memory watcher and I was ready to start work on the scanner, well as expected, it was much easier and I even released the first beta a couple of days ago of the RAM search!

Yes, you can get the sources and run binaries by going to the Github page: https://github.com/aldelaro5/Dolphin-memory-engine

This is going to be a shorter post, but since doing the scanner was still quite the adventure, it’s enough to do a post about how exactly something as complex as a scanner is done and how I managed to make it work like the Cheat Engine one with similar or even better performance.

How memory is searched at runtime

Memory in this case, imagine it as a typical database with millions of entries all containing values and their keys or unique identification would be the addresses.  At the most basic level, this is how memory is treated when you want to search through it, except it has many particularities.

The first is because the memory is analysed at runtime, you have to deal with constantly changing entries in that database I mentioned.  Something you queried a second ago might have gone through 60 changes so you constantly have to update the RAM by fetching it internally.  The second is how abstract memory is, as I said in my memory watch post, memory has no meaning until you choose to give it one, but all of them are in theory correct, it’s just one of them only would make the most sense.  You could read a string of text as floating point numbers, it’s just the result would make no sense, but by definition, you COULD do this.  Because of this, when you do a query on the memory, you HAVE to specify how you will choose to interpret it this time around.  Even better, if you think about this, this entirely changes what is considered equal, bigger than or smaller than, the 3 very basic equality operation results depending on which interpretation you chose.

So how I would describe searching through memory is it’s more like a constantly changing and polymorphic database, you decide how you interpret it and every query is valid for so short.

Translating this to code

This is where C++ shines, it has features that made this FAR FAR FAR easier than many other languages you could use because first, it allows to explicitly allocate memory, playing with pointers and managing this memory however you want.  In fact, one of the biggest reason I chose C++ to begin with is its advanced memory and typing capability, it’s so powerful and this was definitely a use case that benefited from it.

Being able to allocate memory however you want made this scanner much more specialised towards Dolphin’s inner working.  For example, whether you scan for a GameCube game which uses only one region of memory or a Wii game which uses 2 separate regions, it would work because I can at runtime decide how much bytes I need.  It turns out that in both cases, I know how much I need, but I can do even better: for the Wii case, I can put these regions right next to each other as cache for the scanner to go through, this reduces the RAM usage and prevents error when trying to read in between, I just do 2 reads, one being the common region both console uses which I put at the beginning and the Wii one which I put right after the first one ended.

if (DolphinComm::DolphinAccessor::isMem2Enabled())
  {
    ramSize = Common::MEM1_SIZE + Common::MEM2_SIZE;
    m_scanRAMCache = new char[ramSize - 1];
    if (!DolphinComm::DolphinAccessor::readFromRAM(Common::dolphinAddrToOffset(Common::MEM2_START),
                                                   m_scanRAMCache + Common::MEM1_SIZE,
                                                   Common::MEM2_SIZE - 1, false))
    {
      delete[] m_scanRAMCache;
      return Common::MemOperationReturnCode::operationFailed;
    }
  }
  else
  {
    ramSize = Common::MEM1_SIZE;
    m_scanRAMCache = new char[ramSize - 1];
  }

  if (!DolphinComm::DolphinAccessor::readFromRAM(Common::dolphinAddrToOffset(Common::MEM1_START),
                                                 m_scanRAMCache, Common::MEM1_SIZE - 1, false))
  {
    delete[] m_scanRAMCache;
    return Common::MemOperationReturnCode::operationFailed;
  }

Now, the one impact this has is it’s going to go pretty fast because I am fetching the data into the RAM which is already quite fast for this.  I studied the code of Cheat Engine a lot throughout this (more on that later) and I found that interestingly, they actually fetch the data into temporary files and perform their scans on it.  Because they are using the disk, it’s much slower, but it’s more likely they did this because Cheat Engine is made to hack PC games and we know how modern games can take like gigabytes of RAM because the assets takes a lot of space so obviously they don’t want to saturate the RAM with such a high quantity.  However, for me, I can permit myself to use the RAM, at most, a Wii scan would take 300MB of memory which is actually not too bad considering Dolphin doesn’t even take NEARLY as much RAM (they anyway recommend specs that implies that you would be comfortable with them with AT THE VERY LEAST 4GB, but again, that’s still pretty low).  Because of this, it can be way faster than CE, especially for when doing the first scan through the entire memory,

As for advanced typing, take a look at this piece of code which honestly, I am very proud I got this to work:

  template <typename T>
  inline T convertMemoryToType(const char* memory, bool invert) const
  {
    T theType;
    std::memcpy(&theType, memory, sizeof(T));
    if (invert)
      theType *= -1;
    return theType;
  }

  template 
  inline CompareResult compareMemoryAsNumbersWithType(const char* first, const char* second,
                                                      const char* offset, bool offsetInvert,
                                                      bool bswapSecond) const
  {
    T firstByte;
    T secondByte;
    std::memcpy(&firstByte, first, sizeof(T));
    std::memcpy(&secondByte, second, sizeof(T));
    size_t size = sizeof(T);
    switch (size)
    {
    case 2:
    {
      u16 firstHalfword = 0;
      std::memcpy(&firstHalfword, &firstByte, sizeof(u16));
      firstHalfword = Common::bSwap16(firstHalfword);
      std::memcpy(&firstByte, &firstHalfword, sizeof(u16));
      if (bswapSecond)
      {
        std::memcpy(&firstHalfword, &secondByte, sizeof(u16));
        firstHalfword = Common::bSwap16(firstHalfword);
        std::memcpy(&secondByte, &firstHalfword, sizeof(u16));
      }
      break;
    }
    case 4:
    {
      u32 firstWord = 0;
      std::memcpy(&firstWord, &firstByte, sizeof(u32));
      firstWord = Common::bSwap32(firstWord);
      std::memcpy(&firstByte, &firstWord, sizeof(u32));
      if (bswapSecond)
      {
        std::memcpy(&firstWord, &secondByte, sizeof(u32));
        firstWord = Common::bSwap32(firstWord);
        std::memcpy(&secondByte, &firstWord, sizeof(u32));
      }
      break;
    }
    case 8:
    {
      u64 firstDoubleword = 0;
      std::memcpy(&firstDoubleword, &firstByte, sizeof(u64));
      firstDoubleword = Common::bSwap64(firstDoubleword);
      std::memcpy(&firstByte, &firstDoubleword, sizeof(u64));
      if (bswapSecond)
      {
        std::memcpy(&firstDoubleword, &secondByte, sizeof(u64));
        firstDoubleword = Common::bSwap64(firstDoubleword);
        std::memcpy(&secondByte, &firstDoubleword, sizeof(u64));
      }
      break;
    }
    }

    if (firstByte != firstByte)
      return CompareResult::nan;

    if (firstByte < (secondByte + convertMemoryToType(offset, offsetInvert)))
      return CompareResult::smaller;
    else if (firstByte > (secondByte + convertMemoryToType(offset, offsetInvert)))
      return CompareResult::bigger;
    else
      return CompareResult::equal;
  }

This is (except for 2 types which are string and array of byte) is the ENTIRE comparison logic, no joke, it covers EVERY number types.  The idea is at the end of the day, I want to know a comparison result between 2 pieces of memory using a type (like integer 32 bit unsigned for example), they are equal, first is smaller, first is bigger or the second was a nan and I should straight up ignore it (fun fact, nan cannot even compare to themselves so this explains the weird if condition).  To determine the type, I use another function that dispatch the appropriate template parameter depending on the scanner type.  This saves probably hundreds of lines of code and it makes everything easy because it works and it’s very maintainable, I don’t have to change 15 functions to fix a bug, I change it in one place.  Honestly, this is the piece of code I am the proudest to have done, it just shows how powerful templates could be and until now, I didn’t find any great reasons to use them.

So, once this worked, I simply call it on every memory entry I find, which brings to another very important point that honestly, I was worried about what would happen.

Faster, FASTEEEEER!

This was in the entire project the most critical point performance wise because I do SO MUCH operation in such a short amount of time and I obviously want this to go fast, even a second seems a bit long for this because I expect the user to run a lot of scans on a single search.

At the beginning, it worked, but with naive approach like doing an allocation on a 26 millions iterations loop, this is where I learned that allocating memory is very slow.  No joke like I think I cut 90% of the time to do a scan JUST by allocating the least I could.  Then, it got weird like apparently, putting the keyword const in parameters that you don’t change in the function is not only a protection for the developer, but it also allows the compiler to optimise the function even further.  I took precaution like inlining functions (which doesn’t do calls, it actually copy paste the function when there is a call so you don’t have the overhead of calling it), but even adding const made it better.

After that, I honestly do not get this but apparently, GCC is a godlike compiler, I used std::chrono on both MSVC 2017 and GCC to compare the performance with optimizations and surprisingly, GCC was 10 TIMES FASTER!  I really was surprised, the exact same code behaved so differently on these 2 compilers, it’s something I never thought would happen.  It still got very good on both, but it’s a difference of 1ms vs 0.1ms which is crazy fast, not even CE could get this.

About CE though, there’s one thing I had to do to be sure I was doing the right things.  CE has its sources public and it’s all in my interest to know what they did because my goal is to be at least as fast as CE, it was pretty darn fast with Dolphin from my years of experience with it.  Well, the problem is, to know this I had to read tons of pascal code…..which sucked, but even worse, it’s impossible to profile this without actually building it with a measurement of some kind.  So……I had to download lazarus, a free pascal IDE and…..OH MY ARCEUS!

Capture
On the bottom, this is where you write your code, the UI is……everything else and the thing I want to add is the little thing saying time to profile, can you find it?  it’s like playing where’s Charlie 🙂

Okay, let me say this, I don’t know how the heck such a big project can be managed with this, like what the heck is this mess?  Note, this is me JUST OPENING the project, idk what’s going on with lazarus, but this is by far the worst way to present an IDE I have EVER seen. EVEN A COBOL IDE I USED AT COLLEGE LOOKED MORE PLEASING THAN THIS!

After I spent sometime to just freaking get through this mess, I felt annoyed to have to go through the mainUnit code which is like…..9000 lines long, what?  ALL I needed to do is add a freaking time counter in the UI and that was it.  Luckily, it was simple, but man, I respect the dedication the author of CE puts into this after all these years, it works sure, but maintainability goes out the window for sure on this.  By the way, the only reason I was even able to do this is because the college I was in when I was learning computer science made us learn Delphi which is a derivate of Pascal and……honestly, I hope you won’t have to deal with this in college, it’s not a good language tbh.

Anyway, the conclusion was clear: CE is for what it’s doing with the disk surprisingly very performant and even my RAM search with a more optimised language and using the RAM instead of the disk, sometime CE was faster within the margin of error.  As expected however, it happened a lot of time where my scanner was slightly faster or the same so it’s a success!

Figuring out updates

This is where I encountered the most amount of bugs, I am messing with the memory here, so much CAN go wrong that a lot DID go wrong and it’s very complex code so it’s not that easy to debug.  I probably got the most segfault I ever had in such short time lol.  The basic idea is I learned that reading one massive read of 24MB is MUCH faster than reading several times little amount.  The problem is because of what I said above about not wanting to allocate memory inside the scan loop, I only store the addresses.  This means that in order for me to be able to fetch their current values, even more update after the scan, I have to fetch the entire new ram into a cache and later one, the UI will query it from that cache that has already been fetched.  It’s just so massive reads that I actually couldn’t get the result list update timer to 10ms like the others, I decided to make it 100ms which fun fact, after more research into CE code, it’s the same timer used in their memory viewer, indicating I was on the right track here.

The result: you have a result list that appears once 1000 or less results are found, for each of them, the address, scanned values and the current values are displayed, the current value is always updated so the user can figure out before adding the address to the watch list which one likely makes the most sense.  I really wanted this to work like CE because it was one of the best features of the CE scanner.

And there we go!  that’s how I was able to do a memory scanner.  It was much easier than the watcher, but it was surely an adventure 🙂

Going further with the project

I released a beta of the project and from comments I received from people who tried it, I consider this release a success.  People seem to enjoy a lot the reduced hassle of just setting it up and of course, the multilevel pointer support is really the most hyped feature.  So overall, I say this project was and will be definitely worth my time and going this far makes me even more motivated to continue.

If you are interested int he project, int he Github page I linked, there’s binaries releases in the release tab, I am open to bug reports and features request which you can ask in the issues tab.  Btw, if you plan to propose a feature request, read the roadmap before, it has all the features I already planned in the future.

With that, I am happy that a GOOD RAM search is finally something that’s coming for Dolphin, I can’t wait the finding done with this 🙂

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s