ReadProcessMemory vs MiniDumpWriteDump

ReadProcessMemory vs MiniDumpWriteDump - c#

I noticed that if I try to read the entirety of the process with ReadProcessMemory it takes VERY long. However, when doing a MiniDumpWriteDump, it happens in about 1 second.
Also for some reason the byte array becomes corrupt when trying to store the entire process when doing ReadProcessMemory and MiniDumpWriteDump doesn't.
Only problem is, when doing a MiniDumpWriteDump, I can't match the addresses/values in something like Cheat Engine. Like for example doing a byte array search returns a different address.
MiniDumpWriteDump(pHandle, procID, fsToDump.SafeFileHandle.DangerousGetHandle(), 0x00000002, IntPtr.Zero, IntPtr.Zero, IntPtr.Zero);
ReadProcessMemory(pHandle, (UIntPtr)0, test, (UIntPtr)procs.PrivateMemorySize, IntPtr.Zero);
ReadProcessMemory Length = 597577728
Dump Length = 372053153

if I try to read the entirety of the process with ReadProcessMemory it takes VERY long.
MiniDumpWriteDump is fast because it's a highly optimized function written my Microsoft themselves.
A proper pattern scan function that is checking page protection type and state by using VirtualQueryEx() with a limited number of wildcards won't take more than 10 seconds and in most cases will take less than 2 seconds.
This is C++ code but the logic will be the same in C#
#include <iostream>
#include <windows.h>
int main()
{
MEMORY_BASIC_INFORMATION meminfo;
char* addr = 0;
HANDLE hProc = OpenProcess(PROCESS_ALL_ACCESS, FALSE, GetCurrentProcessId());
MEMORY_BASIC_INFORMATION mbi;
char buffer[0x1000];
while (VirtualQueryEx(hProc, addr, &mbi, sizeof(mbi)))
{
if (mbi.State != MEM_COMMIT || mbi.Protect == PAGE_NOACCESS)
{
char* buffer = new char[mbi.RegionSize];
ReadProcessMemory(hProc, addr, buffer, mbi.RegionSize, nullptr);
}
addr += mbi.RegionSize;
}
CloseHandle(hProc);
}
Notice we check for MEM_COMMIT, if the memory isn't commited then it's invalid. Similarly if the protection is PAGE_NOACCESS we discard that memory as well. This simple technique will yield only proper memory worth scanning, resulting in a fast scan. After you Read each section into the local buffer, you can run your pattern scan code against that. Lastly just resolve the offset from the beginning of the region to the absolute address in the target process.

Related

System.AccessViolationException when copying data to portable device after installing windows 10 creators update 1703

I am attempting to transfer content to portable device using this code snippet
IPortableDeviceValues values =
GetRequiredPropertiesForContentType(fileName, parentObjectId);
IStream tempStream;
uint optimalTransferSizeBytes = 0;
content.CreateObjectWithPropertiesAndData(
values,
out tempStream,
ref optimalTransferSizeBytes,
null);
System.Runtime.InteropServices.ComTypes.IStream targetStream =
(System.Runtime.InteropServices.ComTypes.IStream)tempStream;
try
{
using (var sourceStream =
new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
var buffer = new byte[optimalTransferSizeBytes];
int bytesRead;
do
{
bytesRead = sourceStream.Read(
buffer, 0, (int)optimalTransferSizeBytes);
IntPtr pcbWritten = IntPtr.Zero;
if (bytesRead < (int)optimalTransferSizeBytes)
{
targetStream.Write(buffer, bytesRead, pcbWritten);
}
else
{
targetStream.Write(buffer, (int)optimalTransferSizeBytes, pcbWritten);
}
} while (bytesRead > 0);
}
targetStream.Commit(0);
}
finally
{
System.Runtime.InteropServices.Marshal.ReleaseComObject(tempStream);
}
When trying to execute targetStream.Write
System.AccessViolationException occuried.
This is reproducible only for windows 10, creators update 1703.
Could you please tell me what I am doing wrong?
Thanks in advance.

--Skip to the text in bold if you just want a fix!
Investigating further, the issue is in the low level native API: ISequentialStream::Write (which IStream Inherits) The MSDN page for this is: https://msdn.microsoft.com/en-us/library/windows/desktop/aa380014%28v=vs.85%29.aspx?f=255&MSPPError=-2147217396
Note the text for the argument: pcbWritten [out] reads
'A pointer to a ULONG variable where this method writes the actual number of bytes written to the stream object. The caller can set this pointer to NULL, in which case this method does not provide the actual number of bytes written.'
MS introduced a bug into this API (as called in PortableDeviceApi.dll) --> it is no longer checking whether or not pcbWritten is NULL before attempting to read/write from this variable - rather, it ALWAYS attempts to write the No Of Bytes written into this variable - if the API is called with this variable set to NULL then it BARFS. I have proven this is the case in pure native code by changing the way the API is called:
Works:
DWORD bytesWritten
IStream->Write(objectData, bytesRead, &bytesWritten)))
Fails:
IStream->Write(objectData, bytesRead, NULL))) <-- Note the NULL
Up in .Net, IStream.Write is marshalled thus:
void Write(byte[] pv,int cb,IntPtr pcbWritten)
and the demo code (from where we all likely got our implementations!) was:
IntPtr pcbWritten = IntPtr.Zero //<--Eg NULL
IStream.Write(buffer, bytesRead, pcbWritten);
Proposed Solution - Change code to:
IntPtr pcbWritten = Marshal.AllocHGlobal(4);
IStream.Write(buffer, bytesRead, pcbWritten);
Marshal.FreeHGlobal(pcbWritten);
This works around the issue - Hopefully MS will fix it to avoid us all having to re-distribute our products! The entire code is contained in PortableDeviceApi.dll (also including the stream stuff), which would explain why the entire world is not moaning about this issue and why it was not found during test. NB: For multi block copies in while loop, I suspect the alloc/free can be done outside the while without issue. Should also be safe if MS does fix the bug.
Credit: Alistair Brown (in our office) for fishing about, and allocating what should not need to be allocated and thus finding the issue.
Nigel

After banging my head on this for days, calling this on another thread solved the problem:
var t = Task.Run(() => { device.TransferContentToDevice(fileName, parent.Id); });
t.Wait();

I have this same issue also, it looks to be a bug/feature in the Creators Edition build 1703 of Windows 10.
The issue has been reported via the 'Visual Studio and .Net Framework' stream on Microsoft Connect: https://connect.microsoft.com/VisualStudio/Feedback/Details/3135829
(there is no specific area for Windows 10)
I would encourage anyone with this issue to head to the above and indicate that you can reproduce it also.
I have a simple repeater for it which may be downloaded here: http://scratch.veletron.com/WPD_FAIL_REPEATER.ZIP

SetConsoleActiveScreenBuffer does not display screen buffer

I am currently trying to write a console application in C# with two screen buffers, which should be swapped back and forth (much like VSync on a modern GPU). Since the System.Console class does not provide a way to switch buffers, I had to P/Invoke several methods from kernel32.dll.
This is my current code, grossly simplified:
static void Main(string[] args)
{
IntPtr oldBuffer = GetStdHandle(-11); //Gets the handle for the default console buffer
IntPtr newBuffer = CreateConsoleScreenBuffer(0, 0x00000001, IntPtr.Zero, 1, 0); //Creates a new console buffer
/* Write data to newBuffer */
SetConsoleActiveScreenBuffer(newBuffer);
}
The following things occured:
The screen remains empty, even though it should be displaying newBuffer
When written to oldBuffer instead of newBuffer, the data appears immediately. Thus, my way of writing into the buffer should be correct.
Upon calling SetConsoleActiveScreenBuffer(newBuffer), the error code is now 6, which means invalid handle. This is strange, as the handle is not -1, which the documentation discribes as invalid.
I should note that I very rarely worked with the Win32 API directly and have very little understanding of common Win32-related problems. I would appreciate any sort of help.

As IInspectable points out in the comments, you're setting dwDesiredAccess to zero. That gives you a handle with no access permissions. There are some edge cases where such a handle is useful, but this isn't one of them.
The only slight oddity is that you're getting "invalid handle" rather than "access denied". I'm guessing you're running Windows 7, so the handle is a user-mode object (a "pseudohandle") rather than a kernel handle.
At any rate, you need to set dwDesiredAccess to GENERIC_READ | GENERIC_WRITE as shown in the sample code.
Also, as Hans pointed out in the comments, the declaration on pinvoke.net was incorrect, specifying the last argument as a four-byte integer rather than a pointer-sized integer. I believe the correct declaration is
[DllImport("kernel32.dll", SetLastError = true)]
static extern IntPtr CreateConsoleScreenBuffer(
uint dwDesiredAccess,
uint dwShareMode,
IntPtr lpSecurityAttributes,
uint dwFlags,
IntPtr lpScreenBufferData
);

Setting Timestamps on files/directories is extremely slow

I'm working on a project, which requires to copy a lot of files and directories, while preserving their original timestamps. So I need to make many calls to the target's SetCreationTime(), SetLastWriteTime() and SetLastAccessTime() methods in order to copy the original values from source to target. As the screenshot below shows, these simple operations take up to 42% of the total computation time.
Since this is limiting my whole application's performance tremendously, I'd like to speed things up. I assume, that each of these calls requires to open and close a new stream to the file/directory. If that's the reason, I'd like to leave this stream open until I finished writing all attributes. How do I accomplish this? I guess this would require the use of some P/Invoke.
Update:
I followed Lukas' advice to use the WinAPI method CreateFile(..) with the FILE_WRITE_ATTRIBUTES. In order to P/Invoke the mentioned method I created following wrapper:
public class Win32ApiWrapper
{
[DllImport("kernel32.dll", SetLastError = true, CharSet = CharSet.Auto)]
private static extern SafeFileHandle CreateFile(string lpFileName,
[MarshalAs(UnmanagedType.U4)] FileAccess dwDesiredAccess,
[MarshalAs(UnmanagedType.U4)] FileShare dwShareMode,
IntPtr lpSecurityAttributes,
[MarshalAs(UnmanagedType.U4)] FileMode dwCreationDisposition,
[MarshalAs(UnmanagedType.U4)] FileAttributes dwFlagsAndAttributes,
IntPtr hTemplateFile);
public static SafeFileHandle CreateFileGetHandle(string path, int fileAttributes)
{
return CreateFile(path,
(FileAccess)(EFileAccess.FILE_WRITE_ATTRIBUTES | EFileAccess.FILE_WRITE_DATA),
0,
IntPtr.Zero,
FileMode.Create,
(FileAttributes)fileAttributes,
IntPtr.Zero);
}
}
The enums I used can be found here.This allowed me to do all all things with only opening the file once: Create the file, apply all attributes, set the timestamps and copy the actual content from the original file.
FileInfo targetFile;
int fileAttributes;
IDictionary<string, long> timeStamps;
using (var hFile = Win32ApiWrapper.CreateFileGetHandle(targetFile.FullName, attributeFlags))
using (var targetStream = new FileStream(hFile, FileAccess.Write))
{
// copy file
Win32ApiWrapper.SetFileTime(hFile, timeStamps);
}
Was it worth the effort? YES. It reduced computation time by ~40% from 86s to 51s.
Results before optimization:
Results after optimization:

I'm not a C# programmer and I don't know how those System.IO.FileSystemInfo methods are implemented. But I've made a few tests with the WIN32 API function SetFileTime(..) which will be called by C# at some point.
Here is the code snippet of my benchmark-loop:
#define NO_OF_ITERATIONS 100000
int iteration;
DWORD tStart;
SYSTEMTIME tSys;
FILETIME tFile;
HANDLE hFile;
DWORD tEllapsed;
iteration = NO_OF_ITERATIONS;
GetLocalTime(&tSys);
tStart = GetTickCount();
while (iteration)
{
tSys.wYear++;
if (tSys.wYear > 2020)
{
tSys.wYear = 2000;
}
SystemTimeToFileTime(&tSys, &tFile);
hFile = CreateFile("test.dat",
GENERIC_WRITE, // FILE_WRITE_ATTRIBUTES
0,
NULL,
OPEN_EXISTING,
FILE_ATTRIBUTE_NORMAL,
NULL);
if (hFile == INVALID_HANDLE_VALUE)
{
printf("CreateFile(..) failed (error: %d)\n", GetLastError());
break;
}
SetFileTime(hFile, &tFile, &tFile, &tFile);
CloseHandle(hFile);
iteration--;
}
tEllapsed = GetTickCount() - tStart;
I've seen that the expensive part of setting the file times is the opening/closing of the file. About 60% of the time is used to open the file and about 40% to close it (which needs to flush the modifications to disc). The above loop took about 9s for 10000 iterations.
A little research showed that calling CreateFile(..) with FILE_WRITE_ATTRIBUTES (instead of GENERIC_WRITE) is sufficient to change the time attributes of a file.
This modification speed things up significantly! Now the same loop finishes within 2s for 10000 iterations. Since the number of iterations is quite small I've made a second run with 100000 iterations to get a more reliable time measurement:
FILE_WRITE_ATTRIBUTES: 5 runs with 100000 iterations: 12.7-13.2s
GENERIC_WRITE: 5 runs with 100000 iterations: 63.2-72.5s
Based on the above numbers my guess is that the C# methods use the wrong access mode when opening the file to change to file time. Or some other C# behavior slows things down...
So maybe a solution to your speed issue is to implement a DLL which exports a C function which changes the file times using SetFileTime(..)? Or maybe you can even import the functions CreateFile(..), SetFileTime(..) and CloseHandle(..) directly to avoid calling the C# methods?
Good luck!

Why does a recursive call cause StackOverflow at different stack depths?

I was trying to figure out hands-on how tail calls are handled by the C# compiler.
(Answer: They're not. But the 64bit JIT(s) WILL do TCE (tail call elimination). Restrictions apply.)
So I wrote a small test using a recursive call that prints how many times it gets called before the StackOverflowException kills the process.
class Program
{
static void Main(string[] args)
{
Rec();
}
static int sz = 0;
static Random r = new Random();
static void Rec()
{
sz++;
//uncomment for faster, more imprecise runs
//if (sz % 100 == 0)
{
//some code to keep this method from being inlined
var zz = r.Next();
Console.Write("{0} Random: {1}\r", sz, zz);
}
//uncommenting this stops TCE from happening
//else
//{
// Console.Write("{0}\r", sz);
//}
Rec();
}
Right on cue, the program ends with SO Exception on any of:
'Optimize build' OFF (either Debug or Release)
Target: x86
Target: AnyCPU + "Prefer 32 bit" (this is new in VS 2012 and the first time I saw it. More here.)
Some seemingly innocuous branch in the code (see commented 'else' branch).
Conversely, using 'Optimize build' ON + (Target = x64 or AnyCPU with 'Prefer 32bit' OFF (on a 64bit CPU)), TCE happens and the counter keeps spinning up forever (ok, it arguably spins down each time its value overflows).
But I noticed a behaviour I can't explain in the StackOverflowException case: it never (?) happens at exactly the same stack depth. Here are the outputs of a few 32-bit runs, Release build:
51600 Random: 1778264579
Process is terminated due to StackOverflowException.
51599 Random: 1515673450
Process is terminated due to StackOverflowException.
51602 Random: 1567871768
Process is terminated due to StackOverflowException.
51535 Random: 2760045665
Process is terminated due to StackOverflowException.
And Debug build:
28641 Random: 4435795885
Process is terminated due to StackOverflowException.
28641 Random: 4873901326 //never say never
Process is terminated due to StackOverflowException.
28623 Random: 7255802746
Process is terminated due to StackOverflowException.
28669 Random: 1613806023
Process is terminated due to StackOverflowException.
The stack size is constant (defaults to 1 MB). The stack frames' sizes are constant.
So then, what can account for the (sometimes non-trivial) variation of stack depth when the StackOverflowException hits?
UPDATE
Hans Passant raises the issue of Console.WriteLine touching P/Invoke, interop and possibly non-deterministic locking.
So I simplified the code to this:
class Program
{
static void Main(string[] args)
{
Rec();
}
static int sz = 0;
static void Rec()
{
sz++;
Rec();
}
}
I ran it in Release/32bit/Optimization ON without a debugger. When the program crashes, I attach the debugger and check the value of the counter.
And it still isn't the same on several runs. (Or my test is flawed.)
UPDATE: Closure
As suggested by fejesjoco, I looked into ASLR (Address space layout randomization).
It's a security technique that makes it hard for buffer overflow attacks to find the precise location of (e.g.) specific system calls, by randomizing various things in the process address space, including the stack position and, apparently, its size.
The theory sounds good. Let's put it into practice!
In order to test this, I used a Microsoft tool specific for the task: EMET or The Enhanced Mitigation Experience Toolkit. It allows setting the ASLR flag (and a lot more) on a system- or process-level.
(There is also a system-wide, registry hacking alternative that I didn't try)
In order to verify the effectiveness of the tool, I also discovered that Process Explorer duly reports the status of the ASLR flag in the 'Properties' page of the process. Never saw that until today :)
Theoretically, EMET can (re)set the ASLR flag for a single process. In practice, it didn't seem to change anything (see above image).
However, I disabled ASLR for the entire system and (one reboot later) I could finally verify that indeed, the SO exception now always happens at the same stack depth.
BONUS
ASLR-related, in older news: How Chrome got pwned

I think it may be ASLR at work. You can turn off DEP to test this theory.
See here for a C# utility class to check memory information: https://stackoverflow.com/a/8716410/552139
By the way, with this tool, I found that the difference between the maximum and minimum stack size is around 2 KiB, which is half a page. That's weird.
Update: OK, now I know I'm right. I followed up on the half-page theory, and found this doc that examines the ASLR implementation in Windows: http://www.symantec.com/avcenter/reference/Address_Space_Layout_Randomization.pdf
Quote:
Once the stack has been placed, the initial stack pointer is further
randomized by a random decremental amount. The initial offset is
selected to be up to half a page (2,048 bytes)
And this is the answer to your question. ASLR takes away between 0 and 2048 bytes of your initial stack randomly.

This C++11 code prints the offset of the stack within the start page:
#include <Windows.h>
#include <iostream>
using namespace std;
#if !defined(__llvm__)
#pragma warning(disable: 6387) // handle could be NULL
#pragma warning(disable: 6001) // using uninitialized memory
#endif
int main()
{
SYSTEM_INFO si;
GetSystemInfo( &si );
static atomic<size_t> aPageSize = si.dwPageSize;
auto theThread = []( LPVOID ) -> DWORD
{
size_t pageSize = aPageSize.load( memory_order_relaxed );
return (DWORD)(pageSize - ((size_t)&pageSize & pageSize - 1));
};
constexpr unsigned ROUNDS = 10;
for( unsigned r = ROUNDS; r--; )
{
HANDLE hThread = CreateThread( nullptr, 0, theThread, nullptr, 0, nullptr );
WaitForSingleObject( hThread, INFINITE );
DWORD dwExit;
GetExitCodeThread( hThread, &dwExit );
CloseHandle( hThread );
cout << dwExit << endl;
}
}
Linux doesn't randomize the lower 12 bits per default:
#include <iostream>
#include <atomic>
#include <pthread.h>
#include <unistd.h>
using namespace std;
int main()
{
static atomic<size_t> aPageSize = sysconf( _SC_PAGESIZE );
auto theThread = []( void *threadParam ) -> void *
{
size_t pageSize = aPageSize.load( memory_order_relaxed );
return (void *)(pageSize - ((size_t)&pageSize & pageSize - 1));
};
constexpr unsigned ROUNDS = 10;
for( unsigned r = ROUNDS; r--; )
{
pthread_t pThread;
pthread_create( &pThread, nullptr, theThread, nullptr );
void *retVal;
pthread_join( pThread, &retVal );
cout << (size_t)retVal << endl;
}
}
The issue here is that randomizing the thread stack's starting address within a page doesn't make sense from a security standpoint. The issue is simply that when you have a 64 bit system with 47 bit userspace (on newer Intel-CPUs you even have 55 bit userspace) you have still 35 bits to randomize, i.e. about 34 billion placements of a stack. And it doesn't make sense from a performance standpoint either since cacheline aliasing on SMT-systems can't happen because caches have enough associativity today.

Change r.Next() to r.Next(10). StackOverflowExceptions should occur in the same depth.
Generated strings should consume the same memory because they have the same size. r.Next(10).ToString().Length == 1 always. r.Next().ToString().Length is variable.
The same applies if you use r.Next(100, 1000)

how do disable disk cache in c# invoke win32 CreateFile api with FILE_FLAG_NO_BUFFERING

everyone,i have a lot of files write to disk per seconds,i want to disable disk cache to improve performance,i google search find a solution:win32 CreateFile method with FILE_FLAG_NO_BUFFERING and How to empty/flush Windows READ disk cache in C#?.
i write a little of code to test whether can worked:
const int FILE_FLAG_NO_BUFFERING = unchecked((int)0x20000000);
[DllImport("KERNEL32", SetLastError = true, CharSet = CharSet.Auto, BestFitMapping = false)]
static extern SafeFileHandle CreateFile(
String fileName,
int desiredAccess,
System.IO.FileShare shareMode,
IntPtr securityAttrs,
System.IO.FileMode creationDisposition,
int flagsAndAttributes,
IntPtr templateFile);
static void Main(string[] args)
{
var handler = CreateFile(#"d:\temp.bin", (int)FileAccess.Write, FileShare.None,IntPtr.Zero, FileMode.Create, FILE_FLAG_NO_BUFFERING, IntPtr.Zero);
var stream = new FileStream(handler, FileAccess.Write, BlockSize);//BlockSize=4096
byte[] array = Encoding.UTF8.GetBytes("hello,world");
stream.Write(array, 0, array.Length);
stream.Close();
}
when running this program,the application get exception:IO operation will not work. Most likely the file will become too long or the handle was not opened to support synchronous IO operations
later,i found this article When you create an object with constraints, you have to make sure everybody who uses the object understands those constraints,but i can't fully understand,so i change my code to test:
var stream = new FileStream(handler, FileAccess.Write, 4096);
byte[] ioBuffer = new byte[4096];
byte[] array = Encoding.UTF8.GetBytes("hello,world");
Array.Copy(array, ioBuffer, array.Length);
stream.Write(ioBuffer, 0, ioBuffer.Length);
stream.Close();
it's running ok,but i just want "hello,world" bytes not all.i trying change blocksize to 1 or other integer(not 512 multiple) get same error.i also try win32 WriteFile api also get same error.someone can help me?

CreateFile() function in No Buffering mode imposes strict requirements on what may and what may not be done. Having a buffer of certain size (multiple of device sector size) is one of them.
Now, you can improve file writes in this way only if you use buffering in your code. If you want to write 10 bytes without buffering, then No Buffering mode won't help you.

If I understood your requirements correctly, this is what I'd try first:
Create a queue with objects that have the data in memory and the target file on the disk.
You start writing the files first just into memory, and then on another thread start going through the queue, opening io-completion port based filestream handles (isAsync=True) - just don't open too many of them as at some point you'll probably start losing perf due to cache trashing etc. You need to profile to see what is optimal amount for your system and ssd's.
After each open, you can use the async filestream methods Begin... to start writing data from memory to the files. the isAsync puts some requirements so this may not be as easy to get working in every corner case as using filestream normally.
Whether there will be any improvement to using another thread to create the files and another to write to them using the async api, that might only be the case if there is a possibility that creating/opening the files would block. SSD's perform various things internally to keep the access to data fast, so when you start doing this sort of extreme performance stuff, there may be pronounced differences between SSD controllers. It's also possible that if the controller drivers aren't well implemented, OS/Windows may start to feel sluggish or freeze. The hardware benchmarks sites do not really stress this particular kind of scenario (eg. create and write x KB into million files asap) and no doubt there's some drivers out there that are slower than others.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.