Is there any way how to enumerate process with given PID in windows, and get list of all his opened handles(locked files, etc.)?
EDIT: I dont care about language. If it is in .NET, I'd be glad, if in WinApi (C), it won't hurt. If in something else, I think I can rewrite it :-)
I did a deep googling and found this article.
This article gave a link to download source code:
I tried method in NtSystemInfoTest.cpp ( downloaded source code ) and it worked superbly.
void ListHandles( DWORD processID, LPCTSTR lpFilter )
The code has following declaimer:
// Written by Zoltan Csizmadia, zoltan_csizmadia#yahoo.com
// For companies(Austin,TX): If you would like to get my resume, send an email.
//
// The source is free, but if you want to use it, mention my name and e-mail address
//
//////////////////////////////////////////////////////////////////////////////////////
//
I hope this helps you.
The command-line 'Handle' tool from Sysinternals does this, if you just want a tool. This won't help you if you're looking for a code solution, though.
Here is an example using ZwQueryProcessInformation from the DDK. The DDK is now known as the "WDK" and is available with MSDN. If you don't have MSDN, apparantly, you can also get it from here.
I haven't tried it, I just googled your question.
#include "ntdll.h"
#include <stdlib.h>
#include <stdio.h>
#include "ntddk.h"
#define DUPLICATE_SAME_ATTRIBUTES 0x00000004
#pragma comment(lib,"ntdll.lib")
BOOL EnablePrivilege(PCSTR name)
{
TOKEN_PRIVILEGES priv = {1, {0, 0, SE_PRIVILEGE_ENABLED}};
LookupPrivilegeValue(0, name, &priv.Privileges[0].Luid);
HANDLE hToken;
OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES, &hToken);
AdjustTokenPrivileges(hToken, FALSE, &priv, sizeof priv, 0, 0);
BOOL rv = GetLastError() == ERROR_SUCCESS;
CloseHandle(hToken);
return rv;
}
int main(int argc, char *argv[])
{
if (argc == 1) return 0;
ULONG pid = strtoul(argv[1], 0, 0);
EnablePrivilege(SE_DEBUG_NAME);
HANDLE hProcess = OpenProcess(PROCESS_DUP_HANDLE, FALSE, pid);
ULONG n = 0x1000;
PULONG p = new ULONG[n];
while (NT::ZwQuerySystemInformation(NT::SystemHandleInformation, p, n * sizeof *p, 0)
== STATUS_INFO_LENGTH_MISMATCH)
delete [] p, p = new ULONG[n *= 2];
NT::PSYSTEM_HANDLE_INFORMATION h = NT::PSYSTEM_HANDLE_INFORMATION(p + 1);
for (ULONG i = 0; i < *p; i++) {
if (h[i].ProcessId == pid) {
HANDLE hObject;
if (NT::ZwDuplicateObject(hProcess, HANDLE(h[i].Handle), NtCurrentProcess(), &hObject,
0, 0, DUPLICATE_SAME_ATTRIBUTES)
!= STATUS_SUCCESS) continue;
NT::OBJECT_BASIC_INFORMATION obi;
NT::ZwQueryObject(hObject, NT::ObjectBasicInformation, &obi, sizeof obi, &n);
printf("%p %04hx %6lx %2x %3lx %3ld %4ld ",
h[i].Object, h[i].Handle, h[i].GrantedAccess,
int(h[i].Flags), obi.Attributes,
obi.HandleCount - 1, obi.PointerCount - 2);
n = obi.TypeInformationLength + 2;
NT::POBJECT_TYPE_INFORMATION oti = NT::POBJECT_TYPE_INFORMATION(new CHAR[n]);
NT::ZwQueryObject(hObject, NT::ObjectTypeInformation, oti, n, &n);
printf("%-14.*ws ", oti[0].Name.Length / 2, oti[0].Name.Buffer);
n = obi.NameInformationLength == 0
? MAX_PATH * sizeof (WCHAR) : obi.NameInformationLength;
NT::POBJECT_NAME_INFORMATION oni = NT::POBJECT_NAME_INFORMATION(new CHAR[n]);
NTSTATUS rv = NT::ZwQueryObject(hObject, NT::ObjectNameInformation, oni, n, &n);
if (NT_SUCCESS(rv))
printf("%.*ws", oni[0].Name.Length / 2, oni[0].Name.Buffer);
printf("\n");
CloseHandle(hObject);
}
}
delete [] p;
CloseHandle(hProcess);
return 0;
}
Related
When I start my application, I try to figure out if there is another process of the application. I also try to figure out if it runs in a different user session.
So far so good, that's what it looks like in C#:
private static bool isThereAnotherInstance() {
string name = Path.GetFileNameWithoutExtension(Application.ExecutablePath);
Process[] pAll = Process.GetProcessesByName(name);
Process pCurrent = Process.GetCurrentProcess();
foreach (Process p in pAll) {
if (p.Id == pCurrent.Id) continue;
if (p.SessionId != pCurrent.SessionId) continue;
return true;
}
return false;
}
But the requirements has changed, I need this piece of code in C++ using plain WinAPI.
Until now, I'm able to find a process that has the same executable path by using CreateToolhelp32Snapshot, OpenProcess, etc.
The missing part is how to get the session id of a process (current and other processes, AND current and other session)
How to do this?
The ProcessIdToSessionId function maps a process ID to a session ID.
You note that this seems to require excessive permissions that aren't needed by .Net.
.Net does get some of its process data from HKEY_PERFORMANCE_DATA in the registry, but this doesn't include the session ID. The session ID is obtained using NtQuerySystemInformation to return an array of SYSTEM_PROCESS_INFORMATION structures. This structure is not well documented, but the session ID immediately follows the handle count (i.e. it is the field currently declared as BYTE Reserved4[4];). Microsoft do not guarantee that this will continue to be true in future versions of Windows.
As mentioned by arx, ProcessIdToSessionId should do the job.
But unfortunately, in my case it tells me ACCESS_DENIED for the processes I'm interested in.
It DOES its job for the current process.
So here's my solution, using NtQuerySystemInformation.
.NETs Process class uses the same function internally .
typedef struct _SYSTEM_PROCESS_INFORMATION_BUG {
//...
}
typedef NTSTATUS (WINAPI *PNtQuerySystemInformation) (
IN SYSTEM_INFORMATION_CLASS SystemInformationClass,
OUT PVOID SystemInformation,
IN ULONG SystemInformationLength,
OUT PULONG ReturnLength OPTIONAL
);
#ifndef NT_ERROR
#define NT_ERROR(Status) ((ULONG)(Status) >> 30 == 3)
#endif
#define PROCESSINFO_BUFFERSIZE (256*1024)
DLL_EXPORT int GetProcessIdFromPath2(char *exePath, int flags) {
char exe[MAX_PATH], *exeName, file[MAX_PATH], *fileName;
DWORD pidCurrent, sessionIdCurrent;
int ret=-1;
strcpy(exe, exePath);
strupr(exe);
exeName=getFileName(exe);
pidCurrent = GetCurrentProcessId();
if (!ProcessIdToSessionId(pidCurrent, &sessionIdCurrent)) sessionIdCurrent=0;
HMODULE hNT = LoadLibrary("Ntdll.dll");
if (hNT) {
PNtQuerySystemInformation pNtQuerySystemInformation = (PNtQuerySystemInformation)GetProcAddress(hNT, "NtQuerySystemInformation");
if (pNtQuerySystemInformation) {
SYSTEM_PROCESS_INFORMATION_BUG* processInfo;
char *buffer = (char*)malloc(PROCESSINFO_BUFFERSIZE);
if (!buffer) {
ret=-3;
}
else {
char *current=buffer;
DWORD len;
int count=0;
NTSTATUS s = pNtQuerySystemInformation(SystemProcessInformation, buffer, PROCESSINFO_BUFFERSIZE, &len);
if (NT_ERROR(s)) {
ret=-2;
}
else {
ret=0;
while(1) {
processInfo = (SYSTEM_PROCESS_INFORMATION_BUG*)current;
if (processInfo->ImageName.Buffer!=NULL){
wcstombs(file, processInfo->ImageName.Buffer, MAX_PATH-1);
strupr(file);
fileName=getFileName(file);
if (strcmp(fileName, exeName)==0) {
if (processInfo->UniqueProcessId!=pidCurrent) {
if (processInfo->SessionId==sessionIdCurrent) {
ret = processInfo->UniqueProcessId;
}
}
}
}
if (processInfo->NextEntryOffset==0) break;
current+=processInfo->NextEntryOffset;
count++;
}
}
free(buffer);
buffer=NULL;
}
}
FreeLibrary(hNT);
}
return ret;
}
Code for listing all PID, SID, EXE ("ala" Task Manager, sort of)
Works for me (Windows 7 64b) VS2012 Express
#include <stdio.h>
#include <tchar.h>
#include <Windows.h>
#include <Winternl.h>
#pragma comment( lib, "ntdll.lib" )
typedef LONG KPRIORITY; // Thread priority
typedef struct _SYSTEM_PROCESS_INFORMATION_DETAILD {
ULONG NextEntryOffset;
ULONG NumberOfThreads;
LARGE_INTEGER SpareLi1;
LARGE_INTEGER SpareLi2;
LARGE_INTEGER SpareLi3;
LARGE_INTEGER CreateTime;
LARGE_INTEGER UserTime;
LARGE_INTEGER KernelTime;
UNICODE_STRING ImageName;
KPRIORITY BasePriority;
HANDLE UniqueProcessId;
ULONG InheritedFromUniqueProcessId;
ULONG HandleCount;
BYTE Reserved4[4];
PVOID Reserved5[11];
SIZE_T PeakPagefileUsage;
SIZE_T PrivatePageCount;
LARGE_INTEGER Reserved6[6];
} SYSTEM_PROCESS_INFORMATION_DETAILD, *PSYSTEM_PROCESS_INFORMATION_DETAILD;
int _tmain(int argc, _TCHAR* argv[]) {
SYSTEM_PROCESS_INFORMATION aSPI[ 1024 ];
// could ask for actual needed size size and malloc (with few extra new processes bonus...)
NTSTATUS nts = NtQuerySystemInformation( SystemProcessInformation, aSPI, sizeof( aSPI ), NULL );
if ( NT_ERROR( nts ) ) return -1;
char * pSPI = reinterpret_cast<char*>( &aSPI[ 0 ] );
while ( true ) {
SYSTEM_PROCESS_INFORMATION_DETAILD * pOneSPI = reinterpret_cast<SYSTEM_PROCESS_INFORMATION_DETAILD*>( pSPI );
WCHAR * pwch = pOneSPI->ImageName.Buffer;
if ( pwch == 0 || pOneSPI->ImageName.Length == 0 ) pwch = TEXT( "Unknown" );
_tprintf( TEXT( "PID %d - SID %d EXE %s\n" ), pOneSPI->UniqueProcessId, *reinterpret_cast<LONG*>( &pOneSPI->Reserved4 ), pwch );
if ( pOneSPI->NextEntryOffset ) pSPI += pOneSPI->NextEntryOffset;
else break;
}
return 0;
}
Many thanks to #Oleg for documentation of the SPI structure on SO here
Sorry for probably simple question but I'm newbie in shared memory and trying to learn by example how to do things.
On c++ side I receive such pair: const unsigned char * value, size_t length
On c# side I need to have regular c# string. Using shared memory what is the best way to do that?
It's not that easy to using the string.
If it's me, I'll try these ways:
1.simply get a copy of the string. System.Text.Encoding.Default.GetString may convert from a byte array to a string.
You may try in a unsafe code block (for that you could use pointer type) to do:
(1) create a byte array, size is your "length"
byte[] buf = new byte[length];
(2) copy your data to the array
for(int i = 0; i < length; ++i) buf[i] = value[i];
(3) get the string
string what_you_want = System.Text.Encoding.Default.GetString(buf);
2.write a class, having a property "string what_you_want", and each time you access it, the above process will perform.
before all, you should first using P/Invoke feature to get the value of that pair.
edit: this is an example.
C++ code:
struct Pair {
int length;
unsigned char value[1024];
};
#include <windows.h>
#include <stdio.h>
int main()
{
const char* s = "hahaha";
HANDLE handle = CreateFileMappingW(INVALID_HANDLE_VALUE, NULL, PAGE_READWRITE, 0, sizeof(Pair), L"MySharedMemory");
struct Pair* p = (struct Pair*) MapViewOfFile(handle, FILE_MAP_READ|FILE_MAP_WRITE, 0, 0, sizeof(Pair));
if (p != 0) {
p->length = lstrlenA(s);
lstrcpyA((char*)p->value, s);
puts("plz start c# program");
getchar();
} else
puts("create shared memory error");
if (handle != NULL)
CloseHandle(handle);
return 0;
}
and C# code:
using System;
using System.IO.MemoryMappedFiles;
class Program
{
static void Main(string[] args)
{
MemoryMappedFile mmf = MemoryMappedFile.OpenExisting("MySharedMemory");
MemoryMappedViewStream mmfvs = mmf.CreateViewStream();
byte[] blen = new byte[4];
mmfvs.Read(blen, 0, 4);
int len = blen[0] + blen[1] * 256 + blen[2] * 65536 + blen[3] * 16777216;
byte[] strbuf = new byte[len];
mmfvs.Read(strbuf, 0, len);
string s = System.Text.Encoding.Default.GetString(strbuf);
Console.WriteLine(s);
}
}
just for example.
you may also add error-check part.
In Delphi there is a function ExpandUNCFileName that takes in a filename and converts it into the UNC equivalent. It expands mapped drives and skips local and already expanded locations.
Samples
C:\Folder\Text.txt -> C:\Folder\Text.txt
L:\Folder\Sample.txt -> \\server\Folder1\Folder\Sample.txt Where L: is mapped to \\server\Folder1\
\\server\Folder\Sample.odf -> \server\Folder\Sample.odf
Is there a simple way to do this in C# or will I have to use windows api call WNetGetConnection and then manually check the ones that wouldn't get mapped?
Here's some C# code with a wrapper function LocalToUNC, which seems to work OK, though I haven't tested it extensively.
[DllImport("mpr.dll")]
static extern int WNetGetUniversalNameA(
string lpLocalPath, int dwInfoLevel, IntPtr lpBuffer, ref int lpBufferSize
);
// I think max length for UNC is actually 32,767
static string LocalToUNC(string localPath, int maxLen = 2000)
{
IntPtr lpBuff;
// Allocate the memory
try
{
lpBuff = Marshal.AllocHGlobal(maxLen);
}
catch (OutOfMemoryException)
{
return null;
}
try
{
int res = WNetGetUniversalNameA(localPath, 1, lpBuff, ref maxLen);
if (res != 0)
return null;
// lpbuff is a structure, whose first element is a pointer to the UNC name (just going to be lpBuff + sizeof(int))
return Marshal.PtrToStringAnsi(Marshal.ReadIntPtr(lpBuff));
}
catch (Exception)
{
return null;
}
finally
{
Marshal.FreeHGlobal(lpBuff);
}
}
P/Invoke WNetGetUniversalName().
I've done it modifying this code from www.pinvoke.net.
There is no built-in function in the BCL which will do the equivalent. I think the best option you have is pInvoking into WNetGetConnection as you suggested.
Try this code, is written in Delphi .Net
you must translate it to c #
function WNetGetUniversalName; external;
[SuppressUnmanagedCodeSecurity, DllImport(mpr, CharSet = CharSet.Ansi, SetLastError = True, EntryPoint = 'WNetGetUniversalNameA')]
function ExpandUNCFileName(const FileName: string): string;
function GetUniversalName(const FileName: string): string;
const
UNIVERSAL_NAME_INFO_LEVEL = 1;
var
Buffer: IntPtr;
BufSize: DWORD;
begin
Result := FileName;
BufSize := 1024;
Buffer := Marshal.AllocHGlobal(BufSize);
try
if WNetGetUniversalName(FileName, UNIVERSAL_NAME_INFO_LEVEL,
Buffer, BufSize) <> NO_ERROR then Exit;
Result := TUniversalNameInfo(Marshal.PtrToStructure(Buffer,
TypeOf(TUniversalNameInfo))).lpUniversalName;
finally
Marshal.FreeHGlobal(Buffer);
end;
end;
begin
Result :=System.IO.Path.GetFullPath(FileName);
if (Length(Result) >= 3) and (Result[2] = ':') and (Upcase(Result[1]) >= 'A')
and (Upcase(Result[1]) <= 'Z') then
Result := GetUniversalName(Result);
end;
Bye.
Im working on porting some old ALP user accounts to a new ASP.Net solution, and I would like for the users to be able to use their old passwords.
However, in order for that to work, I need to be able to compare the old hashes to a newly calculated one, based on a newly typed password.
I searched around, and found this as the implementation of crypt() called by PHP:
char *
crypt_md5(const char *pw, const char *salt)
{
MD5_CTX ctx,ctx1;
unsigned long l;
int sl, pl;
u_int i;
u_char final[MD5_SIZE];
static const char *sp, *ep;
static char passwd[120], *p;
static const char *magic = "$1$";
/* Refine the Salt first */
sp = salt;
/* If it starts with the magic string, then skip that */
if(!strncmp(sp, magic, strlen(magic)))
sp += strlen(magic);
/* It stops at the first '$', max 8 chars */
for(ep = sp; *ep && *ep != '$' && ep < (sp + 8); ep++)
continue;
/* get the length of the true salt */
sl = ep - sp;
MD5Init(&ctx);
/* The password first, since that is what is most unknown */
MD5Update(&ctx, (const u_char *)pw, strlen(pw));
/* Then our magic string */
MD5Update(&ctx, (const u_char *)magic, strlen(magic));
/* Then the raw salt */
MD5Update(&ctx, (const u_char *)sp, (u_int)sl);
/* Then just as many characters of the MD5(pw,salt,pw) */
MD5Init(&ctx1);
MD5Update(&ctx1, (const u_char *)pw, strlen(pw));
MD5Update(&ctx1, (const u_char *)sp, (u_int)sl);
MD5Update(&ctx1, (const u_char *)pw, strlen(pw));
MD5Final(final, &ctx1);
for(pl = (int)strlen(pw); pl > 0; pl -= MD5_SIZE)
MD5Update(&ctx, (const u_char *)final,
(u_int)(pl > MD5_SIZE ? MD5_SIZE : pl));
/* Don't leave anything around in vm they could use. */
memset(final, 0, sizeof(final));
/* Then something really weird... */
for (i = strlen(pw); i; i >>= 1)
if(i & 1)
MD5Update(&ctx, (const u_char *)final, 1);
else
MD5Update(&ctx, (const u_char *)pw, 1);
/* Now make the output string */
strcpy(passwd, magic);
strncat(passwd, sp, (u_int)sl);
strcat(passwd, "$");
MD5Final(final, &ctx);
/*
* and now, just to make sure things don't run too fast
* On a 60 Mhz Pentium this takes 34 msec, so you would
* need 30 seconds to build a 1000 entry dictionary...
*/
for(i = 0; i < 1000; i++) {
MD5Init(&ctx1);
if(i & 1)
MD5Update(&ctx1, (const u_char *)pw, strlen(pw));
else
MD5Update(&ctx1, (const u_char *)final, MD5_SIZE);
if(i % 3)
MD5Update(&ctx1, (const u_char *)sp, (u_int)sl);
if(i % 7)
MD5Update(&ctx1, (const u_char *)pw, strlen(pw));
if(i & 1)
MD5Update(&ctx1, (const u_char *)final, MD5_SIZE);
else
MD5Update(&ctx1, (const u_char *)pw, strlen(pw));
MD5Final(final, &ctx1);
}
p = passwd + strlen(passwd);
l = (final[ 0]<<16) | (final[ 6]<<8) | final[12];
_crypt_to64(p, l, 4); p += 4;
l = (final[ 1]<<16) | (final[ 7]<<8) | final[13];
_crypt_to64(p, l, 4); p += 4;
l = (final[ 2]<<16) | (final[ 8]<<8) | final[14];
_crypt_to64(p, l, 4); p += 4;
l = (final[ 3]<<16) | (final[ 9]<<8) | final[15];
_crypt_to64(p, l, 4); p += 4;
l = (final[ 4]<<16) | (final[10]<<8) | final[ 5];
_crypt_to64(p, l, 4); p += 4;
l = final[11];
_crypt_to64(p, l, 2); p += 2;
*p = '\0';
/* Don't leave anything around in vm they could use. */
memset(final, 0, sizeof(final));
return (passwd);
}
And, here is my version in C#, along with an expected match.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Security.Cryptography;
using System.IO;
using System.Management;
namespace Test
{
class Program
{
static void Main(string[] args)
{
byte[] salt = Encoding.ASCII.GetBytes("$1$ls3xPLpO$Wu/FQ.PtP2XBCqrM.w847/");
Console.WriteLine("Hash: " + Encoding.ASCII.GetString(salt));
byte[] passkey = Encoding.ASCII.GetBytes("suckit");
byte[] newhash = md5_crypt(passkey, salt);
Console.WriteLine("Hash2: " + Encoding.ASCII.GetString(newhash));
byte[] newhash2 = md5_crypt(passkey, newhash);
Console.WriteLine("Hash3: " + Encoding.ASCII.GetString(newhash2));
Console.ReadKey(true);
}
public static byte[] md5_crypt(byte[] pw, byte[] salt)
{
MemoryStream ctx, ctx1;
ulong l;
int sl, pl;
int i;
byte[] final;
int sp, ep; //** changed pointers to array indices
MemoryStream passwd = new MemoryStream();
byte[] magic = Encoding.ASCII.GetBytes("$1$");
// Refine the salt first
sp = 0; //** Changed to an array index, rather than a pointer.
// If it starts with the magic string, then skip that
if (salt[0] == magic[0] &&
salt[1] == magic[1] &&
salt[2] == magic[2])
{
sp += magic.Length;
}
// It stops at the first '$', max 8 chars
for (ep = sp;
(ep + sp < salt.Length) && //** Converted to array indices, and rather than check for null termination, check for the end of the array.
salt[ep] != (byte)'$' &&
ep < (sp + 8);
ep++)
continue;
// Get the length of the true salt
sl = ep - sp;
ctx = MD5Init();
// The password first, since that is what is most unknown
MD5Update(ctx, pw, pw.Length);
// Then our magic string
MD5Update(ctx, magic, magic.Length);
// Then the raw salt
MD5Update(ctx, salt, sp, sl);
// Then just as many characters of the MD5(pw,salt,pw)
ctx1 = MD5Init();
MD5Update(ctx1, pw, pw.Length);
MD5Update(ctx1, salt, sp, sl);
MD5Update(ctx1, pw, pw.Length);
final = MD5Final(ctx1);
for(pl = pw.Length; pl > 0; pl -= final.Length)
MD5Update(ctx, final,
(pl > final.Length ? final.Length : pl));
// Don't leave anything around in vm they could use.
for (i = 0; i < final.Length; i++) final[i] = 0;
// Then something really weird...
for (i = pw.Length; i != 0; i >>= 1)
if((i & 1) != 0)
MD5Update(ctx, final, 1);
else
MD5Update(ctx, pw, 1);
// Now make the output string
passwd.Write(magic, 0, magic.Length);
passwd.Write(salt, sp, sl);
passwd.WriteByte((byte)'$');
final = MD5Final(ctx);
// and now, just to make sure things don't run too fast
// On a 60 Mhz Pentium this takes 34 msec, so you would
// need 30 seconds to build a 1000 entry dictionary...
for(i = 0; i < 1000; i++)
{
ctx1 = MD5Init();
if((i & 1) != 0)
MD5Update(ctx1, pw, pw.Length);
else
MD5Update(ctx1, final, final.Length);
if((i % 3) != 0)
MD5Update(ctx1, salt, sp, sl);
if((i % 7) != 0)
MD5Update(ctx1, pw, pw.Length);
if((i & 1) != 0)
MD5Update(ctx1, final, final.Length);
else
MD5Update(ctx1, pw, pw.Length);
final = MD5Final(ctx1);
}
//** Section changed to use a memory stream, rather than a byte array.
l = (((ulong)final[0]) << 16) | (((ulong)final[6]) << 8) | ((ulong)final[12]);
_crypt_to64(passwd, l, 4);
l = (((ulong)final[1]) << 16) | (((ulong)final[7]) << 8) | ((ulong)final[13]);
_crypt_to64(passwd, l, 4);
l = (((ulong)final[2]) << 16) | (((ulong)final[8]) << 8) | ((ulong)final[14]);
_crypt_to64(passwd, l, 4);
l = (((ulong)final[3]) << 16) | (((ulong)final[9]) << 8) | ((ulong)final[15]);
_crypt_to64(passwd, l, 4);
l = (((ulong)final[4]) << 16) | (((ulong)final[10]) << 8) | ((ulong)final[5]);
_crypt_to64(passwd, l, 4);
l = final[11];
_crypt_to64(passwd, l, 2);
byte[] buffer = new byte[passwd.Length];
passwd.Seek(0, SeekOrigin.Begin);
passwd.Read(buffer, 0, buffer.Length);
return buffer;
}
public static MemoryStream MD5Init()
{
return new MemoryStream();
}
public static void MD5Update(MemoryStream context, byte[] source, int length)
{
context.Write(source, 0, length);
}
public static void MD5Update(MemoryStream context, byte[] source, int offset, int length)
{
context.Write(source, offset, length);
}
public static byte[] MD5Final(MemoryStream context)
{
long location = context.Position;
byte[] buffer = new byte[context.Length];
context.Seek(0, SeekOrigin.Begin);
context.Read(buffer, 0, (int)context.Length);
context.Seek(location, SeekOrigin.Begin);
return MD5.Create().ComputeHash(buffer);
}
// Changed to use a memory stream rather than a character array.
public static void _crypt_to64(MemoryStream s, ulong v, int n)
{
char[] _crypt_a64 = "./0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".ToCharArray();
while (--n >= 0)
{
s.WriteByte((byte)_crypt_a64[v & 0x3f]);
v >>= 6;
}
}
}
}
What Am I doing wrong? I am making some big assumptions about the workings of the MD5xxxx functions in the FreeBSD version, but it seems to work.
Is this not the actual version used by PHP? Does anyone have any insight?
EDIT:
I downloaded a copy of PHP's source code, and found that it uses the glibc library. So, I downloaded a copy of glibc's source code, found the __md5_crypt_r function, duplicated its functionality, ant it came back with the EXACT same hashes as the FreeBSD version.
Now, I am pretty much stumped. Did PHP 4 use a different method than PHP 5? What is going on?
Alright, so here is the answer:
PHP uses the glibc implementation of the crypt function. (attached: C# implementation)
The reason my old passwords are not matching the hash is because the Linux box my old website (hosted by GoDaddy) sat on had a non-standard hashing algorithm. (Possibly to fix some of the WEIRD stuff done in the algorithm.)
However, I have tested the following implementation against glibc's unit tests and against a windows install of PHP. Both tests were passed 100%.
EDIT
Here is the link: (moved to a Github Gist)
https://gist.github.com/1092558
The crypt() function in PHP uses whatever hash algorithm the underlying operating system provides for encrypting the data - have a look at its documentation. So the first step should be to find out, how the data was encrypted (what hashing algorithm was used). Once you know that, it should be trivial to find the same algorithm for C#.
You can always system() (or whatever the C# static function is called) out to a php command-line script that does the crypt for you.
I would recommend forcing a password change though after successful login. Then you can have a flag that indicates if the user has changed. Once everyone has changed you can dump the php call.
Just reuse the php implementation... Make sure php's crypt libraries are in your system environment path...
You may need to update your interop method to make sure your string marshaling/charset is correct... you can then use the original hashing algorithm.
[DllImport("crypt.dll", CharSet=CharSet.ASCII)]
private static extern string crypt(string password, string salt);
public bool ValidLogin(string username, string password)
{
string hash = crypt(password, null);
...
}
It does not look trivial.
UPDATE: Originally I wrote: "The PHP Crypt function does not look like a standard hash. Why not? Who knows." As pointed out in the comments, the PHP crypt() is the same as used in BSD for passwd crypt. I don't know if that is a dejure standard, but it is defacto standard. So.
I stand by my position that it does not appear to be trivial.
Rather than porting the code, you might consider keeping the old PHP running, and use it strictly for password validation of old passwords. As users change their passwords, use a new hashing algorithm, something a little more "open". You would have to store the hash, as well as the "flavor of hash" for each user.
I need to maintain a roster of connected clients that are very shortlived and frequently go up and down. Due to the potential number of clients I need a collection that supports fast insert/delete. Suggestions?
C5 Generic Collection Library
The best implementations I have found in C# and C++ are these -- for C#/CLI:
http://www.itu.dk/research/c5/Release1.1/ITU-TR-2006-76.pdf
http://www.itu.dk/research/c5/
It's well researched, has extensible unit tests, and since February they also have implemented the common interfaces in .Net which makes it a lot easier to work with the collections. They were featured on Channel9 and they've done extensive performance testing on the collections.
If you are using data-structures anyway these researchers have a red-black-tree implementation in their library, similar to what you find if you fire up Lütz reflector and have a look in System.Data's internal structures :p. Insert-complexity: O(log(n)).
Lock-free C++ collections
Then, if you can allow for some C++ interop and you absolutely need the speed and want as little overhead as possible, then these lock-free ADTs from Dmitriy V'jukov are probably the best you can get in this world, outperforming Intel's concurrent library of ADTs.
http://groups.google.com/group/lock-free
I've read some of the code and it's really the makings of someone well versed in how these things are put together. VC++ can do native C++ interop without annoying boundaries. http://www.swig.org/ can otherwise help you wrap C++ interfaces for consumption in .Net, or you can do it yourself through P/Invoke.
Microsoft's Take
They have written tutorials, this one implementing a rather unpolished skip-list in C#, and discussing other types of data-structures. (There's a better SkipList at CodeProject, which is very polished and implement the interfaces in a well-behaved manner.) They also have a few data-structures bundled with .Net, namely the HashTable/Dictionary<,> and HashSet. Of course there's the "ResizeArray"/List type as well together with a stack and queue, but they are all "linear" on search.
Google's perf-tools
If you wish to speed up the time it takes for memory-allocation you can use google's perf-tools. They are available at google code and they contain a very interesting multi-threaded malloc-implementation (TCMalloc) which shows much more consistent timing than the normal malloc does. You could use this together with the lock-free structures above to really go crazy with performance.
Improving response times with memoization
You can also use memoization on functions to improve performance through caching, something interesting if you're using e.g. F#. F# also allows C++ interop, so you're OK there.
O(k)
There's also the possibility of doing something on your own using the research which has been done on bloom-filters, which allow O(k) lookup complexity where k is a constant that depends on the number of hash-functions you have implemented. This is how google's BigTable has been implemented. These filter will get you the element if it's in the set or possibly with a very low likeliness an element which is not the one you're looking for (see the graph at wikipedia -- it's approaching P(wrong_key) -> 0.01 as size is around 10000 elements, but you can go around this by implementing further hash-functions/reducing the set.
I haven't searched for .Net implementations of this, but since the hashing calculations are independent you can use MS's performance team's implementation of Tasks to speed that up.
"My" take -- randomize to reach average O(log n)
As it happens I just did a coursework involving data-structures. In this case we used C++, but it's very easy to translate to C#. We built three different data-structures; a bloom-filter, a skip-list and random binary search tree.
See the code and analysis after the last paragraph.
Hardware-based "collections"
Finally, to make my answer "complete", if you truly need speed you can use something like Routing-tables or Content-addressable memory . This allows you to very quickly O(1) in principle get a "hash"-to-value lookup of your data.
Random Binary Search Tree/Bloom Filter C++ code
I would really appreciate feedback if you find mistakes in the code, or just pointers on how I can do it better (or with better usage of templates). Note that the bloom filter isn't like it would be in real life; normally you don't have to be able to delete from it and then it much much more space efficient than the hack I did to allow the delete to be tested.
DataStructure.h
#ifndef DATASTRUCTURE_H_
#define DATASTRUCTURE_H_
class DataStructure
{
public:
DataStructure() {countAdd=0; countDelete=0;countFind=0;}
virtual ~DataStructure() {}
void resetCountAdd() {countAdd=0;}
void resetCountFind() {countFind=0;}
void resetCountDelete() {countDelete=0;}
unsigned int getCountAdd(){return countAdd;}
unsigned int getCountDelete(){return countDelete;}
unsigned int getCountFind(){return countFind;}
protected:
unsigned int countAdd;
unsigned int countDelete;
unsigned int countFind;
};
#endif /*DATASTRUCTURE_H_*/
Key.h
#ifndef KEY_H_
#define KEY_H_
#include <string>
using namespace std;
const int keyLength = 128;
class Key : public string
{
public:
Key():string(keyLength, ' ') {}
Key(const char in[]): string(in){}
Key(const string& in): string(in){}
bool operator<(const string& other);
bool operator>(const string& other);
bool operator==(const string& other);
virtual ~Key() {}
};
#endif /*KEY_H_*/
Key.cpp
#include "Key.h"
bool Key::operator<(const string& other)
{
return compare(other) < 0;
};
bool Key::operator>(const string& other)
{
return compare(other) > 0;
};
bool Key::operator==(const string& other)
{
return compare(other) == 0;
}
BloomFilter.h
#ifndef BLOOMFILTER_H_
#define BLOOMFILTER_H_
#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"
#define LONG_BIT 32
#define bitmask(val) (unsigned long)(1 << (LONG_BIT - (val % LONG_BIT) - 1))
// TODO: Implement RW-locking on the reads/writes to the bitmap.
class BloomFilter : public DataStructure
{
public:
BloomFilter(){}
BloomFilter(unsigned long length){init(length);}
virtual ~BloomFilter(){}
void init(unsigned long length);
void dump();
void add(const Key& key);
void del(const Key& key);
/**
* Returns true if the key IS BELIEVED to exist, false if it absolutely doesn't.
*/
bool testExist(const Key& key, bool v = false);
private:
unsigned long hash1(const Key& key);
unsigned long hash2(const Key& key);
bool exist(const Key& key);
void getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1, int& i2, const Key& key);
void getCountIndicies(const int i1, const unsigned long h1,
const int i2, const unsigned long h2, int& i1_c, int& i2_c);
vector<unsigned long> m_tickBook;
vector<unsigned int> m_useCounts;
unsigned long m_length; // number of bits in the bloom filter
unsigned long m_pockets; //the number of pockets
static const unsigned long m_pocketSize; //bits in each pocket
};
#endif /*BLOOMFILTER_H_*/
BloomFilter.cpp
#include "BloomFilter.h"
const unsigned long BloomFilter::m_pocketSize = LONG_BIT;
void BloomFilter::init(unsigned long length)
{
//m_length = length;
m_length = (unsigned long)((2.0*length)/log(2))+1;
m_pockets = (unsigned long)(ceil(double(m_length)/m_pocketSize));
m_tickBook.resize(m_pockets);
// my own (allocate nr bits possible to store in the other vector)
m_useCounts.resize(m_pockets * m_pocketSize);
unsigned long i; for(i=0; i< m_pockets; i++) m_tickBook[i] = 0;
for (i = 0; i < m_useCounts.size(); i++) m_useCounts[i] = 0; // my own
}
unsigned long BloomFilter::hash1(const Key& key)
{
unsigned long hash = 5381;
unsigned int i=0; for (i=0; i< key.length(); i++){
hash = ((hash << 5) + hash) + key.c_str()[i]; /* hash * 33 + c */
}
double d_hash = (double) hash;
d_hash *= (0.5*(sqrt(5)-1));
d_hash -= floor(d_hash);
d_hash *= (double)m_length;
return (unsigned long)floor(d_hash);
}
unsigned long BloomFilter::hash2(const Key& key)
{
unsigned long hash = 0;
unsigned int i=0; for (i=0; i< key.length(); i++){
hash = key.c_str()[i] + (hash << 6) + (hash << 16) - hash;
}
double d_hash = (double) hash;
d_hash *= (0.5*(sqrt(5)-1));
d_hash -= floor(d_hash);
d_hash *= (double)m_length;
return (unsigned long)floor(d_hash);
}
bool BloomFilter::testExist(const Key& key, bool v){
if(exist(key)) {
if(v) cout<<"Key "<< key<<" is in the set"<<endl;
return true;
}else {
if(v) cout<<"Key "<< key<<" is not in the set"<<endl;
return false;
}
}
void BloomFilter::dump()
{
cout<<m_pockets<<" Pockets: ";
// I changed u to %p because I wanted it printed in hex.
unsigned long i; for(i=0; i< m_pockets; i++) printf("%p ", (void*)m_tickBook[i]);
cout<<endl;
}
void BloomFilter::add(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
int i1_c, i2_c;
// tested!
getHashAndIndicies(h1, h2, i1, i2, key);
getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
m_tickBook[i1] = m_tickBook[i1] | bitmask(h1);
m_tickBook[i2] = m_tickBook[i2] | bitmask(h2);
m_useCounts[i1_c] = m_useCounts[i1_c] + 1;
m_useCounts[i2_c] = m_useCounts[i2_c] + 1;
countAdd++;
}
void BloomFilter::del(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
int i1_c, i2_c;
if (!exist(key)) throw "You can't delete keys which are not in the bloom filter!";
// First we need the indicies into m_tickBook and the
// hashes.
getHashAndIndicies(h1, h2, i1, i2, key);
// The index of the counter is the index into the bitvector
// times the number of bits per vector item plus the offset into
// that same vector item.
getCountIndicies(i1, h1, i2, h2, i1_c, i2_c);
// We need to update the value in the bitvector in order to
// delete the key.
m_useCounts[i1_c] = (m_useCounts[i1_c] == 1 ? 0 : m_useCounts[i1_c] - 1);
m_useCounts[i2_c] = (m_useCounts[i2_c] == 1 ? 0 : m_useCounts[i2_c] - 1);
// Now, if we depleted the count for a specific bit, then set it to
// zero, by anding the complete unsigned long with the notted bitmask
// of the hash value
if (m_useCounts[i1_c] == 0)
m_tickBook[i1] = m_tickBook[i1] & ~(bitmask(h1));
if (m_useCounts[i2_c] == 0)
m_tickBook[i2] = m_tickBook[i2] & ~(bitmask(h2));
countDelete++;
}
bool BloomFilter::exist(const Key& key)
{
unsigned long h1, h2;
int i1, i2;
countFind++;
getHashAndIndicies(h1, h2, i1, i2, key);
return ((m_tickBook[i1] & bitmask(h1)) > 0) &&
((m_tickBook[i2] & bitmask(h2)) > 0);
}
/*
* Gets the values of the indicies for two hashes and places them in
* the passed parameters. The index is into m_tickBook.
*/
void BloomFilter::getHashAndIndicies(unsigned long& h1, unsigned long& h2, int& i1,
int& i2, const Key& key)
{
h1 = hash1(key);
h2 = hash2(key);
i1 = (int) h1/m_pocketSize;
i2 = (int) h2/m_pocketSize;
}
/*
* Gets the values of the indicies into the count vector, which keeps
* track of how many times a specific bit-position has been used.
*/
void BloomFilter::getCountIndicies(const int i1, const unsigned long h1,
const int i2, const unsigned long h2, int& i1_c, int& i2_c)
{
i1_c = i1*m_pocketSize + h1%m_pocketSize;
i2_c = i2*m_pocketSize + h2%m_pocketSize;
}
** RBST.h **
#ifndef RBST_H_
#define RBST_H_
#include <iostream>
#include <assert.h>
#include <vector>
#include <math.h>
#include "Key.h"
#include "DataStructure.h"
#define BUG(str) printf("%s:%d FAILED SIZE INVARIANT: %s\n", __FILE__, __LINE__, str);
using namespace std;
class RBSTNode;
class RBSTNode: public Key
{
public:
RBSTNode(const Key& key):Key(key)
{
m_left =NULL;
m_right = NULL;
m_size = 1U; // the size of one node is 1.
}
virtual ~RBSTNode(){}
string setKey(const Key& key){return Key(key);}
RBSTNode* left(){return m_left; }
RBSTNode* right(){return m_right;}
RBSTNode* setLeft(RBSTNode* left) { m_left = left; return this; }
RBSTNode* setRight(RBSTNode* right) { m_right =right; return this; }
#ifdef DEBUG
ostream& print(ostream& out)
{
out << "Key(" << *this << ", m_size: " << m_size << ")";
return out;
}
#endif
unsigned int size() { return m_size; }
void setSize(unsigned int val)
{
#ifdef DEBUG
this->print(cout);
cout << "::setSize(" << val << ") called." << endl;
#endif
if (val == 0) throw "Cannot set the size below 1, then just delete this node.";
m_size = val;
}
void incSize() {
#ifdef DEBUG
this->print(cout);
cout << "::incSize() called" << endl;
#endif
m_size++;
}
void decrSize()
{
#ifdef DEBUG
this->print(cout);
cout << "::decrSize() called" << endl;
#endif
if (m_size == 1) throw "Cannot decrement size below 1, then just delete this node.";
m_size--;
}
#ifdef DEBUG
unsigned int size(RBSTNode* x);
#endif
private:
RBSTNode(){}
RBSTNode* m_left;
RBSTNode* m_right;
unsigned int m_size;
};
class RBST : public DataStructure
{
public:
RBST() {
m_size = 0;
m_head = NULL;
srand(time(0));
};
virtual ~RBST() {};
/**
* Tries to add key into the tree and will return
* true for a new item added
* false if the key already is in the tree.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool add(const Key& key, bool v=false);
/**
* Same semantics as other add function, but takes a string,
* but diff name, because that'll cause an ambiguity because of inheritance.
*/
bool addString(const string& key);
/**
* Deletes a key from the tree if that key is in the tree.
* Will return
* true for success and
* false for failure.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool del(const Key& key, bool v=false);
/**
* Tries to find the key in the tree and will return
* true if the key is in the tree and
* false if the key is not.
*
* Will also have the side-effect of printing to the console if v=true.
*/
bool find(const Key& key, bool v = false);
unsigned int count() { return m_size; }
#ifdef DEBUG
int dump(char sep = ' ');
int dump(RBSTNode* target, char sep);
unsigned int size(RBSTNode* x);
#endif
private:
RBSTNode* randomAdd(RBSTNode* target, const Key& key);
RBSTNode* addRoot(RBSTNode* target, const Key& key);
RBSTNode* rightRotate(RBSTNode* target);
RBSTNode* leftRotate(RBSTNode* target);
RBSTNode* del(RBSTNode* target, const Key& key);
RBSTNode* join(RBSTNode* left, RBSTNode* right);
RBSTNode* find(RBSTNode* target, const Key& key);
RBSTNode* m_head;
unsigned int m_size;
};
#endif /*RBST_H_*/
** RBST.cpp **
#include "RBST.h"
bool RBST::add(const Key& key, bool v){
unsigned int oldSize = m_size;
m_head = randomAdd(m_head, key);
if (m_size > oldSize){
if(v) cout<<"Node "<<key<< " is added into the tree."<<endl;
return true;
}else {
if(v) cout<<"Node "<<key<< " is already in the tree."<<endl;
return false;
}
if(v) cout<<endl;
};
bool RBST::addString(const string& key) {
return add(Key(key), false);
}
bool RBST::del(const Key& key, bool v){
unsigned oldSize= m_size;
m_head = del(m_head, key);
if (m_size < oldSize) {
if(v) cout<<"Node "<<key<< " is deleted from the tree."<<endl;
return true;
}
else {
if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
return false;
}
};
bool RBST::find(const Key& key, bool v){
RBSTNode* ret = find(m_head, key);
if (ret == NULL){
if(v) cout<< "Node "<<key<< " is not in the tree."<<endl;
return false;
}else {
if(v) cout<<"Node "<<key<< " is in the tree."<<endl;
return true;
}
};
#ifdef DEBUG
int RBST::dump(char sep){
int ret = dump(m_head, sep);
cout<<"SIZE: " <<ret<<endl;
return ret;
};
int RBST::dump(RBSTNode* target, char sep){
if (target == NULL) return 0;
int ret = dump(target->left(), sep);
cout<< *target<<sep;
ret ++;
ret += dump(target->right(), sep);
return ret;
};
#endif
/**
* Rotates the tree around target, so that target's left
* is the new root of the tree/subtree and updates the subtree sizes.
*
*(target) b (l) a
* / \ right / \
* a ? ----> ? b
* / \ / \
* ? x x ?
*
*/
RBSTNode* RBST::rightRotate(RBSTNode* target) // private
{
if (target == NULL) throw "Invariant failure, target is null"; // Note: may be removed once tested.
if (target->left() == NULL) throw "You cannot rotate right around a target whose left node is NULL!";
#ifdef DEBUG
cout <<"Right-rotating b-node ";
target->print(cout);
cout << " for a-node ";
target->left()->print(cout);
cout << "." << endl;
#endif
RBSTNode* l = target->left();
int as0 = l->size();
// re-order the sizes
l->setSize( l->size() + (target->right() == NULL ? 0 : target->right()->size()) + 1); // a.size += b.right.size + 1; where b.right may be null.
target->setSize( target->size() -as0 + (l->right() == NULL ? 0 : l->right()->size()) ); // b.size += -a_0_size + x.size where x may be null.
// swap b's left (for a)
target->setLeft(l->right());
// and a's right (for b's left)
l->setRight(target);
#ifdef DEBUG
cout << "A-node size: " << l->size() << ", b-node size: " << target->size() << "." << endl;
#endif
// return the new root, a.
return l;
};
/**
* Like rightRotate, but the other way. See docs for rightRotate(RBSTNode*)
*/
RBSTNode* RBST::leftRotate(RBSTNode* target)
{
if (target == NULL) throw "Invariant failure, target is null";
if (target->right() == NULL) throw "You cannot rotate left around a target whose right node is NULL!";
#ifdef DEBUG
cout <<"Left-rotating a-node ";
target->print(cout);
cout << " for b-node ";
target->right()->print(cout);
cout << "." << endl;
#endif
RBSTNode* r = target->right();
int bs0 = r->size();
// re-roder the sizes
r->setSize(r->size() + (target->left() == NULL ? 0 : target->left()->size()) + 1);
target->setSize(target->size() -bs0 + (r->left() == NULL ? 0 : r->left()->size()));
// swap a's right (for b's left)
target->setRight(r->left());
// swap b's left (for a)
r->setLeft(target);
#ifdef DEBUG
cout << "Left-rotation done: a-node size: " << target->size() << ", b-node size: " << r->size() << "." << endl;
#endif
return r;
};
//
/**
* Adds a key to the tree and returns the new root of the tree.
* If the key already exists doesn't add anything.
* Increments m_size if the key didn't already exist and hence was added.
*
* This function is not called from public methods, it's a helper function.
*/
RBSTNode* RBST::addRoot(RBSTNode* target, const Key& key)
{
countAdd++;
if (target == NULL) return new RBSTNode(key);
#ifdef DEBUG
cout << "addRoot(";
cout.flush();
target->print(cout) << "," << key << ") called." << endl;
#endif
if (*target < key)
{
target->setRight( addRoot(target->right(), key) );
target->incSize(); // Should I?
RBSTNode* res = leftRotate(target);
#ifdef DEBUG
if (target->size() != size(target))
BUG("in addRoot 1");
#endif
return res;
}
target->setLeft( addRoot(target->left(), key) );
target->incSize(); // Should I?
RBSTNode* res = rightRotate(target);
#ifdef DEBUG
if (target->size() != size(target))
BUG("in addRoot 2");
#endif
return res;
};
/**
* This function is called from the public add(key) function,
* and returns the new root node.
*/
RBSTNode* RBST::randomAdd(RBSTNode* target, const Key& key)
{
countAdd++;
if (target == NULL)
{
m_size++;
return new RBSTNode(key);
}
#ifdef DEBUG
cout << "randomAdd(";
target->print(cout) << ", \"" << key << "\") called." << endl;
#endif
int r = (rand() % target->size()) + 1;
// here is where we add the target as root!
if (r == 1)
{
m_size++; // TODO: Need to lock.
return addRoot(target, key);
}
#ifdef DEBUG
printf("randomAdd recursion part, ");
#endif
// otherwise, continue recursing!
if (*target <= key)
{
#ifdef DEBUG
printf("target <= key\n");
#endif
target->setRight( randomAdd(target->right(), key) );
target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
if (target->right()->size() != size(target->right()))
BUG("in randomAdd 1");
#endif
}
else
{
#ifdef DEBUG
printf("target > key\n");
#endif
target->setLeft( randomAdd(target->left(), key) );
target->incSize(); // TODO: Need to lock.
#ifdef DEBUG
if (target->left()->size() != size(target->left()))
BUG("in randomAdd 2");
#endif
}
#ifdef DEBUG
printf("randomAdd return part\n");
#endif
m_size++; // TODO: Need to lock.
return target;
};
/////////////////////////////////////////////////////////////
///////////////////// DEL FUNCTIONS ////////////////////////
/////////////////////////////////////////////////////////////
/**
* Deletes a node with the passed key.
* Returns the root node.
* Decrements m_size if something was deleted.
*/
RBSTNode* RBST::del(RBSTNode* target, const Key& key)
{
countDelete++;
if (target == NULL) return NULL;
#ifdef DEBUG
cout << "del(";
target->print(cout) << ", \"" << key << "\") called." << endl;
#endif
RBSTNode* ret = NULL;
// found the node to delete
if (*target == key)
{
ret = join(target->left(), target->right());
m_size--;
delete target;
return ret; // return the newly built joined subtree!
}
// store a temporary size before recursive deletion.
unsigned int size = m_size;
if (*target < key) target->setRight( del(target->right(), key) );
else target->setLeft( del(target->left(), key) );
// if the previous recursion changed the size, we need to decrement the size of this target too.
if (m_size < size) target->decrSize();
#ifdef DEBUG
if (RBST::size(target) != target->size())
BUG("in del");
#endif
return target;
};
/**
* Joins the two subtrees represented by left and right
* by randomly choosing which to make the root, weighted on the
* size of the sub-tree.
*/
RBSTNode* RBST::join(RBSTNode* left, RBSTNode* right)
{
if (left == NULL) return right;
if (right == NULL) return left;
#ifdef DEBUG
cout << "join(";
left->print(cout);
cout << ",";
right->print(cout) << ") called." << endl;
#endif
// Find the chance that we use the left tree, based on its size over the total tree size.
// 3 s.d. randomness :-p e.g. 60.3% chance.
bool useLeft = ((rand()%1000) < (signed)((float)left->size()/(float)(left->size() + right->size()) * 1000.0));
RBSTNode* subtree = NULL;
if (useLeft)
{
subtree = join(left->right(), right);
left->setRight(subtree)
->setSize((left->left() == NULL ? 0 : left->left()->size())
+ subtree->size() + 1 );
#ifdef DEBUG
if (size(left) != left->size())
BUG("in join 1");
#endif
return left;
}
subtree = join(right->left(), left);
right->setLeft(subtree)
->setSize((right->right() == NULL ? 0 : right->right()->size())
+ subtree->size() + 1);
#ifdef DEBUG
if (size(right) != right->size())
BUG("in join 2");
#endif
return right;
};
/////////////////////////////////////////////////////////////
///////////////////// FIND FUNCTIONS ///////////////////////
/////////////////////////////////////////////////////////////
/**
* Tries to find the key in the tree starting
* search from target.
*
* Returns NULL if it was not found.
*/
RBSTNode* RBST::find(RBSTNode* target, const Key& key)
{
countFind++; // Could use private method only counting the first call.
if (target == NULL) return NULL; // not found.
if (*target == key) return target; // found (does string override ==?)
if (*target < key) return find(target->right(), key); // search for gt to the right.
return find(target->left(), key); // search for lt to the left.
};
#ifdef DEBUG
unsigned int RBST::size(RBSTNode* x)
{
if (x == NULL) return 0;
return 1 + size(x->left()) + size(x->right());
}
#endif
I'll save the SkipList for another time since it's already possible to find good implementations of a SkipList from the links and my version wasn't much different.
The graphs generated from the test-file are as follows:
Graph showing time taken to add new items for BloomFilter, RBST and SkipList.
graph http://haf.se/content/dl/addtimer.png
Graph showing time taken to find items for BloomFilter, RBST and SkipList
graph http://haf.se/content/dl/findtimer.png
Graph showing time taken to delete items for BloomFilter, RBST and SkipList
graph http://haf.se/content/dl/deltimer.png
So as you can see, the random binary search tree was rather a lot better than the SkipList. The bloom filter lives up to its O(k).
Consider the hash-based collections for this, e.g. HashSet, Dictionary, HashTable, which provide constant time performance for adding and removing elements.
More information from the .NET Framework Developer's Guide:
Hashtable and Dictionary Collection Types
HashSet Collection Type
Well, how much do you need to query it? A linked-list has fast insert/delete (at any position), but isn't as quick to search as (for example) a dictionary / sorted-list. Alternatively, a straight list with a bit/value pair in each - i.e. "still has value". Just re-use logically empty cells before appending. Delete just clears the cell.
For reference types, "null" would do here. For value-types, Nullable<T>.
You could use a Hashtable or strongly typed Dictionary<Client>. The client class might override GetHashCode to provide a faster hash code generation, or if using Hashtable you can optionally use an IHashCodeProvider.
How do you need to find the clients? Is a Tuple/Dictionary necessary? You're more than likely to find something that solves your problem in the Jeffrey Richter's Power Collections library which has lists, trees, most data structures you can think of.
I was very impressed by the Channel9 interview with Peter Sestoft:
channel9.msdn.com/shows/Going+Deep/Peter-Sestoft-C5-Generic-Collection-Library-for-C-and-CLI/
He is a professor at the Copenhagen IT University who helped to create the The C5 Generic Collection Library:
www.itu.dk/research/c5/
It might be overkill or it might be just the speedy collection you were looking for ...
hth,
-Mike