January 6, 2012

Here’s how I created a more secure bitcoin wallet with paper keys

by admin — Categories: Uncategorized — Tags: , , , 5 Comments

So I tested making a secure paper bitcoin key-pair today using some of this information as a starting point.  A key-pair contains a public key and a private key which are mathematically linked.  I wanted to do this because of a need for high security of the private half of the key pair.  If you are really going to move significant value into bitcoin, security of the key-pair is important.  Ironically, it seems like paper is one of the most secure ways to prevent the theft of the key.  It is possible to physically secure a print out and it takes a lot more work to get to it than just breaking into a computer and stealing a file, which for a hacker seems easy.

A bitcoin key-pair is a way to create an “account” from and to which you can spend and receive bitcoins.   The private key is required in order to spend money from the account.  The public key is required in order to send money to the account.   Generally one wants to keep the private key very secure so that other people can’t spend your bitcoins:  like cash that you put in the mail, once you send bitcoin you can’t easily get it back.

A “wallet” contains zero or more key-pairs.

It is also important that you don’t lose your private key.  That is considerably easier to manage as you can simply create several paper copies and distribute them in diverse secure locations.

To create a key-pair which is secure against theft and in particular spyware I did the following:

  • I used VirtualBox for Mac and created a new virtual machine on my computer that was not connected to the Internet.
  • I installed Ubuntu 11.10 on it from an .iso image.
  • I downloaded a copy of bitaddress.org, a client-side bitcoin key-pair generator onto a thumb drive
  • I mounted the thumb drive in the virtual machine
  • I opened the offline copy of bitaddress in the virtual machine
  • I generated a key-pair in a browser
  • I printed it to a .pdf on the thumb drive
  • I mounted the thumb drive on the host Mac and printed out the paper wallet.
  • Then I used the “shred” utility to destroy the pdf file.

Some possible places where this could be insecure.  My key would be compromised:

  • … if my host computer had some kind of spyware on it that was doing a live screen capture and sending it somewhere.  The screen capture would have the private key on it in the pixels that were shown from the virtual machine.  If a bad guy got the screen capture she could spend my bitcoin.
  • … if the code that I downloaded from bitaddress was compromised to give me a non-new or non-random key-pair.  Then whomever wrote the code could just try the key-pair they knew I was going to generate to spend my bitcoin.
  • … if I didn’t delete the pdf with the bitcoin address on it securely.  Then the bad guy could recover the document from my Trash can or forensically from the hard drive.
  • … if Time Machine captured the pdf and stored it for automatic backup.  (I didn’t move it off the thumb drive to avoid this.)  Then the bad guy could find the key in my backed up hard drive data.
  • … if the print process were compromised (e.g., as described here).  This is similar to the screen capture attack but with printers.

Finally to see that my system worked, I sent a small amount of bitcoin to the public key on my paper wallet.

  • In the process of doing this I realized that the QR codes that I printed out on my paper wallet were too small to scan.  This is actually really important because when you send coins, you don’t want to mistype the address you are sending them to.  You can never get them back.  So it’s better to automatically enter them.
  • I found that so far, it only looks like Mt. Gox will let you enter a private key to retrieve funds.  They treat it like you are redeeming a gift card which I think is clever.  At first, it wasn’t clear that it worked though, so I started looking for another option and found pywallet.  While I was messing with pywallet, Mt. Gox posted the balance from the private key and transferred the money from it to my Mt. Gox account.  This is great as long you trust Mt.Gox because they can send the bitcoins anywhere they like.
  • I also used the pywallet program.  This python script lets you add a private key to your local bitcoin wallet. In the process I edited my wallet while my bitcoin client was running (despite the warnings) and corrupted my wallet.  I lost an hour restoring my backed up wallet.  (Thanks Time Machine!)  In this process I learned that pywallet doesn’t yet work with encrypted wallets.  So to recover funds from a private key, start a new unencrypted wallet, import your private key with pywallet, immediately transfer the bitcoins from the private key to a key in an encrypted wallet.
  • I also ended up with a transaction in my bitcoin client that I didn’t understand.  That made me nervous.

I wasn’t very happy with how smoothly the first paper wallet went, so I tried it again using the lessons learned and it went much better:

  • I made a new paper key-pair in a virtual machine.
  • I printed it out as a pdf on the thumb drive
  • I moved the thumb drive to my Mac and verified that I could scan the codes when I printed it out at a higher scale.
  • I sent some bitcoin to the public key of the paper key-pair.
  • Once the transaction was confirmed I shut-down the bitcoin client, moved my normal encrypted wallet.dat file away
  • I restarted the Bitcoin client which created a new unencrypted wallet.dat to manage the paper key-pair.
  • I shut down the Bitcoin client
  • I loaded the paper private-key into the unencrypted wallet.dat using pywallet.py
  • I started the Bitcoin client with the unencrypted wallet.dat and sent the balance from the paper key-pair back to my normal wallet.
  • I shutdown the Bitcoin client
  • I replaced the original encrypted wallet.dat file and verified that bitcoin went back to my original wallet.

Mission Accomplished!

 

 

 

January 5, 2012

Building the current Bitcoin client on Lion

by admin — Categories: Uncategorized — Tags: , , , Leave a comment

So I went to update my bitcoin source code and found that it had significantly changed since the last time I posted about how to build it.  The most obvious change was the use of the Qt framework for the interface.  So now installing completely from source is beyond what I think is reasonable.  Instead there are some good instructions for how to build it from source+Macports located on StackOverflow here.  In particular the post by gavinanderson.

It’s much much simpler.  Also it appears the bitcoin client is in MacPorts as well now.  I still want the ability to monkey with the code for now.

August 8, 2011

Building the Bitcoin Client from Source on OS X Lion

by admin — Categories: Uncategorized — Tags: , , , , , , , , 2 Comments

by Don Patterson (donald.j.patterson.iii@gmail.com)

8/8/2011

These are the instructions for building a bitcoin client from source code on OSX Lion.

I’m sure I did not do everything the right way.   By that I mean, I don’t understand the logic of all the build systems involved, I don’t understand exactly what is strictly necessary, and I don’t know what is the fastest way to compile and run the bitcoin software.  But I spent 4 days trying to get this to work, so I’m hoping that I can save someone else the time and maybe it will help to adjust the build scripts by the people that maintain them.

Overview:

  • This was done on a 2.3 GHz Intel Core i7 MacBook Pro running Mac OS X 10.7
  • First you download the source code for bitcoin and other software libraries that it depends on.
  • Then you compile the libraries and create a binary with both i386 and x86_64 architectures supported.  Finally you compile the bitcoin client itself and optionally make it into a package.

Preliminaries:

  • First I will assume that you are building the code in a directory called <mybuilddir> and that bitcoin and the dependent projects are all siblings in the directory
  • This is not a line by line what-command-do-I-run-now kind of document.  You need to have some clue before trying to do this.
  • I put the compile jobs into script files so that I could repeat them during the trial and error process that I underwent.
  • If you see a “j8” option that made the make processes use 8 cores at once.  Don’t use that if you don’t have 8 cores.

Bitcoin source:

  • Get the main bitcoin source from https://github.com/bitcoin/bitcoin
    • You need to use git to do this.  A tutorial on git is beyond the scope of these instructions, but besides the command line options I like GitX
    • git clone https://github.com/bitcoin/bitcoin.git bitcoin
  • Uncompress the file into <mybuilddir>
  • Create a directory called <mybuilddir>/bitcoin/deps
    • <mybuilddir>/bitcoin should already be there

Dependencies:

boost 1.47.0

  • Download, build and install the boost C++ libraries
    • Get version 1.47.0 from http://www.boost.org/users/download/
    • Uncompress the files so that you have a directory called
      • <mybuilddir>/boost_1_47_0
    • run this bootstrap command:
      • /bootstrap.sh --prefix=<mybuilddir>/bitcoin/deps/
    • I put these commands into a script and ran them:
      • export CFLAGS="-arch i386 -arch x86_64 -O3"
        export SRCDIR=""
        export BUILDDIR=""
        
        ./bjam --clean
        ./bjam architecture=x86 address-model=32_64 macosx-version=10.6 macosx-version-min=10.6 link=static runtime-link=static --toolset=darwin --prefix=<mybuilddir>/bitcoin/deps -j8 --variant=release -a -q install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""

openssl

  • Download, build and install the openssl libraries
    • Get version 1.0.0d from http://www.openssl.org/source/
    • Uncompress the files twice and rename the directories one at a time so that you have two directories with the same code in them called
      • <mybuilddir>/openssl-1.0.0d-i386
      • <mybuilddir>/openssl-1.0.0d-x86_64
    • Then build this project twice once for each architecture
    • I put these commands into a script and ran them for i386:
      • export CFLAGS="-arch i386 -O3"
        export LDFLAGS="-arch i386 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        ./Configure \
        --prefix=<mybuilddir>/bitcoin/deps \
        --openssldir=<mybuilddir>/bitcoin/deps/openssl \
        darwin-i386-cc
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • I put these commands into a script and ran them for x86_64:
      • export CFLAGS="-arch x86_64 -O3"
        export LDFLAGS="-arch x86_64 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        ./Configure \
        --prefix=<mybuilddir>/bitcoin/deps \
        --openssldir=<mybuilddir>/bitcoin/deps/openssl \
        darwin64-x86_64-cc
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • Then you need to merge the binaries into one library file
      • cd <mybuilddir>/bitcoin/deps
      • for i in libcrypto.a libssl.a; do lipo -arch i386 ../../openssl-1.0.0d-i386/$i -arch x86_64 ../../openssl-1.0.0d-x86_64/$i -o lib/$i -create; file lib/$i; done;

miniupnpc

  • Download, build and install the miniupnpc libraries
    • Get version 1.5 (exactly) from http://miniupnp.tuxfamily.org/files/
    • Uncompress the files twice and rename the directories one at a time so that you have two directories with the same code in them called
      • <mybuilddir>/miniupnpc-1.5-i386
      • <mybuilddir>/miniupnpc-1.5-x86_64
    • Then build this project twice once for each architecture
    • I put these commands into a script and ran them for i386:
      • export CFLAGS="-arch i386 -O3"
        export LDFLAGS="-arch i386 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • I put these commands into a script and ran them for x86_64:
      • export CFLAGS="-arch x86_64 -O3"
        export LDFLAGS="-arch x86_64 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    •    Then I had to move stuff from deps/usr/* to deps/*
      • cd <mybuilddir>/bitcoin/deps
      • mv usr/bin/* bin
      • rm -r -f include/miniupnpc
      • mv usr/include/miniupnpc include/
      • mv usr/lib/* lib/
      • rm -r -f usr
    • Then you need to merge the binaries into one library file
      •  cd to <mybuilddir>/bitcoin/deps
      • for i in libminiupnpc.a; do lipo -arch i386 ../../miniupnpc-1.5-i386/$i -arch x86_64 ../../miniupnpc-1.5-x86_64/$i -o lib/$i -create; file lib/$i; done;

Then Berkeley db

  • Download, build and install the Berkeley DB libraries
    • Get version  4.8.30 from  http://freshmeat.net/projects/berkeleydb/
    • Uncompress the files twice and rename the directories one at a time so that you have two directories with the same code in them called
      • <mybuilddir>/db-4.8.30-i386
      • <mybuilddir>/db-4.8.30-x86_64
    • Then build this project twice once for each architecture
    • I put these commands into a script and ran them for i386:
      • export CFLAGS="-arch i386 -O3"
        export LDFLAGS="-arch i386 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        ../dist/configure \
        --prefix=<mybuilddir>/bitcoin/deps/ \
        --enable-cxx \
        --enable-stl 
        
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • I put these commands into a script and ran them for x86_64:
      • export CFLAGS="-arch x86_64 -O3"
        export LDFLAGS="-arch x86_64 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        ../dist/configure \
        --prefix=<mybuilddir>/bitcoin/deps/ \
        --enable-cxx \
        --enable-stl 
        
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • Then you need to merge the binaries into one library file
      • cd to <mybuilddir>/bitcoin/deps
      • for i in libdb_cxx.a libdb.a; do lipo -arch i386 ../../db-4.8.30-i386/build_unix/$i -arch x86_64 ../../db-4.8.30-x86_64/build_unix/$i -o lib/$i -create; file lib/$i; done;
      • for i in libdb_cxx-4.8.a; do lipo -arch i386 ../../db-4.8.30-i386/build_unix/.libs/$i -arch x86_64 ../../db-4.8.30-x86_64/build_unix/.libs/$i -o lib/$i -create; file lib/$i; done;

miniupnpc

  • Download, build and install the miniupnpc libraries
    • Get version 1.5 (exactly) from http://miniupnp.tuxfamily.org/files/
    • Uncompress the files twice and rename the directories one at a time so that you have two directories with the same code in them called
      • <mybuilddir>/miniupnpc-1.5-i386
      • <mybuilddir>/miniupnpc-1.5-x86_64
    • Then build this project twice once for each architecture
    • I put these commands into a script and ran them for i386:
      • export CFLAGS="-arch i386 -O3"
        export LDFLAGS="-arch i386 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • I put these commands into a script and ran them for x86_64:
      • export CFLAGS="-arch x86_64 -O3"
        export LDFLAGS="-arch x86_64 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=""
        export BUILDDIR=""
        
        make clean
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    •    Then I had to move stuff from deps/usr/* to deps/*
      • cd <mybuilddir>/bitcoin/deps
      • mv usr/bin/* bin
      • rm -r -f include/miniupnpc
      • mv usr/include/miniupnpc include/
      • mv usr/lib/* lib/
      • rm -r -f usr
    • Then you need to merge the binaries into one library file
      •  cd to <mybuilddir>/bitcoin/deps
      • for i in libminiupnpc.a; do lipo -arch i386 ../../miniupnpc-1.5-i386/$i -arch x86_64 ../../miniupnpc-1.5-x86_64/$i -o lib/$i -create; file lib/$i; done;

The last dependency is wxWidgets a GUI framework.  This was hard because you have to change the code to get it to compile

  • Download, build and install the wxWidgets libraries
    • Get version 2.9.2 from http://sourceforge.net/projects/wxwindows/files/2.9.2/
    • Uncompress the files and edit them before copying them into two directories.  You have to edit 3 files
      • dataview.mm
      • utils_osx.cpp
      • window_osx.cpp
    • The corrections below are a unified diff of the changes.  I am not arguing that these are the right bits of code.  Just that it will compile and work for you afterwards
      • --- wxWidgets-2.9.2/src/osx/cocoa/dataview.mm    2011-07-04 14:26:11.000000000 -0700
        +++ wxWidgets-2.9.2-i386/src/osx/cocoa/dataview.mm    2011-08-04 17:32:22.000000000 -0700
        @@ -664,7 +664,7 @@
        
        wxCHECK_MSG( model, nil, "Valid model in data source does not exist." );
        
        -    wxDataViewColumn* col(static_cast<wxDataViewColumn*>([[tableColumn identifier] pointer]));
        +    wxDataViewColumn* col(static_cast<wxDataViewColumn*>(static_cast<void*> ([[tableColumn identifier] pointer])));
        const unsigned colIdx = col->GetModelColumn();
        
        wxDataViewItem dataViewItem(wxDataViewItemFromItem(item));
        @@ -687,7 +687,7 @@
        {
        wxUnusedVar(outlineView);
        
        -    wxDataViewColumn* col(static_cast<wxDataViewColumn*>([[tableColumn identifier] pointer]));
        +    wxDataViewColumn* col(static_cast<wxDataViewColumn*>(static_cast<void *>([[tableColumn identifier] pointer])));
        
        col->GetRenderer()->
        OSXOnCellChanged(object, wxDataViewItemFromItem(item), col->GetModelColumn());
        @@ -1624,7 +1624,8 @@
        //
        -(void) outlineView:(NSOutlineView*)outlineView mouseDownInHeaderOfTableColumn:(NSTableColumn*)tableColumn
        {
        -    wxDataViewColumn* const col(static_cast<wxDataViewColumn*>([[tableColumn identifier] pointer]));
        +    wxDataViewColumn* const col(static_cast<wxDataViewColumn*>(static_cast<void *>([[tableColumn identifier] pointer])));
        +
        
        wxDataViewCtrl* const dvc = implementation->GetDataViewCtrl();
        
        @@ -1720,9 +1721,9 @@
        wxDataViewModel * const model = dvc->GetModel();
        
        wxDataViewColumn * const
        -        dvCol(static_cast<wxDataViewColumn*>(
        -                    [[tableColumn identifier] pointer]
        -                    )
        +          dvCol(static_cast<wxDataViewColumn*>(static_cast<void *>(
        +                     [[tableColumn identifier] pointer]
        +                    ))
        );
        const unsigned colIdx = dvCol->GetModelColumn();
        
        @@ -1760,7 +1761,7 @@
        {
        int const newColumnPosition = [[[notification userInfo] objectForKey:@"NSNewColumn"] intValue];
        
        -    wxDataViewColumn* const col(static_cast<wxDataViewColumn*>([[[[self tableColumns] objectAtIndex:newColumnPosition] identifier] pointer]));
        +    wxDataViewColumn* const col(static_cast<wxDataViewColumn*>(static_cast<void *>([[[[self tableColumns] objectAtIndex:newColumnPosition] identifier] pointer])));
        
        wxDataViewCtrl* const dvc = implementation->GetDataViewCtrl();
        
        @@ -1828,8 +1829,8 @@
        currentlyEditedRow = [self editedRow];
        
        wxDataViewColumn* const col =
        -        static_cast<wxDataViewColumn*>(
        -                [[[[self tableColumns] objectAtIndex:currentlyEditedColumn] identifier] pointer]);
        +        static_cast<wxDataViewColumn*>(static_cast<void *>(
        +                [[[[self tableColumns] objectAtIndex:currentlyEditedColumn] identifier] pointer]));
        
        wxDataViewCtrl* const dvc = implementation->GetDataViewCtrl();
        
        @@ -1866,8 +1867,8 @@
        if ( currentlyEditedColumn != -1 && currentlyEditedRow != -1 )
        {
        wxDataViewColumn* const col =
        -            static_cast<wxDataViewColumn*>(
        -                    [[[[self tableColumns] objectAtIndex:currentlyEditedColumn] identifier] pointer]);
        +            static_cast<wxDataViewColumn*>(static_cast<void *>(
        +                    [[[[self tableColumns] objectAtIndex:currentlyEditedColumn] identifier] pointer]));
        
        wxDataViewCtrl* const dvc = implementation->GetDataViewCtrl();
        
        @@ -1979,7 +1980,7 @@
        
        wxDataViewColumn* wxCocoaDataViewControl::GetColumn(unsigned int pos) const
        {
        -    return static_cast<wxDataViewColumn*>([[[[m_OutlineView tableColumns] objectAtIndex:pos] identifier] pointer]);
        +    return static_cast<wxDataViewColumn*>(static_cast<void *>([[[[m_OutlineView tableColumns] objectAtIndex:pos] identifier] pointer]));
        }
        
        int wxCocoaDataViewControl::GetColumnPosition(const wxDataViewColumn *columnPtr) const
        @@ -2323,7 +2324,7 @@
        
        for (UInt32 i=0; i<noOfColumns; ++i)
        if ([[columns objectAtIndex:i] sortDescriptorPrototype] != nil)
        -            return static_cast<wxDataViewColumn*>([[[columns objectAtIndex:i] identifier] pointer]);
        +            return static_cast<wxDataViewColumn*>(static_cast<void *>([[[columns objectAtIndex:i] identifier] pointer]));
        return NULL;
        }
        
        @@ -2358,7 +2359,7 @@
        indexRow    = [m_OutlineView rowAtPoint:   nativePoint];
        if ((indexColumn >= 0) && (indexRow >= 0))
        {
        -        columnPtr = static_cast<wxDataViewColumn*>([[[[m_OutlineView tableColumns] objectAtIndex:indexColumn] identifier] pointer]);
        +        columnPtr = static_cast<wxDataViewColumn*>(static_cast<void *>([[[[m_OutlineView tableColumns] objectAtIndex:indexColumn] identifier] pointer]));
        item      = wxDataViewItem([[m_OutlineView itemAtRow:indexRow] pointer]);
        }
        else
        
        ************************************************************************************************************************************
        --- wxWidgets-2.9.2/src/osx/utils_osx.cpp    2011-07-04 14:26:11.000000000 -0700
        +++ wxWidgets-2.9.2-i386/src/osx/utils_osx.cpp    2011-08-04 17:21:34.000000000 -0700
        @@ -66,10 +66,24 @@
        }
        
        #if wxOSX_USE_COCOA_OR_CARBON
        +
        // Returns depth of screen
        int wxDisplayDepth()
        {
        -    int theDepth = (int) CGDisplayBitsPerPixel(CGMainDisplayID());
        +    CGDirectDisplayID displayId = CGMainDisplayID();
        +    CGDisplayModeRef mode = CGDisplayCopyDisplayMode(displayId);
        +    size_t depth = 0;
        +
        +    CFStringRef pixEnc = CGDisplayModeCopyPixelEncoding(mode);
        +    if(CFStringCompare(pixEnc, CFSTR(IO32BitDirectPixels), kCFCompareCaseInsensitive) == kCFCompareEqualTo)
        +        depth = 32;
        +    else if(CFStringCompare(pixEnc, CFSTR(IO16BitDirectPixels), kCFCompareCaseInsensitive) == kCFCompareEqualTo)
        +        depth = 16;
        +    else if(CFStringCompare(pixEnc, CFSTR(IO8BitIndexedPixels), kCFCompareCaseInsensitive) == kCFCompareEqualTo)
        +        depth = 8;
        +
        +    //int theDepth = (int) CGDisplayBitsPerPixel(CGMainDisplayID());
        +    int theDepth = (int) depth;
        return theDepth;
        }
        
        ************************************************************************************************************************************
        --- wxWidgets-2.9.2/src/osx/window_osx.cpp    2011-07-04 14:26:11.000000000 -0700
        +++ wxWidgets-2.9.2-i386/src/osx/window_osx.cpp    2011-08-06 17:11:36.000000000 -0700
        @@ -1602,7 +1602,11 @@
        
        #if wxOSX_USE_COCOA_OR_CARBON
        
        -    InsetRect( &rect, -1 , -1 ) ;
        +    //InsetRect( &rect, -1 , -1 ) ;
        +    rect.top--;
        +    rect.bottom++;
        +    rect.left--;
        +    rect.right++;
        
        {
        CGRect cgrect = CGRectMake( rect.left , rect.top , rect.right - rect.left ,
    •  Now copy that directory and rename the directories one at a time so that you have two directories with the same code in them called
      • <mybuilddir>/wxWidgets-2.9.2-i386
      • <mybuilddir>/wxWidgets-2.9.2-i386
    • Then build this project twice once for each architecture
    • I put these commands into a script and ran them for i386:
      • export CFLAGS="-arch i386 -O3"
        export LDFLAGS="-arch i386 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=<mybuilddir>/bitcoinBuild/wxWidgets-2.9.2-i386
        export BUILDDIR="$SRCDIR/macbuild"
        
        cd "$SRCDIR" &&
        [ -f include/wx/hashmap.h.orig ] || cp include/wx/hashmap.h include/wx/hashmap.h.orig &&
        sed 's/if wxUSE_STL/if 0 \&\& wxUSE_STL/g' < include/wx/hashmap.h.orig > include/wx/hashmap.h &&
        [ -f include/wx/hashset.h.orig ] || cp include/wx/hashset.h include/wx/hashset.h.orig &&
        sed 's/if wxUSE_STL/if 0 \&\& wxUSE_STL/g' < include/wx/hashset.h.orig > include/wx/hashset.h 
        
        make clean
        rm -vrf "$BUILDDIR" && mkdir "$BUILDDIR" && cd "$BUILDDIR" &&
        ../configure --prefix="$PREFIX" \
        --with-osx_cocoa \
        --disable-shared \
        --disable-debug_flag \
        --with-macosx-version-min=10.5 \
        --enable-macosx_arch=i386 \
        --enable-stl \
        --enable-utf8 \
        --with-libjpeg=builtin \
        --with-libpng=builtin \
        --with-regex=builtin \
        --with-libtiff=builtin \
        --with-zlib=builtin \
        --with-expat=builtin \
        --with-macosx-sdk=/Developer/SDKs/MacOSX10.6.sdk 
        
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • I put these commands into a script and ran them for x86_64:
      • export CFLAGS="-arch x86_64 -O3"
        export LDFLAGS="-arch x86_64 -O3"
        export PREFIX="<mybuilddir>/bitcoin/deps"
        export SRCDIR=<mybuilddir>/bitcoinBuild/wxWidgets-2.9.2-x86_64
        export BUILDDIR="$SRCDIR/macbuild"
        
        cd "$SRCDIR" &&
        [ -f include/wx/hashmap.h.orig ] || cp include/wx/hashmap.h include/wx/hashmap.h.orig &&
        sed 's/if wxUSE_STL/if 0 \&\& wxUSE_STL/g' < include/wx/hashmap.h.orig > include/wx/hashmap.h &&
        [ -f include/wx/hashset.h.orig ] || cp include/wx/hashset.h include/wx/hashset.h.orig &&
        sed 's/if wxUSE_STL/if 0 \&\& wxUSE_STL/g' < include/wx/hashset.h.orig > include/wx/hashset.h 
        
        make clean
        rm -vrf "$BUILDDIR" && mkdir "$BUILDDIR" && cd "$BUILDDIR" &&
        ../configure --prefix="$PREFIX" \
        --with-osx_cocoa \
        --disable-shared \
        --disable-debug_flag \
        --with-macosx-version-min=10.5 \
        --enable-macosx_arch=x86_64 \
        --enable-stl \
        --enable-utf8 \
        --with-libjpeg=builtin \
        --with-libpng=builtin \
        --with-regex=builtin \
        --with-libtiff=builtin \
        --with-zlib=builtin \
        --with-expat=builtin \
        --with-macosx-sdk=/Developer/SDKs/MacOSX10.6.sdk 
        
        make -j8
        make install
        
        export CFLAGS=""
        export LDFLAGS=""
        export PREFIX=""
        export SRCDIR=""
        export BUILDDIR=""
    • Then you need to merge the binaries into one library file
      • cd to <mybuilddir>/bitcoin/deps
      • for i in libwx_baseu-2.9.a libwx_baseu_net-2.9.a libwx_baseu_xml-2.9.a libwx_osx_cocoau_adv-2.9.a libwx_osx_cocoau_aui-2.9.a libwx_osx_cocoau_core-2.9.a libwx_osx_cocoau_gl-2.9.a libwx_osx_cocoau_html-2.9.a libwx_osx_cocoau_media-2.9.a libwx_osx_cocoau_propgrid-2.9.a libwx_osx_cocoau_qa-2.9.a libwx_osx_cocoau_ribbon-2.9.a libwx_osx_cocoau_richtext-2.9.a libwx_osx_cocoau_stc-2.9.a libwx_osx_cocoau_xrc-2.9.a libwxexpat-2.9.a libwxjpeg-2.9.a libwxpng-2.9.a libwxregexu-2.9.a libwxscintilla-2.9.a libwxtiff-2.9.a libwxzlib-2.9.a; do lipo -arch i386 ../../wxWidgets-2.9.2-i386/macbuild/lib/$i -arch x86_64 ../../wxWidgets-2.9.2-x86_64/macbuild/lib/$i -o lib/$i -create; file lib/$i; done;

Finally bitcoin itself

  • Change a line in the makefile.osx file
    •  DEPSDIR=<mybuilddir>/bitcoin/deps
  • I put these commands into a script and ran them for x86_64.  Please note that this will create a Bitcoin.app on your Desktop with the compiled binary in it.
    • export CFLAGS="-arch x86_64 -O3"
      export LDFLAGS="-arch x86_64 -O3"
      export PREFIX=""
      export SRCDIR="<mybuilddir>/bitcoin"
      export BUILDDIR=""
      
      make -f makefile.osx clean
      make -f makefile.osx -j8
      
      cp -pR $SRCDIR/contrib/Bitcoin.app ~/Desktop
      cp $SRCDIR/src/bitcoin ~/Desktop/Bitcoin.app/Contents/MacOS/bitcoin
      
      export CFLAGS=""
      export LDFLAGS=""
      export PREFIX=""
      export SRCDIR=""
      export BUILDDIR=""

So thanks for reading.  If you have any suggestions for changes or note any errors, let me know in the comments.  I assume that much of this will be added to the existing build scripts before too long.

References

Thanks to these posts for shoulders to stand on as well as the docs in the bitcoin/docs and bitcoin/contrib directories:

  1. wxWidgets:
    1. https://trac.macports.org/ticket/30272
    2. https://groups.google.com/d/topic/wx-dev/lVRXRw6y3ag/discussion

July 17, 2010

Resizing a VMWare LVM partition

by djp3 — Categories: General Info1 Comment
disk drive

Photo courtesy of Tazintosh

My problem was that I was running a virtual machine in VMWare that didn’t have enough hard drive space. The underlying host machine did have more hard drive space. I wanted to increase the amount of disk space on the hosted virtual machine. Because linux tools (the main one being gparted) do not support increasing the size of an LVM partition this was hard.

I understand how to do this now, but figuring out the right sequence of things to do required me to learn about LVM (logical volume management) and was a bear.

For starters:

The host machine was running Ubuntu Linux 2.6.28 server edition.

The virtual machine was runnning Ubuntu Linux 2.6.24 desktop edition.

The first thing to do was to increase the size of the disk that VMWare was giving to the virtual machine. I did that in the VMWare control panel while the virtual machine was shutdown. Alternatively I could have created a new hard disk for the virtual machine. It would have ended up being the same, although I don’t know if there is a performance difference between the two.

Next I booted the virtual machine off of a live Ubuntu 10.04 iso image. This gave me a gui environment running on the virtual machine, but didn’t mount the disk.

If gparted supported resizing an LVM partition I would have been nearly done. I could have opened gparted and resized the partition into the newly available space. gparted did not support resizing LVMs, so I had to learn more.

What I needed to do next was explained by this image taken from here.

lvm_df_2.jpg

From the gui, I opened a terminal window and got root access, “sudo su”.

I created a new partition in the empty space using “fdisk”. I set the partition type to “8e” which is “LVM”. Then I wrote out the changes. At this point I had to reboot my virtual machine back into the live CD environment in order to get the changes to be seen when I ran “fdisk -l”.

Next I made the new partition into a physical volume, (I ran “apt-get install lvm2″ first) with the command, “pvcreate /dev/sda3″. Verified it worked with “pvdisplay”

Then I added the new physical volume to the existing volume group, “vgextend Ubuntu /dev/sda3″. I verified that it worked with “vgdisplay”.

Then I extended the logical volume to include the new space with “lvextend -l +100%FREE /dev/Ubuntu/root”. I verified that it worked with “lvdisplay”

Then I extended the underlying filesystem into the new space after cleaning it up with, “e2fsck -f /dev/Ubuntu/root” and then “resize2fs -f /dev/Ubuntu/root”

Then I ejected the .iso image from the virtual machine, rebooted it and the new disk was bigger! Yay! (In reality this took me a lot longer to figure out.)

One other useful command was “vgchange -a y” this enabled the Live CD Linux version to mount /dev/Ubuntu/root.

Thanks to this website for some insight into this.

Along the way I also shrunk the LVM partition and it’s about the same thing in reverse, but you just have to make sure that you don’t shrink things so much that you lose data. There are options in the tools that find out exactly how far you can go which was helpful. man pages are your friends.

May 31, 2010

Cell-Phone Activity For Predicting Earthquakes

by djp3 — Categories: Paper ReviewLeave a comment
haiti earthquake

Photo courtesy of United Nations Development Programme

An organization called Artificial Intelligence for Development (AI-D) recently had a workshop at Stanford which looked at ways that data driven statistical modeling and prediction could be used to help under-served populations in the developing world.

One of the papers that came out of that conference was called “People, Quakes, and Communications: Inferences from Call Dynamics about a Seismic Event and its Influences on a Population“. It was written by researchers at Microsoft and at the Santa Fe Institute, Ashish Kapoor, Eric Horvitz, and Nathan Eagle. They are all terrific researchers who have a much better track record than I do.

The basic foundation for the paper was a large dataset of calls that were made in Rwanda in 2008. During this time frame an earthquake with a 5.9 magnitude occurred. The paper examined ways of determining:

1. Whether an earthquake happened.

2. Where it happened.

3. Where you need more information to reduce uncertainty.

In terms of determining where an earthquake happened, the paper actually recognized that something unusual happened that affected call patterns. It was not correct to say that it recognizes earthquakes, but when applied to the data that contained an earthquake, it did recognize the earthquake as out of the usual.

In terms of determining where it happened, the paper assumed a model in which the the degree of unusualness fell off with the proposed distance from the closest call tower. The problem of finding where the earthquake occurred then reduced to a search over all locations for the location that best explained the observed disruption. I think this approach would work for a brief moment after an earthquake occurs, when call pattern disruption is dominated by calls in and out of the immediately affected area, but I think that very quickly calls would start being made all over the place as word spread regionally. This approach also assumes that an earthquake only really affects one location, that locations are defined by the cell phone towers and that there is a cell phone tower where the earthquake occurs.

In terms of the final contribution, I didn’t understand the paper’s approach. At a high level it was a decision theoretic approach that argued for surveying unknown locations that reduce uncertainty the most. But in terms of cell-phone data, how is it that you don’t have the data already? What does it mean to go survey an unknown location for cell-phone activity? The math was pretty complex when you start to apply the DT approach, so I may have missed something here. Nevertheless the motivation was cool. Wouldn’t it be nice to have a disaster management computer that told you where you would benefit the most by sending scouts?

Despite my critique, this paper was exciting because it contained real meaningful data and took a well-grounded stab at solving an important and hard problem.

March 20, 2010

Ripping a TurboTax DVD image to an .iso file on a Mac

by djp3 — Categories: General InfoLeave a comment
turbotax.jpg

I have a copy of TurboTax that I want to install on a Windows virtual machine. In order to do that I need to insert the DVD in the underlying host hardware first. Unfortunately that requires physical access to the host hardware which I don’t have right now. Instead what I want to do is to send a software image (.iso file) of the DVD to the host machine over the network, have the host machine mount the .iso file and then my virtual machine will think the DVD is inserted in it’s hardware. Further complicated matters is the fact that I have a MacBook Pro running OSX 10.5.8.

Since the Mac mounts the disk in a different way from the way Windows mounts the disk, I first need to make my Mac mount the disk like a Windows box would. Here is what I did based on this hint:

  1. I inserted the disk into my Mac
  2. executed:mount, to figure out which disk I was working with
  3. executed:sudo umount /Volume/TurboTax\ Premier\ 2009, to disconnect from the disk without ejecting it
  4. executed:mkdir /Volumes/Turbo.win, as a new mount point
  5. executed:mount_cd9660 -er /dev/disk2 /Volumes/Turbo.win, to mount the disk as a Windows box would.

Now I can cd to Turbo.win and see all the windows files that were hidden from me when I used the natural Mac mounting technique of inserting-the-disk-in-the-drive.

So I now I need to rip the DVD into an .iso file which I based on this hint

  1. execute: dd if=/dev/disk2 of=myFile.iso

Voilá all done. Then I compressed the file. Sent it across the network. Decompressed it. Mounted it, and installed TurboTax on my virtual machine.

A few cleanup steps are to eject the disk on my Mac. diskutil eject /dev/disk2, and to erase the mount point, rmdir /Volumes/Turbo.win

March 11, 2010

Fixing a broken built-in iSight on a MacBook Pro running 10.5.8

by djp3 — Categories: General InfoLeave a comment
FirefoxScreenSnapz002.png

Props the telephone tech support at Apple who treated me like I had a brain and quickly helped me to fix my problem with my broken MacBook Pro built-in iSight. He said that the following procedure fixes about 40% of built-in iSight failure cases.

  1. Shut down your computer
  2. Disconnect the power adaptor
  3. Remove the battery
  4. Press the power button for 10 seconds. Nothing visible will happen, but this is resetting something inside the computer
  5. Put the battery back in
  6. Attach the power adaptor again
  7. Hold down the following four keys, Option-Command-P-R, and hit the power button
  8. Wait until you hear the second boot-up chime noise
  9. Let go of the four keys
  10. Log into your user account as normal
  11. Test to see if the camera works by running PhotoBooth

Thank you Apple, you did a great job!

March 2, 2010

Real-Time is Prime-Time for Scams

by djp3 — Categories: MusingsLeave a comment
bridge accident

Photo courtesy of th.omas

I had a brief panic earlier this week. For a few hours I thought the Internet
was on the verge of collapse. The sudden concern was brought about by trying to
find season 1 episode 1 of “Glee” online somewhere.

My first thought was to go to Hulu.com as this has become my new authority for
legal online television. It turns out that only the last 5 episodes of Glee are
available on Hulu. A fascinating conversation with a Fox executive taught me
about how that is the result of a legacy method of licensing the content
for television programs that has been shoehorned into the Internet TV age. It
used to be that shows followed a path from brand-new to syndication in a
well-ordered manner which doesn’t match well with the expectations the
public have of finding all shows archived on the Internet somewhere.
That, however, is tangential to my panic.

Unsuccessful on hulu, I started doing just a general Google search and turned up
many many many pages purporting to be Glee Episode 1 Season 1, but were really
just gateway videos to … you guessed it … porn.

I immediately related this experience to the resulting aftermath of the Haiti
and now Chile earthquakes in which immediately following both disasters,
internet sites sprang up which fraudulently offered to take donations on behalf
of victims or redirect you to their issue / product of primary concern which was rarely related to the disaster.

A final example of this effect comes from Twitter. In the Twitterverse whenever
a meme is created, usually with a hashtag, it is not long after that the
griefers and scammers show up. They post their VIAGRA ad with the hottest
twitter meme hashtag and destroy the conversation for everyone else.

My panic peaked at this point? How has the internet survived so long in the face of this stuff? Has it just grown to the point where this is now economically feasible? Are we in a new era of the web which looks like the spam-era of email? As part of the
work that I’ve been doing on Information Retrieval I was able to consider how
powerful the signal from PageRank must be to overcome this: To have been overcoming this for so long.

PageRank, is a
technique in which links from one page to another confirm authority on the
destination page. The paths that people can take through the Internet by following links therefore reveal a
great deal about where the good content is and where the bad content is. The
links represent the efforts of human curation on the Internet. Every link that
you put on your web page helps PageRank sifts the garbage from the gold.

However, this doesn’t work with real-time information because PageRank is pretty
slow. It takes time for people to add those links. It takes time to figure
out the shape of the Internet and it takes time to report the results back to
Internet searchers. Apparently it takes more than half a season of Glee,
because I can only find garbage today.

So PageRank works well for archived data, what can work for real-time data?
Maybe social networks can. If you can leverage social networks to immediately
vote on the content being created by the real-time web, then perhaps the social
network can replace PageRank for ephemeral data. All that remains is a way of
figuring out what people think is good or bad in the same way that looking at a
link tells you whether people think content is good or bad.

So what started out as a panic that the Internet was about to collapse, really gave me a new appreciation for PageRank. In some ways the link structure of the Internet is the social network that we have been leveraging all along. My panic also made me realize that we need a new signal for real-time ephemeral data – like news and tweets – to sift the good from the bad. My panic has subsided now that I know the shape of the problem a little better. I think the problem is large, but it would be cool to solve it.

February 26, 2010

Twitter During Emergencies

by djp3 — Categories: Paper ReviewLeave a comment
people looking at the fargo flood


Photo courtesy of DahKohlmeyer

A few months ago on the University of California, Irvine campus we had an
incident in which a student wearing camouflage was seen walking onto campus from
secondary road coming from a somewhat remote area (grain of salt: we are in Irvine of course). Given the stories about
violence on campus that are consistently told in the main stream media, this
caused people who saw him to become alarmed. The fear was that his intention was
to massacre a number of students in a Virgina Tech style rampage.

What followed was an explosion of real-time information sharing. It was an
extremely heterogeneous mix of media that was involved however. One system that
was involved is called ZotAlert. It is a terrific text-message based system
that is used by the UCI Police to send text messages about emerging danger to
the entire University community. It has been used to warn about violent
incidents occurring in and around campus as well as burglaries and other crimes
in a very short window of time after they are reported.

Of course, individual text messaging, facebook, twitter, email and a variety of
less well known social media were involved in the explosion of
information. But so were face to face conversations and standard phone calls.

I remember that the first information that I received was from my wife who was
talking to a friend who had received a text message from another friend relaying
information from mobile phone call with her daughter! The message that was relayed was
that they were in lock-down at the pool and there was a guy running around with a gun
shooting people.

Then I read a Facebook report that the guy had a Nerf-Gun. I synthesized and retweeted those two bits of
information and was emailed by a reporter from the O.C. Register who wanted to
confirm my update that he saw on Facebook. I, of course, couldn’t confirm it as I was just passing
along information that I had heard.

Later, a student wearing camouflage was arrested in the student union which was
also in lock down at the time. It eventually turned out it was the wrong guy who made an unfortunate fashion choice that morning.

Hours later the information flood settled down and it was revealed that the guy with
the gun was a student with a paint-ball gun who was shooting paint balls in the
field and was going home. He was mostly oblivious to the craziness going on
around him because he wasn’t online and he wasn’t apprehended in any reasonable
amount of time. He eventually apologized profusely for being dumb enough to carry a paint ball gun onto a college campus and being seen.

I had a couple of observations about this event. The first was that the
ZotAlert system was very authoritative, and not surprisingly was slower to send
out the facts. This was, hopefully, because they were trying to make sure they
actually had facts.

Another observation was that the social media was extremely effective at getting
out the word that something was going on. The subject and accuracy of
the something varied to a great degree. When it comes to a potentially dangerous
situation like this could have been I think that this was a success. Even if
you can’t communicate the right information, you would like everyone to be on
guard and in the right frame of mind to respond appropriately when they get
first hand information.

Lastly it was interesting to see how much this media space is fractured. Every
social media tool I was involved with was lighting up. No tool had a monopoly
on the communication. They were each used to individual strengths and to
communicate to particular people. It appears that our community has a pretty
good sense of which tools different people pay attention to. So when I want to
reach my wife I text message, but if I want to reach my department I send an
email.

In the paper “Chatter on the Red: What Hazards Threat Reveals about the Social
Life of Microblogged Information”
by Starbird, Palen, Hughes and Vieweg and
published in CSCW 2010, the authors look formally at some of these effects.

Their data source were tweets that were sent out around the time of a 2009 flood
in the Red River Valley on the U.S/Canada border. This event lasted for several
months, so the nature of the information was much less about being individually safe for the
next few hours and much more about being safe as a community for weeks.

They commented on the fractured and heterogenous nature of social media:

“Collection and analysis of large data sets generated from
CMC [Computer Mediated Communication] during newsworthy events first reveals an utterly
unsurprising observation: that publicly available CMC is
heterogeneous and unwieldy. …
Our tweet-by-tweet analysis of the Local-
Individual Users Streams indicates that most are
broadcasting autobiographical information in narrative
form, though many contain elements of commentary and
the sharing of higher-level information as well. Even as
some Twitterers shift focus to the flood, most continue
tweeting within their established Twitter persona.”

and although they had contradictory comments about what Twitter was this
quote reinforced my view that as a technology, Twitter is an infrastructure for low-bandwidth multicast.


Twitter, a new incarnation of computer mediated chat, is a
platform without formal curation mechanisms for the
massive amount of information generated by its
(burgeoning) user base. There is no rating or
recommendation system support—key features of
commerce sites like Amazon and information aggregators
like Digg. Nor is there a complex system of validation that,
for example, Wikipedia has implemented. Also unlike
Wikipedia, content passed through Twitter is short-lived,
and therefore cannot be discussed, verified and edited.
While most social media have “places” for interaction,
interaction in Twitter occurs in and on the data itself,
through its distribution, manipulation, and redistribution.
Without regular retransmission, communications quickly
get lost in the noise and eventually die off.”

Another difference between UCI and this flood, was that the time scale allowed people to do
more self-organization and to create more digital tools to help manage the
information flow than would be normal over the course of hours.

The authors reinforced a belief about geo-located data that I previously blogged about
which is that there is nothing about twitter which should make you think that
localized data is really more local. It is a means to broadcast and subscribe
and what you do on top of that is communicate. Just like all communication it
is human centered and not easily parsable by a machine. So the researchers
spent a lot of effort curating a set of tweets that were related to the
flooding. There was one caveat which I’ll mention below

Some interesting facts that emerged from their study were that only about 10% of
the tweets that were in the dataset about the floods were original. And of that
10%, it was split between autobiographical narrative format and knowledge
introduction. This is the same pattern of use seen in “Is It Really About Me?
Message Content in Social Awareness Stream”
between Meformers and Informers.

A curious note about this dataset though was that the localized data was three
times as likely to be original. This is a reasonable expectation given the
dataset but speaks to a place in which the merging of local and localized data
does occur.

Those that weren’t original were sometimes synthesizing other tweets. This
included editing, curating and synthesized the data from others. Then another
group of people were posting educational tweets related to the events unfolding.

A final interesting behavior that was observed was the sensor stream to twitter
account phenomenon in which some talented folks connected a sensor measuring
flood levels to a twitter stream which periodically tweeted data. This is
something I would like to explore in much greater depth.

February 25, 2010

Using social networks to guide recommendation systems

by djp3 — Categories: Paper ReviewLeave a comment
punk bffs

Photo courtesy of Walt Jabsco

The problem of trying to incorporate social networks into collaborative filtering
recommendations seems to be a hot research topic right now. The basic idea of
this problem is that one may have a dataset consisting of many different ratings
by a user of a thing, like a movie or product, which takes on a number from 0 to
1. What we would like to do is to predict how much a given user will like
something which they have never rated before.

In collaborative filtering the approaches have two axes user/item similarity and
memory based or model based approaches.

The first axis describes what kind of similarity you are leveraging in order to
make your recommendations. User similarity asserts that people who have rated
things similar to you in the past are likely to rate a new thing in the same
way. Item similarity assets that you are likely to rate a new thing the same way
that you have rated similar things in the past. The first requires a way of
determining whether users are similar, the latter a way of determining whether
items are similar. In either case you can just use your ratings themselves as
the basis for similarity or you can use some external knowledge to judge
similarity.

The second axis describes how you store the information from which you base your
decisions. In a memory based approach, you just keep all of your rankings
around and when it comes time to make a new rating on an unseen item, you go to
your data and do your analysis. In a model-based approach, every time a new
rating is observed, a new model is constructed for a user. When it is time to
rate a new unseen item, then the model is consulted for a rating.

Two recent papers that explore these issues are href="http://doi.acm.org/10.1145/1571941.1571977">“On Social Networks and
Collaborative Recommendation” by Konstas, Stathopoulus, and Jose and href=http://doi.acm.org/10.1145/1571941.1571978">“Learning
to Recommend with Social Trust Ensemble” by Ma, King and Lyu, both from
SIGIR 2009.

The first paper, “On Social…” undertook the task of trying to create a list of
songs that a user would like based on their previous history of listening to
songs in Last.fm, who their friends are (and their history), and then a
collection of tags which applied to users and music. There approach was an
attempt to merge both user and item similarity with a social network in a memory
based approach. Because they ultimately created a play list sorted by
most-likely-to-be-liked, they had a hard time comparing their results to
traditional collaborative filtering systems which produce a hypothetical
how-much-do-you-like-a-song rating for every song given a user.

Their approach was very interesting to me because they basically created a graph
in which users, songs and tags were nodes and the relations between them were
represented as weighted edges in a graph. Then they ran the
PageRank algorithm over the personalized graph, pulled out the songs in the graph that were mostly highly rated and that was the recommendation. The weights on the edges of the graph required some black magic tuning.

It appears that any memory-based system that has significant computation, like
the previous one, suffers from a real-time response challenge. If you are
creating a system like Pandora, it’s not really a problem because you have quite
a bit of time in which to pick the next song about as long as the current song takes to play.

But if your system is more of a query-response system where you ask “Will I like
song X?” then you really only have milliseconds to get an answer. This suggests
to me that a model based approach is nearly required in fast-response systems.

The second paper, “Learning to Recommend…” was similar in spirit but very
different in execution. The goal of the authors of this paper was to create a
recommendation system of products based on the reviews in Epinions.com. The key
feature that the authors wanted to include was a social-recommendation
component and the basic assumption they were exploring: What you like is based on a
combination of your own tastes and the tastes of your social network. When it
comes to epinions this work shows that to be true in a 40/60 split respectively.

So they cast the problem of collaborative filtering as a graphical
model and used results that showed how the matrix manipulations associated with
collaborative filtering can be solved as just such a model. Then they showed how
as social graph can also be cast as a graphical model. So the first graphical
model says how likely you are to like something based on your previous history
of ratings and the second component says how likely is your social network
likely to like something. Then they combined the graphical models and derived
an optimization technique for finding a local optimal solution to the matrix
problem in the beginning. The result was a model that performed a query on an
item quickly and did better than simply looking at a users tastes by themselves,
or a social networks tastes by themselves.

© 2012 Codex Caelestis All rights reserved - Wallow theme v0.46.5 by ([][]) TwoBeers - Powered by WordPress - Have fun!