I, Hacker

Hungry hungry macros 

Superpacking JS Demos

Background

The techniques I'm going to be describing here were created and/or implemented to pack my entry to the Mozilla Demoparty: Magister. Huge thanks to my friend Nicolás Alvarez for helping squeeze every last byte out of this.

Getting started

So you have a demo in JS. It's pretty. It's perfect. It's 3k in a 1k competition. Well, damn.

You start with the obvious and run it through a minifier and you shorten all your variable names to a single character, you get it to 2500 bytes. Great, that's progress, but you've still got 1476 bytes to go. You merge some functions together, fold a couple loops into each other, and soon you're at 2200 bytes. Long way to go.

Rule #1 of shrinking demos: removing a byte becomes more difficult every time you remove a byte. It starts off trivial and runs very quickly into a brick wall.

Quick wins

These are a couple useful techniques to trim the fat at this point.

var a = g.getAttribLocation(x, 'pos');
g.uniform1f(g.getUniformLocation(x, 'time'), t);
g.vertexAttribPointer(a, 2, g.FLOAT, 0, 0, 0);
g.enableVertexAttribArray(a);

What's wrong with this picture? Well, if we ignore whitespace (minification takes care of that): we're using var (pfft, correctness), we have a shader attribute with a 3-byte name and a uniform with a 4-byte name, we're using g.FLOAT (more on this in a moment), and we've got a lot of zeros.

g.vertexAttribPointer(
    a = g.getAttribLocation(x, 'p'), 
    2, 
    5126, 
    g.uniform1f(g.getUniformLocation(x, 't'), z), 
    0, 
    0
)

Much better! We killed var removing 4 bytes, we removed 2 bytes by inlining the assignment to a in the first use and killing the semicolon, we eliminated a zero (the return of uniform1f is treated as a zero by the WebGL API in this case) and a semicolon for a savings of 2 bytes, we changed our shader attributes/uniforms to have 1-byte names (saving 5 bytes), and we inlined the FLOAT constant for a savings of 3 bytes. That's 16 bytes removed from a block of 151 (after whitespace removal), or a reduction by 10.59%! Let's do it again.

Before we move on, I just want to make a note that WebGL constants really are just that. You can inline them and trust them not to change.

Abusing globals

Look at the latest code above. Count the instances of 'g.'. Four instances just in that tiny little snippet. If we could get rid of those in a small way, this could be a huge win, so let's do it.

What is the global namespace in JS? Normally, it comes from the window object. So when you call foo(), it's going to look in window if you haven't declared it locally. So why don't we shove our methods into the window object? Or even better, into top (in our case, they're identical)?

Right now the declaration of g is as such:

g = z.getContext('experimental-webgl')

But let's change it to:

for(k in g = z.getContext('experimental-webgl'))
    top[k] = g[k].bind && g[k].bind(g);

So we walk over every key in the WebGL context and put it into top, but only if it has the bind method and we can bind it to the WebGL context. Without that, it tries to treat the window as a WebGL context. The window doesn't like this.

At this point, we just cut all instances of 'g.' out of the picture. This was a win of around 30 bytes (counting the cost of the globalization code) in my demo's case, but we can go much, much further.

What's in a name?

Let's look at a list of all the WebGL methods I called in my demo:

attachShader
createProgram
linkProgram
createBuffer
bindBuffer
bufferData
viewport
createShader
shaderSource
compileShader
useProgram
getAttribLocation
getUniformLocation
uniform1f
uniform2f
vertexAttribPointer
enableVertexAttribArray
drawArrays

Holy moly, that's a lot of bytes! But these are defined in WebGL, how can we change this?

Welllll... let's look back at our globalization code. We get a name, k, and then bind g[k] into top[k]. But we control what k goes into top. Having a map of long name into short name would be expensive, but what about a regex? I'm going to spare you the gory details, but after a while of tinkering with it, I determined that the optimal code for this is:

t[k.slice(1, -5).replace(/[ntalruicoh]/ig, '')] = t[k] = g[k].bind && g[k].bind(g);

In this case, t is top (because hey, 4 bytes!). We still assign the original name into t, but we also put the sliced and diced name into it. Why? Because this mangles two names we need to disambiguate: uniform1f and uniform2f. But that's fine.

So what do the names look like after this?

S
eeP
kP
eeB
dB
ffe
e
eeS
deS
mpeS
seP
eb
efm
f
f
eexbP
beVeexb
w

Night and day difference, as you can see. You'll obviously want to change the regex if you do this yourself, since your code will have different balances.

Other random wins

Arrays of digits tend to be pretty wasteful:

new Float32Array([0,0,2,0,0,2,2,0,2,2,0,2])

What about this instead?

new Float32Array('002002202202'.split(''))

The more digits you have, the bigger the win from this replacement is. Until we get to compression and everything changes.

Compression

Ok, we all know HTTP requests are trivially compressible, and this can improve load times and blah blah blah. Doesn't matter when it comes to demos: if the size on the wire is 1k and the size on disk is 3k, your demo is 3k.

In the past, people have compressed their demos into images. This is an easy way to get a good size reduction -- PNGs are just zlib'd data with a little header, basically. But these demos all had a PNG and an HTML+JS file that would load the PNG, draw it to a canvas, pull the pixels out of the canvas as a string, and eval that string. How can you make this into a single file demo?

Introduction to PNGs

The PNG file has an 8-byte signature followed by a series of simple chunks. The chunk format is as follows:

4 byte length
4 byte chunk type
<length> byte chunk data
4 byte CRC

Most of these are pretty clear, except chunk type. Chunk type is a FourCC, e.g. 'IHDR', which is mostly generic, with some exceptions which I'll talk about shortly.

I'm not going to go into detail on the IHDR chunk (we can't really mess with it, but I will say I use greyscale for simplicity purposes), but here's what we start with.

8 byte signature
13 byte IHDR with 12 byte chunk header
X byte IDAT (covered below) with a 12 byte chunk header

The IDAT format is dead simple: 1 byte filtering method followed by your zlib deflated data.

Abusing PNGs

We start by defining our own chunk type (we're allowed to do that!) before the IDAT chunk.

4 byte length
4 byte "jawh" (Just Another WebGL Hacker (TM))
X byte bootstrap
4 byte CRC

What is the bootstrap? Well, it's what turns our PNG into code and runs it. Here's the one I use:

<img onload=with(document.createElement('canvas'))p=width=4968,(c=getContext('2d')).drawImage(this,e='',0);
while(p)e+=String.fromCharCode(c.getImageData(0,0,p,1).data[p-=4]);
(t=top).eval(e) src=#>

The 4968 here is really the size of the decompressed data in bytes times 4 -- there are 4 components to each pixel (red, green, blue, alpha) but we're only using grayscale, so we need to offset that. If you look carefully, you'll also notice that it walks backwards across the data, which should create a string in reverse, but I compensate for that when I compress the data, and reverse it before doing so. This saves a couple bytes. The image source is also important: rather than hard-coding the image filename and wasting space, it uses #, enabling it to treat itself as a PNG.

So what do we have now? We have a PNG containing some HTML. The browser first opens it as HTML (file extension is important here), then sniffs the MIME type and figures out it's a PNG when it gets loaded into the image tag. We have a self-extracting PNG. But we can do better.

Why specs don't matter

We're currently using a chunk type of "jawh". Clever and a nice little insider reference, but that's 4 wasted bytes! The chunk type comes right before the data, so why don't we make the chunk type into <img instead?

The spec says on the subject:

Four bits of the type code, namely bit 5 (value 32) of each byte, are used 
to convey chunk properties. This choice means that a human can read off the 
assigned properties according to whether each letter of the type code is 
uppercase (bit 5 is 0) or lowercase (bit 5 is 1). However, decoders should 
test the properties of an unknown chunk by numerically testing the specified 
bits; testing whether a character is uppercase or lowercase is inefficient, 
and even incorrect if a locale-specific case definition is used.

It then goes on to tell you that bit 5 of each of the bytes is:

First byte: 0 = critical, 1 = ancillary
Second byte: 0 = public, 1 = private
Third byte: reserved
Fourth byte: 0 = unsafe to copy, 1 = safe to copy

Since we could call it <iMg or <imG or any other combination of uppercase and lowercase, clearly we only care about the impact of < on the first byte. But if we look at bit 5 of <, we see that it's already set high, so it's an ancillary chunk -- we're in the clear and just saved 4 bytes!

Wait, no, neither Chrome nor Firefox load the image anymore. What the hell? This is the point where you realize: the spec doesn't matter. No browser out there follows the PNG spec, whatsoever. Perfectly to-the-spec images won't work, and horribly broken images will work. So let's horribly break it. Note: Both Chrome and Firefox use libpng for image parsing, so you could go through the code and look for ways in which it's mishandling data and go down that path, but for my purposes I found that just experimentally changing things and seeing how they break was Good Enough (TM).

Breaking PNG for fun and profit

Messing with the chunk type is out due to browser incompatibility, so what other options do we have? Well, who cares about CRCs.

Set the CRC to c=#>, save, refresh. Hey, it works in Chrome and Firefox! 4 bytes saved.

Well, we can't push back towards the front due to the chunk type, and we can't push forwards towards the IDAT chunk because of the size. But how do browsers handle chunk size mismatches?

If we set the size of the IDAT atom to c=#>, that unpacks to a size of 1042496867, which is obviously more than the size of the IDAT chunk. We shift ) sr into the CRC and set the IDAT size to c=#>, save, refresh. All browsers will set size = min(reportedSize, endOfFile - startOfChunk)! That saves us another 4 bytes.

So sure, we couldn't get rid of jawh, but who cares? We just saved 8 bytes!

Compression caveats

Now you know how to make the PNG container small, but this says nothing about actually making your demo small. A few of the tips above are actually harmful in the context of compression. Why? Because repetitive data compresses really, really well. For instance, the Float32Array trick is nice when you're dealing with uncompressed data, but ends up eating a couple extra bytes after compression. Reusing function and variable names is also very, very important. The more repetition you can introduce, the better off you'll generally be.

Wrapping up

All in all, these tips (and general insanity) will let you build really small demos. My demo, Magister, went from around 3k down to just over 900 bytes when all was said and done. I ended up adding a message in the demo ("1k should be enough for anybody.") to get it up to exactly 1024 bytes, which was my goal. Feels good to have to work up rather than down.

I hope this helped shine some light on the subject.

Happy Hacking,
- Cody Brocious (Daeken)

Comments [5]

Further Freedom Attacks

Please, redistribute this far and wide. Share it, repost it, print it out and hand it to people you see on the street. Creative Commons License
Further Freedom Attacks by Cody Brocious is licensed under a Creative Commons Attribution 3.0 Unported License.

Today I've been vindicated; everything I've said for the last 5 years has come true. If you think I'm happy, you couldn't be further from the truth. As of this writing, the US government -- specifically Immigration and Customs Enforcement, a division of the Department of Homeland Security -- has seized 70 domain names from private citizens and corporations, not as part of legal proceedings, but on the basis of suspected illegality. The content of the sites, mostly owned by companies suspected of selling knockoff items, is inconsequential; even if their actions are illegal (which hasn't been proven in court, as they're not in court), they have the same right to publish their thoughts and feelings on the subject as anyone else does, due to the freedom of speech enabled by the first amendment.

Background

In 1998, President Bill Clinton signed into law a bill called the DMCA, or the Digital Millenium Copyright Act. The DMCA, among other things, was intended to: make it easier to deal with content infringing content; make the distribution of tools for circumventing copy protection illegal; make it illegal to link to sites that distribute infringing content. Now, this may seem innocuous, but there are a few very scary things here.

First and foremost, let's look at the last bit of that: due to the DMCA, it is illegal to tell someone where to get content that's thought to infringe copyright law. Note: this has nothing to do with actually giving someone illegal content, but just telling them where they can get it.

More relevant to myself is this: the anti-circumvention provisions of the DMCA make it illegal to tell someone how to use a piece of music in a way that the distributor doesn't want you to. If you purchased a song on iTunes in the past, you could only listen to it using iTunes or an iPod. People who made it possible to listen to your own purchased music on other devices were attacked on the grounds of the DMCA. This hits home for me, as one of my previous companies was shut down by Apple in this way, despite simply helping legitimate users.

Freedom of speech

There are two sides to the freedom of speech issue here: the DMCA's anti-circumvention provisions and their relation to source code freedom, and the direct freedom of written speech.

Source code

Source code, used to write programs, is a mechanism for communicating knowledge, much like normal speech is. For this reason, many people have made the suggestion that source code ought to be a protected form of speech. Courts in the US have been having this battle for ages, and they tend to lean heavily towards the argument that source code is more utility than artistic form, and thus does not get the same protection that speech does.

However, think about this from a slightly higher level: it's legally protected speech for me to tell you how to do X in steps the exact same way that the source code for doing X would. However, as soon as you translate that into something the computer can directly understand -- regardless of its ability to be understood by people as well -- it's no longer constitutionally protected.

If you take the stance that source code is -- or ought to be -- protected speech, then the use of the DMCA to stifle the distribution of code is a clear violation of the first amendment.

Written speech

The other big side to this is that, in seizing these domains, they have not just taken down the potentially illegal content, but the owners' own words. That is, they didn't just take down what they felt might be illegal, but everything written on these sites. This is such a ham-fisted, unconstitutional approach to the problem, I can't even begin to express my sadness. Whether or not you agree with what any one of these sites were doing, everyone has the right to speak their mind. With this action today, we've seen another in a long, sad trend towards restrictions on "bad" speech. This absolutely cannot be allowed to stand.

What can we do?

First and foremost, donate to the EFF. The EFF, or Electronic Frontier Foundation, is the foremost organization working to defend our digital rights. They've done more for the cause than anyone else out there, and they need our help.

Secondly, write to your elected officials, at the Local, State, and Federal levels. While only the Federal level is directly involved here, the more people who understand the issues and concerns, the better. This is absolutely crucial.

Lastly, spread the word. Share this article, news articles on the subject (a list of references are below), and any other relevant materials with people around you. Make sure that everyone you know knows about these issues.

We cannot take this sitting down. This must not be allowed to stand.

Stay free,
- Cody Brocious (Daeken)

References

Comments [7]

New Job

After a lot of interviewing and a lot of deliberation, I decided to take a job. Starting Monday, I'm the new senior security analyst for Matasano Security. I'm really looking forward to working with these guys, and I hope to post some fun stuff on the Chargen soon!

Thanks to everyone who contacted me in response to my post about looking for work -- there were a lot of interesting options, and it took a while to narrow it down.

Happy hacking,
- Cody Brocious (Daeken)

Comments [0]

Emokit status, JsDataFlowEditor, and looking for work

Before discussing new stuff, a little update on the progress of Emokit. Emokit is now completely able to parse the EEG data coming from the Consumer and Developer headsets, and thanks to skadge now contains proper Linux support, the beginnings of a C library, and quite a few other fun features. At this point, the only things left to reverse-engineer are the battery meter and contact qualities. Huge thanks to everyone who's supported us -- it's been invaluable!

Now for new stuff. I've just released JsDataFlowEditor, a Javascript library (based on Raphael and JQuery) for building and manipulating dataflow graphs. This can be tremendously useful for everything from image compositing to animations to audio work to visual programming. I'm really excited to see what people do with it. It's still fairly early stage, but it works well as it stands.

Super simple example use: http://daeken.github.com/JsDataFlowEditor/Examples/Alyn/. If you build something with it, shoot me an email at cody.brocious@gmail.com and let me know -- I'll link to it from the Github repo.

On another note, I've decided to get out of the full-time startup thing for a while. If you know of anyone looking for a developer/reverse-engineer in the NYC area, shoot them a link to http://daeken.com/looking-for-work please.

Happy hacking,
- Cody Brocious (Daeken)

Filed under  //   emokit javascript work  

Comments [1]

Looking for work

Update: I've taken a job, but I'm going to leave this post here for historical reasons.

I've decided to put myself on the market for full-time work. I'm primarily looking for work in New York City, but other locations (or telecommuting) will be considered.

Most of you reading this will have a general idea of the work I do from other posts on my blog, but for those not in the know: I'm primarily a reverse-engineer and compiler developer, but I work on anything that happens to catch my interest. I'm always looking for a new challenge and new learning experiences.

My resume is available here, in PDF or in HTML format. It's very reversing-heavy as that's what the majority of my interesting commercial work is, but I've done everything from game development to kernel hacking to web development.

If I sound like the right person for the job, let's talk. Give me a call at +1.678.636.9323 or shoot me an email at cody.brocious@gmail.com.

Happy Hacking,
- Cody Brocious (Daeken)

Comments [0]

The Hardware Hacker Manifesto

Edit: I've decided to release this post under a CC Attributed license. Share it, copy it, edit it, whatever you feel like, just attribute it back to me. Creative Commons License
The Hardware Hacker Manifesto by Cody Brocious is licensed under a Creative Commons Attribution 3.0 Unported License.

 

 

This post is a long time in coming. For years we've been seeing an active fight against the right to utilize hardware the way the owner wishes to use it, but it wasn't until this week that it got personal, driving me to write this. Please, share this far and wide; hardware hacking is essential and we're losing ground to those who would love to see it done away with.

The Hardware Hacker Manifesto

My name is Cody and I'm a hardware hacker. It started at the age of five, taking apart a toy computer to figure out how it worked. I live for that thrill of discovery and rush of power that I feel when I figure out what makes something tick, then figure out how to bend it to my will. This has led to me hacking everything from game consoles to phones.

It used to be that this was what people did: if something was wrong with a device, it was acceptable to take it apart, figure out how it worked, and fix whatever was wrong with it. That's no longer the case; we're still there -- in growing numbers, to boot -- but what's changed is that it's no longer acceptable. As companies have made devices more and more locked down, making hardware hacking even more important than ever, there's a growing segment of the population that believes we're pirates. Who are we to modify these devices against the company's will?

It all comes down to one simple question: once you've purchased something, do you own it? While this may seem like a silly question, it's the entire crux of the argument for hardware hacking. If you believe that the purchaser owns the good, then they have the right to do with it what they want.

I exercise that right on a daily basis, whether with my jailbroken phone, my Wii running homebrew media player software, or -- now -- my hacked brain-computer interface. The last case is interesting, because it's the first time I've ever been called a pirate by a representative of the company producing the hardware I hacked:

Piracy is a vexed question but in its worst form it is still basically taking what someone has spent a lot of time and money on, and denying them some or all of the rewards for doing it. If the developer is being reasonable about it then it's tough to justify piracy. It costs a lot to get something developed and into the market, and next to nothing to copy or crack it. It discourages people from taking the risks in the first place, and we're all the poorer for the things that didn't get done because they would be too easy to steal.

In this case, I purchased a brain-computer interface outright, then proceeded to reverse-engineer it and release details of how to communicate with it. In the week since I released this, I've been called a selfish pirate more than I'd like to recall. All of this because I decided to exercise my right to use my hardware the way I want.

Why should we have to ask permission to use what we've spent our money on? Let's see an absurd extension of this logic: Why should Ford lose out on the rewards of building the car, when you don't go to an authorized service station to get your oil changed?

Let me make this crystal clear: once you sell me something, I will do whatever I want with it. Period. I'll take it apart, I'll patch it, I'll make it do things you never imagined, and I'll tell everyone who will listen exactly how to do the same. It's mine, and every device you've purchased is yours too; don't let anyone tell you otherwise.

I am a hardware hacker and this is my manifesto. We've always been here and we will always be here; you can fight to keep us out, but we'll fight even harder to get back in. I assure you we'll win.

Happy hacking,
- Cody Brocious (Daeken)

 

 

Filed under  //   hacking hardware  

Comments [36]

Emokit: Hacking the Emotiv EPOC Brain-Computer Interface

Update

Emokit fully supports the Emotiv headset and has functional Python and C interfaces. It's now maintained as part of OpenYou under the loving care of Kyle Machulis. The official repo is now at https://github.com/qdot/emokit/. Many thanks to everyone who helped out with the project.

Intro

I've been interested in the Emotiv EPOC headset for a while; a $300 14-sensor EEG. It's intended for gaming, but it's quite high quality. There's a research SDK available for $750, but it's Windows-only and totally proprietary. I decided to hack it, and open the consumer headset up to development. Thanks to donations I got some hardware in hand this weekend.

I'm happy to announce the Emokit project, an open source interface to the EPOC. The goal is to open it up to develompent and enable new products and research. For the first time, we have access to a high-quality EEG for $300 -- this is huge.

Code

The code is available on github on the daeken/Emokit repository. There's a Python library for interacting with the EPOC, as well as a renderer that will graph the sensor data.

Graph in debug mode

Where things are right now

You can access raw EEG data from the Emotiv EPOC on Windows, Linux, and OS X from Python. While it's not known exactly which sensors are which in the data (read below for more info), it's close to being useful already. Word of warning: this project is less than 48 hours old (I just got hardware in my hands Saturday night) and has only been run by me on Windows due to a dead Linux box. It's very much alpha quality right now -- don't trust it.

How it happened

The first step was to figure out how exactly the PC communicates with it. This part was straightforward; it's a USB device with VID=21A1, PID=0001 (note: from walking through the device enum code in the EDK, it seems that PID=0002 might be the development headset, but that's totally unverified). It presents two HID interfaces, "EPOC BCI" and "Brain Waves". Reading data off the "Brain Waves" interface gives you reports of 32 bytes at a time; "EPOC BCI" I'm unsure about.

Next step was to read some data off the wire and figure out what's going on. I utilized the pywinusb.hid library for this. It was immediately apparent that it's encrypted, so figuring out what the crypto was became the top priority. This took a couple hours due to a few red herrings and failed approaches, but here's what it boiled down to:

  • Throw EmotivControlPanel.exe into IDA.
  • Throw EmotivControlPanel.exe into PeID and run the Krypto Analyzer plugin on it.
  • You'll see a Rijndael S-Box (used for AES encryption and key schedule initialization) come up from KAnal.
  • Using IDA, go to the S-Box address.
  • You'll see a single function that references the S-Box -- this is the key initialization code (not encryption, as I originally thought).
  • Use a debugger (I used the debugger built into IDA for simplicity's sake) and attach to the beginning of the key init function.
  • You'll see two arguments: a 16-byte key and an integer containing 16.

So that you don't have to do that yourself, here's the key: 31003554381037423100354838003750 or 1\x005T8\x107B1\x005H8\x007P. Given that, decrypting the data is trivial: it's simply 128-bit AES in ECB mode, block size of 16 bytes.

The first byte of each report is a counter that goes from 0-127 then to 233, then cycles back to 0. Once this was determined, I figured out the gyro data. To do that, I broke out pygame and wrote a simple app that drew a rectangle at the X and Y coords coming from two bytes of the records. I pretty quickly figured out that the X coord from the gyro is byte 29 and the Y coord is byte 30. The EPOC has some sort of logic in it to reset the gyro baseline levels, but I'm not sure on the details there; the baseline I'm seeing generally (not perfect) is roughly 102 for X and 204 for Y. This lets you get control from the gyro fairly easy.

That accounts for 3 bytes of the packet, but we have 14 sensors. If you assume that each sensor is represented by 2 bytes of data, that gives us 28 bytes for sensor data. 32 - 28 == 4, so what's the extra byte? Looking at byte 15, it's pretty clear that it's (almost) always zero -- the only time it's non-zero is the very first report from the device. I have absolutely no idea what this is.

From here, all we have is data from the sensors. Another quick script with pygame and boom, we have a graph renderer for this data.

However, here's where it gets tough. Figuring out which bytes correspond to which sensors is difficult, because effectively all the signal processing and filtering happens on the PC side, meaning it's not in this library yet. Figuring out the high bytes (which are less noisy and change less frequently) isn't terribly difficult, and I've identified a few of them, but there's a lot of work to be done still.

What needs to be done

Reversing-wise:

  • Determine which bytes correspond to which signals -- I'm sure someone more knowledgable than myself can do this no problem
  • Figure out how the sensor quality is transmitted -- according to some data on the research SDK, there's 4 bits per sensor that give you the signal quality (0=none, 1=very poor, 2=poor, 3=decent, 4=good, 5=very good)
  • Figure out how to read the battery meter

Emokit-wise:

  • Linux and OS X support haven't been tested at all, but they should be good to go
  • Build a C library for working with the EPOC
  • Build an acquisition module for OpenViBE

Get involved

Contact us

I've started the #emokit channel on Freenode and I'm idling there (nick=Daeken).

How you can help

I'm about to get started on an acquisition module for OpenViBE, but someone more knowledgable than myself could probably do this far more quickly. However, the reversing side of things -- particularly figuring out the sensor bytes -- would be much more useful.

Summary

I hope that the Emokit project will open new research that was never possible before, and I can't wait to see what people do with this. Let me know if you have any questions or comments.

Happy Hacking,
- Cody Brocious (Daeken)

Filed under  //   emotiv   hacking  

Comments [16]

Hacking the Belkin Network USB Hub

I've recently been hacking the Belkin Network USB Hub. It allows you to connect USB devices to the network and share them between your computers. The problem? It's Windows-only and totally undocumented. The solution? Reverse-engineering and a Python library, of course. Btw, if you pick one up, consider getting it from Amazon using my referral link: Belkin F5L009 5-Port Network USB Hub

You can find all of the source and some logs in the pybelkusb repository on Github. Note that this post is largely stream of consciousness, having been written mostly during the reverse-engineering process, in an effort to show how I actually went about it it, and as such there are certainly inaccuracies and unknowns. If anyone knows what some of these bits are or manages to find a case that breaks my assumptions, though, drop me a line.

I got it hooked up to my PC via a crossover cable, plugged in some devices (for testing I used an FTDI USB-RS485 cable, a generic flash drive, and a device with a Prolific PL2303 in it), and installed the Belkin software. Once I'd played around with it a bit -- it works remarkably well, for the record, although the default software is fairly buggy -- I dug in to start hacking it.

I started by breaking out Wireshark and refreshed the device list in the Belkin software, looking to see some basics: protocol (UDP), port number (19540), whether or not encryption/compression is used (unlikely or used sparingly -- plenty of plaintext). Now, I could've continued using Wireshark, but when I'm reverse-engineering I tend to write my own sniffer using the awesome SharpPcap wrapper for libpcap; this lets me annotate the logs, as well as quickly write parsers so I can validate my assumptions on the fly. For those of you playing along at home, it's in the Logger directory of the pybelkusb-reverse repo.

So the first thing to figure out was how the Belkin software discovered hubs on the network. This occurs via a broadcast UDP packet on port 19540:

0000 | 00 04 95 06 00 00 00 00  00 00 00 00 00 00 00 00  | ..?..... ........ 
0010 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0020 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0030 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0040 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0050 | 00 00 00 00 00 00 00 00                           | ........ 
0058

This, as far as I'm aware, is 100% constant. What the values are, I really can't say however. The hub responds by sending a UDP packet to the source of the broadcast:

0000 | 00 02 95 06 00 00 00 00  00 00 00 00 00 00 00 00  | ..?..... ........ 
0010 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0020 | 00 00 00 00 4e 65 74 77  6f 72 6b 20 55 53 42 20  | ....Netw ork.USB. 
0030 | 48 75 62 00 00 00 00 00  00 1c df cc 15 dc 00 00  | Hub..... ..??.?.. 
0040 | 00 00 c0 a8 89 2c 00 00  56 65 72 20 32 2e 32 2e  | ..???,.. Ver.2.2. 
0050 | 30 00 80 80 00 00 00 00  00 00 00 00 00 00 00 00  | 0.??.... ........ 
0060 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0070 | 00 00 00 00 42 4b 2d 4e  55 48 43 43 31 35 44 43  | ....BK-N UHCC15DC 
0080 | 00 00 00 00 31 2e 32 2e  30 00 00 00 00 00 00 00  | ....1.2. 0....... 
0090 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00a0 | 00 00 00 00 00 00                                 | ......
00a6

I then worked through the data to figure out the following bits:

  • 0x24 [16 bytes] -- Name (null term)
  • 0x38 [6 bytes] -- MAC address
  • 0x42 [4 bytes] -- IP address
  • 0x48 [10 bytes] -- Some version string (null term)
  • 0x74 [12 bytes] -- Serial number
  • 0x84 [6 bytes] -- Some version string (null term)

Once a hub is discovered, the PC opens a connection (as much as you can with UDP, of course) on port 19540 and sends the following packet to enumerate devices:

0000 | 00 01 00 01 47 45 54 20  55 53 42 5f 44 45 56 49  | ....GET. USB_DEVI 
0010 | 43 45 5f 4c 49 53 54 00  00 00 00 00 00 00 00 00  | CE_LIST. ........ 
0020 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0030 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0040 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0050 | 00 00 00 00 00 00 02 00                           | ........ 
0058

The structure of that message is pretty clear, but the response less so. Here are a few responses with varying numbers of devices:

No devices:
0000 | 00 01 00 00 00 00 00 01  00 00 00 00 00 00 00 00  | ........ ........ 
0010 | 00 00                                             | ..
0012

1 device:
0000 | 00 01 00 00 00 00 00 39  01 64 65 76 30 31 00 55  | .......9 .dev01.U 
0010 | 53 42 32 2e 30 20 46 6c  61 73 68 20 44 69 73 6b  | SB2.0.Fl ash.Disk 
0020 | 00 00 00 00 00 80 81 12  21 32 34 48 31 34 00 00  | .....??. !24H14.. 
0030 | 00 00 00 00 00 00 00 00  00 00 00 00 08 00 00 00  | ........ ........ 
0040 | 00                                                | .
0041

2 devices:
0000 | 00 01 00 00 00 00 00 74  02 64 65 76 30 31 00 55  | .......t .dev01.U
0010 | 53 42 32 2e 30 20 46 6c  61 73 68 20 44 69 73 6b  | SB2.0.Fl ash.Disk
0020 | 00 00 00 00 00 80 81 12  21 32 34 48 31 34 00 00  | .....??. !24H14..
0030 | 00 00 00 00 00 00 00 00  00 00 00 00 08 00 00 00  | ........ ........
0040 | 00 64 65 76 30 32 00 46  54 44 49 20 55 53 42 2d  | .dev02.F TDI.USB-
0050 | 52 53 34 38 35 20 43 61  62 6c 65 00 00 00 00 00  | RS485.Ca ble.....
0060 | 80 81 04 03 60 01 46 32  33 00 00 00 00 00 00 00  | ??..`.F2 3.......
0070 | 00 00 00 00 00 00 00 ff  00 00 00 00              | .......? ....
007c

At first I was a bit thrown off by the varying packet size, until I realized that it actually just uses null-terminated strings. The structure from there is pretty straightforward, although certain bits are still unknown/assumed constant:

  • 0x00 -- Header (00 01 00 00 00 00 00)
  • 0x07 [1 byte] -- Packet size - 8
  • 0x08 [1 byte] -- Number of devices
  • For each device
    • Device node, null terminated string (devXX)
    • Device name, null terminated string
    • 4 byte IP of computer connected to device, or nulls if it's open
    • 80 81
    • 2 bytes Vendor ID
    • 2 bytes Product ID
    • 22 bytes of unknown data

After a short break to implement hub discovery and device enumeration (quite simple with Python), I started capturing data for device connection. The following is a request to connect to the flash drive, on device node dev01.

0000 | 00 01 00 01 45 58 45 43  20 53 43 41 4e 4e 45 52  | ....EXEC .SCANNER 
0010 | 20 43 4f 4e 4e 45 43 54  3a 01 06 00 1c df cc 15  | .CONNECT :....??. 
0020 | dc c0 a8 89 01 64 65 76  30 31 00 4c 54 00 00 00  | ????.dev 01.LT... 
0030 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0040 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0050 | 00 00 00 00 00 00 02 00  00 00 00 00 00 00 00 00  | ........ ........ 
0060 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0070 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0080 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0090 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00f0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0100 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0110 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0120 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0130 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0140 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0150 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0160 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0170 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0180 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0190 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01f0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00        | ........ ......
01fe

Holy nulls, batman! As you can see, most of this packet is unused, and it's almost entirely constant:

  • 0x00 [4 bytes] -- 00 01 00 01
  • 0x04 [21 bytes] -- EXEC SCANNER CONNECT:
  • 0x19 [3 bytes] -- 01 06 (06 could be the length of the following field, a MAC address)
  • 0x1A [6 bytes] -- MAC address of the hub
  • 0x21 [4 bytes] -- IP address of the computer
  • 0x25 [6 bytes] -- Device node string, null terminated
  • 0x2B [3 bytes] -- LT followed by a null
  • 0x2E [40 bytes] -- Nulls
  • 0x56 [1 byte] -- 02
  • 0x57 [423 bytes] -- Nulls

The hub responds as follows for success:

0000 | 00 01 00 00 00 00 00 03  4f 4b 00 76 30 31 00 55  | ........ OK.v01.U 
0010 | 53 42                                             | SB
0012

Or for failure:

0000 | 00 01 00 00 00 00 00 03  4e 47 00 76 30 31 00 55  | ........ NG.v01.U 
0010 | 53 42                                             | SB
0012

This one should be pretty self-explanatory, but regardless:

  • 0x00 [4 bytes] -- 00 01 00 00
  • 0x04 [4 bytes] -- 00 00 00 03 Potentially the size of the following string
  • 0x08 [3 bytes] -- OK/NG followed by a null
  • 0x0B [4 bytes] -- v01 followed by a null
  • 0x0F [3 bytes] -- USB

Now it gets a bit tricky. You send the connection packet a second time, and the hub sends two UDP packets to port 19540 on the PC. Note: this is not where you sent the broadcasts from, it is a port you have to have hold open explicitly for device connections/disconnections. Also, a few things to be aware of if you're talking to the hub from Windows: The SXUPTP driver that handles the hub communication holds this port at all times, and if you connect to a device from, e.g. pybelkusb, and don't have the Belkin GUI up, the driver will crash and cause a kernel panic. This bit me in the ass repeatedly. Anyway, here's the packets it sends:

0000 | 29 08 02 05 c0 a8 89 2c  30 32 30 31 00 00 00 00  | )...???, 0201.... 
0010 | 01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0020 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0030 | 46 32 33 00 00 00 00 00  00 00 00 00 00 00 00 00  | F23..... ........ 
0040 | 12 01 02 00 00 00 00 08  04 03 60 01 06 00 01 02  | ........ ..`..... 
0050 | 03 01 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0060 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0070 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0080 | 09 02 00 20 01 01 00 80  96 09 04 00 00 02 ff ff  | .......? ?.....?? 
0090 | ff 02 07 05 81 02 00 40  00 07 05 02 02 00 40 00  | ?...?..@ ......@. 
00a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
00f0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0100 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0110 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0120 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0130 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0140 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0150 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0160 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0170 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0180 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0190 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
01f0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0200 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0210 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0220 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0230 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0240 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0250 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0260 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0270 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0280 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0290 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
02a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
02b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
02c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
02d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
02e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
02f0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0300 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0310 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0320 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0330 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0340 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0350 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0360 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0370 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0380 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0390 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
03a0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
03b0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
03c0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
03d0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
03e0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
03f0 | 00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  | ........ ........ 
0400

0000 | 00 03 00 00 00 00 00 00  30 32 30 31 00 00 00 00  | ........ 0201.... 
0010 | 01 00                                             | ..
0012

I never really figured out the meaning of either of these packets. The one thing I do know is that the fourth byte of the first packet there is some sort of device ID for the duration of the connection. You'll see it in use shortly.

Once these packets are receieved, you can make an actual connection to the device. You do this by connecting to TCP port 19540 on the hub. No initialization is required on the connection, and you can start sending commands. Let's see a bulk write:

TCP -> hub
0000 | 80 03 03 10 00 02 05 02  02 01 80 40 00 00 00 00  | ?....... ..?@.... 
0010 | 00 10 66 6f 6f 20 62 61  72 20 62 61 7a 20 68 61  | ..foo.ba r.baz.ha 
0020 | 78 21                                             | x!
0022

The data foo bar baz hax! was sent to endpoint 0x02 in this case, and the structure is as follows:

  • 0x04 [2 bytes] -- Endpoint number (think this is two bytes, might be 1 at 0x05)
  • 0x10 [2 bytes] -- Size of data
  • 0x12 -- Data to write

Now, here I'll break from the narrative to explain the rest of the structure and a bit about the TCP "packets" themselves. Each TCP packet begins with two bytes indicating its type, e.g. 80 03 == bulk write. This is followed by a two-byte sequence, which starts at zero. Figuring out the rest of it took writing a parser that was capable of churning through a dozen logs I took over a period of a week. There are still unknown bits, but here's the best I managed to do:

  • 0x04 [2 bytes] -- Unknown (I always send 00 03)
  • 0x06 [1 byte] -- Device ID referred to above
  • 0x07 [1 byte] -- Constant 02
  • 0x08 [1 byte] -- Endpoint number
  • 0x09 [1 byte] -- Constant 01
  • 0x0A [2 bytes] -- Unknown (I always send 80 40)
  • 0x0C [4 bytes] -- Constant 00 00 00 00
  • 0x10 [2 bytes] -- Size of data
  • 0x12 -- Data

The hub responds:

TCP <- hub
0000 | 80 03 03 10 00 00 00 00  00 00                    | ?....... ..
000a

Note that the type (80 03) and sequence number (03 01) match up with the packet sent to the device. The rest of the packet is null in every case I've seen. This may be used for errors in some case, but I've yet to actually see that.

I implemented this, then realized I needed control writes to actually initialize the device I intended to speak to, so that was up next. Here's an example control write:

TCP -> hub
0000 | 80 01 03 11 00 02 05 00  02 01 00 08 40 01 01 00  | ?....... ....@... 
0010 | 00 00 00 00 00 00                                 | ......
0016

Here you can see the type is 80 01. The packet structure is:

  • 0x04 [2 bytes] -- Unknown (I always send 00 03)
  • 0x06 [1 byte] -- Device ID referred to above
  • 0x07 [1 byte] -- Null
  • 0x08 [4 bytes] -- Constant 02 01 00 08
  • 0x0C [1 byte] -- Request type
  • 0x0D [1 byte] -- Request
  • 0x0E [2 bytes] -- Value
  • 0x10 [2 bytes] -- Index
  • 0x12 [2 bytes] -- Null
  • 0x14 [2 bytes] -- Size of data
  • 0x16 -- Data

This one was particularly difficult to track down, as a lot of the values look the same and I had no idea which was which. It took installing a USB sniffer on my computer and capturing the actual device initialization and going over it field-by-field, matching the locations. I won't bore you with the details of that -- if you're interested, grab the demo of USBTrace and give it a shot; it's not hard.

The hub responds:

TCP <- hub
0000 | 80 01 03 11 00 00 00 00  00 00                    | ?....... ..
000a

Much like bulk writes, it's an empty packet with matching type/sequence.

With bulk writes and control writes out of the way, the matching two were dead simple, as almost everything is identical. A control read:

TCP -> hub
0000 | 00 01 00 05 00 02 05 00  02 01 00 08 80 06 03 02  | ........ ....?... 
0010 | 04 09 00 00 00 ff                                 | .....?
0016

The structure is identical to the control write message, with the exception of byte 0x5. In this case, I send 1 constantly, but I don't really think it matters. Of course, there's no data following this packet, since the size indicates the number of bytes to read.

Response:

TCP <- hub
0000 | 00 01 00 05 00 00 00 00  00 20 20 03 55 00 53 00  | ........ ....U.S. 
0010 | 42 00 2d 00 52 00 53 00  34 00 38 00 35 00 20 00  | B.-.R.S. 4.8.5... 
0020 | 43 00 61 00 62 00 6c 00  65 00                    | C.a.b.l. e.
002a

The structure here is dead simple, and you most likely could figure it out in under a minute, having made it this far:

  • 0x04 [4 bytes] -- Nulls
  • 0x08 [2 bytes] -- Size read
  • 0x0A -- Data

Bulk reads are likewise simple as hell:

TCP -> hub
0000 | 00 03 03 6e 00 01 04 81  02 01 82 00 00 00 00 00  | ...n...? ..?..... 
0010 | 02 00                                             | ..
0012

Raise your hand if you're surprised that this is identical to the bulk write. None of you? Thought so. I won't insult your intelligence by breaking this down.

The response:

TCP <- hub
0000 | 00 03 03 6e 00 00 00 00  02 00 eb 3c 90 4d 53 57  | ...n.... ..?<?MSW 
0010 | 49 4e 34 2e 31 00 02 40  06 00 02 00 7e 00 00 f8  | IN4.1..@ ....~..? 
0020 | fd 00 3f 00 ff 00 00 00  00 00 00 20 3f 00 00 00  | ?.?.?... ....?... 
0030 | 29 22 96 1b 00 4e 4f 20  4e 41 4d 45 20 20 20 20  | )"?..NO. NAME.... 
0040 | 46 41 54 31 36 20 20 20  fa 33 c0 8e d0 bc 00 7c  | FAT16... ?3????.| 
0050 | 16 07 bb 78 00 36 c5 37  1e 56 16 53 bf 3e 7c b9  | ..?x.6?7 .V.S?>|? 
0060 | 0b 00 fc f3 a4 06 1f c6  45 fe 0f 8b 0e 18 7c 88  | ..???..? E?.?..|? 
0070 | 4d f9 89 47 02 c7 07 3e  7c fb cd 13 72 79 33 c0  | M??G.?.> |??.ry3? 
0080 | 39 06 13 7c 74 08 8b 0e  13 7c 89 0e 20 7c a0 10  | 9..|t.?. .|?..|?. 
0090 | 7c f7 26 16 7c 03 06 1c  7c 13 16 1e 7c 03 06 0e  | |?&.|... |...|... 
00a0 | 7c 83 d2 00 a3 50 7c 89  16 52 7c a3 49 7c 89 16  | |??.?P|? .R|?I|?. 
00b0 | 4b 7c b8 20 00 f7 26 11  7c 8b 1e 0b 7c 03 c3 48  | K|?..?&. |?..|.?H 
00c0 | f7 f3 01 06 49 7c 83 16  4b 7c 00 bb 00 05 8b 16  | ??..I|?. K|.?..?. 
00d0 | 52 7c a1 50 7c e8 92 00  72 1d b0 01 e8 ac 00 72  | R|?P|??. r.?.??.r 
00e0 | 16 8b fb b9 0b 00 be e6  7d f3 a6 75 0a 8d 7f 20  | .???..?? }??u.?.. 
00f0 | b9 0b 00 f3 a6 74 18 be  9e 7d e8 5f 00 33 c0 cd  | ?..??t.? ?}?_.3?? 
0100 | 16 5e 1f 8f 04 8f 44 02  cd 19 58 58 58 eb e8 8b  | .^.?.?D. ?.XXX??? 
0110 | 47 1a 48 48 8a 1e 0d 7c  32 ff f7 e3 03 06 49 7c  | G.HH?..| 2???..I| 
0120 | 13 16 4b 7c bb 00 07 b9  03 00 50 52 51 e8 3a 00  | ..K|?..? ..PRQ?:. 
0130 | 72 d8 b0 01 e8 54 00 59  5a 58 72 bb 05 01 00 83  | r??.?T.Y ZXr?...? 
0140 | d2 00 03 1e 0b 7c e2 e2  8a 2e 15 7c 8a 16 24 7c  | ?....|?? ?..|?.$| 
0150 | 8b 1e 49 7c a1 4b 7c ea  00 00 70 00 ac 0a c0 74  | ?.I|?K|? ..p.?.?t 
0160 | 29 b4 0e bb 07 00 cd 10  eb f2 3b 16 18 7c 73 19  | )?.?..?. ??;..|s. 
0170 | f7 36 18 7c fe c2 88 16  4f 7c 33 d2 f7 36 1a 7c  | ?6.|???. O|3??6.| 
0180 | 88 16 25 7c a3 4d 7c f8  c3 f9 c3 b4 02 8b 16 4d  | ?.%|?M|? ????.?.M 
0190 | 7c b1 06 d2 e6 0a 36 4f  7c 8b ca 86 e9 8a 16 24  | |?.??.6O |?????.$ 
01a0 | 7c 8a 36 25 7c cd 13 c3  0d 0a 4e 6f 6e 2d 53 79  | |?6%|?.? ..Non-Sy 
01b0 | 73 74 65 6d 20 64 69 73  6b 20 6f 72 20 64 69 73  | stem.dis k.or.dis 
01c0 | 6b 20 65 72 72 6f 72 0d  0a 52 65 70 6c 61 63 65  | k.error. .Replace 
01d0 | 20 61 6e 64 20 70 72 65  73 73 20 61 6e 79 20 6b  | .and.pre ss.any.k 
01e0 | 65 79 20 77 68 65 6e 20  72 65 61 64 79 0d 0a 00  | ey.when. ready... 
01f0 | 49 4f 20 20 20 20 20 20  53 59 53 4d 53 44 4f 53  | IO...... SYSMSDOS 
0200 | 20 20 20 53 59 53 00 00  55 aa                    | ...SYS.. U?
020a

Again, the structure screams out at this point.

And really, that's about it. I didn't cover disconnection because it's boring and dead simple (see the source in pybelkusb if you're really curious, or just log the disconnection yourself and look at it), and I haven't yet looked into error handling or timeouts whatsoever (they'll make their way into pybelkusb soon enough), but I think this gives you a good idea of how the protocol works and how easy it really is to reverse-engineer protocols like this.

Happy Hacking,
- Cody Brocious (Daeken)

Filed under  //   hacking usb  

Comments [1]

Dotpack Beta 1

Edit: Due to a high troll rate on this post, I've temporarily disabled commenting.  If you'd like to leave feedback, please email me at cody.brocious@gmail.com or drop a comment on the Hacker News thread http://news.ycombinator.com/item?id=1254678 .

I'm proud to announce the first beta of Dotpack, a packer for .NET executables. I've been working on this for the last week or so and finally have something fit for public eyes. It started out of my desire to build 64k demos on .NET and for that reason it's very small -- as of this writing, it's sitting at 5331 bytes overhead. It also achieves high compression ratios due to its use of LZMA; average size reduction for the files I've tested has been 60-80%.

At the moment it's fairly straightforward, not tampering with the original binary at all, but future versions will bring obfuscation and an array of code transformations to drop the filesize even further.

This version has several known issues I simply didn't have time to deal with: Silverlight packing is there (you can pass it a .xap and get one back), but it's very finicky and not particularly good so far. Packing binaries that use System.Reflection.Assembly.GetExecutingAssembly().Name will get back an empty string due to the way assemblies are loaded after unpacking, which can cause major issues. I'm going to fix these for the next release.

Other future features which will be coming, in no particular order:

  • Merging of assemblies. I was originally planning on releasing with ILmerge support as a stopgap until I finished my own prelinker, but licensing issues and a generally poor API made that less appealing. I'm working on a prelinker which, in addition to just merging assemblies, will perform dead code analysis to strip unused portions of the code away.
  • Obfuscation. Not only will this make it more difficult to analyze your binaries, but you'll get the benefit of less space being taken up by names. This can be quite substantial in a large binary.
  • Visual Studio integration. You'll be able to easily tie Dotpack into your Visual Studio workflow to produce packed binaries from your release builds.

Dotpack is freely distributable, but is under a non-commercial license. If you're using it in a commercial environment, even if you're not distributing your binaries, please purchase a commercial license. In addition to supporting Dotpack's development, you will also get builds ahead of the non-commercial users.

You can get the current beta build of Dotpack here: http://straylight-software.s3.amazonaws.com/Dotpack1.0b1.zip. To use it, simply run: dotpack.exe input.exe/.xap output.exe/.xap and you're off.

During the beta, a one-year single-user license is available at a discounted price of $250; once this goes gold, the price will go up to $500. Note that the year doesn't begin until the final release, so you're getting quite a deal. If you'd like to purchase a volume license, please contact me at cody.brocious@gmail.com. (Before the final release, I plan on getting a site up, but you know what they say about minimum viable products.)

Try it out, let me know how it works for you, and let me know if there are any features you'd like to see.

Happy Hacking,
- Cody Brocious (Daeken)

Comments [0]

Demo a week, week 2: Armitage

Well, this demo is coming a bit late. After days of fighting with OpenGL and GLSL, I finally have a demo that's fairly presentable. It's still not great (by any means), but it's an improvement over last week's demo and that's the goal. Sadly, this week's demo has compatibility issues out the ass -- I suddenly understand why everyone goes D3D for the demoscene. My next demo will almost definitely be D3D10, due to familiarity and geometry shaders. We'll see.

Anyway, you can get the build at http://pouet.net/prod.php?which=54524. If you give it a shot, let me know what video card you're using and whether it worked or not. I'll be releasing the source in the next day or two if anyone wants to play with it, and as always, the NFO is below.

Oh, and note: I couldn't make it less dependent on framerate this time around, but it's a ton faster than the last demo, so I'm not going to beat myself up over it. Next demo will be framerate-independent, I promise...

Happy Hacking,
- Cody Brocious (Daeken)

Armitage by Straylight
                    - April 2, 2010

Week 2 of my demo-a-week project is upon us.  I tend to think this is a much
better demo than the last one, even if it is short.  It has known compatibility
issues (older ATI cards, like the one in my desktop, seem to dislike my vertex
shaders...) and it'ss fairly short, but I like it a lot more than Waveride.

I was planning on making this one cross-platform, but after dealing with tons
of compatibility issues, I figure it's better for my mental health if I didn't.
The source will be made available in case anyone feels like doing something fun
with it.

Waveride used Miriel by Nightbeat as its music, on suggestion from a friend of 
mine.  Oddly enough, Nightbeat produced a song called Straylight -- I got the 
name from Neuromancer, but I can only take that as a good sign.  Anyway, enough
rambling.

Compatibility:
Tested on a GeForce 9400M, but should work with most modern cards.  Requires
SM3.0 (I believe) and rendering to a floating point framebuffer object.  If it
dies on start or you don't see the pretty spheres, one of those things is 
likely the issue.

Credits:
Code and art by Daeken
Music by Novoxide

Greets to:
ASD
SVatG
Conspiracy
Fairlight
Kewlers

Until next time,
- Daeken

Comments [1]