What is the data that you're censoring?

Pages:
‹
Page 1
2
›
All

In the government's zeal to censor the Internet, they appeared to have paid little attention to the question of what we actually mean when we say that one or other piece of data is unacceptable. There seems to be the unchallenged assumption that a set of data necessarily contains some specific content, that can be tagged as acceptable, or not.

My purpose here is to show that the idea that a set of data contains particular content is simplistic, and that it doesn't have to be true.

Suppose that we want to make available some pictures. For the sake of simplicity, assume that each picture requires 1MB of data. Suppose we have picture files named a, b, c, d and e. We follow the following procedure:

1. Construct a file consisting of 1MB of random data. Call this A.
2. Perform byte-wise exclusive ORs of A and the picture file a, to produce the file B.
3. Now do the same with B, and the picture file b, to produce the file C.
4. Continue with this scheme to produce files D, E and F.

Now, clearly we can get back any of the pictures by obtaining two consecutive files in the set, but no individual file can be said to contain a picture. Further, all the files have the property that they appear to consist of random data. File A (our original) is not qualitatively different from the others.

Of course, we don't have to stick to this linear scheme. We could derive other sequences of files from any of the new files (including A). Given such a set of files, it would be impossible to determine the starting point.

Having created the files, we would upload them to different web sites. In such a scenario, it is not sensible to assert that a particular site holds some impugned content, or that someone was downloading it. A person wanting some particular content would just need the identities of the two files used to create it. The content only exists when they combine the two files.

Posted by Sylvia Else, Wednesday, 24 December 2008 6:03:52 PM

It might be that it's 0030 and I just got home from work but I have NO IDEA what WTF you're on about.

I'll again in the morning.

Posted by StG, Saturday, 27 December 2008 12:32:54 AM

*I'll try again...

Posted by StG, Saturday, 27 December 2008 12:33:49 AM

StG I'm not sure if this is where Sylvia is heading but what I got out of the opening post was some thinging about the point at which data becomes "censorable".

Sylvia has deliberately abstracted the discussion so I hope I'm not messing up what she is doing here. My intepretation is
1/ Take an image which in it's own right is deemed to be inappropriate.
2/ Split the data which makes up that image to create two or more new images which don't individually fit whatever definition has been defined as being inapropriate. If necessary repeat the process a number of times until until that result is obtained.
3/ If you are really keen add in some additional data to the resultant images/files which might give it some "approved" content. This content will be filtered out when images are recombined.
4/ Publish the created images but not the original, the images can be published across multiple sites or via multiple means. Somewhere is a piece of software and or a set of instructions on how to merge the published images to remove any added content and reproduce the original image.

I'm assuming that all of the above is relatively easily done if you have some suitable software.

So what then are you censoring? Can we censor files which don't individually breach any definitions of unacceptability? Can we censor software which might be capable of merging images together and which might be in wide use for "acceptable" uses? Can we censor lists of file names of images which don't individually breach the rules? If we can then can we get around modifications to those lists which turn them into something else?

There is a more fundamental question about trying to censor bits of data, a stream of 0,1's which when combined the right way creates something deemed to be unacceptable.

R0bert

Posted by R0bert, Saturday, 27 December 2008 7:57:39 AM

Stg, "exclusive OR" is a simple computation involving two bits.

0 eor 0 = 0
0 eor 1 = 1
1 eor 0 = 1
1 eor 1 = 0

The useful thing about it is that if you take a bit (the first bit), and exclusive OR it with a second bit, then exclusive OR the result with the second bit again, you get back to the first bit.

Of course, this is by no means the only computation on data that behaves this way. It's merely the simplest for the purpose of my example.

RObert,

Yes, you've pretty much summed it up. I was trying to provide a specific mechanism so to make it clear that this was an idea that could work, rather than a simple wish list item.

I didn't have space in the original item to explain my motive for posting. Clearly the mechanism I propose would be useful to those wishing to distribute child pornography, but it was not their interests I was seeking to protect. I originally came up with this idea about eight years ago when the USA was looking to censor Internet content. I've resurrected it now because of our government is doing so. I regard the benefits to child pornographers as collateral damage in the war against censorship. Governments need to understand that when they act to censor the internet it has unintended consequences, one of which can be that schemes are devised that are of assistance to the very people the government *claims* to be aiming their censorship at.

They'd do better to leave well alone.

Posted by Sylvia Else, Saturday, 27 December 2008 8:48:32 AM

The technical bits of this go straight over my head, but the general message is clear. Electronic documents of any kind are nothing like the sorts of materials governments have been easily able to censor in the past. The documents themselves are not static or stable, and neither is the technological environment. The social environment on the internet makes it that much more chaotic.

This technology has a very short history, but what history it does have shows very clearly that it's self-healing. Censorship or any other kind of control is little more than a technological challenge, usually overcome before the clamp down is even in place.

The really tragic thing in this case is that the real offenders would already be building more complexities into their documents.

Posted by chainsmoker, Saturday, 27 December 2008 9:42:39 AM

Pages:
‹
Page 1
2
›
All

About Us :: Search :: Discuss :: Feedback :: Legals :: Privacy