[asterisk-biz] Anonymous statistics collection tool forAsterisk servers?
Trixter aka Bret McDanel
trixter at 0xdecafbad.com
Fri May 16 14:06:27 CDT 2008
On Fri, 2008-05-16 at 14:48 -0400, Peter Beckman wrote:
> On Fri, 16 May 2008, Dean Collins wrote:
>
> > It would be voluntary to download the module in the first place so not
> > necessary to be able to 'turn off'.
>
> Even "scrubbed" anonymous data has been able to be used in ways that
> really don't make it anonymous. When Netflix offered $1m to anyone who
> could improve their movie recommendations, they released a large amount of
> what they believed was scrubbed, anonymous data. Turns out, it wasn't so
> anonymous.
they didnt scrub it properly, what they did effectively was replace the
name with a token that represents that person. What researchers did
then was filter the data and remove in the case of netflix the most
commonly viewed films, and look for the less common ones, then match the
token to someone they targeted. IIRC they couldnt just guess which
token went to which person, they had to target a person first then find
their token.
While that does still present some issues, and the more semi-scrubbed
aggregate data that is published the easier it becomes to cross
reference to get even more information. Another paper discussed this
possibility, which can even lead to discovery of identities where no
other method was previously available.
So to correct you on what you said "Turns out, it wasn't so SCRUBBED" :)
how one would properly scrub depends largely on the data in question,
who its released to, etc. The more its scrubbed though the less
valuable it becomes to many.
For example if you were an ITSP publishing call figures, you could do
like netflix and just replace the customer acct number with some token,
but that lets you see that token X called all these numbers, and low and
behold you got a call from X which is listed, you now know who X is.
But if you released figures that broke the data down into simple stats,
customer X did Y minutes to country Z, A% of traffic was during these
hours, etc, it isnt as useful to many making doing it less rewarding.
You could of course do well to mask all the numbers in this particular
example, maybe just list the region its in (US state for example) and
not even the city. In that way you could try to reduce more and more
the information but still have some value.
There will of course be those that dont want to participate, and I
generally think that tying aggregate data to the use of a product is a
bad idea and that it should be optional. At the very least it should be
well revealed that this is going on, especially since some places dont
allow this without implicit not tacit agreements over this. There is
the potential in some places (and it would take only the customer being
there not necessarily the business) for a civil or criminal charge to
occur, and if its criminal it can get ugly with extradition and all
that. Data privacy is a touchy thing in some parts of the world.
--
Trixter http://www.0xdecafbad.com Bret McDanel
Belfast +44 28 9099 6461 US +1 516 687 5200
http://www.trxtel.com the phone company that pays you!
More information about the asterisk-biz
mailing list