Ticket #467 (new enhancement)

Opened 11 years ago

Last modified 11 years ago

How do you control license violation

Reported by: kmaclean Owned by: kmaclean
Priority: major Milestone: Acoustic Model 0.1.2
Component: Acoustic Model Version: Acoustic Model 0.1.1
Keywords: Cc:

Description (last modified by kmaclean) (diff)

from this post:

Came across an interesting thread in one of the Debian maililng lists (legal questions regarding machine learning models: msg 09321) where Mathieu Blondel asks:

[...]

For example, in speech recognition, speech models are trained from databases of speech and their corresponding annotated text. The models can then be used to recognize speech. To summarize the "training" procedure with a black box:

input: data => [ training algorithm] => output: model

As can be seen from the arrows, this is a "one way" transformation, i.e. it is possible to transform the data into a model but it's not possible to transform the model back into exactly the same data. The only possibility for someone to find whether his/her data were used to create the model is to reproduce exactly the same training conditions and train the data again to see if the resulting model is the same. However, two implementations of the same algorithm may differ due to design choices and algorithms themselves can have several parameters, so it's not easy to reproduce the exact same training conditions. Even then, there's no proof that some other data cannot lead to the same model in some other training conditions.

[...]

My second question is: Given the difficulty to prove what data were actually used to train a model, how can we prevent non-free software to use free data such as those of Voxforge?

Josselin Mouette provides a possible solution:

A widely-used technique is to cleverly hide some minor bugs in the data. If a non-free model shows the same bugs, you can prove the data was used illegally. Of course this only works if you manage to keep the bugs secret.

Ken

Change History

comment:1 Changed 11 years ago by kmaclean

  • Description modified (diff)
Note: See TracTickets for help on using tickets.