Performance issue and shaky basis for fake detection ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Performance issue and shaky basis for fake detection ?

kinkin2
hello,

I have a large collection of audio files and was advised to use fakin the funk to maintain it.

I tried running faking the funk on a small subset of the collection with a 150GB directory of around 16k files.

as faking the funk has no support for linux and windows has no support for linux filesystem I had to resort to install faking the funk on windows and mount the linux filesystem over sshfs in windows on a 1Gbit LAN connection. I understand this means the performance will take a hit, but this mirrors my actual use and I am not aware of a better way to work around the limitations. it takes a little less than 1 hour to transfer the whole collection, so it should not be that bad.

But, faking the funk has been running 24/24 for over 5 days straight and the progress bar has been stuck at the limit between the file tree separation with the file list.. the remaining time estimation is plain wrong and seems to have been poorly thought out as it is fixed to only HH:MM:SS, and it seems the same issue affects the elapsed time indicator.

So I decided to put an end to the experiment and cancel the operation, which lead to faking the funk becoming unresponsive, 12h later I simply killed it from task manager and ended up with a database error when trying to start fakin the funk again.

At this point I started again from scratch and with a fresh install did the same experiment but with a much smaller subset of around 100 files, which ran to completion with no visible issue.

To my surprise the funk had flagged much more files as fakes than I expected. upon closer inspection about a 40% of those should not be fake as they were bought from online shops and not only would it be odd for a reputable shop and artist to sell fake files, but why the rest of the tracks from the same set would not be fake ? another 40% was definitely not fakes as I ripped and encoded them myself from my CDs.
The last 20% could be actual fakes, as I do not know where they come from. But I do not understand why faking the funk flagged most of them as such. the only thing faking the funk offers is to show the frequency spectrum, no other explanation are available. so I tried that and was surprised that it was only calculated at this point and was not during the scanning process. the resulting image only feature a horizontal line which seems to point to a possible frequency cutoff.

from what I understand, faking the funk seems to based its detection of the wrong assumption that a filter cut off means the file had been upscaled.

let me quote someone explaining why this is wrong:

>you cannot detect "fake" lossless with a spectograph.

>only some lossy codecs use a low-pass filter, not all.

>and only some lossless/uncompressed files use the entire available frequency band, not all.

>so if you try to identify "fakes" by spotting a bandlimiting filter in a spectrograph, you will get false positives easily and you will also get false negatives easily. in other words, you will not be identifying shit.


in other words, if faking the funk is relying on frequency cutoff, it will get some of the obvious transcode, miss those that do not include a frequency cutoff, and flag as false positive other type of files.

that said despite this limitation, faking the funk can still be useful. one simply has to be aware that it is only reliable to a point and it will not sanitize a whole collection.

imho it is too bad that faking the funk is not up to par to be a commercially sold software yet, as there is a market for such a software. while at the same time, developing such a software requires funding to be done.
Reply | Threaded
Open this post in threaded view
|

Re: Performance issue and shaky basis for fake detection ?

Fake No Funk
Administrator
Thanks for the long post :)

The way "Faink' The Funk" works and the limitations are described in the userguide available at https://fakinthefunk.net/userguide

You are right -> it's technically not possible to calculate the 100% accurate bitrate from the spectrum, but as you mentioned, it is a strong indicator if there are cutoffs or other anomalies.

You can decide to set the option "allow cutoffs above xxx Hz" to reduce false positives

About the performance issue, I'm really surprised that it runs so long. I would have to analyse the logfile, maybe it gives a hint. The log can be accessed int he about dialog by doubleclicking the versionnumber in  the bottom left corner

CU
Ulrich

Reply | Threaded
Open this post in threaded view
|

Re: Performance issue and shaky basis for fake detection ?

kinkin3
From looking at the logs a clear pattern seems to emerge.

Here is a log for a full day of scanning: https://pastebin.com/AXH2096f

actual filenames have been renamed to filepath for privacy reason, before renaming it showed are each file name in sequence.

it seems to me it tries to scan each file for 10 mins then gives up due to some kind of timeout happening, then it moves to the next and same thing happens again. so about it goes through the queue at a very slow pace of 6 files per hour to eventually fail raising an error.

all files do exists, are accessible at this path, are valid audio files that can be loaded and listened to in foobar2000 or any other media player.