Recent comments in /f/MachineLearning
BlazeObsidian t1_j6xbu8f wrote
Reply to [D] PC takes a long time to execute code, possibility to use a cloud/external device? by Emergency-Dig-5262
You can try Kaggle notebooks and Google Colab notebooks but they don't persist for that long. They typically shut down after 6 hours. You'll have to periodically save your best model/hyperparameters but that might be a viable free option.
Google Colab also has a paid option where you can upgrade the RAM, GPU etc.. to meet your needs.
But I am curious as to why it's taking 21 hours. Have you checked in your course forums/discussions for the expected time ?
starstruckmon t1_j6xbhe1 wrote
Reply to comment by SuddenlyBANANAS in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
>find exact duplicates of images which were not in the training data, you'd have a point
The process isn't exactly the same, but isn't this how all the diffusion based editing techniques work?
djc1000 t1_j6xb9r4 wrote
Reply to [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
It’s really interesting to see how companies are trying to productize ai. The teams features seem both powerful, and a total waste of a billion dollar language model. I hope we start to see better.
znihilist t1_j6xa0o3 wrote
Reply to comment by Ronny_Jotten in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
My point is more to the fact that f(x) doesn't have 3.95 in it anywhere. Because another option would be to write f(x) as -(x-2)(x-3)(x-4)*1/6 -(x-1)(x-3)(x-4)*3.95/2 -(x-1)(x-2)(x-4)*9.05/2 + (x-1)(x-2)(x-3)*16.001/6 this recreates the original points, plug in 1 and you get -(-1)(-2)(-3)*1/6 -(0)(-2)(-3)*3.95/2 -(0)(-1)(-3)*9.05/2 + (0)(-1)(-2)*16.001/6 which is just 1.
This version of f(x) has "memorized" the inputs and is written as a direct function of these inputs, versus x^2 which has nothing in it that is retraced to the original inputs. Both of these functions are able to recreate the original inputs. Although one to infinite precision (RMSE = 0) and the other to an RMSE of ~0.035.
I think intuitively we recognize that these two functions are not the same even beyond their obvious differences (first is a 4th order power function, and the other is a 2nd order power function), either way. Point is, I think "memorize" while applicable in both cases, one stores a copy and the other is able to recreate from scratch, and I believe they do mean different things in their legal implications.
Also, I think it is very interesting the divide on this from a philosophical point of view, and with the genie being out of the bottle, then beside strong societal change and pressure that genie is never going back to the bottle.
Single_Blueberry t1_j6x9v3s wrote
>Does this mean that only well-funded corporations will be able to train general-purpose LLM
No, they are just always a couple years ahead.
That's not just a thing with language models, or even ML, it's like that with many technologies.
Miguel33Angel t1_j6x9bnf wrote
Reply to comment by I_Am_The_Sevit in [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
Yeah you would just need to add something to remove filler words as well
Doing it with whisper, given a list of filler words would be easy enough I think
BCBCC t1_j6x9axw wrote
Reply to comment by TrevorIRL in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
I know what the Pareto principle is, and I don't think 20% of users will pay this subscription fee, that's a pretty wild assumption
new_name_who_dis_ t1_j6x95yq wrote
Reply to comment by frequenttimetraveler in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
What do you mean by that?
Senior1292 t1_j6x92pj wrote
Reply to comment by frequenttimetraveler in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Using fancy words but be factually incorrect?
visarga t1_j6x8zna wrote
I think open source implementations will eventually get there. They probably need much more multi-task and RLHF data, or they had too little code in the initial pre-training. Training GPT-3.5 like models is like a recipe, and the formula + ingredients are gradually becoming available.
Jean-Porte t1_j6x8oyx wrote
Reply to comment by alpha-meta in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
Yes but the LM has to take many steps to produce the text
We need to train the LM to maximize a far-away reward and we need RL to do that
schwagggg t1_j6x7eh7 wrote
Reply to [D] Normalizing Flows in 2023? by wellfriedbeans
i recently found a paper from Blei’s lab that use NF to learn klpq instead of klqp variational inferences (might be what the other commenter is referring to), but i’m afraid that’s not what u r interested in.
then apart from that the last SOTA i can remember was GLOW applied application wise.
TrevorIRL t1_j6x5uer wrote
So it costs them $100 000/day to run
30 days * $100 000/day = $3 million a month in costs
10 million users * 20% who will buy (Pareto Principle) = 2 million users who buy a subscription.
2 million * $20/month = $40 000 000/ month in revenue.
Assuming I did my math right, that’s some pretty amazing margins and it’s only going to get better!
znihilist t1_j6x5c0y wrote
Reply to comment by maxToTheJ in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
MP3 can recreate only the original version. They can't recreate other songs that has never been created or thought of. Compression only relates to one input and one output exactly. As such, this comparison falls apart when you apply it to these models.
frequenttimetraveler t1_j6x5bdf wrote
Reply to [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Oh well, now every employee can talk like a manager
maxToTheJ t1_j6x4vrz wrote
Reply to comment by Argamanthys in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
> That's pretty much what's going on here.
No its not. We wouldn’t need training sets if that was the case like in the scenario described where you can generate the dataset using a known algo
bpooqd t1_j6x4p5d wrote
Cool, but I wished it would include an API and integration in other messengers like Signal. Would still sign up for it for sure though as long as its reasonably priced (<20$/month).
maxToTheJ t1_j6x4dc8 wrote
Reply to comment by znihilist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Thats a bad argument . MP3s are compressed versions for the original file for many songs so the original isn’t exactly in the MP3 until the decompression is applied. Would anybody argue that since a transformation is applied in the form of a decompression algo that Napster was actually in the clear legally
[deleted] t1_j6x3qal wrote
>With this method, the authors were able to find samples from Stable Diffusion and Imagen corresponding to copyrighted training images.
Well this will split the room.
nombinoms t1_j6x36k6 wrote
Reply to comment by fraktall in [R] On the Expressive Power of Geometric Graph Neural Networks by chaitjo
Well when you consider the fact that every transformer is based on self-attention, which is a type of GNN, I'd say they are getting quite a bit of attention (no pun intended).
koolaidman123 t1_j6x2b05 wrote
Reply to comment by [deleted] in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
sure? you can have multiple ways of ranking, but:
- the instructGPT paper strictly uses pairwise ranking
- asking annotators to rank however many passages 1-k in 1 shot is much more difficult and subject to noise than asking for pairwise comparisons
visarga t1_j6x1uwy wrote
Reply to comment by Ronny_Jotten in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
> The extent to which something is memorized ... is certainly something to be discussed.
One in a million chance of memorisation even when you're actively looking for them is not worth discussing about.
> We select the 350,000 most-duplicated examples from the training dataset and generate 500 candidate images for each of these prompts (totaling 175 million generated images). We find 109 images are near-copies of training examples.
On the other hand, these models compress billions of images into a few GB. There is less than 1 byte on average per input example, there's no space to have significant memorisation. Probably why there were only 109 memorised images found.
I would say I am impressed there were so few of them, if you use a blacklist for these images you can be 100% sure the model is not regurgitating training data verbatim.
I would suggest the model developers remove these images from the training set and replace them with variations generated with the previous model so they only learn the style and not the exact composition of the original. Replacing originals with variations - same style, different composition, would be a legitimate way to avoid close duplication.
alpha-meta OP t1_j6x1r2j wrote
Reply to comment by Jean-Porte in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
But isn't this only if you train it on the loss (negative log-likelihood) via next-word prediction, i.e., what they do during pretraining?
If you use the ranks (from having users rank the documents) to compute the loss on the instead of the words as labels, would that still be the case?
bojohnsonyadig t1_j6x1hud wrote
Reply to comment by cachemonet0x0cf6619 in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
It’s not up to date for any libraries past it’s training date, so are you using it as a rough answer or are your questions not generally library specific?
[deleted] t1_j6xcbsa wrote
Reply to comment by fraktall in [R] On the Expressive Power of Geometric Graph Neural Networks by chaitjo
[deleted]