Recent comments in /f/MachineLearning

TheTwigMaster t1_j5097m6 wrote

Using open source models might be good for quickly experimenting and getting a feel/sense of the value of an approach for a particular problem. But at a company (especially big tech companies), there are many more things to consider:

  • How do I scale this to my particular dataset? It’s a bigger pain to change my data to fit a given model than to change the model to fit my data
  • How can I integrate my company’s infrastructure/tooling/monitoring to this? Often it ends up being simpler to revisit the implementation from scratch
  • How easy is it to experiment with adjustments to this? Often we don’t want to pick a single architecture forever, so we want to be able to adjust and modify easily. Open source models may not always accommodate this.

At the risk of being flippant/dismissive: coding up a model/architecture is one of the easiest and fastest parts of the problem. So if you can make other things easier by making a model implementation from scratch, it’s makes sense to just do that.

12

CurrentlyJoblessFML OP t1_j508inw wrote

Hi! Thanks for the response. I’ll try my luck by just concatenating my noisy input with yt along the channel dimension and see if that works. In the SR3 paper, the authors also mention that they tried using a different way to condition the model but they found that simply concatenating it gave them the same generation quality so they just stuck with that.

Good luck with your project and HMU if you ever want to discuss this. I’ve been breaking my head on these diffusion models for the past couple of days so I feel your struggle.

3

Naive-Progress4549 t1_j507af0 wrote

I think that if you go in the guided_diffusion repository you can see that the super resolution network condition the output by concatenating the low resolution image. There are also other ways to condition, like the gradients during sampling.

I am trying to adapt the guided_diffusion repository for some other task since a couple of months now...I have to say I am facing quite some difficulties overall!

I hope this helps

1

starstruckmon OP t1_j501y7y wrote

From the paper

>One natural avenue for future work would be to investigate fine-tuning mechanisms for such large-scale models, which would allow further accuracy recovery. We conjecture that this should be possible, and that probably at least 80-90% sparsity can be achieved with progressive pruning and fine-tuning.

So, that comes next. Though I doubt the 80-90% guesstimate.

1

LanverYT t1_j501vdn wrote

That's a really interesting question, and I've been wondering about the same thing. I've never been able to figure it out, but I would love to see what others have to say about it. It sounds like you have a solid approach and understanding of the concept, so I'm curious to see how it turns out. Good luck with your experimentation and let us know how it goes

−1

Leptino t1_j4zxkyn wrote

The only people that have a prayer at doing this, is OpenAI themselves. It is likely they can insert an undetectable watermark in sufficiently generic text output for sufficiently many words which does not distort the meaning or quality appreciatively.

However, there is almost no way this can survive subsequent finetunings.. Like 'rewrite the previous paragraph with three new random words that doesn't change the meaning', and 'change all the nouns/verbs into synonyms that preserves the meaning of the paragraph'.

I strongly suspect (and might one day try my hand at the math) that there can be no such system that works in general against this sort of attack.

2

IntelArtiGen t1_j4zr3iq wrote

Yeah that's also what I would say, I doubt it's anything revolutionary as it's likely not necessary. It might be an innovative use of embeddings of a conversation but I wouldn't qualify that as "revolutionary".

They probably don't use only one embedding for the whole conv, perhaps they use one embedding per prompt and/or they keep in memory some tokens.

1

Czl2 t1_j4zqan4 wrote

Ask model to summarize whatever is about to be cut off as you slide the token window and replace what is lost with that summary? In this way your token window always has a summarized version of what is missing attached?

8

Daos-Lies t1_j4zpwjr wrote

This is just a suspicion, but I think it's just a matter of embedding the conversation and using that embedding as an input, in addition to your most recent question. (Which is just classic recurrence really).

I'm relatively confident that the mechanism would be something along those lines because they made a relatively big fuss about their new embedding service around the same time that chatgpt was released. (tho obviously that didn't get as much attention as chatgpt itself).

(and in response to u/DaLameLama asking if chatGPT goes past the token limit: Yes. it deffo can go past 8000 tokens, I have had some v v v long conversations with it.)

21

DaLameLama t1_j4zhqqj wrote

Does ChatGPT actually get past the token limit? Codex supports ~8000 tokens. You might underestimate how much this is. Has anyone tested the limits?

Unfortunately, OpenAI aren't serious about publishing technical reports anymore.

30

mtocrat t1_j4zecpm wrote

What you're describing is a general approach to RL that is used in different forms in many methods: sample actions, weight or rank them in some way by the estimated return, regress to the weighted actions. So you're not suggesting to do something other than RL but to replace one RL approach with a different RL approach.

2