tripple13 t1_j723bf0 wrote on February 3, 2023 at 3:00 PM

Reply to comment by new_name_who_dis_ in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

I strongly disagree. Having an understanding of seq2seq prior Transformers, goes a long way.

mostlyhydrogen OP t1_j7238p8 wrote on February 3, 2023 at 2:59 PM

Reply to comment by linverlan in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

As you probably know, ANN search often returns irrelevant data. How might I iteratively refine the search with human feedback: marking samples as "relevant" or "irrelevant" and repeating the search.

I've done a lit search and haven't found anything, maybe because I am using the wrong keywords.

yaosio t1_j71zddj wrote on February 3, 2023 at 2:32 PM

Reply to comment by Necessary_Ad_9800 in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

It won't be too long before they can use co-pilot to fix code for them.

SAbdusSamad OP t1_j71z0zp wrote on February 3, 2023 at 2:30 PM

Reply to comment by JustOneAvailableName in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

Well, I do have idea about CNNs. I have limited knowledge of RNNs. But I don't have knowledge of Attention is All You Need.

JustOneAvailableName t1_j71yj42 wrote on February 3, 2023 at 2:26 PM

Reply to comment by new_name_who_dis_ in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

Understanding what is extremely easy and rather useless, to understand a paper you need to understand some level of why. If you have time to go in depth, aim to understand the what not and why not.

So I would argue at least some basic knowledge of CNNs is required.

asarig_ OP t1_j71wbqs wrote on February 3, 2023 at 2:10 PM

Reply to comment by SatoshiNotMe in [R] Graph Mixer Networks by asarig_

Of course, MLP-Mixers is a new approach first developed as image classification and was developed independently by Google and Oxford researchers in May 2021.

The MLP-Mixer, also known simply as "Mixer", is a type of image architecture that doesn't incorporate convolutions or self-attention. Instead, it relies solely on the use of multi-layer perceptrons (MLPs) that are repeatedly applied either to different spatial locations or feature channels.

Instead of Transformers, which are normally applied on the Graph, in this work, I tried to use Mixers as a new kernel method on graphs, which aims to find out how it performs with linear complexity, avoiding the O(n***^(2)***) complexity of Transformers

new_name_who_dis_ t1_j71w8up wrote on February 3, 2023 at 2:09 PM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

If I recall correctly, ViT is a purely transformer based architecture. So you don't need to know RNNs or CNNs, just transformers.

atharvat80 t1_j71u3oa wrote on February 3, 2023 at 1:53 PM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

If you want to take the top down approach I'd recommend that you start by learning what transformers are. Transformers were originally intended for language modelling so if you look up a NLP lecture series like Stanford CS224n they cover that in detail form a NLP perspective, it should be helpful regardless. Or you can check out CS231n they have a whole lecture on attention, transformers and ViT. Start there and look up the stuff thats unclear from there.

Lmk of you'd like me to link any other resources, I'll edit this later. Happy learning!

SatoshiNotMe t1_j71t20w wrote on February 3, 2023 at 1:45 PM

Reply to [R] Graph Mixer Networks by asarig_

For those not clued in, can you briefly explain what are MLP-Mixers and how they are relevant to GNNs?

netw0rkf10w OP t1_j71r8e8 wrote on February 3, 2023 at 1:30 PM

Reply to comment by CyberDainz in [D] ImageNet normalization vs [-1, 1] normalization by netw0rkf10w

Any references?

juanigp t1_j71p88u wrote on February 3, 2023 at 1:13 PM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

matrix multiplication, linear projections, dot product

prototypist t1_j71p3d6 wrote on February 3, 2023 at 1:12 PM

Reply to [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta

You can fine-tune language models on a dataset, and that's essentially how people have been typically doing NLP with transformers models? It's more recent that research has been having success with RL for these kinds of tasks. So whatever rationale and answers you get here, the main reason is that they were doing supervised learning before and the RL people started getting better results.

cunth t1_j71ovks wrote on February 3, 2023 at 1:10 PM

Reply to comment by theoneandonlypatriot in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

Getting a good data set to train a model is usually the most time-consuming task. You need breadth amd depth of content so your model doesn't overfit and work for just a handful of narrow use cases.

Supervised learning algorithms need labeled data (e.g. classification tags) and this is traditionally done with people. If that can be done with AI, you can complete this 100x faster and probably more accurately.

anarcap t1_j71oi6e wrote on February 3, 2023 at 1:07 PM

Reply to comment by Imonfire1 in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

Windows 11 is technically a decent Ubuntu distro.

cunth t1_j71ocf6 wrote on February 3, 2023 at 1:05 PM

Reply to comment by Nhabls in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

Not sure about the above claim, but you can train a GPT2 model in 38 hours for about 600 bucks on rented hardware now. Costs are certainly coming down.

Jurph t1_j71nymu wrote on February 3, 2023 at 1:02 PM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

I recommend diving in, but getting out a notepad and writing down any term you don't understand. So if you get two paragraphs in and someone says this simply replaces back-propagation, making the updated weights sufficient for the skip-layer convolution and you realize that you don't understand back-prop or weights or skip-layer convolution ... then you probably need to stop, go learn those ideas, and then go back and try again.

For deep neural nets, back-propagation, etc., there will be a point where a full understanding will require calculus or other strong mathematic principles. For example, you can't accurately explain why back-prop works without a basic intuition for the Chain Rule. Similarly, activation functions like ReLu and sigmoid require a strong algebraic background for their graphs to be a useful shorthand. But you can "take it on faith" that it works, treat that part of the system like a black box, and revisit it once you understand what it's doing.

I would say the biggest piece of foundational knowledge is the idea of "functions", their role in mappings and transforms, and how things similar to Newton's Method are meant to work to get approximate solutions after several steps. A lot of machine learning is based on the idea of expressing the problem as a composed set of mathematical expressions that can be solved iteratively. Grasping the idea of a "loss function" that can be minimized is core to the entire discipline.

cunth t1_j71nxsu wrote on February 3, 2023 at 1:02 PM

Reply to comment by ThunderySleep in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

Ability to execute will become even more important when competence is normalized.

ooonurse t1_j71mt0q wrote on February 3, 2023 at 12:51 PM

Reply to comment by ThunderySleep in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

In fairness, every single time I've seen someone use grammarly they were extremely intelligent people with English as their second or third language. I also know one person who uses it because of dyslexia, which has nothing to do with intelligence. Be careful about shaming people for using software commonly used for accessibility.

[deleted] t1_j71m65o wrote on February 3, 2023 at 12:45 PM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

[removed]

AerysSk t1_j71kz0d wrote on February 3, 2023 at 12:33 PM

Reply to comment by the_architect_ai in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

This is the correct attitude. Dive in, and if you meet obstacles, find it. It's what makes the learning journey fun: you don't just learn one thing, but many things.

CatalyzeX_code_bot t1_j71kw7i wrote on February 3, 2023 at 12:32 PM

Reply to [R] Graph Mixer Networks by asarig_

Found relevant code at https://github.com/google-research/vision_transformer + all code implementations here

--

Found relevant code at https://github.com/lukemelas/do-you-even-need-attention + all code implementations here

--

Found relevant code at https://github.com/asarigun/GraphMixerNetworks + all code implementations here

--

To opt out from receiving code links, DM me

fuscarili OP t1_j71k5i1 wrote on February 3, 2023 at 12:25 PM

Reply to comment by EnzoTrent in [D] I'm at a crossroads: Bayesian methods VS Reinforcement Learning, which to choose? by fuscarili

I see, so in this sense you mean that Reinforcement Learning should be the choice?

Cause it's one the things Chat GPT uses , together with supervised learning.

the_architect_ai t1_j71izep wrote on February 3, 2023 at 12:13 PM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

I suggest you just dive straight in. Part of learning is to find out what you don’t know and slowly cover your bases from there.

mongoosefist t1_j71dbhq wrote on February 3, 2023 at 11:06 AM

Reply to comment by NitroXSC in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips

Differential privacy methods work in a way that's quite similar to the denoising process of diffusion models already. The problem is that in most Differential privacy methods they rely on the discreteness of data. The latent space of diffusion models is completely continuous, so there is no way to tell the difference between similar images, and thus you can't tell which ones are from the training data if any at all.

For example, if you're pretty sure the diffusion model has memorized an oil painting of Kermit the frog, there is no way for you to say with any reasonable amount of certainty whether images you are denoising that turn out to be oil paintings of Kermit are from actual pictures, or from the distribution of oil paintings overlapping with the distribution of pictures of Kermit from the latent space, because there is no hard point where one transitions to the other, or a meaningful difference in density between the distribution

jimmymvp t1_j71cgkw wrote on February 3, 2023 at 10:55 AM

Reply to comment by badabummbadabing in [D] Normalizing Flows in 2023? by wellfriedbeans

There is a trick how you can get away with gradually expanding your latent dimension with normalising flows, if you assume that the dimensions are independent to a certain point, then you sample from a base distribution and concatenate in the middle of the flow.

Again, MCMC sampling, simulation based inference are examples. Imagine you have an energy function that describes the distribution (you don't have data), how do you sample from this distribution? You would do some MCMC, how would you arrive to a good proposal distribution to make the MCMC algorithm more efficient? You would fit the proposal based on some limited data that you have or inductive biases such as certain invariances etc.

Recent comments in /f/MachineLearning