Recent comments in /f/MachineLearning

cccntu OP t1_j9hsouu wrote

Theirs requires you to rewrite the whole model and replace every layer you want to apply LoRA to with the LoRA counterpart, or use monky-patching.Mine utilizes PyTorch parametrizations to inject the LoRA logic to existing models. If your model has nn.Linear, you can call add_lora(model) to add LoRA to all the linear layers. And it's not limited to Linear, you can see how I extended it to Embedding, Conv2d in a couple lines of code. https://github.com/cccntu/minLoRA/blob/main/minlora/model.py

9

GaseousOrchid t1_j9hhxdm wrote

yeah, this has been my experience -- i'm working with a lot of custom data, and even though some of it is CV adjacent, it doesn't fit exactly (e.g., ~40 channels instead of 3 like RGB). would be nice, especially for research prposes, to have something to plug and play that just worked.

1

ParanoidTire t1_j9hdztb wrote

No idea what nmf is, but normalization is usually a critical step for any ML algo. Min max normalization is common, as well as z normalization. If your data needs to be positive, adding the minimum is indeed a way to guarantee this.

1

ParanoidTire t1_j9hd4r5 wrote

Welcome to the world of research. You can find all that stuff in so called "papers", i.e. publications. To get started I would suggest to have a look at one of the most influential architectures: resnet. Just Google "resnet paper" and your good to go (too lazy to fetch the citation, but it's by he et al.)

1

ParanoidTire t1_j9hc4uq wrote

My journey started years ago by wanting to understand the DQN paper. Hintons coersera course was a nice start and after that it was just going down the rabbit hole which are citations. It takes a lot of effort in the beginning because every single sentence you read will introduce new topics to you that you never heard before. But after a while these become second nature and you won't spend any second thoughts on them anymore. It just takes preserverance and will imo.

1

ParanoidTire t1_j9hbd77 wrote

I have years of grievances with io. It's really difficult to have something that is both flexible, performant, and can scale to terabytes of data with complex strucuture. As soon as you leave the nice cv or nlp domain you are on your own. Raw c type arrays loaded manually from disk in a separate Cuda stream can sometimes be really be your best shot.

1

KakaTraining OP t1_j9hayq4 wrote

Oh, I mean kinds of... There is a lot of work to do for writing papers, The connected ChatGPT will bring a lot of research fields to information security.

User A can publish the prompt injection content to mislead User B through NewBing.

Will there be many injection spam like SEO spam on the Internet in the future?

1

buyIdris666 t1_j9h8zqj wrote

I like to think of networks with residual connections as an ensemble of small models cascaded together. The residual connections were created to avoid vanishing/exploding gradient with deep networks.

It's important to realize that each residual connection exposes the layer to the entire input. I don't like the name "residual" because it implies a small amount of data is transferred. Nope, it's the entire input.

Latent information the model has learned along the way passes through the bottleneck. Which supposedly forces it to keep only the most important information. But the explanation above about receptive fields is also prudent.

Beyond vanishing gradient problem that affects all deep networks, one of the biggest challenges with image models is getting them to understand the big picture. Pixels close to each other are very strongly related, so the network preferentially learns these close relations. The bottleneck can be seen as forcing the model to learn global things about the image as resolution is usually halved each layer. A form of image compression if you want to think of it that way.

So the residual connections keep the model from forgetting what it started with, and the bottleneck forces it to learn more than just the relations between close pixels. Similar to the attention mechanism used in Transformers.

CNN's tend to work better than transformers for images because they naturally assume close by pixels affect each other more due to their receptive fields. This makes them easier to train on images. Whether Transformers would work equally well with more training is an open question.

For a similar model to U-Net and other "bottlenecked" CNN architectures check out denoising auto-encoders https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798

There is currently speculation that Diffusion models are simply a return of the ancient (7 years old) denoising autoencoder with a fancy training schedule tacked on

This explanation is entirely my intuition because nobody really knows how this stuff works lol

6

the_architect_ai t1_j9h7but wrote

Use binning/ quantisation to reduce image size. Look into voxelisation.

Transformers can capture long range spatial interactions but computation is hefty. Might have to downsize first.

In ViT, tokenization is applied on patches. You might need a 3D CNN to extract voxel tokens.

There are many ways to reduce computational costs via attention-ing. In the paper Perceiver I/O by deepmind, a bottleneck cross attention layer is applied.

6

limpbizkit4prez t1_j9h3nbm wrote

If you don't know how to code, then regardless of how you interface it's going to be difficult to execute. If you do know how to code, then you'll probably want better encapsulation. I guess what I'm most curious about is if those code examples they give in their paper are able to be ran, like are those libraries that easy to use

3

morecoffeemore t1_j9h1m0n wrote

How does the facial search engine pimeyes work?

It's frighteningly accurate. I've tested on pictures of people I know and it's accurately found other pictures of them online which leads to their identity. The false positive rate is pretty low.

I have a technical background (although not CS), so please provide more than a simple response if you can.

Thanks.

2