Recent comments in /f/MachineLearning
cccntu OP t1_j9hsouu wrote
Reply to comment by brucebay in [P] minLoRA: An Easy-to-Use PyTorch Library for Applying LoRA to PyTorch Models by cccntu
Theirs requires you to rewrite the whole model and replace every layer you want to apply LoRA to with the LoRA counterpart, or use monky-patching.Mine utilizes PyTorch parametrizations to inject the LoRA logic to existing models. If your model has nn.Linear, you can call add_lora(model) to add LoRA to all the linear layers. And it's not limited to Linear, you can see how I extended it to Embedding, Conv2d in a couple lines of code. https://github.com/cccntu/minLoRA/blob/main/minlora/model.py
nikola-b t1_j9hk5q4 wrote
Reply to comment by tyras_ in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Free for now, we have not added the payment workflow. In the future, you are billed only for the inference time, so with 1h you should be able to generate lots of tokens. Also I added EleutherAI/gpt-neo-2.7B and EleutherAI/gpt-j-6B if the op wants to try them.
Mad-Independence t1_j9hik5x wrote
Reply to comment by ParanoidTire in [D] Simple Questions Thread by AutoModerator
Hello, yeah I started at it super long and hard and went through. I just ended up using the excel way instead 😂
GaseousOrchid t1_j9hhxdm wrote
Reply to comment by ParanoidTire in [D] Simple Questions Thread by AutoModerator
yeah, this has been my experience -- i'm working with a lot of custom data, and even though some of it is CV adjacent, it doesn't fit exactly (e.g., ~40 channels instead of 3 like RGB). would be nice, especially for research prposes, to have something to plug and play that just worked.
geeky_username t1_j9hhkqz wrote
Very cool
Bookmarked and subscribed
jaeja_helvitid_thitt t1_j9hfvi0 wrote
I don't think the existing animations are strictly wrong, they just don't show the last dimension.
Mnbvcx0001 t1_j9hfr7h wrote
Reply to comment by ParanoidTire in [D] Simple Questions Thread by AutoModerator
Thanks for sharing.
ParanoidTire t1_j9hdztb wrote
Reply to comment by Khal_Doggo in [D] Simple Questions Thread by AutoModerator
No idea what nmf is, but normalization is usually a critical step for any ML algo. Min max normalization is common, as well as z normalization. If your data needs to be positive, adding the minimum is indeed a way to guarantee this.
ParanoidTire t1_j9hdrte wrote
Reply to comment by Mad-Independence in [D] Simple Questions Thread by AutoModerator
Have you tried doing what the message suggested, i.e. checking the logs? Otherwise I would suggest contacting their support directly.
ParanoidTire t1_j9hdc4h wrote
Reply to comment by slickvaguely in [D] Simple Questions Thread by AutoModerator
Yes this is actually commonly done and a. Core ingredient in object detection. Look up faster r-cnn
ParanoidTire t1_j9hd4r5 wrote
Reply to comment by TheGamingPhoenix_000 in [D] Simple Questions Thread by AutoModerator
Welcome to the world of research. You can find all that stuff in so called "papers", i.e. publications. To get started I would suggest to have a look at one of the most influential architectures: resnet. Just Google "resnet paper" and your good to go (too lazy to fetch the citation, but it's by he et al.)
ParanoidTire t1_j9hc4uq wrote
Reply to comment by Mnbvcx0001 in [D] Simple Questions Thread by AutoModerator
My journey started years ago by wanting to understand the DQN paper. Hintons coersera course was a nice start and after that it was just going down the rabbit hole which are citations. It takes a lot of effort in the beginning because every single sentence you read will introduce new topics to you that you never heard before. But after a while these become second nature and you won't spend any second thoughts on them anymore. It just takes preserverance and will imo.
ParanoidTire t1_j9hbd77 wrote
Reply to comment by GaseousOrchid in [D] Simple Questions Thread by AutoModerator
I have years of grievances with io. It's really difficult to have something that is both flexible, performant, and can scale to terabytes of data with complex strucuture. As soon as you leave the nice cv or nlp domain you are on your own. Raw c type arrays loaded manually from disk in a separate Cuda stream can sometimes be really be your best shot.
KakaTraining OP t1_j9hayq4 wrote
Reply to comment by master3243 in [D] Maybe a new prompt injection method against newBing or ChatGPT? Is this kind of research worth writing a paper? by KakaTraining
Oh, I mean kinds of... There is a lot of work to do for writing papers, The connected ChatGPT will bring a lot of research fields to information security.
User A can publish the prompt injection content to mislead User B through NewBing.
Will there be many injection spam like SEO spam on the Internet in the future?
keepthepace t1_j9haufb wrote
It looks gorgeous!
ParanoidTire t1_j9hatq5 wrote
Reply to comment by Rubberdiver in [D] Simple Questions Thread by AutoModerator
Leave it be? Or put in the effort and learn programming and numeric methods in general, e.g. HMMs. "I'm far from a professional mountain biker but I want to race down this difficult trail"
buyIdris666 t1_j9h8zqj wrote
Reply to comment by _Arsenie_Boca_ in [D] Bottleneck Layers: What's your intuition? by _Arsenie_Boca_
I like to think of networks with residual connections as an ensemble of small models cascaded together. The residual connections were created to avoid vanishing/exploding gradient with deep networks.
It's important to realize that each residual connection exposes the layer to the entire input. I don't like the name "residual" because it implies a small amount of data is transferred. Nope, it's the entire input.
Latent information the model has learned along the way passes through the bottleneck. Which supposedly forces it to keep only the most important information. But the explanation above about receptive fields is also prudent.
Beyond vanishing gradient problem that affects all deep networks, one of the biggest challenges with image models is getting them to understand the big picture. Pixels close to each other are very strongly related, so the network preferentially learns these close relations. The bottleneck can be seen as forcing the model to learn global things about the image as resolution is usually halved each layer. A form of image compression if you want to think of it that way.
So the residual connections keep the model from forgetting what it started with, and the bottleneck forces it to learn more than just the relations between close pixels. Similar to the attention mechanism used in Transformers.
CNN's tend to work better than transformers for images because they naturally assume close by pixels affect each other more due to their receptive fields. This makes them easier to train on images. Whether Transformers would work equally well with more training is an open question.
For a similar model to U-Net and other "bottlenecked" CNN architectures check out denoising auto-encoders https://towardsdatascience.com/applied-deep-learning-part-3-autoencoders-1c083af4d798
There is currently speculation that Diffusion models are simply a return of the ancient (7 years old) denoising autoencoder with a fancy training schedule tacked on
This explanation is entirely my intuition because nobody really knows how this stuff works lol
lucidraisin t1_j9h8fu4 wrote
Reply to comment by Animated-AI in [P] The First Depthwise-separable Convolution Animation by Animated-AI
one for transformers, or even just multi head attention would be amazing! do you have a patreon?
the_architect_ai t1_j9h7but wrote
Use binning/ quantisation to reduce image size. Look into voxelisation.
Transformers can capture long range spatial interactions but computation is hefty. Might have to downsize first.
In ViT, tokenization is applied on patches. You might need a 3D CNN to extract voxel tokens.
There are many ways to reduce computational costs via attention-ing. In the paper Perceiver I/O by deepmind, a bottleneck cross attention layer is applied.
ginsunuva t1_j9h467a wrote
Reply to comment by EightEqualsEqualsDe in [D] Maybe a new prompt injection method against newBing or ChatGPT? Is this kind of research worth writing a paper? by KakaTraining
Vulnerability for a few-day-old prototype?
limpbizkit4prez t1_j9h3nbm wrote
Reply to comment by blueSGL in [R] ChatGPT for Robotics: Design Principles and Model Abilities by CheapBreakfast9
If you don't know how to code, then regardless of how you interface it's going to be difficult to execute. If you do know how to code, then you'll probably want better encapsulation. I guess what I'm most curious about is if those code examples they give in their paper are able to be ran, like are those libraries that easy to use
OneDollarToMillion t1_j9h2we1 wrote
Reply to comment by master3243 in [D] Maybe a new prompt injection method against newBing or ChatGPT? Is this kind of research worth writing a paper? by KakaTraining
There is scientific contribution for the sociologg and politology.
You basically research what kind of people are at the helm.
morecoffeemore t1_j9h1m0n wrote
Reply to [D] Simple Questions Thread by AutoModerator
How does the facial search engine pimeyes work?
It's frighteningly accurate. I've tested on pictures of people I know and it's accurately found other pictures of them online which leads to their identity. The false positive rate is pretty low.
I have a technical background (although not CS), so please provide more than a simple response if you can.
Thanks.
k___k___ t1_j9gyx7m wrote
Reply to comment by londons_explorer in [D] Maybe a new prompt injection method against newBing or ChatGPT? Is this kind of research worth writing a paper? by KakaTraining
this is also why Microsoft now limits the conversation depth to 5 interactions per session
zds-nlp t1_j9hulkz wrote
Reply to [P] The First Depthwise-separable Convolution Animation by Animated-AI
This is brilliant, thanks for sharing