Avelina9X OP t1_j4vngcs wrote on January 18, 2023 at 4:09 PM

Reply to comment by C0hentheBarbarian in [D] Has any work been done on VQ-VAE Language Models? by Avelina9X

Ahah! It seems like the reason I couldn't find anything is because I was being too specific about text seq models and I was disregarding the domain of audio. Thank you!

Avelina9X OP t1_j4vn6su wrote on January 18, 2023 at 4:07 PM

Reply to comment by gunshoes in [D] Has any work been done on VQ-VAE Language Models? by Avelina9X

Ahhhh! So it seems like this is something that's been explored in the slightly parallel domain of TTS and ASR rather than in pure text LMs, thanks for pointing me in this direction!

Avelina9X OP t1_j4vn244 wrote on January 18, 2023 at 4:06 PM

Reply to comment by dojoteef in [D] Has any work been done on VQ-VAE Language Models? by Avelina9X

Thank you for the resource! I'll have a deep dive into this!

jostmey t1_j4vmqcs wrote on January 18, 2023 at 4:04 PM

Reply to [D] Are there any results on convergence guarantees when optimizing NNs? by Dartagnjan

A deep neural network trained by backpropagation will converge to a local minimum if you use gradient descent or even stochastic gradient descent. However, there are many components added to a deep neural network like dropout and batch normalization, which as far as I know, do not come with convergence guarantees.

There are no guarantees about finding a global minimum

[deleted] t1_j4vmj5c wrote on January 18, 2023 at 4:03 PM

Reply to [D] Are there any results on convergence guarantees when optimizing NNs? by Dartagnjan

[deleted]

femboyxx98 t1_j4vlsfj wrote on January 18, 2023 at 3:58 PM

Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng

Have you compared it against modern transformer implementations e.g. with FlashAttention, which can provide 3x-5x speed up by itself?

tdgros t1_j4vipol wrote on January 18, 2023 at 3:39 PM

Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa

if the two cameras are rigidly fixed, then you can calibrate them like one calibrates a stereo pair, and at least align the orientation and intrinsics. The points very far from the camera will be well aligned, the ones very close will remain unaligned.

The calibration process will involve you pointing positions by hand, but the maths for the correction is very very simple after that.

nmkd t1_j4vh7cl wrote on January 18, 2023 at 3:29 PM

Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa

Okay, in that case, I'll try to be a bit more helpful lol.

I think you absolutely need to use something like YOLO for object identification/classification.

Humans and animals are warmer than the environment
Cars and other vehicles are warmer than the environment
Glass blocks IR but not visible light

You could get the overall "look" with just image-based networks, but to make it really convincing (more like COD's thermal vision) you need classification in order to make objects look hot that are supposed to be hot.

kingdroopa OP t1_j4vh0sq wrote on January 18, 2023 at 3:28 PM

Reply to comment by Anjum48 in [D] Suggestion for approaching img-to-img? by kingdroopa

Ahh, I see. Thanks! I'll write it down in my TODO list. Might have to investigate seg masks a bit more :)

kingdroopa OP t1_j4vguxt wrote on January 18, 2023 at 3:26 PM

Reply to comment by ML4Bratwurst in [D] Suggestion for approaching img-to-img? by kingdroopa

The GAN models I've tested are based on the 'unaligned' approach (e.g. CycleGAN). I still have not tested to cut and resize the images, to make them show the same region. My immediate thought would be that the top-and-bottom of both images might dissapear, but perhaps its ok still?

kingdroopa OP t1_j4vgmvb wrote on January 18, 2023 at 3:25 PM

Reply to comment by ML4Bratwurst in [D] Suggestion for approaching img-to-img? by kingdroopa

Interesting! I will for sure write that down in my TODO list, thanks!

kingdroopa OP t1_j4vgho9 wrote on January 18, 2023 at 3:24 PM

Reply to comment by nmkd in [D] Suggestion for approaching img-to-img? by kingdroopa

Correct, it's not physically possible. This is a research project to find to what degree it IS possible :)

nmkd t1_j4vg72g wrote on January 18, 2023 at 3:22 PM

Reply to [D] Suggestion for approaching img-to-img? by kingdroopa

You cannot just translate visible light to IR. No matter what machine learning you use, this is physically impossible.

ML4Bratwurst t1_j4vfm36 wrote on January 18, 2023 at 3:18 PM

Reply to [D] Suggestion for approaching img-to-img? by kingdroopa

Maybe you could also turn the RGB image into grayscale and use it as an additional supervised loss for regularization and maybe more stable training.

ML4Bratwurst t1_j4vfax7 wrote on January 18, 2023 at 3:16 PM

Reply to [D] Suggestion for approaching img-to-img? by kingdroopa

I think one important part here is the "misalignment" of the images. Have you tried to cut and resize the images, so that they show the same region? You don't need a GAN then

Anjum48 t1_j4vc9kp wrote on January 18, 2023 at 2:56 PM

Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa

The Unet I described will output a continuous number for each pixel between 0 & 1, which you can use as a proxy for your IR image.

People often use a threshold to this image (e.g. 0.5) to create a mask which might be where you are getting confused

BlazeObsidian t1_j4vc61q wrote on January 18, 2023 at 2:55 PM

Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa

That depends on the extent to which the pixel information is misaligned I think. If cropping your images is not a solution and a large portion of your images have this issue, the model wouldn't be able to generate the right pixel information for the misaligned sections. But it's worth giving a try with Palette if the misalignment is not significant.

YouDamnHotdog t1_j4vbd1a wrote on January 18, 2023 at 2:49 PM

Reply to [D] Any model like VALL-E available currently? by CeFurkan

Voice.ai

YouDamnHotdog t1_j4vbbf4 wrote on January 18, 2023 at 2:49 PM

Reply to comment by mamafied in [D] Any model like VALL-E available currently? by CeFurkan

Man, that doesn't work at aaaaaall. Sounds like the worst robot and nothing like me

kingdroopa OP t1_j4vbaxk wrote on January 18, 2023 at 2:49 PM

Reply to comment by BlazeObsidian in [D] Suggestion for approaching img-to-img? by kingdroopa

Thanks :) I noticed Palette uses paired images, whilst mine are a bit unaligned. Would you considered it a paired image set, or unpaired? They look closely similar, but don't share pixel information in the top/bottom of the images.

fullouterjoin t1_j4vbawe wrote on January 18, 2023 at 2:49 PM

Reply to comment by thedabking123 in [D] Bitter lesson 2.0? by Tea_Pearce

> requirements for explainability

We have to start pushing for this legislation now. If you leave it up to the market, Equifax will just make a magic Credit Score model that will be like huffing tea leaves.

BlazeObsidian t1_j4var74 wrote on January 18, 2023 at 2:45 PM

Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa

Sorry, I was wrong. Modern deep VAE's can match SOTA GAN model performance for img superresolution(https://arxiv.org/abs/2203.09445) but I don't have evidence for recoloring.

But diffusion models are shown to outperform GAN's on multiple img-to-img translation tasks. Eg:- https://deepai.org/publication/palette-image-to-image-diffusion-models

You could probably reframe your problem as an image colorization task:- https://paperswithcode.com/task/colorization and the SOTA is still Palette linked above

kingdroopa OP t1_j4vafrc wrote on January 18, 2023 at 2:43 PM

Reply to comment by Anjum48 in [D] Suggestion for approaching img-to-img? by kingdroopa

Thanks a lot! Will look into it, but seems like the U-NET outputs are segmentation masks, whilst I want it to actually output (generate) IR image equivalents of the RGB image. Is there some idea that I'm missing, perhaps?

JacksOngoingPresence t1_j4v9jh4 wrote on January 18, 2023 at 2:37 PM

Reply to [D] RLHF - What type of rewards to use? by JClub

There isn't much difference between "Simply 👍/👎" and "scale of 1-5". They will probably give ~same results. I understand first one as {0, 1} and second as {0, ... , 1}. It's just the question of resolution. the 1-5 thing will most likely give you faster convergence, but it can also f you up if some of your data gets mislabeled. Since it's easier to make mistakes with high resolution.

But in a limit, if you take 1 million different people and ask them to asses your model in a binary fashion, or on a scale of 1 to 10, and then average out results, you will get the same thing. It's just from a human perspective, it's easier to asses things as yes-no. (e.g., "did you like this new movie?" vs "how would you rate this movie on a scale from 1 to 10?"). But from computer's perspective, ML wants that label to be as close to its true value as possible.

Anjum48 t1_j4v8mpm wrote on January 18, 2023 at 2:30 PM

Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa

+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:

model = sm.Unet('resnet34', classes=1, activation='sigmoid')

Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models

Many of the most popular encoders/backbones are implemented in that package

Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler

Recent comments in /f/MachineLearning