Recent comments in /f/MachineLearning
Avelina9X OP t1_j4vn6su wrote
Reply to comment by gunshoes in [D] Has any work been done on VQ-VAE Language Models? by Avelina9X
Ahhhh! So it seems like this is something that's been explored in the slightly parallel domain of TTS and ASR rather than in pure text LMs, thanks for pointing me in this direction!
Avelina9X OP t1_j4vn244 wrote
Reply to comment by dojoteef in [D] Has any work been done on VQ-VAE Language Models? by Avelina9X
Thank you for the resource! I'll have a deep dive into this!
jostmey t1_j4vmqcs wrote
A deep neural network trained by backpropagation will converge to a local minimum if you use gradient descent or even stochastic gradient descent. However, there are many components added to a deep neural network like dropout and batch normalization, which as far as I know, do not come with convergence guarantees.
There are no guarantees about finding a global minimum
[deleted] t1_j4vmj5c wrote
[deleted]
femboyxx98 t1_j4vlsfj wrote
Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
Have you compared it against modern transformer implementations e.g. with FlashAttention, which can provide 3x-5x speed up by itself?
tdgros t1_j4vipol wrote
Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa
if the two cameras are rigidly fixed, then you can calibrate them like one calibrates a stereo pair, and at least align the orientation and intrinsics. The points very far from the camera will be well aligned, the ones very close will remain unaligned.
The calibration process will involve you pointing positions by hand, but the maths for the correction is very very simple after that.
nmkd t1_j4vh7cl wrote
Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa
Okay, in that case, I'll try to be a bit more helpful lol.
I think you absolutely need to use something like YOLO for object identification/classification.
-
Humans and animals are warmer than the environment
-
Cars and other vehicles are warmer than the environment
-
Glass blocks IR but not visible light
You could get the overall "look" with just image-based networks, but to make it really convincing (more like COD's thermal vision) you need classification in order to make objects look hot that are supposed to be hot.
kingdroopa OP t1_j4vh0sq wrote
Reply to comment by Anjum48 in [D] Suggestion for approaching img-to-img? by kingdroopa
Ahh, I see. Thanks! I'll write it down in my TODO list. Might have to investigate seg masks a bit more :)
kingdroopa OP t1_j4vguxt wrote
Reply to comment by ML4Bratwurst in [D] Suggestion for approaching img-to-img? by kingdroopa
The GAN models I've tested are based on the 'unaligned' approach (e.g. CycleGAN). I still have not tested to cut and resize the images, to make them show the same region. My immediate thought would be that the top-and-bottom of both images might dissapear, but perhaps its ok still?
kingdroopa OP t1_j4vgmvb wrote
Reply to comment by ML4Bratwurst in [D] Suggestion for approaching img-to-img? by kingdroopa
Interesting! I will for sure write that down in my TODO list, thanks!
kingdroopa OP t1_j4vgho9 wrote
Reply to comment by nmkd in [D] Suggestion for approaching img-to-img? by kingdroopa
Correct, it's not physically possible. This is a research project to find to what degree it IS possible :)
nmkd t1_j4vg72g wrote
Reply to [D] Suggestion for approaching img-to-img? by kingdroopa
You cannot just translate visible light to IR. No matter what machine learning you use, this is physically impossible.
ML4Bratwurst t1_j4vfm36 wrote
Reply to [D] Suggestion for approaching img-to-img? by kingdroopa
Maybe you could also turn the RGB image into grayscale and use it as an additional supervised loss for regularization and maybe more stable training.
ML4Bratwurst t1_j4vfax7 wrote
Reply to [D] Suggestion for approaching img-to-img? by kingdroopa
I think one important part here is the "misalignment" of the images. Have you tried to cut and resize the images, so that they show the same region? You don't need a GAN then
Anjum48 t1_j4vc9kp wrote
Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa
The Unet I described will output a continuous number for each pixel between 0 & 1, which you can use as a proxy for your IR image.
People often use a threshold to this image (e.g. 0.5) to create a mask which might be where you are getting confused
BlazeObsidian t1_j4vc61q wrote
Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa
That depends on the extent to which the pixel information is misaligned I think. If cropping your images is not a solution and a large portion of your images have this issue, the model wouldn't be able to generate the right pixel information for the misaligned sections. But it's worth giving a try with Palette if the misalignment is not significant.
YouDamnHotdog t1_j4vbd1a wrote
Voice.ai
YouDamnHotdog t1_j4vbbf4 wrote
Reply to comment by mamafied in [D] Any model like VALL-E available currently? by CeFurkan
Man, that doesn't work at aaaaaall. Sounds like the worst robot and nothing like me
kingdroopa OP t1_j4vbaxk wrote
Reply to comment by BlazeObsidian in [D] Suggestion for approaching img-to-img? by kingdroopa
Thanks :) I noticed Palette uses paired images, whilst mine are a bit unaligned. Would you considered it a paired image set, or unpaired? They look closely similar, but don't share pixel information in the top/bottom of the images.
fullouterjoin t1_j4vbawe wrote
Reply to comment by thedabking123 in [D] Bitter lesson 2.0? by Tea_Pearce
> requirements for explainability
We have to start pushing for this legislation now. If you leave it up to the market, Equifax will just make a magic Credit Score model that will be like huffing tea leaves.
BlazeObsidian t1_j4var74 wrote
Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa
Sorry, I was wrong. Modern deep VAE's can match SOTA GAN model performance for img superresolution(https://arxiv.org/abs/2203.09445) but I don't have evidence for recoloring.
But diffusion models are shown to outperform GAN's on multiple img-to-img translation tasks. Eg:- https://deepai.org/publication/palette-image-to-image-diffusion-models
You could probably reframe your problem as an image colorization task:- https://paperswithcode.com/task/colorization and the SOTA is still Palette linked above
kingdroopa OP t1_j4vafrc wrote
Reply to comment by Anjum48 in [D] Suggestion for approaching img-to-img? by kingdroopa
Thanks a lot! Will look into it, but seems like the U-NET outputs are segmentation masks, whilst I want it to actually output (generate) IR image equivalents of the RGB image. Is there some idea that I'm missing, perhaps?
JacksOngoingPresence t1_j4v9jh4 wrote
Reply to [D] RLHF - What type of rewards to use? by JClub
There isn't much difference between "Simply 👍/👎" and "scale of 1-5". They will probably give ~same results. I understand first one as {0, 1} and second as {0, ... , 1}. It's just the question of resolution. the 1-5 thing will most likely give you faster convergence, but it can also f you up if some of your data gets mislabeled. Since it's easier to make mistakes with high resolution.
But in a limit, if you take 1 million different people and ask them to asses your model in a binary fashion, or on a scale of 1 to 10, and then average out results, you will get the same thing. It's just from a human perspective, it's easier to asses things as yes-no. (e.g., "did you like this new movie?" vs "how would you rate this movie on a scale from 1 to 10?"). But from computer's perspective, ML wants that label to be as close to its true value as possible.
Anjum48 t1_j4v8mpm wrote
Reply to comment by kingdroopa in [D] Suggestion for approaching img-to-img? by kingdroopa
+1 for UNets. Since IR will be a single channel you could use a single class semantic segmentation-type model (i.e. a UNet with a 1-channel output passed through a sigmoid). Something like this would get you started:
model = sm.Unet('resnet34', classes=1, activation='sigmoid')
Edit: Forgot the link for the package I'm referencing: https://github.com/qubvel/segmentation_models
Many of the most popular encoders/backbones are implemented in that package
Edit 2: Is the FOV important? If you could resize the images so that the RGB & IR FOV are equivalent then that would make things a lot simpler
Avelina9X OP t1_j4vngcs wrote
Reply to comment by C0hentheBarbarian in [D] Has any work been done on VQ-VAE Language Models? by Avelina9X
Ahah! It seems like the reason I couldn't find anything is because I was being too specific about text seq models and I was disregarding the domain of audio. Thank you!