Recent comments in /f/MachineLearning

canbooo t1_j8mfn6z wrote

Since you are waiting since 6h without any response, let me share my 5c. You are probably inspired by chatgpt and the success of HRL so why not start there: https://openreview.net/forum?id=20-xDadEYeU

But this idea is not novel, only its application to nlp. It has been applied to other stuff like games and autonomous driving. They use PPO, which is to me the most robust on-policy algorithm. However, any other on-policy algorithm could also have been used instead and stuff like SAC could improve sample efficiency but might run into convergence problems. Also, you can try to be more generalistic and try off-policy algorithms independent of a specific language model. This would allow using same experience/value model to fine tune other LMs. But it might require much much more data to achieve a similar performance. In any case, the application of RL to NLP (except for language based games) is quite new and many points remain yet to be answered.

3

soviet69er t1_j8mbzpm wrote

Hello! I`m currently a 2nd year data science student and I am into machine learning engineering as a career, and I`m wondering what skills should I learn on my own beside (python ml frameworks) and data engineering frameworks such as pyspark, I was considering to learn java but I am not sure if I am better off investing my time learning something else

1

Oripy t1_j8m8ejv wrote

I have a question related to the Actor Critic method described in the keras example here: https://keras.io/examples/rl/actor_critic_cartpole/

I looked at the code for the Train part, and I think I understand what all lines are supposed to do and why they are there. However, I don't think I understand what role the critic plays in the improvement of the agent. To me this critic is just a value that predicts the future reward, but I don't see this being fed back into the system for the agent to select a better action to improve its reward.

Do I have a good understanding? Is the critic just a "bonus" output? Are the two unrelated and the exact same performance could be achieved by removing the Critic output altogether? Or is the critic output used in any way to improve learning rate in a way I fail to see?

Thank you.

1

bushrod t1_j8m68xt wrote

This technique is similar to data augmentation, but with a specific focus on important samples. There may not be a specific name for this technique, but it could be considered a form of "strategic oversampling" or "strategic repetition" of important samples. By repeating these important samples in every batch, you are increasing their impact on the training process and potentially helping the neural network to converge to a better solution that takes these samples into account.

It's worth noting that this technique may not always be appropriate or necessary, and it could potentially lead to overfitting if not used carefully. However, in cases where there are a small number of important samples that have a disproportionate impact on the end application, repeating them in every batch can be a useful approach to ensure that the neural network learns to incorporate their information effectively.

:-P

1

specializedboy t1_j8lwdgu wrote

Does anyone know any study groups or any resources that targets towards learning causal inference in machine learning. I have recently started learning causal inference. Please ping me if any one interested to form a study group or something to learn.

2

SnooStories4137 t1_j8lrsug wrote

Some reinforcement learning like algorithm seems like really interesting next step here. Observation = task (like qa or mask filling), actions = api call where the output updates the observation via concatenation as in the paper, environment is apis and database and python installation etc, state is network weights, reward is loss function before and after update to observation.

I feel like even if the only api is just generating text using itself to update the observation ('to help itself think') intuitively seems like it could help for some things. Rather than try to fill in the mask right away, it might recognize better to first 'think a little' to update its working memory (which is of course the observation here).

1

Maleficent_Stay_7737 OP t1_j8ktr7x wrote

Not exactly. Both are formulated as inverse problem in image processing. Super-Resolution investigates the case where information is lost due to downscaling whereas deblurring focus on blurry input (e.g., by low pass filters). However, they have similar properties and deep learning based methods can be applied to both. In this survey, we didn't go deeper into the deblurring topic.

6

Maleficent_Stay_7737 OP t1_j8kezw2 wrote

Thank you very much for your comment. It is a very valuable and important note for the subject and community as this is a super important aspect of image SR. We refer to this topic under the Unsupervised SR section (8) but did not have the space to go into more detail, which doesn't mean it doesn't deserve attention. We referenced another survey by Liu et al. (“Blind image superresolution: A survey and beyond", https://arxiv.org/abs/2107.03055) from 2022 to fill this gap (also mentions KernelGAN and related methods), which we find is an informative source for blind SR in general.

9