Recent comments in /f/MachineLearning

mycall t1_j51xq1r wrote

Loss/cost functions are used to optimize the model during training. The objective is almost always to minimize the loss function. The lower the loss the better the model. Cross-Entropy loss is a most important cost function. It is used to optimize classification models. The understanding of Cross-Entropy is pegged on understanding of Softmax activation function.

2

EmmyNoetherRing t1_j51x98z wrote

> As an alternative evaluation, we measure cross-entropy loss, which is used in scaling laws for pre-training, for the six emergent BIG-Bench tasks, as detailed in Appendix A. This analysis follows the same experimental setup from BIG-Bench (2022) and affirms their conclusions for the six emergent tasks we consider. Namely, cross-entropy loss improves even for small model scales where the downstream metrics (exact match, BLEU, and accuracy) are close to random and do not improve, which shows that improvements in the log-likelihood of the target sequence can be masked by such downstream metrics. However, this analysis does not explain why downstream metrics are emergent or enable us to predict the scale at which emergence occurs. Overall, more work is needed to tease apart what enables scale to unlock emergent abilities.

Don't suppose you know what cross-entropy is?

1

tennismlandguitar OP t1_j51rizd wrote

Haha I guess I just haven't seen it in my experience.

I think for research scientists, it becomes far easier to implement improvements to the existing SOTA models if they don't have to try implementing them from scratch.

For MLEs, definitely makes sense that it needs to be in the context of an application. Is that hard enough to drive people away from trying in your experience?

1

tennismlandguitar OP t1_j51r1ao wrote

Wow, thanks for the response, that was really enlightening-- I never thought about monitoring to support these models.
Have you noticed one of these problems to be the biggest issue in industry?
W/ regards to your last point, that definitely makes sense in case of a simple CNN or deep network, but sometimes there are more complicated RL algorithms or transformers that become a bit difficult and time-intensive to implement. In these cases, I would suspect that it would be easier to use something open-sourced?

2

lukaszluk t1_j51m1sf wrote

Hello!
Does anyone know of a dataset with 2-D floor plan images with labeled furniture?
Couldn't find anything interesting (bad quality or very little examples).
Some of the places I tried:
SESYD - ok quality dataset (but little examples)
HouseExpo - json datasets - the quality is good, but no labeled furniture.
FloorPlanCAD Dataset - the quality of data is low
Furnishing dataset - does not contain whole rooms, only furniture
SFPI dataset Towards Robust Object Detection in Floor Plan Images: A Data
Augmentation Approach. 10k images (this could be a good dataset if quality is good, still downloading though)
Any other datasets I should check out?

1

aminostfx t1_j51kpiw wrote

This is awesome. But i have an idea to make it even better. What if we train a RL agent to write code without errors and actually make sure there is no bugs in the code. The environment used to train the RL would be the compiler. We can start first with support python only and supporting other languages later on. DM if you’re interested to colab on this project.

1

JClub OP t1_j51h8up wrote

> Shouldn't it be the opposite ?

Yes, that makes more sense. Will change!

> How is that different from min(ratio * R, 1.2 * R) ? Does 0.8 have any influence at all ?

Maybe I did not explain properly what the clip is doing. If you have ratio=0.6, then it become 0.8 and if it is > 1.2, it becomes 1.2
Does that make more sense? Regarding the min operation, it's just an heuristic to choose the smaller update tbh

2

TastyOs t1_j5129q7 wrote

I assume you're doing something like minimizing MSE between inputs and reconstructions. Instead of calculating MSE for all 21 columns, you split it into two parts: do an MSE for the important columns, and an MSE for the unimportant columns. Then weight the important MSE higher than the unimportant MSE

​

So something like

loss = 0.9 * MSE_important + 0.1 * MSE_unimportant

2