mycall t1_j51xq1r wrote on January 19, 2023 at 8:48 PM

Reply to comment by EmmyNoetherRing in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

Loss/cost functions are used to optimize the model during training. The objective is almost always to minimize the loss function. The lower the loss the better the model. Cross-Entropy loss is a most important cost function. It is used to optimize classification models. The understanding of Cross-Entropy is pegged on understanding of Softmax activation function.

EmmyNoetherRing t1_j51x98z wrote on January 19, 2023 at 8:45 PM

Reply to comment by mycall in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

> As an alternative evaluation, we measure cross-entropy loss, which is used in scaling laws for pre-training, for the six emergent BIG-Bench tasks, as detailed in Appendix A. This analysis follows the same experimental setup from BIG-Bench (2022) and affirms their conclusions for the six emergent tasks we consider. Namely, cross-entropy loss improves even for small model scales where the downstream metrics (exact match, BLEU, and accuracy) are close to random and do not improve, which shows that improvements in the log-likelihood of the target sequence can be masked by such downstream metrics. However, this analysis does not explain why downstream metrics are emergent or enable us to predict the scale at which emergence occurs. Overall, more work is needed to tease apart what enables scale to unlock emergent abilities.

Don't suppose you know what cross-entropy is?

blimpyway t1_j51wv3h wrote on January 19, 2023 at 8:43 PM

Reply to [D] is it time to investigate retrieval language models? by hapliniste

Retrieval should work also on entire interaction history with a particular user. Not only tracking beyond token window but having available all "interesting stuff" from users perspective.

[deleted] t1_j51wpic wrote on January 19, 2023 at 8:42 PM

Reply to comment by DaLameLama in [D] Inner workings of the chatgpt memory by terserterseness

[deleted]

EmmyNoetherRing t1_j51wpgz wrote on January 19, 2023 at 8:42 PM

Reply to comment by mycall in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

thank you, I've been looking for something along these lines.

mycall t1_j51wahq wrote on January 19, 2023 at 8:40 PM

Reply to comment by EmmyNoetherRing in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon

I'm not exactly sure what it is or how it would manifest, but perhaps it is related to Emergent Abilities of Large Language Models

BenjaminJamesBush t1_j51w4g7 wrote on January 19, 2023 at 8:39 PM

Reply to [D] Inner workings of the chatgpt memory by terserterseness

https://dagster.io/blog/chatgpt-langchain

tennismlandguitar OP t1_j51sbtt wrote on January 19, 2023 at 8:16 PM

Reply to comment by junetwentyfirst2020 in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

HAHA no worries, sent you a DM about this stuff in general, answer with what you're comfortable with!

tennismlandguitar OP t1_j51rxa3 wrote on January 19, 2023 at 8:13 PM

Reply to comment by MrAcurite in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

Thanks! Sent you a DM to follow up with this :)

tennismlandguitar OP t1_j51rtwb wrote on January 19, 2023 at 8:13 PM

Reply to comment by PredictorX1 in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

Definitely! An example could be the use of https://github.com/AI4Finance-Foundation/FinRL in quant-firms and fintech.

tennismlandguitar OP t1_j51rizd wrote on January 19, 2023 at 8:11 PM

Reply to comment by [deleted] in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

Haha I guess I just haven't seen it in my experience.

I think for research scientists, it becomes far easier to implement improvements to the existing SOTA models if they don't have to try implementing them from scratch.

For MLEs, definitely makes sense that it needs to be in the context of an application. Is that hard enough to drive people away from trying in your experience?

tennismlandguitar OP t1_j51r6gb wrote on January 19, 2023 at 8:09 PM

Reply to comment by z_fi in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

Definitely agree with the first point here. Could you expand a bit more on the second? Why is it limited usefulness with transfer learning and fine-tuning today?

junetwentyfirst2020 t1_j51r600 wrote on January 19, 2023 at 8:09 PM

Reply to comment by tennismlandguitar in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

I refuse to answer on the grounds that I may purger myself

tennismlandguitar OP t1_j51r1ao wrote on January 19, 2023 at 8:08 PM

Reply to comment by TheTwigMaster in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

Wow, thanks for the response, that was really enlightening-- I never thought about monitoring to support these models.
Have you noticed one of these problems to be the biggest issue in industry?
W/ regards to your last point, that definitely makes sense in case of a simple CNN or deep network, but sometimes there are more complicated RL algorithms or transformers that become a bit difficult and time-intensive to implement. In these cases, I would suspect that it would be easier to use something open-sourced?

tennismlandguitar OP t1_j51q970 wrote on January 19, 2023 at 8:03 PM

Reply to comment by LcuBeatsWorking in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

Totally agree! I've found this problem to be a big issue myself, so I assumed that was the main issue. Sent you a DM to talk a bit more about this :)

tennismlandguitar OP t1_j51prk0 wrote on January 19, 2023 at 8:00 PM

Reply to comment by junetwentyfirst2020 in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

I suppose it's just the teams I've been on, then!

Do you see mostly research teams use these? Or have you also seen software teams use some ML engineers to integrate these open-source models into their products? (Where licensing is not an issue)

lukaszluk t1_j51m1sf wrote on January 19, 2023 at 7:38 PM

Reply to [D] Simple Questions Thread by AutoModerator

Hello!
Does anyone know of a dataset with 2-D floor plan images with labeled furniture?
Couldn't find anything interesting (bad quality or very little examples).
Some of the places I tried:
SESYD - ok quality dataset (but little examples)
HouseExpo - json datasets - the quality is good, but no labeled furniture.
FloorPlanCAD Dataset - the quality of data is low
Furnishing dataset - does not contain whole rooms, only furniture
SFPI dataset Towards Robust Object Detection in Floor Plan Images: A Data
Augmentation Approach. 10k images (this could be a good dataset if quality is good, still downloading though)
Any other datasets I should check out?

aminostfx t1_j51kpiw wrote on January 19, 2023 at 7:29 PM

Reply to [P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3 by jsonathan

This is awesome. But i have an idea to make it even better. What if we train a RL agent to write code without errors and actually make sure there is no bugs in the code. The environment used to train the RL would be the compiler. We can start first with support python only and supporting other languages later on. DM if you’re interested to colab on this project.

JClub OP t1_j51h8up wrote on January 19, 2023 at 7:08 PM

Reply to comment by Ouitos in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

> Shouldn't it be the opposite ?

Yes, that makes more sense. Will change!

> How is that different from min(ratio * R, 1.2 * R) ? Does 0.8 have any influence at all ?

Maybe I did not explain properly what the clip is doing. If you have ratio=0.6, then it become 0.8 and if it is > 1.2, it becomes 1.2
Does that make more sense? Regarding the min operation, it's just an heuristic to choose the smaller update tbh

ramya_1995 OP t1_j51h0ka wrote on January 19, 2023 at 7:07 PM

Reply to comment by laaweel in [D] GCN datasets by ramya_1995

Thank you u/laaweel!

EmmyNoetherRing t1_j51cvjh wrote on January 19, 2023 at 6:42 PM

Reply to comment by DaLameLama in [D] Inner workings of the chatgpt memory by terserterseness

yeah, as an uninformed guess it seems like IJCAI or NeurIPS would be a more natural home, but AAAI is actually in DC, which seems helpful for some categories of conversation. if the right people attend.

DaLameLama t1_j519tns wrote on January 19, 2023 at 6:23 PM

Reply to comment by EmmyNoetherRing in [D] Inner workings of the chatgpt memory by terserterseness

There was an OpenAI party at NeurIPS, but I wasn't there. No clue about AAAI :)

retarded_user t1_j518o13 wrote on January 19, 2023 at 6:16 PM

Reply to [D] Simple Questions Thread by AutoModerator

Should the learning rate be changed to a smaller value (such as 1e-4) when working with scaled Data (range [0,1] or [-1,1]?

I'm using Adam with Keras/Tensorflow.

MrSpotgold OP t1_j5148cv wrote on January 19, 2023 at 5:50 PM

Reply to comment by BrotherAmazing in [D] Can ChatGPT flag it's own writings? by MrSpotgold

I appreciate your comments. We don't have to agree. Moreover, I could be wrong.

TastyOs t1_j5129q7 wrote on January 19, 2023 at 5:38 PM

Reply to comment by Iljaaaa in [D] Simple Questions Thread by AutoModerator

I assume you're doing something like minimizing MSE between inputs and reconstructions. Instead of calculating MSE for all 21 columns, you split it into two parts: do an MSE for the important columns, and an MSE for the unimportant columns. Then weight the important MSE higher than the unimportant MSE

So something like

loss = 0.9 * MSE_important + 0.1 * MSE_unimportant

Recent comments in /f/MachineLearning