Recent comments in /f/MachineLearning

friend_of_kalman t1_jb9nfzj wrote

I'm working with a small group at a big local university hospital. We have a hug dataset of patience data from the Neurological ICU and are currently applying AI for risk detection. For Example we are doing time series forecasting on all sorts of medical indicators(vital data, blood gas analysis etc.) Adoption is slow though and most hospitals don't have the proper infrastructure in place for this.

1

enjakuro t1_jb9l86l wrote

Ah it was the rare text thing I believe. Now that I'm more awake I also realized that they copied the source to target, meaning the same language as source and target while keeping the rest bilingual. If I can recall correctly, you can have up to 50% copied data which makes the training set much bigger. I guess if the images aren't exactly the same this would have the same effect. Basically training a language model.

2

alterframe t1_jb9i70h wrote

Interesting. With many probabilistic approaches, where we have some intermediate variables in a graph like X -> Z -> Y, we need to introduce sampling on Z to prevent mode collapse. Then we also decay the entropy of this sampler with temperature.

This is quite similar to this early dropout idea, because there we also have some sampling process that effectively works only at the beginning of the training. However, in those other scenarios, we rather attribute it to something like exploration vs. exploitation.

If we had an agent that almost immediately assigns very high probability to a bad initial actions, then it may be never able find a proper solution. On a loss landscape in worst case scenario we can also end up in a local minimum very early on, so we use higher lr at the beginning to make it less likely.

Maybe in general random sampling could be safer than using higher lr? High lr can still fail for some models. If, by parallel, we do it just to boost early exploration, then maybe randomness could be a good alternative. That would kind of counter all claims based on analysis of convex functions...

2

ReginaldIII t1_jb9goco wrote

Link to your code? It needs to be GPLv3 to be compliant with LLama's licensing.

How are you finding the quality of the output? I've had a little play around with the model but wasn't overly impressed. That said, a nice big parameter set like this is a nice test bed for looking at things like pruning methods.

−4

bo_peng OP t1_jb9bdw3 wrote

Directly from RWKV-LM Github:

RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.

1

graphicteadatasci t1_jb9afw5 wrote

Really? Because copying all your data once is the same as running your dataset twice per epoch instead of once. Doesn't sound right. Unless your test data is drawn from the same dataset and duplication happens before splitting in which case you would certainly expect metric improvements. Or was this a case of duplicating rare text in which case it is the opposite of having duplicate images in LAION.

1

BogBodySalad t1_jb96vut wrote

I'm in the same boat as you (MLE, lots of work exp incl. in a US AI startup, just started freelancing journey). Here are some more client acquisition/marketing ideas:

  • Have a popular open-source project (ML related) on github (takes time)
  • Content marketing: Write articles/blog posts and promote those on LN
  • Give talks on conferences/meetup
  • Find established freelancers and ask to be a subcontractor
  • Chase prospects on LN (e.g. identify a niche like "YC startup founders" follow them and engage in a conservation.

everyone: pm me if you want to connect on LN (or if you have a project for me 🤗)

1

etesian_dusk t1_jb94rak wrote

Ok, that doesn't sound like much. I don't understand why I should abandon standard and verified tools for this.

On top of that the whole "George Hotz Twitter internship" thing was just embarassing. I trust him to jailbreak playstations, but that's the end of it.

7