Recent comments in /f/MachineLearning
gtancev t1_j4rrpm6 wrote
Double descent may also be of interest.
MrSpotgold OP t1_j4rqx4q wrote
Reply to comment by BrotherAmazing in [D] Can ChatGPT flag it's own writings? by MrSpotgold
This software is a nightmare for anyone in the teaching business (whether secondary school or higher education) where assessments is based on essays. I'm not kidding: a nightmare. We are going to bring up kids who will not be able write a comprehensive text simply because we lack the means to check that they wrote it themselves, and therefore we must abandon the assessment method altogether. It's that bad.
GPT-5entient t1_j4rop7m wrote
Reply to comment by --algo in [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC
Nope. CoPilot is Codex and ChatGPT is Da Vinci.
[deleted] t1_j4rjw0s wrote
Reply to comment by __lawless in [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
[deleted]
mrconter1 OP t1_j4rintd wrote
Reply to comment by navillusr in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
Yeah you're right. My approach see to be a bit more general and should be less work.
bo_peng OP t1_j4rht4i wrote
Reply to comment by currentscurrents in [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
RWKV is a RNN that also works as a linear transformer (or we may say it's a linear transformer that also works as a RNN). So it has both parallel & serial mode, and you get the best of both worlds (fast and saves VRAM).
Almost all such "linear transformers" are bad at language modeling, but RWKV is the exception. The basic idea is a bit similar to https://arxiv.org/abs/2105.14103. Then I added lots of new ideas :)
navillusr t1_j4rhitt wrote
Reply to comment by mrconter1 in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
The distinctions you’re drawing, pixels vs selenium output and browser vs os, are far less significant than the complexity of the tasks (step-by-step vs entire processes). What they’ve achieved is strictly harder for humans than what you are testing. We can argue whether perception or planning are harder for current technology (the computer vision is far more developed than AI planning right now), but I think you need to reconsider the formulation of your tasks. It seems like they are designed to be easy enough for modern methods to solve.
On another note, most interesting tasks can’t be completed with just an x,y mouse location output. Why did you decide to restrict the benchmark to such a limited set of tasks?
navillusr t1_j4rexbm wrote
Reply to comment by mrconter1 in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
This is wrong, WoB/MiniWoB++ has a 160 x210px observation. Also some OS’s (chrome OS) are almost entirely web based, so this distinction is minimal.
mrconter1 OP t1_j4rdm59 wrote
Reply to comment by navillusr in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
Adept AI is restricted to the web and also does not use raw pixels as input...
mrconter1 OP t1_j4rdf3t wrote
Reply to comment by navillusr in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
The MiniWoB++ is restricted to website related things on not OS also it does not take raw pixels as input.
currentscurrents t1_j4rcc3e wrote
Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
Interesting! I haven't heard of RWKV before.
Getting rid of attention seems like a good way to increase training speed (since training all those attention heads at once is slow), but how can it work so well without attention?
Also aren't RNNs usually slower than transformers because they can't be parallelized?
__lawless t1_j4r9ebs wrote
Reply to comment by monkeysingmonkeynew in [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
Ok let me elaborate a bit. Imagine the old model is called m_0. Your newly obtained training data is X, y, features and labels, respectively. Now calculate the residual error which is the difference between y and prediction of m_0: dy = y - m_0(X).
Now train a new model m_1. The labels and features are X, dy. Finally at inference time the prediction is the sum of the two models: y_pred = m_0(X_new) + m_1(X_new).
lostmsu t1_j4r942j wrote
Reply to comment by timdettmers in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27
Do you mind if I use your data to make a webpage similar to https://diskprices.com/ ?
mrconter1 OP t1_j4r8o27 wrote
Reply to comment by Dendriform1491 in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
No it's not.
mrconter1 OP t1_j4r8miw wrote
Reply to comment by navillusr in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
- A LLM test does not require reasoning because it generates one word at the time?
- It can't.
- This might be interesting though.
monkeysingmonkeynew OP t1_j4r6lwj wrote
Reply to comment by __lawless in [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
this sounds pretty cool. but I don't follow every step. By "calculate the errors" do you mean for example, extract the predicted probabilities from the actual outcome?
Also, I didn't get your last part about inference, what exactly are you referring to there?
trnka t1_j4r661s wrote
Reply to comment by all_is_love6667 in [D] Simple Questions Thread by AutoModerator
Think about it more like autocomplete. It's able to complete thoughts coherently enough to fool some people, when provided enough input to complete from. It's often incorrect with very technical facts though.
It's really about how you make use of it. In scientific work, you could present your idea and ask for pros and cons of the idea, or to write a story about how the idea might fail horribly. That can be useful at times. Or to explain basic ideas from other fields.
It's kinda like posing a question to Reddit except that ChatGPT generally isn't mean.
There are other approaches like Elicit or Consensus that use LLMs more for literature review which is probably more helpful.
niclas_wue OP t1_j4r5wb1 wrote
Reply to comment by Yidam in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
Yes, it can be applied to every document, a book would be more expensive, because it has more text and thus more input tokens. The pdf needs to be converted to text, because the API only accepts text, some equations which can be written using Unicode are directly put into the network and it can understand. Other equations are currently skipped. So far I have spent almost 100$ in tokens to summarize the papers, so there need to be some paid features in the near future or a reduction in the amount of papers.
monkeysingmonkeynew OP t1_j4r539v wrote
Reply to comment by BenoitParis in [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
Yes, it's ok if i run it once a day, but I need to backtest two years of data and so it's not feasible on a laptop, or affordable on a GPU
TrueBirch t1_j4r4u2n wrote
Reply to comment by Haunting-Ad-5191 in [D] Unlocking the Potential of ChatGPT: A Community Discussion by North-Ad6756
>combining these technologies is a no brainer
Agreed. I look at the GPT family of models as infrastructure. The real potential comes from layering specific applications on top of it. Imagine every random high school baseball game got a writeup on the local news website. You'd need to ingest sports data and do other pipeline work, but the result could be profitable.
serge_mamian t1_j4r4soo wrote
Reply to comment by Epitaque in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27
In US, check Newegg regularly and its always out of stock. How do you catch it?
lorenzo1384 t1_j4r47aw wrote
Reply to [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada
Can I try this on colab
ChangingHats t1_j4r2hxx wrote
Reply to [D] Simple Questions Thread by AutoModerator
I am trying to utilize tensorflow's MultiHeadAttention to do regression on time series data for forecasting of a `(batch, horizon, features)` tensor.
During training, I have `inputs ~> (1, 10, 1)` and `targets ~> (1, 10, 1)`. `targets` is a horizon-shifted output of `inptus`.
During inference, `targets` is just a zeros tensor of the same shape.
What's the best way to run attention such that the output utilizes all timesteps in `inputs` as well as each subsequent timestep of the resulting attention output, instead of ONLY the timesteps of the inputs?
Another problem I see is that attention is run between Q and K, and during inference, Q = K, so that will affect the output differently, no?
iyeva t1_j4qx9aa wrote
Reply to comment by cblume in [P] featureimpact: A Python package for estimating the impact of features on ML models by cblume
Oh, oké, that's interesting! Thanks for sharing!
GPT-5entient t1_j4rrs42 wrote
Reply to comment by barrelroll126 in [D] Is MusicGPT a viable possibility? by markhachman
RIAA vs Big Tech. Both are business interests...