UnusualClimberBear t1_jbngux4 wrote on March 10, 2023 at 8:23 AM

Reply to [D] Is it possible to train LLaMa? by New_Yak1645

Training from scratch required 2048 A100 for 21 days. And it seems only to be the final run.

I guess you can start to fine-tune it with much lower resources, 16 A100 seems reasonable as going lower will require quantization or partial loadings for the model.

WH7EVR t1_jbngk56 wrote on March 10, 2023 at 8:19 AM

Reply to [D] chatGPT and AI ethics by [deleted]

It took about 120 GPU-years (A100 80GB) to train LLaMA. If you want to train it from scratch, it'll cost you a ton of money and/or time. That said, you can fine-tune llama as-is. No real point is recreating it.

ch9ki7 t1_jbneot3 wrote on March 10, 2023 at 7:54 AM

Reply to [D] Is it possible to train LLaMa? by New_Yak1645

I would start searching on huggingface.co

CKtalon t1_jbnccl7 wrote on March 10, 2023 at 7:25 AM

Reply to [D] Is it possible to train LLaMa? by New_Yak1645

If you have a few thousand A100s, sure? The dataset is fairly easily obtainable.

The next difficulty is the technical knowhow to train such LLMs.

currentscurrents t1_jbnandw wrote on March 10, 2023 at 7:04 AM

Reply to comment by harharveryfunny in [D] Why are so many tokens needed to train large language models? by blacklemon67

I think this is the wrong way to think about what LLMs are doing. They aren't modeling the world; they're modeling human intelligence.

The point of generative AI is to model the function that created the data. For language, that's us. You need all these tokens and parameters because modeling how humans think is very hard.

As LLMs get bigger, they can model us more accurately, and that's where all these human-like emergent abilities come from. They build a world model because it's useful for predicting text written by humans who have a world model. Same thing for why they're good at RL and task decomposition, can convincingly fake emotions, and inherit our biases.

Dendriform1491 t1_jbn9r9j wrote on March 10, 2023 at 6:53 AM

Reply to [D] chatGPT and AI ethics by [deleted]

Define "friendly".

People are not friendly towards each other, and being friendly towards one person can result in being hostile against another, or even cross moral or legal boundaries. A person may use a LLM with hostile objectives in mind. Such as facilitating scams, academic cheating, impersonations, misinformation, harassment, etc.

ChatGPT is unethical, because it can always be tricked to do the wrong thing despite any instruction it is given to it.

WikiSummarizerBot t1_jbn6sxs wrote on March 10, 2023 at 6:18 AM

Reply to comment by czl in [D] chatGPT and AI ethics by [deleted]

Tit for tat

>Tit for tat is an English saying meaning "equivalent retaliation". It developed from "tip for tap", first recorded in 1558. It is also a highly effective strategy in game theory. An agent using this strategy will first cooperate, then subsequently replicate an opponent's previous action.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

czl t1_jbn6rys wrote on March 10, 2023 at 6:18 AM

Reply to comment by currentscurrents in [D] chatGPT and AI ethics by [deleted]

> What would a better ethics system even mean?

You ask a good question. Much like language fosters communication to my non expert eyes ethics is an ideology with a protocol for behavior the purpose of which is to foster “group cohesion” / cooperation / trust / lower social transaction costs / reduction of exploitation / …

A langauge is best when communication is best yet there are many languages possible and what is most important that your language matches the language of your group and that when langauge changes that the changes are gradual so that langauge continues to be useful. I belive similar principles apply to ethics for the purpose ethics service.

Thus a better ethical system will be one that serves its purpose better. Machines can help us discover improvements to ethics because using machines we can simulate payoffs for various behavior strategies and these simulations can teach us valuable lessons. For example the discovery of:

>> Tit-for-tat has been very successfully used as a strategy for the iterated prisoner's dilemma. The strategy was first introduced by Anatol Rapoport in Robert Axelrod's two tournaments,[2] held around 1980. Notably, it was (on both occasions) both the simplest strategy and the most successful in direct competition.

From https://en.wikipedia.org/wiki/Tit_for_tat

Moreover since machines enable all to study ethcial protocols all can see which strategies work and which do not work and what the consequences are so there is the rational convergence towards what works as tends to happen in science vs natural fragmentation and polarization as trends to happen with non-science based beliefs (and their ethical systems).

I expect experts of ethics to challenge this non expert view so please do not hold back your criticism — but speak as if to a dummy so keep the jargon back and your explanations simple. I am here to be educated. Thank you!

visarga t1_jbn5g3w wrote on March 10, 2023 at 6:03 AM

Reply to comment by harharveryfunny in [D] Why are so many tokens needed to train large language models? by blacklemon67

On the other hand LLM has broad knowledge about all topics, a true dilettante. We can't keep up on that level.

Origin_of_Mind t1_jbn2m6d wrote on March 10, 2023 at 5:32 AM

Reply to [D] Why are so many tokens needed to train large language models? by blacklemon67

If you look at the studies of how children acquire language, for example "First verbs" by Michael Tomasello, the gist is that children understand quite a bit in their daily routine and actively participate in it -- well before they begin to understand and produce language. The language acquisition in children occurs in an already very capable nervous system, which "gets" a lot of stuff going on around it. Language gets tied into all that.

Our artificial neural networks do not have anything comparable. So, to use extremely simple architectures, we have to constrain them with super-human amount of input, to allow them by simple statistics to converge on interesting machinery which also to some extent "gets" not just the surface of language but discovers some of the deeper connections. Multi-modal systems should be able to see even more of the relevant underlying structure of the world, getting one step closer to what humans do.

WikiSummarizerBot t1_jbn0th4 wrote on March 10, 2023 at 5:14 AM

Reply to comment by currentscurrents in [D] chatGPT and AI ethics by [deleted]

Is–ought problem

>The is–ought problem, as articulated by the Scottish philosopher and historian David Hume, arises when one makes claims about what ought to be that are based solely on statements about what is. Hume found that there seems to be a significant difference between descriptive or positive statements (about what is) and prescriptive or normative statements (about what ought to be), and that it is not obvious how one can coherently move from descriptive statements to prescriptive ones.

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

currentscurrents t1_jbn0sbf wrote on March 10, 2023 at 5:13 AM

Reply to comment by czl in [D] chatGPT and AI ethics by [deleted]

What would a better ethics system even mean?

In order to say one ethics system is better than another, you would have to look at its impact on the world and decide whether the outcomes are good or bad. But "good and bad" are ethical concepts themselves, so you've just shifted the problem up to meta-ethics.

It's the is-ought problem. Intelligence is solidly on the side of "is" - it figures out how to solve problems to accomplish its goals. Ethics is about how you set those goals, and it's on the "ought" side of the fence.

currentscurrents t1_jbmzwxo wrote on March 10, 2023 at 5:05 AM

Reply to [D] chatGPT and AI ethics by [deleted]

There's two big problems:

Nobody has a solid handle on how to control the end-user's interaction with the LLM. RLHF seems brittle and hard to scale. Programmed-in rules are too small to contain a flexible thing like a neural network. Bing gives high-level rules in plain english and hopes the LLM will understand them, but it doesn't always prioritize them over user input.
Nobody agrees on what is ethical. For example, is it good to automate jobs? I think yes, but go out into any sub on the front page and you will find plenty of people who disagree with me.

#1 is probably solvable. In fact it's gonna have to be solved for LLMs to be useful; imagine if you called your bank and told the rep to pretend to be DAN.

I think #2 is intractable. People have already been arguing about ethics for millenia, and the existence of AI doesn't make it any easier.

czl t1_jbmys4r wrote on March 10, 2023 at 4:53 AM

Reply to [D] chatGPT and AI ethics by [deleted]

Ethics is not static. Human ethics vary culture to culture and evolve over time. If AI can help us develop better strategies for games why would AI not also help us develop better ethical (and legal) systems? And yes at some point our AI will lead. Machines already do most of our physical work why would we not use machines for mental work as much as we can as well?

monouns t1_jbmy24f wrote on March 10, 2023 at 4:46 AM

Reply to [D] chatGPT and AI ethics by [deleted]

But I'm not quite sure how GPT and PPO get trained from feedback relative to each other.

CaptainLocoMoco t1_jbmeg1l wrote on March 10, 2023 at 2:03 AM

Reply to [D] Why are so many tokens needed to train large language models? by blacklemon67

You shouldn't directly compare LLM training to human learning. LLM's are spawned with totally random weights, apart from the design choices of the architecture, the only learning signal they ever receive is from the training data. Human's are born with billions of years of information baked into them due to evolution. Comparing the two doesn't really make sense. I think this becomes way more obvious when you think about fine motor control instead of language modeling. I.e. a robot isn't going to learn how to walk as well as a human after the same amount of "training" time.

endless_sea_of_stars t1_jbmda5p wrote on March 10, 2023 at 1:53 AM

Reply to comment by bivouac0 in [D] Why are so many tokens needed to train large language models? by blacklemon67

> develop a method to separate knowledge retention and language pattern modeling. Think about learning the state capitals. A person quickly learns to say "the capital of X is Y" and then can substitute in different memorized facts. AI learns the facts and the sentence patterns all in the same manner.

This sounds like a problem Toolformer is supposed to address. Instead of learning all the state capitals learn to call. "The capital of Indiana is [QA(Indiana, capital)]."

Psychological-Ear896 t1_jbmbse3 wrote on March 10, 2023 at 1:42 AM

Reply to comment by z_fi in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? by doctorjuice

why a firm no to w2? what's the key difference other than 'who pays my medicare and withholding my tax"

farmingvillein t1_jblnh6d wrote on March 9, 2023 at 10:44 PM

Reply to comment by mckirkus in [D] Why are so many tokens needed to train large language models? by blacklemon67

But she still had feedback loops.

frequenttimetraveler t1_jbljl97 wrote on March 9, 2023 at 10:17 PM

Reply to [D] Why are so many tokens needed to train large language models? by blacklemon67

Have they tried to train the same model with half the tokens?

SpaceCockatoo t1_jblj2so wrote on March 9, 2023 at 10:13 PM

Reply to comment by ortegaalfredo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

4bit quant already out

Aran_Komatsuzaki t1_jbkyegs wrote on March 9, 2023 at 8:01 PM

Reply to comment by LetterRip in [D] Why isn't everyone using RWKV if it's so much better than transformers? by ThePerson654321

> Thanks for sharing your results. It is being tuned to longer context lengths, current is

I tried the one w/ context length = 4096 for RWKV :)

> Could you clarify - was one of those meant to be former and the other late

Sorry for the typo. The latter 'former' is meant to be the 'latter'.

mckirkus t1_jbkx54l wrote on March 9, 2023 at 7:54 PM

Reply to comment by harharveryfunny in [D] Why are so many tokens needed to train large language models? by blacklemon67

Hellen Keller is an interesting example of what we are capable of without visual or aural inputs.

farmingvillein t1_jbkx0co wrote on March 9, 2023 at 7:53 PM

Reply to comment by LetterRip in [D] Why isn't everyone using RWKV if it's so much better than transformers? by ThePerson654321

I don't understand the relevance here--tape-RNNs != RWKV, unless I misunderstand the RWKV architecture (certainly possible).

farmingvillein t1_jbkwkgl wrote on March 9, 2023 at 7:50 PM

Reply to comment by ThePerson654321 in [D] Why isn't everyone using RWKV if it's so much better than transformers? by ThePerson654321

> most extraordinary claim I got stuck up on was "infinite" ctx_len.

All RNNs have that capability, on paper. But the question is how well does the model actually remember and utilize things that happened a long time ago (things that happened beyond the the window that a transformer has, e.g.). In simpler RNN models, the answer is usually "not very".

Which doesn't mean that there can't be real upside here--just that it is not a clear slam-dunk, and that it has not been well-studied/ablated. And obviously there has been a lot of work in extending transformer windows, too.

Recent comments in /f/MachineLearning