Recent comments in /f/MachineLearning
thatphotoguy89 t1_j64q5hm wrote
Reply to comment by TankAttack in [D] Best large language model for Named Entity Extraction? by TankAttack
You can try extractive QA if you don't want to fine-tune it. Basically, create a QA pipeline and ask the same questions for different text
TankAttack OP t1_j64p2oj wrote
Reply to comment by thatphotoguy89 in [D] Best large language model for Named Entity Extraction? by TankAttack
At this point I would like to imitate the example with position and company. It was taken from gpt-j btw. I thought neox is 3 times bigger so tried that first. Will run gptj and compare the results now.
Thank you
mil24havoc t1_j64ogl0 wrote
Reply to comment by romantimm25 in [P] Using algorithms or models from papers for commercial use by romantimm25
It basically means you read the paper and write the code to do what the paper describes yourself.
If you start with their code base, then your work is derivative of that copyrighted work and the question becomes a bit more complicated.
Yes, the line is fuzzy. However, it's typically very easy to stay on the "not copyright or license infringing" side of the line if you make an honest effort to rewrite the code from scratch and simply use their code base to check your understanding of the algorithm.
Again, IANAL but changing a for loop to a while loop is probably not sufficient to distinguish between their work and yours. Rewriting the code in another language may be. Rewriting it in the same language but making substantial changes to (for example) user interface, data preprocessing, training data, hyperparameters, etc... may be.
Edit: courts and lawyers usually aren't too concerned with technical details. Think of it like a book. The same story gets told over and over again by different authors who use different words to tell it. Your implementation needs to tell the same story but in different words, basically.
lookatmetype t1_j64nstm wrote
Reply to comment by CKtalon in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
To be fair, most of the weights in every "Foundation" model are useless.
romantimm25 OP t1_j64mzgp wrote
Reply to comment by mil24havoc in [P] Using algorithms or models from papers for commercial use by romantimm25
What I always don't understand is the "reimplement" the algorithm.
I mean where lies the line between being too similar to the original and being completely different?
Of course there is the most obvious cases where one changes a "for loop" to a "while loop". But then does switching a certain library on which the paper's code depends on means that the implementation is different enough?
mil24havoc t1_j64lhxk wrote
IANAL but the copyright protects the paper's text, data, and the code. Algorithms themselves can't be copyrighted. If you reimplement the algorithm, you can do whatever you want with it.
Edit to add: licenses on (trained) models haven't been tested in court as far as I'm aware. I can imagine this being very complicated. Can you copyright and license a linear regression fit to simple economic data? For example: log(gdp) = alpha + beta×population? That seems silly. So why would a Transformer (e.g.) be any different? If you add Gaussian noise to every weight in a Transformer, is the license still valid?
Late-Associate8835 t1_j64l7pd wrote
Reply to comment by Icries4frenchfries in Apple AI Residency 2023 [R] by Extension-Reward5756
Same here, I got the interview email just now.
TurnipAppropriate360 t1_j64kfeb wrote
Reply to [D] Special-purpose "neuromorphic" chips for AI - current state of the art? by currentscurrents
Go straight to Brainchips website and look at their AKIDA NSoC and IP - the tech is there and they’re already beginning to commercialise.
AI will be as big for investors in the next 2-5 years as the internet was in the 90’s.
EarthquakeBass t1_j64jhk3 wrote
Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
https://en.m.wikipedia.org/wiki/Huang%27s_law
A bit of marketing flair for sure, but I think at the crossroads of hardware improvements, ensembling, clever optimizations etc. we will keep improving models at a pretty darn fast pace. GPT-3 alone dramatically has improved the productivity of engineers, I’m sure of it.
vwings t1_j64itph wrote
Reply to comment by Thanos_nap in [P] Building a LSTM based model for binary classification by Thanos_nap
The batch dimensions are the different customers. You have N costumers, across T weeks and possible actions. This should give you a sparse tensor of dimensions [N,T,K] that you can easily plug into any LSTM....
Acceptable-Cress-374 t1_j64f9mi wrote
Reply to comment by bhendel in [D] MusicLM: Generating Music From Text by carlthome
It's GPTs all the way down...
WarProfessional3278 t1_j649od6 wrote
Reply to [D] Simple Questions Thread by AutoModerator
Does anyone know of any good AI-generated text detectors? I know there's GPTZero but it's not very good in my experience.
My research has led me to Hive AI but I'm sure there are better alternatives out there that does not claim such good results (99.9% accuracy) while still having a lot of false positives in my tests.
SimonJDPrince OP t1_j648umm wrote
Reply to comment by NeoKov in [P] New textbook: Understanding Deep Learning by SimonJDPrince
Thanks! Definitely a mistake. If you send your real name to the e-mail address on the website, I'll add you to the acknowledgements in the book.
Let me know if you find any more.
SimonJDPrince OP t1_j648ce9 wrote
Reply to comment by NeoKov in [P] New textbook: Understanding Deep Learning by SimonJDPrince
GitHub or e-mail are better. Only occasionally on Reddit.
HateRedditCantQuitit t1_j647xm6 wrote
Reply to comment by madmax_br5 in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
I'm not sure how long you've been around, but before BPE came along, large vocabularies were actually quite a pain in the ass. You can find lots of literature around it before maybe 2016 (can't remember exact dates to look and I'm feeling lazy).
IIRC, a big issue was the final prediction layer. Say you're predicting a sequence 4k tokens long. Then you have 4k times vocab-size predictions. With a 50k token vocab, that's 200M predictions in memory (roughly 1 gig with floats). Lets say we want to equally compress 20x more languages, so we get 1M tokens (speaking super duper roughly), which means nearly 20GB just to represent the logits. If we wanted to handle a 40k long sequence, it's the difference between 20GB and 200GB of logits.
That said, BPE just takes in sequences of more-simple tokens. If you want to feed it unicode, go ahead. If you want to feed it something else, that will work too. It seems like you're mostly frustrated that LLM investments are focused on english right now, which is valid. Tech investments in general have a strong silicon valley bias, and a zillion people want to recreate that elsewhere. But that's a very hard economic question.
WikiSummarizerBot t1_j646sbr wrote
Reply to comment by john_the_jedi in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
>The Ship of Theseus is a thought experiment about whether an object that has had all of its original components replaced remains the same object. According to legend, Theseus, the mythical Greek founder-king of Athens, had rescued the children of Athens from King Minos after slaying the minotaur and then escaped on a ship to Delos. Every year, the Athenians commemorated this legend by taking the ship on a pilgrimage to Delos to honor Apollo. The question was raised by ancient philosophers: After several centuries of maintenance, if every part of the Ship of Theseus had been replaced, one at a time, was it still the same ship?
^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)
john_the_jedi t1_j646qh1 wrote
Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
Hey everyone, I'm the first author of this preprint paper
"A Watermark For Large Language Models": https://arxiv.org/abs/2301.10226
I thought I'd jump in with a few relevant comments about some questions in this thread, especially relating to our approach.
- Our watermark is mathematically constructed to minimize false positives (accusing human text of being machine generated), even if it costs us a few detections of actual machine generated text. At any sufficient length of text, say 100-200 words, there is near 0.0 chance of a false positive. This is obviously the type of error we'd all like to avoid as much as possible.
- We are not anti-LLMs in any general way, these are amazing tools for everyone to use! Rather, we think that it's much better to have a new tool, watermarks, embedded in these models sooner rather than later. A world in which we have limited (currently zero really) ways of distinguishing AI and human generated content is likely to have some difficult to wrestle with consequences. We're concerned with bot farms and accidentally retraining "GPT-10" on tons of old GPT-3 outputs by accident.
- On removing the watermark, we don't claim it is not removable, we just have constructed the watermark procedure so that it is difficult, and comes with a cost to the quality of the output. The fact that many people suggest that they'll just use another LM to paraphrase the output, or that they'll just paraphrase it themselves, gets at a philosophical point we couldn't spend too much time talking about in the paper (though we run some attack experiments trying to remove the watermark). A la the, ship of theseus, if you sufficiently re-write the watermark out of the text, well, it's no longer the original text anyway even though it feels conceptually similar. Rewriting and rephrasing a paragraph from a textbook, but in your own words, and then putting it in a term paper, has always been a way to try and pass off the thoughts and ideas of others as your own. This fact of the world is unchanged.
mycall t1_j643o1d wrote
Reply to [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
It unknown if this affects emergent abilities as the model scales up. Correct?
MadScientist-1214 t1_j6433qc wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
At my institute, nobody trained on ImageNet, so I had to figure it out myself too. If you train on architectures like VGG, it does not take long. <2 days on a single A100, with worse GPU max. 5 days. The most important thing is to use SSD, this increases speed by around 2 days. A good learning scheduler is really important. Most researchers ignore the test set, use only validation set. And also important: use mixed precision. You should really tune the training speed, if you need to do a lot of experiments.
DirectionAggressive1 t1_j642vj7 wrote
Reply to comment by Icries4frenchfries in Apple AI Residency 2023 [R] by Extension-Reward5756
Hi! Do you know what's the difference between Apple AI ML rotation Program and Apple AI ML residency Program? It seems like two different programs.
weightedflowtime t1_j642g8x wrote
Reply to [D] Meta AI Residency 2023 by BeautyInUgly
You tried. That's what counts!:)
Quaxi_ t1_j6421fo wrote
Reply to comment by HateRedditCantQuitit in [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo
And while being easier to train, they give better results.
Diffusion models are also so much more versatile in their application because of their iterative process.
You can do inpainting or img-to-img for example by just conditioning the noise in different ways. You would have to retrain the whole GAN to achieve that.
thatphotoguy89 t1_j63zz5q wrote
GPT-J is supposed to be quite good. Do you have a list of the types of entities you'd like to detect?
cthorrez t1_j63uc5a wrote
Reply to [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
I have an issue with the experiments.
> For ICL, we fix the number of demonstration examples to 32 and tune the random seed for each task to find a set of demonstration examples that achieves the best validation performance. For finetuning, we use the same demonstration examples for ICL as the training examples and use SGD as the optimizer
They go through a set of random seeds to pick the "best" possible samples for in context learning, and then use the same set of examples for fine tuning. I think this biases the results in favor of in context learning.
A more fair way to do this would be to use a truly random set of examples, or to use use the same approach and tune the seed to find the "best" set of examples for finetuning as well.
starfries t1_j64qhqa wrote
Reply to comment by anony_sci_guy in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Can you elaborate on this? I'm trying something similar, so I'm curious what your results were and if you ran across any literature about this idea.