Recent comments in /f/MachineLearning

Ok-Cartoonist8114 t1_j54l5yh wrote

Your pipeline is fine! Cherche is not fancy, it just allow to create hybrid pipelines that rely both on language models and lexical matching which can help a lot. Also Cherche is primarly design for computing embeddings with Sentence Transformers which have a better ratio <precision / number of parameters>.

3

stardust-sandwich t1_j54em1w wrote

I want to pull data from an API(done) and use NLP to categorize that information. Then with those results push it into a webpage or GUI tool where it will highlight the text and say, is the correct? So I can use this GUI so that I can "teach" the learning model how to classify text

e.g

Category 1 - words 1, words 2, words 3 and similar

Category 2 - word4, words 5, words 6 and so on

Then it will go and try that and come back and ask me to tune it again and rinse and repeat. Once this model is trained I then want to see it later in a different script to point a news article at it for example and it will split out the data I need.

How can I achieve this please? What are the best tools and services to get this done, ideally open source if possible, if not then happy to use a commercial service if its cheap to do so, as this is just a personal project of mine.

&#x200B;

Thanks in advance.

1

IntrepidTieKnot t1_j547lq2 wrote

I made a tool that chops documents in chunks, creates embeddings for the chunks via GPT-3 and stores the embeddings in a REDIS database. When I make a query, I create an embedding for that and look up my stored embeddings via cosine similarity.

My question is: isn't that the same as your tool does? In other words: what can you do with Cherche what I cannot do like I described? Is it that I don't need GPT-3 for the same result? Or what is it?

2

Ok-Cartoonist8114 t1_j52mjrw wrote

Here is a great paper from IBM following the retriever-reader paradigm. Love those "light" models that can be specialized by switching index.

IMO the loss of ChatGPT is still interesting for retriever-reader approachs to generate either human like or structured answers from input documents.

Here is a tool I made to create retriever-reader pipeline in a minute: Cherche, would recommend also Haystack on github !

7

currentscurrents t1_j525hto wrote

Retrieval language models do have some downsides. Keeping a copy of the training data around is suboptimal for a couple reasons:

  • Training data is huge. Retro's retrieval database is 1.75 trillion tokens. This isn't a very efficient way of storing knowledge, since a lot of the text is irrelevant or redundant.

  • Training data is still a mix of knowledge and language. You haven't achieved separation of the two types of information, so it doesn't help you perform logic on ideas and concepts.

  • Most training data is copyrighted. It's currently legal to train a model on copyrighted data, but distributing a copy of the training data with the model puts you on much less firm ground.

Ideally I think you want to condense the knowledge from the training data down into a structured representation, perhaps a knowledge graph. Knowledge graphs are easy to perform logic on and can be human-editable. There's also already an entire sub-field studying them.

19

EmmyNoetherRing t1_j5253a8 wrote

>Softmax activation function

Ok, got it. huh (on reviewing wikipedia). so to rephrase the quoted paragraph, they find that the divergence between the training and testing distribution (between the compressed versions of the training and testing data sets in my analogy) starts decreasing smoothly as the scale of the model increases, long before the actual final task performance locks into place successfully.

Hm. Says something more about task complexity (maybe in some computability sense, a fundamental task complexity, that we don't have well defined for those types of tasks yet?). Rather than imagination I think, but I'm still with you on imagination being a factor, and of course the paper and the blog post both leave the cliff problem unsolved. Possibly there's a definition of imagination such that we can say degree X of it is needed to successfully complete those tasks.

1

emreddit0r t1_j523nlc wrote

One thing I find glossed over/lacking in the diffusion model materials is the contribution of the UNet.

Coming from someone that is just trying to catch up on what's going on, the UNet seems to play a huge role (if I understand right, this is where the convolutional neural networks are discovering 2d features.)

Relatively speaking, CNNs are kind of old news.. but they're a big deal. Unless I have something wrong? Do you know where I can learn more about how the UNet aspect works in depth?

1