Recent comments in /f/MachineLearning

marr75 t1_j7ksi6o wrote

They should be. I think LLMs will totally upset how content is indexed and accessed. It's one of the easiest and lowest stakes use cases for them, really.

Unfortunately, Google has such a huge incumbent advantage that they could produce the 5th or 6th best search specialized LLM and still be the #1 search provider.

1

dreternal OP t1_j7krh8s wrote

No. Just a composer with a very large catalog, 85-90% of which I haven't had time to properly tag and describe for sales (using just song, names or file names isn't enough, you need very, very specific tags related to the mood tempo audience, beats per minute instrument list and on and on for every song in order to have proper exposure in the various music libraries online). I've been too busy over the last 30 years writing stuff to bother with adding all this data. So with the advent of these machine learning tools, I'm hoping they can help.

1

aicharades OP t1_j7kpzzn wrote

Of course! Step 1 breaks up your document and runs the prompt on each section. Try it with the Map section vs. Map Reduce (the main page).

Here's an example flow for Map:

  1. ​

Input a Book PDF 2. Convert the PDF to Text 3. Split the Book into Chunks: Book[pg1,pg2,pg3] -> pg1, pg2, pg3 4. Run the Prompt on Each Chunk: pg1, pg2, pg3 -> prompt(pg1), prompt(pg2), etc 5. Output the Summarized Chunks

Here's a prompt you could use (lots of room for improvement!):

the words in <<*>> are comments, plea remove from the final prompt

Goal: I'm trying to perform a content analysis of a document with 7 chapters and identify 10-15 core themes in each chapter.

Sample Map Prompt:

'INSTRUCTIONS': You are a writer <<BEST ROLE FIT??>> performing a content analysis of a document <<DOCUMENT TYPE??>>. You have been given a section of a larger document. You will identify up to 10-15 core themes in each chapter and output theme.

'INPUT': {text}

'OUTPUT':

Sample Reduce Prompt:
'INSTRUCTIONS': You are a copyeditor. You will need to edit a list of summaries together. Please combine the input together and combine any duplicate core themes. Please maintain the context of the document.
'INPUT': {text}
'OUTPUT':

Sample Input: document

2

emerging-tech-reader t1_j7kptn9 wrote

I got a demo of some of the stuff happening.

The one that is most impressive is they have GPT watching a meeting taking minutes and even crafts action items, emails, etc all ready for you when you leave the meeting.

It will also offer suggestions to follow up on in the meetings as they are on going.

Google have become the altavista.

2

Acceptable-Fudge-816 t1_j7kpbkm wrote

The real world also can have thousands of dimensions. Time, color, hatred tension in the room, air current, and anything you can possible attribute to a position/thing.

At the end of the day it's just words, and their meaning depends on agreements. When we speak of the 3 dimensions, we mean the 3 dimensions of the physical world that we decided to define with 3 coordinates that help us known the position of something. Might as well have used complex numbers and keep it to 2 coordinates, or decided time should be included as part of the concept of position. So when you talk about "dimensions" in general, it may as well mean anything.

1

Feeling_Card_4162 t1_j7kp7rq wrote

This is a good way to get an idea of the financial benefit but it’s also important to think about the knowledge you’ll gain and how much other people would benefit from it when deciding whether to continue or not. There is more to determining if something is worth your time than just money.

2

dfcHeadChair t1_j7ko3pf wrote

  1. What is your best guess at how much money you'll make?
  2. Divide that by your best guess at the amount of time, money, and effort it will take you to compile the dataset.
  3. Do the division and ask yourself if it's worth it.

The hard math is going to get you your answer. You may be able to do some fancy correlation mapping depending on the models you think will solve the problem and what data you will need. The trouble with the "shortcut" route is two-fold:

  1. It may take you longer that to do the three steps above.
  2. You might not get an accurate answer.
1

harharveryfunny t1_j7kmbzr wrote

OpenAI just got a second round $10B investment from Microsoft, so that goes a ways ... They are selling API access to GPT for other companies to use however they like, and Microsoft has integrated Copilot (also GPT-based, fine-tuned for code generation) into their dev tools, and MIcrosoft is also integrating OpenAI's LLM tech into Bing. While OpenAI are also selling access to ChatGPT to end users, I doubt that's going to really be a focus for them or major source of revenue.

1

mostlyhydrogen OP t1_j7km5j2 wrote

Thanks for the offer! This is a work project, though. I'm working with images. I can't give too many details due to confidentiality, but we're sub-billion images scale.

Usability is determined by trained annotators. If they find an object of interest and want to harvest more training data, they do a reverse image search across the whole training data and tag true matches.

1