Recent comments in /f/MachineLearning

shaner92 t1_j7kfyfj wrote

You should be thinking about what's most widely used. What will your coworkers be able to collaborate in? Where will you be able to get the most support (forums, tutorials, even libraries)? This should be the only thing that matters for your first language, and in this case its clearly Python.

I think people spend way too much time worrying about the 'best', which makes sense because its a lot of work to learn your first language. It gets easier to switch though so better to just jump into the easiest and most supported.

1

aicharades OP t1_j7kf4or wrote

Here's how it summarizes big documents:

Map Reduce

This method involves an initial prompt on each chunk of data (for summarization tasks, this could be a summary of that chunk; for question-answering tasks, it could be an answer based solely on that chunk). Then a different prompt is run to combine all the initial outputs. This is implemented in the LangChain as the MapReduceDocumentsChain.

Pros: Can scale to larger documents (and more documents) than StuffDocumentsChain. The calls to the LLM on individual documents are independent and can therefore be parallelized.

Cons: Requires many more calls to the LLM than StuffDocumentsChain. Loses some information during the final combining call.

Here's how a separate Wikipedia agent chain works:

Agents use an LLM to determine which actions to take and in what order. An action can either be using a tool and observing its output, or returning to the user.

It uses agent based modeling, basically asking itself a series of questions until it gets to the right answer. sorta looks like wikipedia racing.

Here's a sample LangChain agent based React-Act modeling on Wikipedia/TMDB:

Log

Entering new AgentExecutor chain...

Thought 1: I need to search for a romantic comedy film about a Senatorial candidate falling for a hotel maid, thinking she is a socialite, when he sees her trying on a wealthy woman's dress.

Action 1: Search[romantic comedy film Senatorial candidate hotel maid socialite wealthy woman's dress]

Observation 1: Maid in Manhattan is a 2002 American romantic comedy-drama film directed by Wayne Wang and based on a story by John Hughes, who is credited using a pseudonym. It stars Jennifer Lopez, Ralph Fiennes, and Natasha Richardson. In the film, a hotel maid and a high-profile politician fall in love. The film was released on December 13, 2002, by Columbia Pictures and was a box office success, grossing $154 million against its $55 million budget, while receiving mixed reviews.

Thought 2: Maid in Manhattan is a romantic comedy film about a Senatorial candidate falling for a hotel maid, thinking she is a socialite, when he sees her trying on a wealthy woman's dress.

Action 2: Finish[Maid in Manhattan]

From: https://langchain.readthedocs.io/en/latest/modules/agents/getting_started.html

Here's the sample notebook for the wikipedia agent: https://langchain.readthedocs.io/en/latest/modules/agents/implementations/react.html

1

aicharades OP t1_j7ke5cl wrote

Try mine out at www.wrotescan.com! You can use the site for free if you pay for the API call by providing a a temporary OpenAI key. I wanted to share the tech with a demo. Remember to delete the key you used after its temp use.

When you sign up for OpenAI, you get $18 of free credits.

You can also create it locally using LangChain

0

2blazen t1_j7kc4t6 wrote

Definitely Python, that's what all major companies support too. However it's not the byte code cache that makes a difference but the fact that machine learning libraries are written in C++ so you're not sacrificing performance by scripting in it.

These kind of questions are more suitable on r/learndatascience though

2