Recent comments in /f/MachineLearning

Kingstudly t1_j4xygdq wrote

Not really. It's taking an input and providing the statistically most likely string of words that are associated with it. There's far more to NLP than that. Think about how a human can see a word they've never seen before and infer it's meaning based on context clues. I'm not sure any publicly available system can do that.

New words are entering every language constantly. There's no way to train such a massive model to keep up as fast as a human or purpose built system can.

1

TheFlyingDrildo t1_j4xw5oi wrote

Susan Athey and Stefan Wager's push towards generalized random forests is a major step forward in opening up the type of estimation tasks random forests are useful for, while simultaneously providing the theory for large-sample inference.

An underlying perspective in their research (and most modern random forest theoretical research) is that random forests are effectively kernel regressors, with the forest construction adaptively and implicitly defining the kernel. The component that ends up influencing the adaptivity of the kernel the most is what defines how two child nodes are formed from a parent node.

In the way we implement things right now we've chosen a few techniques for computational ease: random subspacing (controlled by an mtry hyperparameter), axis-aligned splits, and standard CART splitting rules. I think there is still a lot of work to be done here. An example of an interesting direction with respect to splitting rules is the Distributional Random Forests paper.

Edit: In terms of other hyperparameters that people care about, I have a few comments. The depth of the forest should be controlled by a min_samples_leaf parameter, which controls the local vs global trade-off in the kernel. Should pretty much always be be selected in a problem specific manner with a hyperparameter search, but generally should be quite small. It's choice is closely related to the n_trees hyperparameter, which should always be as large as you can afford computationally. An interesting research direction however may be how to adaptively figure out what value of n_trees is "good enough" - which there has been some work on through the analysis of the Purely Uniform Random Forests model.

Lastly, bootstrapping or alternatively subsampling percentage. I believe random forests should always have the honesty property, which naturally pushes us towards subsampling for the extra flexibility in the percentage of data point in the leaves. There could be work done here to determine the appropriate percentage for the split, likely based on convergence rates in learning the tree vs estimating the leaves. Definitely a strong interaction here with the min_samples_leaf hyperparameter. However, the extra variability induced by bootstrapping (and using out-of-bag for honesty) may have desirable properties for the kernel learning, though I believe it is subsampling that makes the large-sample inference theory tractable within our current understanding. Another worthy area of research.

12

londons_explorer t1_j4xknrt wrote

If you want to make the assumption that most buildings don't have any curves in their roofs...

Then take your point cloud, extract the largest polygons... There are classical algorithms for such things.

From the polygons, turning that into a plan should be quite straightforward.

While ML could be applied... I think you'll get better results quicker with classical methods.

3

SearchAtlantis t1_j4xjn4r wrote

Cynthia Rudin at Duke? Just want to clarify because when I see Rudin I think Walter Rudin ala Baby Rudin for Analysis.

Wow just looked her up. I know it wasn't practical for me to do a 2y MS at that point in my life but really wishing I'd gone to Duke now. Interpretable learning is one of my favorite things - and operations research was a passion in undergrad. Those INFORMS papers.

6

mildresponse t1_j4xjmvw wrote

My interpretation is that the words should have different embedding values when they have different positions (context) in the input. Without a positional embedding, the learned word embeddings will be forced into some kind of positional average. The positional offsets give the model more flexibility to resolve differently in different contexts.

Because the embeddings are high dimensional vectors of floats, I'd guess the risk of degeneracy (i.e. that the embeddings could start to overlap with one another) is virtually 0.

1

mildresponse t1_j4xhvkg wrote

Are there any easy and straightforward methods for moving ML models across different frameworks? Does it come down to just manually translating the parameters?

For instance, I am looking at a transformer model in PyTorch, whose parameters are stored within a series of nested objects of various types in an OrderedDict. I would like to extract all of these parameter tensors for use in a similar architecture constructed in Tensorflow or JAX. The naive method of manually collecting the parameters themselves into a new dict seems tedious. And if the target is something like Haiku in JAX, the corresponding model will initialize its parameters into a new nested dict with some default naming structure, which will then have to be connected to the interim dict created from PyTorch. Are there any better ways of moving the parameters or models around?

1

JClub OP t1_j4xgp2x wrote

You're not the first person that asks me that question! I need to add a more detailed explanation for that :)

The reward is non-differentiable because it was produced with a reward model, and this reward model takes text as input. This text was obtained by decoding the log probabilities of the output of your model. This decoding process is non-differentiable and we lose the gradient link between the LM model and the reward model.

Does this make sense? Also, if the reward is given directly by a human, instead of a reward model, it's clearer that this reward is non-differentiable.

RL helps transforming this non-differentiable reward into a differentiable loss :)

5

MajorValue1094 t1_j4xdtl7 wrote

Agreed, the design of GPT is to be indistinguishable from real text, hence you’re fighting a losing battle (unless you have millions to train a rival network). The only key may be in the way GPT interprets language, we are all aware of how it does not understand what it’s says. If you can find a way to target a pre-trained network at that you may have a chance but in theory by the nature of GPT you will loose.

5

FastestLearner OP t1_j4x7rvp wrote

Yes. Your first point is something that I would happily engage in as well. I have no problems contributing to the community. Moreover, the extension can have several additional options like:

(i) Do not perform any kind of inference on the client, i.e. always use query existing timestamps from an the online database. This will be helpful for users with low power devices like laptops.

or

(ii) Perform inference (only) for the video that the client wants. This is, of course, necessary if the video does not have any timestamps on the server. It does the inference and uploads the results on the central server.

or

(iii) Keep performing inference for new videos (even ones that are not watched by the particular user) - Some folks who runs a powerful enough hardware and are eager to donate their computation time can choose this option. I am pretty sure some folks will emerge who are willing to do this. The LeelaChessZero project banked entirely on this particular idea. For this option, there could be slider to let the user control how much of the resources to keep actively engaged (maybe by limiting thread count).

The second point that you mentioned could be a implemented with a peer-to-peer communication protocol, but if the neural network's weights don't change, then there would be nothing different with most recent vs. stale timestamps. Also, in P2P you'd still need trackers to keep track of peers, which could be a central server or be decentralized and serverless depending on the implementation. One potential problem could be latency though.

1

ThrillHouseofMirth t1_j4x7o9e wrote

I don't think that there's any way to do so at this point and eventually someone will prove it. "Original" language virtually always is a recombination previous language of sufficient complexity and uniqueness.

A possible solution to this is AI language model providers to provide API's that allow people to check content against an archive of text that it generated.

Any solution needs to monitoring and telemetry based, the days of algorithmic checking are definitively over.

26

dangerhexagon t1_j4x2yrp wrote

8

KerbalsFTW t1_j4wtp6e wrote

> Is it fair to say that AI and ML are synonymous now in 2023? Or are there people who are still actively working on non-ML techniques for building AI?

AI means "I am a lay person or a media person talking to lay people".

ML means "I know what I'm talking about".

AGI means "I know what I'm talking about but I don't know what it is or how to build it".

The term 'AI' followed a previous hype-then-disappointment curve and got a bad name. Researchers restricted themselves to "things that worked" and called it Machine Learning to imply that we are teaching models and they are learning which is obviously true, rather than implying "this thing is intelligent" which it probably isn't.

Side topic: humans keep moving the bar on what counts as intelligent. It used to be "can play chess" and it then was "can play Go and hold a conversation" and then it was "can draw and show creativity". Humans will keep moving the bar on the definition of intelligence for as long as (humanly) possible.

1

KerbalsFTW t1_j4wschy wrote

Ex-software freelancer here.

> First thing is that as a freelancer you're not part of "the team". This can be good or bad for you, I think it's fantastic.

Agreed, but with a few caveats:

  • You always need a plan for your contract to end, including early. (Never happened to me, but I always planned for it).

  • Companies will eventually try to treat you like staff: assuming you'll always be there and they can tell you what to do rather than asking if you'll do something. At this point you need to start telling them about the break from them you are about to be taking.

> In my experience most small companies won't have use for you. For one, you'll be more expensive than their employed staff, but they also want to keep that know how in house.

Disagree here: small companies struggle to get a wide enough set of skills, and they also have projects that need finishing without expanding their committed outgoings.

There are two major downsides to freelancing:

  • Location. If you are not in a very big tech city you will have to frequently relocate, or work primarily from home (in which case you are competing with very, very cheap people).

  • Skills. Companies do not give you time to learn the next big thing. You are expected to turn a profit for them from day 1. If they are going to be investing in their staff learning new things, it will be with staff they expect to stick around.

> Another thing to keep in mind: Do not go into this for the money. If you factor everything in: Vacation, sick days, hardware, licenses, pension/retirement (rule of thumb: 30% of your net income) etc. it doesn't come out that far apart.

Agreed.... depends how much you value flexibility and time to work on your own projects.

As regards finding work: agencies are essential at first, tell everyone you meet you are a freelance software guy (keep it vague: they'll probe if they need someone), friends and contacts works great but not at first, try to find a "social technology hub" in your city. These are clubs that are frequented by people who work at the big tech places and socialise, this might be a hackerspace or an exercise club. They are not always easy to find.

2