FastestLearner OP t1_j4y3zw2 wrote on January 19, 2023 at 1:46 AM

Reply to [D] Idea: SponsorBlock with a neural net as backend by FastestLearner

Moderators, why did you delete the post? We were having such a good discussion.

lightofaman t1_j4y2y0o wrote on January 19, 2023 at 1:39 AM

Reply to [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

The direction of the gradient

SearchAtlantis t1_j4y02yt wrote on January 19, 2023 at 1:18 AM

Reply to comment by bitchslayer78 in [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

Lol I never even took analysis and I still know Little/Baby Rudin by osmosis.

Kingstudly t1_j4xygdq wrote on January 19, 2023 at 1:06 AM

Reply to comment by singularpanda in [D] Will NLP Researchers Lose Our Jobs after ChatGPT? by singularpanda

Not really. It's taking an input and providing the statistically most likely string of words that are associated with it. There's far more to NLP than that. Think about how a human can see a word they've never seen before and infer it's meaning based on context clues. I'm not sure any publicly available system can do that.

New words are entering every language constantly. There's no way to train such a massive model to keep up as fast as a human or purpose built system can.

TheFlyingDrildo t1_j4xw5oi wrote on January 19, 2023 at 12:50 AM

Reply to [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

Susan Athey and Stefan Wager's push towards generalized random forests is a major step forward in opening up the type of estimation tasks random forests are useful for, while simultaneously providing the theory for large-sample inference.

An underlying perspective in their research (and most modern random forest theoretical research) is that random forests are effectively kernel regressors, with the forest construction adaptively and implicitly defining the kernel. The component that ends up influencing the adaptivity of the kernel the most is what defines how two child nodes are formed from a parent node.

In the way we implement things right now we've chosen a few techniques for computational ease: random subspacing (controlled by an mtry hyperparameter), axis-aligned splits, and standard CART splitting rules. I think there is still a lot of work to be done here. An example of an interesting direction with respect to splitting rules is the Distributional Random Forests paper.

Edit: In terms of other hyperparameters that people care about, I have a few comments. The depth of the forest should be controlled by a min_samples_leaf parameter, which controls the local vs global trade-off in the kernel. Should pretty much always be be selected in a problem specific manner with a hyperparameter search, but generally should be quite small. It's choice is closely related to the n_trees hyperparameter, which should always be as large as you can afford computationally. An interesting research direction however may be how to adaptively figure out what value of n_trees is "good enough" - which there has been some work on through the analysis of the Purely Uniform Random Forests model.

Lastly, bootstrapping or alternatively subsampling percentage. I believe random forests should always have the honesty property, which naturally pushes us towards subsampling for the extra flexibility in the percentage of data point in the leaves. There could be work done here to determine the appropriate percentage for the split, likely based on convergence rates in learning the tree vs estimating the leaves. Definitely a strong interaction here with the min_samples_leaf hyperparameter. However, the extra variability induced by bootstrapping (and using out-of-bag for honesty) may have desirable properties for the kernel learning, though I believe it is subsampling that makes the large-sample inference theory tractable within our current understanding. Another worthy area of research.

bitchslayer78 t1_j4xv6k1 wrote on January 19, 2023 at 12:43 AM

Reply to comment by SearchAtlantis in [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

Shout out to ‘principles of mathematical analysis’, one of the best analysis books ever

No_Goat277 t1_j4xm4hr wrote on January 18, 2023 at 11:40 PM

Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice

I have work for you, PM me. We establish startup and need more brains and manpower.

londons_explorer t1_j4xknrt wrote on January 18, 2023 at 11:30 PM

Reply to [D] Automated Extraction of Building Geometry by EducationalLayer1051

If you want to make the assumption that most buildings don't have any curves in their roofs...

Then take your point cloud, extract the largest polygons... There are classical algorithms for such things.

From the polygons, turning that into a plan should be quite straightforward.

While ML could be applied... I think you'll get better results quicker with classical methods.

SearchAtlantis t1_j4xjn4r wrote on January 18, 2023 at 11:23 PM

Reply to comment by notdelet in [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

Cynthia Rudin at Duke? Just want to clarify because when I see Rudin I think Walter Rudin ala Baby Rudin for Analysis.

Wow just looked her up. I know it wasn't practical for me to do a 2y MS at that point in my life but really wishing I'd gone to Duke now. Interpretable learning is one of my favorite things - and operations research was a passion in undergrad. Those INFORMS papers.

mildresponse t1_j4xjmvw wrote on January 18, 2023 at 11:23 PM

Reply to comment by inquisitor49 in [D] Simple Questions Thread by AutoModerator

My interpretation is that the words should have different embedding values when they have different positions (context) in the input. Without a positional embedding, the learned word embeddings will be forced into some kind of positional average. The positional offsets give the model more flexibility to resolve differently in different contexts.

Because the embeddings are high dimensional vectors of floats, I'd guess the risk of degeneracy (i.e. that the embeddings could start to overlap with one another) is virtually 0.

mildresponse t1_j4xhvkg wrote on January 18, 2023 at 11:11 PM

Reply to [D] Simple Questions Thread by AutoModerator

Are there any easy and straightforward methods for moving ML models across different frameworks? Does it come down to just manually translating the parameters?

For instance, I am looking at a transformer model in PyTorch, whose parameters are stored within a series of nested objects of various types in an OrderedDict. I would like to extract all of these parameter tensors for use in a similar architecture constructed in Tensorflow or JAX. The naive method of manually collecting the parameters themselves into a new dict seems tedious. And if the target is something like Haiku in JAX, the corresponding model will initialize its parameters into a new nested dict with some default naming structure, which will then have to be connected to the interim dict created from PyTorch. Are there any better ways of moving the parameters or models around?

JClub OP t1_j4xgp2x wrote on January 18, 2023 at 11:03 PM

Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

You're not the first person that asks me that question! I need to add a more detailed explanation for that :)

The reward is non-differentiable because it was produced with a reward model, and this reward model takes text as input. This text was obtained by decoding the log probabilities of the output of your model. This decoding process is non-differentiable and we lose the gradient link between the LM model and the reward model.

Does this make sense? Also, if the reward is given directly by a human, instead of a reward model, it's clearer that this reward is non-differentiable.

RL helps transforming this non-differentiable reward into a differentiable loss :)

edunuke t1_j4xg06j wrote on January 18, 2023 at 10:59 PM

Reply to [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

Skimmed through a book about "fast and frugal decision trees" found it interesting. I don't think it is something new per se, but I found the concept useful in terms of data efficiency and explainability.

MajorValue1094 t1_j4xdtl7 wrote on January 18, 2023 at 10:44 PM

Reply to comment by ThrillHouseofMirth in [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116

Agreed, the design of GPT is to be indistinguishable from real text, hence you’re fighting a losing battle (unless you have millions to train a rival network). The only key may be in the way GPT interprets language, we are all aware of how it does not understand what it’s says. If you can find a way to target a pre-trained network at that you may have a chance but in theory by the nature of GPT you will loose.

dataslacker t1_j4xd5aj wrote on January 18, 2023 at 10:40 PM

Reply to [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub

That’s a nice explanation but I’m still unclear as to the motivation for RL. You say the reward isn’t differentiable but since it’s just a label that tells us which of the outputs is best why not simply use that output with supervised training?

hjmb t1_j4xbt1f wrote on January 18, 2023 at 10:31 PM

Reply to [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116

Take a look at Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods by Crothers, Japkowicz, and Viktor (open access preprint on the arXiv, from October 2022)

FastestLearner OP t1_j4x7rvp wrote on January 18, 2023 at 10:05 PM

Reply to comment by float16 in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner

Yes. Your first point is something that I would happily engage in as well. I have no problems contributing to the community. Moreover, the extension can have several additional options like:

(i) Do not perform any kind of inference on the client, i.e. always use query existing timestamps from an the online database. This will be helpful for users with low power devices like laptops.

or

(ii) Perform inference (only) for the video that the client wants. This is, of course, necessary if the video does not have any timestamps on the server. It does the inference and uploads the results on the central server.

or

(iii) Keep performing inference for new videos (even ones that are not watched by the particular user) - Some folks who runs a powerful enough hardware and are eager to donate their computation time can choose this option. I am pretty sure some folks will emerge who are willing to do this. The LeelaChessZero project banked entirely on this particular idea. For this option, there could be slider to let the user control how much of the resources to keep actively engaged (maybe by limiting thread count).

The second point that you mentioned could be a implemented with a peer-to-peer communication protocol, but if the neural network's weights don't change, then there would be nothing different with most recent vs. stale timestamps. Also, in P2P you'd still need trackers to keep track of peers, which could be a central server or be decentralized and serverless depending on the implementation. One potential problem could be latency though.

ThrillHouseofMirth t1_j4x7o9e wrote on January 18, 2023 at 10:04 PM

Reply to [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116

I don't think that there's any way to do so at this point and eventually someone will prove it. "Original" language virtually always is a recombination previous language of sufficient complexity and uniqueness.

A possible solution to this is AI language model providers to provide API's that allow people to check content against an archive of text that it generated.

Any solution needs to monitoring and telemetry based, the days of algorithmic checking are definitively over.

leocus4 t1_j4x5krq wrote on January 18, 2023 at 9:51 PM

Reply to [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

I worked on Interpretable RL with trees (e.g., https://ieeexplore.ieee.org/document/10015004 ). If you want, I can send you more references to related work.

dangerhexagon t1_j4x2yrp wrote on January 18, 2023 at 9:35 PM

Reply to [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

There's some papers on applying transformers to trees: https://arxiv.org/abs/1909.06639 , https://arxiv.org/abs/1911.09983 , https://papers.nips.cc/paper/2019/hash/6e0917469214d8fbd8c517dcdc6b8dcf-Abstract.html

And some recent work on tree extraction: https://arxiv.org/abs/2301.00447

There's also this paper which recovers a tree by observing the leaf nodes: https://arxiv.org/abs/2208.14924

SnooHesitations8849 t1_j4wznru wrote on January 18, 2023 at 9:04 PM

Reply to [R] Researchers out there: which are current research directions for tree-based models? by BenXavier

deep forest model. used in many financial applications in China.

KerbalsFTW t1_j4wtp6e wrote on January 18, 2023 at 8:28 PM

Reply to [D] Has ML become synonymous with AI? by Valachio

> Is it fair to say that AI and ML are synonymous now in 2023? Or are there people who are still actively working on non-ML techniques for building AI?

AI means "I am a lay person or a media person talking to lay people".

ML means "I know what I'm talking about".

AGI means "I know what I'm talking about but I don't know what it is or how to build it".

The term 'AI' followed a previous hype-then-disappointment curve and got a bad name. Researchers restricted themselves to "things that worked" and called it Machine Learning to imply that we are teaching models and they are learning which is obviously true, rather than implying "this thing is intelligent" which it probably isn't.

Side topic: humans keep moving the bar on what counts as intelligent. It used to be "can play chess" and it then was "can play Go and hold a conversation" and then it was "can draw and show creativity". Humans will keep moving the bar on the definition of intelligence for as long as (humanly) possible.

CaptainDifferent3116 OP t1_j4wsqjz wrote on January 18, 2023 at 8:22 PM

Reply to comment by Anjum48 in [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116

They don't offer free trial . Who the hell does that ! I won't pay 20$ just to see the perf.

KerbalsFTW t1_j4wschy wrote on January 18, 2023 at 8:20 PM

Reply to comment by farox in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice

Ex-software freelancer here.

> First thing is that as a freelancer you're not part of "the team". This can be good or bad for you, I think it's fantastic.

Agreed, but with a few caveats:

You always need a plan for your contract to end, including early. (Never happened to me, but I always planned for it).
Companies will eventually try to treat you like staff: assuming you'll always be there and they can tell you what to do rather than asking if you'll do something. At this point you need to start telling them about the break from them you are about to be taking.

> In my experience most small companies won't have use for you. For one, you'll be more expensive than their employed staff, but they also want to keep that know how in house.

Disagree here: small companies struggle to get a wide enough set of skills, and they also have projects that need finishing without expanding their committed outgoings.

There are two major downsides to freelancing:

Location. If you are not in a very big tech city you will have to frequently relocate, or work primarily from home (in which case you are competing with very, very cheap people).
Skills. Companies do not give you time to learn the next big thing. You are expected to turn a profit for them from day 1. If they are going to be investing in their staff learning new things, it will be with staff they expect to stick around.

> Another thing to keep in mind: Do not go into this for the money. If you factor everything in: Vacation, sick days, hardware, licenses, pension/retirement (rule of thumb: 30% of your net income) etc. it doesn't come out that far apart.

Agreed.... depends how much you value flexibility and time to work on your own projects.

As regards finding work: agencies are essential at first, tell everyone you meet you are a freelance software guy (keep it vague: they'll probe if they need someone), friends and contacts works great but not at first, try to find a "social technology hub" in your city. These are clubs that are frequented by people who work at the big tech places and socialise, this might be a hackerspace or an exercise club. They are not always easy to find.

CaptainDifferent3116 OP t1_j4wsanw wrote on January 18, 2023 at 8:19 PM

Reply to comment by stablebrick in [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116

I tried that but didn't work very well

Recent comments in /f/MachineLearning