Recent comments in /f/MachineLearning

pawsibility t1_jaep5s5 wrote

> The MLLM component has 24 layers with 2,048 hidden dimensions, 8,192 FFN intermediate size, and 32 attention heads, resulting in about 1.3B parameters. We use Magneto’s initialization for optimization stability. For faster convergence, the image representation is obtained from a pretrained CLIP ViT-L/14 model with 1,024 feature dimensions. The images are preprocessed into 224×224 resolution during training. We freeze the parameters of the CLIP model except for the last layer during training. The total number of parameters of KOSMOS-1 is about 1.6B.

If they use CLIP to generate image representations/embeddings as input to their model, isn't that kind of cheating when reporting numbers of parameters? Or is CLIP sufficiently small, and that's how they jumped from 1.3B to 1.6B?

6

7734128 t1_jaemc4b wrote

Doesn't really change anything, does it? A zero still has an effect, so it has to be there, so I assume you mean that it could use less memory, right? But is that technically feasible to do in a practical manner? I can't imagine a practical way to have a tensor of split precision weights without ruinous reprocessing when trying to use the weights.

6

Kinferatu t1_jaedgwb wrote

AutoML has a significant limitation when it comes to time series analysis - the inherent nature of time series data makes it challenging to obtain clean validation signals that can extrapolate to test results. This issue is often overlooked, and it can lead to inaccurate predictions and unreliable results.

26

SherbertTiny2366 t1_jaebzb0 wrote

From what I get, that is also the advantage of Fugue. From their Webpage:
> FugueSQL is designed for heavy SQL users to extend the boundaries of traditional SQL workflows. FugueSQL allows the expression of logic for end-to-end distributed computing workflows. It can also be combined with Python code to use custom functions alongside the SQL commands. It provides a unified interface, allowing the same SQL code to run on Pandas, Dask, and Spark.

https://github.com/fugue-project/fugue

5

ForceBru t1_jae3ugb wrote

Basically, k-means is an algorithm. You give it data and the number of clusters you want to find in the data. It finds these clusters and returns their centers (known as centroids) and possibly assigns each data point to a cluster. "Optimizing" a k-means algorithm doesn't make much sense, IMO. What you probably want to say is that you ran the algorithm and got some centroids.

If you run k-means with new data but tell it to use particular centroids (that you got from a previous run of k-means), then it'll use these centroids as starting points and update them to match the new data.

  1. Feed the algorithm some data.
  2. It internally chooses initial centroids. How to choose these very first centroids isn't a simple problem. They're usually chosen "randomly". For example, you can pick K distinct points from your dataset.
  3. K-means then does its thing and adjusts these initial centroids to best fit your data. This happens in several iterations.
  4. Finally, these adjusted centroids are returned.
  5. Now you put in new data and the centroids from the previous step.
    1. If the number of iterations is zero, there's nothing to be done, so the centroids remain unchanged.
    2. If the number of iterations is greater than zero, K-means performs these iterations and adjusts these centroids to better fit the new data.
  6. The new, potentially adjusted centroids are returned.

Basically, k-means will adjust the centroids you give it in such a way that these centroids define clusters that describe your data well enough. When you run k-means with no centroids, it'll generate random centroids for you.

3

CyberPun-K t1_jae0m46 wrote

While AutoML is a powerful tool for automated machine learning, it's not widely used by most people. Personally, I wouldn't pay thousands of dollars for fancy hyperparameter optimization. In most cases improvements are marginal.

One of the cool features of Big Query is its seamless integration with SQL queries, which makes data analysis much easier.

7

More-Horse-3281 t1_jadyg0x wrote

I have no experience with GCP AutoML, but I have experienced heavy overfitting when using FLAML and auto-sklearn. Did you experience the same? (I.e. AutoML outperforming the open source algos on training data?) I have the feeling that a lot of AutoML solutions „cherry-pick“ models that just happened to shine on the training data.

12

ForceBru t1_jaduboq wrote

What do you mean by "trained k-means algorithm"? K-means is an algorithm, there's nothing to "train" there. I guess you could fine-tune the number of iterations and the number of clusters somehow. Is this what you mean?

What do you mean by "training seeds"? Are these cluster centroids obtained after clustering training data?

2