Recent comments in /f/MachineLearning
pawsibility t1_jaep5s5 wrote
Reply to comment by abnormal_human in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
> The MLLM component has 24 layers with 2,048 hidden dimensions, 8,192 FFN intermediate size, and 32 attention heads, resulting in about 1.3B parameters. We use Magneto’s initialization for optimization stability. For faster convergence, the image representation is obtained from a pretrained CLIP ViT-L/14 model with 1,024 feature dimensions. The images are preprocessed into 224×224 resolution during training. We freeze the parameters of the CLIP model except for the last layer during training. The total number of parameters of KOSMOS-1 is about 1.6B.
If they use CLIP to generate image representations/embeddings as input to their model, isn't that kind of cheating when reporting numbers of parameters? Or is CLIP sufficiently small, and that's how they jumped from 1.3B to 1.6B?
karius85 t1_jaeoyq7 wrote
Reply to comment by 7734128 in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
Sparse matrices, but you would need quite a lot of zeros.
7734128 t1_jaemc4b wrote
Reply to comment by RetroPenguin_ in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
Doesn't really change anything, does it? A zero still has an effect, so it has to be there, so I assume you mean that it could use less memory, right? But is that technically feasible to do in a practical manner? I can't imagine a practical way to have a tensor of split precision weights without ruinous reprocessing when trying to use the weights.
mangotheblackcat89 t1_jaef0kf wrote
Very interesting results. The reduction in time and cost is definitely worth checking out in more detail.
Kinferatu t1_jaedgwb wrote
AutoML has a significant limitation when it comes to time series analysis - the inherent nature of time series data makes it challenging to obtain clean validation signals that can extrapolate to test results. This issue is often overlooked, and it can lead to inaccurate predictions and unreliable results.
[deleted] t1_jaed52q wrote
[deleted]
cristianic18 t1_jaec55g wrote
Very interesting comparison. Do you know why BigQuery takes much longer to run if it is using an ARIMA?
SherbertTiny2366 t1_jaebzb0 wrote
Reply to comment by CyberPun-K in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar
From what I get, that is also the advantage of Fugue. From their Webpage:
> FugueSQL is designed for heavy SQL users to extend the boundaries of traditional SQL workflows. FugueSQL allows the expression of logic for end-to-end distributed computing workflows. It can also be combined with Python code to use custom functions alongside the SQL commands. It provides a unified interface, allowing the same SQL code to run on Pandas, Dask, and Spark.
_throw_hawaii OP t1_jae9zxu wrote
Reply to comment by ForceBru in [D] Running a trained k-means clustering on new data with maximum number of iterations equal to zero or not? by _throw_hawaii
Thank you so much, so clear and helpful!!!🙏🙏🙏
ReasonablyBadass t1_jae7zhu wrote
Reply to [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
Can't read the paper right now, can someone summarize: is it a new model or "just" the standard transformers but used on multi modal data? if it is new, what are the strucutral changes?
More-Horse-3281 t1_jae4fj3 wrote
Reply to comment by No_Yogurtcloset_5639 in [Discussion] Open Source beats Google's AutoML for Time series by fedegarzar
GCPs AutoML is part of GCP Vertex AI.
ForceBru t1_jae3ugb wrote
Reply to comment by _throw_hawaii in [D] Running a trained k-means clustering on new data with maximum number of iterations equal to zero or not? by _throw_hawaii
Basically, k-means is an algorithm. You give it data and the number of clusters you want to find in the data. It finds these clusters and returns their centers (known as centroids) and possibly assigns each data point to a cluster. "Optimizing" a k-means algorithm doesn't make much sense, IMO. What you probably want to say is that you ran the algorithm and got some centroids.
If you run k-means with new data but tell it to use particular centroids (that you got from a previous run of k-means), then it'll use these centroids as starting points and update them to match the new data.
- Feed the algorithm some data.
- It internally chooses initial centroids. How to choose these very first centroids isn't a simple problem. They're usually chosen "randomly". For example, you can pick K distinct points from your dataset.
- K-means then does its thing and adjusts these initial centroids to best fit your data. This happens in several iterations.
- Finally, these adjusted centroids are returned.
- Now you put in new data and the centroids from the previous step.
- If the number of iterations is zero, there's nothing to be done, so the centroids remain unchanged.
- If the number of iterations is greater than zero, K-means performs these iterations and adjusts these centroids to better fit the new data.
- The new, potentially adjusted centroids are returned.
Basically, k-means will adjust the centroids you give it in such a way that these centroids define clusters that describe your data well enough. When you run k-means with no centroids, it'll generate random centroids for you.
_throw_hawaii OP t1_jae20rx wrote
Reply to comment by PredictorX1 in [D] Running a trained k-means clustering on new data with maximum number of iterations equal to zero or not? by _throw_hawaii
Yes, exactly. The maximum number of iterations is a parameter that can be usually set in some functions(in programming languages). So I was told when I had to implement the model with k-means on new data to set that number to zero
_throw_hawaii OP t1_jae19kp wrote
Reply to comment by ForceBru in [D] Running a trained k-means clustering on new data with maximum number of iterations equal to zero or not? by _throw_hawaii
Yes, sorry you're right. I meant that the k-means was originally applied (and optimized) to an initial dataset. Then those data have been updated, but the structure of the model has to stay the same(except for some parameters in the code)
CyberPun-K t1_jae0m46 wrote
While AutoML is a powerful tool for automated machine learning, it's not widely used by most people. Personally, I wouldn't pay thousands of dollars for fancy hyperparameter optimization. In most cases improvements are marginal.
One of the cool features of Big Query is its seamless integration with SQL queries, which makes data analysis much easier.
lateautumntear t1_jadykla wrote
Reply to comment by AvailablePresent1113 in [D] CVPR Rebuttal scores are out! by ElPelana
maybe giving 1-2 reviews to do instead of 7 would help.
More-Horse-3281 t1_jadyg0x wrote
I have no experience with GCP AutoML, but I have experienced heavy overfitting when using FLAML and auto-sklearn. Did you experience the same? (I.e. AutoML outperforming the open source algos on training data?) I have the feeling that a lot of AutoML solutions „cherry-pick“ models that just happened to shine on the training data.
No_Yogurtcloset_5639 t1_jadxmg7 wrote
What about Vertex AI is it any better?
PredictorX1 t1_jadudqt wrote
Reply to [D] Running a trained k-means clustering on new data with maximum number of iterations equal to zero or not? by _throw_hawaii
The result of k-means clustering is a set of cluster centers. Usually, I would think "running" it over new data would mean assigning each observation in the new set to one of those clusters. I'm not sure what the rest of your question is getting at.
ForceBru t1_jaduboq wrote
Reply to [D] Running a trained k-means clustering on new data with maximum number of iterations equal to zero or not? by _throw_hawaii
What do you mean by "trained k-means algorithm"? K-means is an algorithm, there's nothing to "train" there. I guess you could fine-tune the number of iterations and the number of clusters somehow. Is this what you mean?
What do you mean by "training seeds"? Are these cluster centroids obtained after clustering training data?
bluebolt789 t1_jadtupr wrote
Reply to comment by Donno_Nemore in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]
Yes, that’s what I was thinking too.
Thank you for your input!
Donno_Nemore t1_jaepw86 wrote
Reply to comment by _throw_hawaii in [D] Running a trained k-means clustering on new data with maximum number of iterations equal to zero or not? by _throw_hawaii
Using known centroids should be more stable . K-means can be stochastic for starting clusters and the result for the same data can vary.