Quantum cluster algorithm for data classification

We present a quantum algorithm for data classification based on the nearest-neighbor learning algorithm. The classification algorithm is divided into two steps: Firstly, data in the same class is divided into smaller groups with sublabels assisting building boundaries between data with different labels. Secondly we construct a quantum circuit for classification that contains multi control gates. The algorithm is easy to implement and efficient in predicting the labels of test data. To illustrate the power and efficiency of this approach, we construct the phase transition diagram for the metal-insulator transition of VO2, using limited trained experimental data, where VO2 is a typical strongly correlated electron materials, and the metallic-insulating phase transition has drawn much attention in condensed matter physics. Moreover, we demonstrate our algorithm on the classification of randomly generated data and the classification of entanglement for various Werner states, where the training sets can not be divided by a single curve, instead, more than one curves are required to separate them apart perfectly. Our preliminary result shows considerable potential for various classification problems, particularly for constructing different phases in materials.


Introduction
Machine learning techniques have demonstrated remarkable success in numerous topics in science and engineering, including artificial intelligence [1,2], molecular dynamics [3], light harvesting systems [4], molecular electronic properties [5], surface reaction network [6], density functional models [7], phase classification, and quantum simulations [8,9,10,11,12,13,14].In addition, modern machine learning techniques have also been applied to the state space of complex condensed-matter systems for their abilities to analyze exponentially large data sets [9], speed-up searches for novel energy generation/storage materials [15,16] and classification of entanglement [17].With the rapid development of quantum computers [18,19,20,21,22], it has become a new frontier to recognize patterns using quantum computers.Considering recent advancements in both quantum computing and machine learning, the combination of the two techniques -quantum machine learning -is expected to be a promising application of quantum computer in the near future.Many quantum machine learning algorithms were proposed in the past few years [23,24,25,26,27].Moreover, researchers have succeeded to apply quantum machine learning algorithms to various systems such as superconducting circuits [28] and photonic systems [29], which leads to enormous enthusiasm applying quantum algorithms into various areas [30,31,31,32,33,34,35].
There is no doubt that we are now in the age of big data and there is an urgent need for developing game-changing quantum algorithms to perform machine learning tasks on large-scale scientific datasets for various industrial and technological applications based on optimization.For a proof of concept, Du and coworkers have successfully distinguished handwriting numbers '6' and '9' with the quantum support vector machine [36].However, it could be difficult to deal with more challenging problems , especially when the training data can not be divided apart by a single curve, instead, more than one curve or even enclosed curves might be required to separate them apart.Another remarkable development is applying quantum machine learning on variational circuits [37,38], which theoretically, can always be able to classify data with complex distribution.Yet generally, these algorithms rely heavily on a gradient-based systematic optimization of parameters [39,40].On the other hand, quantum nearest neighbor algorithm [41] offers another option to classify data without the gradient based optimization process.In brief, the core of nearest-neighbor classification algorithm is to assign the training vectors into classes, and in each class vectors are close to each other.
In this study, we will propose a quantum classification algorithm, with which we can build a quantum circuit that is able to classify artificially generated data, and all parameters in the circuit can be obtained without relying on the gradient based optimization process.For this purpose, we introduce 'sublabels' to assist in classifying data with intricate distribution, where 'sublabel' represents a minor label subordinates to the main one, it also called 'subclass'.There are two main tasks in our developed algorithm: how to find the appropriate sublabels and how to build the quantum classification circuit with these sublabels.With the numerical simulation we will demonstrate the application on various classification problems, especially on constructing different phases of materials.First, in section 1, we will present the basic elements of the algorithm.Then, we will apply the algorithm for classifications for several systems: classification of metallic and insulating phases in the phase diagrame of VO 2 ; classification of entanglement in Werner states, and classification of randomly generated data.Finally, we will present scaling analysis and discuss generalization of quantum classification algorithm in higher dimensional space.In addition, we present in the supplementary materials all the details of the quantum classification algorithm with examples.

Algorithm design
Consider the training data set {x i , y i }, where x i is a vector in R d , where d is the dimension and y i represents the label with possible values {l 1 , l 2 , . . .l M }.Our goal is to build a quantum classification circuit that can be used to predict the label for new vectors {x t }.The classification algorithm is divided into two steps: The first step is a learning process, where one needs to find the "sublabels" for each class of the training data.Then, based on the information obtained in this learning process, we construct a quantum classification circuit that contains multi control quantum gates.
In the learning process, firstly we apply the Lloyd's algorithm [42] for unsupervised machine learning, which assigns training vectors to the same class as the closest mean vector.However the results derived by Lloyd's algorithm can not be used directly as there might be a sublabel redundancy or not enough sublabels to reconstruct the initial distribution.To address this issue, in addition to the algorithm for clustering, we propose to use two adjusting algorithms: one to reduce excessive sublabel and the other to make sure that there is no overlap between sublabels.
For each sublabel, we need to store the information and build a quantum circuit to estimate the inner product, which is shown in fig.(1).When x i are vectors in two-dimensional space, each data point (or sublable) could be represented by a single qubit.An arbitrary state of a single qubit could be written as where α is the global phase and the vector x i = (x 1,i , x 2,i ) is mapped as Here max{x 1 } and max{x 2 } represent the maximum value of all x 1,i and x 2,i respectively, while min{x 1 } and min{x 2 } represent the minimum value of all x 1,i and x 2,i respectively.Then we need to find a measure describing the distance between the two states, where the 'distance' might be the Euclidean distance between the two vectors [41], or the inner product of their two corresponding quantum states.
Here, we chose to calculate the inner product, as calculating the Euclidean distance is more time and resource-consuming.An arbitrary state of a single qubit as shown in Fig. (1), could be prepared by three rotational gates: Thus, the inner product between |ψ(θ 1 , φ 1 ) and |ψ(θ 2 , φ 2 ) is given by: The circuit that estimates the inner product will contain six rotational gates, as shown in Fig. (1).After a measurement of the final state in the Z-basis, the probability of getting a state |0 will be an estimation of the inner product ψ(θ 1 , φ 1 )|ψ(θ 2 , φ 2 ) .For simplicity, in the following sections, we will write Fig. 1: Sketch of the quantum circuit for estimating the inner product: Circuit to estimate the inner product of two-dimensional vectors contains six rotational gates.Additionally, we also need a memory to store the information of this group, which could be represented by θ m , φ m , and an integer N , which represents the total number of data in this group.
Moreover, for every sublabel it is required to store 2 floating numbers θ m and φ m that represent the centroid vector of this subgroup, and an integer N representing the total number of data points in this subgroup.
In the learning process, three basic algorithms are applied to assist in obtaining "sublabels" from the given training data.Algorithm (S1) is designed for an initial clustering of the training data.When designing Algorithm (S1), we refer to Lloyd's algorithm [42]in which we need to assign each vector to the cluster with the closest mean, and then recalculate the centroids of the new cluster.Algorithm (S1) will divide the training data with the same prior label into several subgroups.Algorithm (S2) will reduce redundancy, and Algorithm (S3) is introduced to make sure there will be no overlap between any two left sublabels of the different prior labels.The goal is to leave only the minimal sublabels without losing important information of the training data.After applying Algorithm (S1), and repeating Algorithm (S2), Algorithm (S3) for a number of times, we can get a set with minimal sublabels and information of the centroid vectors for each subgroup (Details of these three algorithms are in the supplementary materials).Now, the next step is to build the quantum classification circuit based on the previously obtained information.Consider the following sublabel-control operations, where the operations U (L i ) are obtained with the aim to reach the final state |Ψ f Li .If one wants to set all vectors belong to prior label L i close to the final state |Ψ f Li , then U (L i ) can be chosen to satisfy To classify a test vector with unknown label, we prefer to rely on a single quantum classification circuit, instead of repeating comparing inner products with the training data.The very basic and intuitive idea is to measure inner product between the new one and all centroid vectors of subgroups.Circuits for classification will consist of two parts: the control qubits representing the sublabels, and others representing the given new vector.Fig.( 2) is a sketch showing the structure of the main circuit.First, one map the test data x t into the prepared circuit as where |l i are the orthogonal eigenstates for the control qubits, representing sub labels l i , and there are n sublabels in total.In the mapping process, we need to apply Hadamard gates for L-qubits (representing the sublabels), and apply operator T (x t ) on the V-qubits (representing the test data), where |ψ(θ, φ) = T (x t )|0 .The quantum classification circuit can be described as: where there are N qubits used to represent sublabels, and n sublabels totally.We can notice that for By applying a measurement with eigenstate |l k |Ψ f L k , the inner product ψ(−θ l k m , −φ l k m )|ψ(θ, φ) can be estimated.Fig. (2) is a sketch of the main circuit, where T (x t ) represents the given test data, l is the sublabel and L is prior label.

L-qubits |0
V-qubits |0 Fig. 2: Sketch of the main circuit : Qubits in the main circuit can be divided into two groups: L-qubits to represent the sub labels, and will play the role of control bits and the V-qubits to represent a given vector.Initially, L-qubits will be prepared at state , where there are N qubits representing the sub labels.The minimum N = log 2 n , and n is the total number of sub labels.In sum n control rotation gates are needed.
When predicting the label for a new test vector, we assume that the probability for each possible sublabels is the same.Based on this assumption, we applied N Hadamard gates on the label qubits, preparing them as a uniform superposition state.In fact, one can always adjust the probability for sublabels based on the training result, and only keep the states representing 'valid' sublabels.Assume that we finally derived K sublabels, and N qubits are used as label qubits, where 2 N −1 < K ≤ 2 N .For a training data set with two labels {x i , y i }, y i ∈ {0, 1}, K 0 sublabels are assigned with label y = 0, while the other K 1 labels with y = 1, K 0 + K 1 = K.Then the Hadamard gates H ⊗N can be replaced as some certain operation, in order to prepare the label qubits at state where p n ≥ 0, and Obviously, for sublabels assigned with label y = 0, the first label qubit is set as state |0 , and for the others the first qubit is set as |1 .For a new test data, one only needs to measure the first label qubit and the data qubits.By comparing P (q 1 = |y , V = |Ψ f y ), y i ∈ {0, 1}, the new test vector will be assigned with label corresponding to the larger P , where P (q 1 = |y , V = |Ψ f y ) represents probability to find the first label qubit q 1 at state |y and meanwhile find the data qubits at state |Ψ f y .p n are chosen to maximize where λ ≥ 0 is the penalty.

Classification of metallic-insulating phases of vanadium dioxide
Strongly correlated electron materials and their phase transitions have attracted great interest for device application and in condensed matter physics [43,44].Recently, metallic-insulating phase transition of vanadium dioxide (V O 2 ), as a prototype of strongly correlated electronic materials, attracted experimental and theoretical attention for its distinct structures and electronic properties [45,46].
Here, we will apply the developed quantum classification algorithm to distinguish the metallic state from the insulating state of V O 2 .Data used in this section is based on experimental results reported in Ref. [47].V O 2 exhibits several special structures under different temperature and pressure.As shown in show the initial clustering results after applying the Algorithm (S1) once.In Algorithm (S1) one should determine the parameter D ∈ (0, 1], which is the minimum inner product between the centroid vector and arbitrary vectors with the same sublabel.In other words, a vector can be assigned to a certain sublabel when the inner product between itself and the centroid vector of the sublabel are larger than D.Here we set D = 0.99.Now, we need to repeat Algorithm (S2) and Algorithm (S3) several times to reduce the excessive sublabels as shown in , blue and red spheres are used to represent data with the same sublabel, where the center of sphere represents the average vector, and their radius is used to represent the number of vectors belonging to a certain sublabel.In Fig. (3(d)), we demonstrate the prediction results of arbitrary given data.States in the yellow parts are predicted to behave as metallic states, and the ones in the blue parts are predicted to be insulating states.In the training process, we used 100 data vectors for the metallic states and 100 data vectors for the insulating states, out of the totally 1100 data vectors.
Simulation results shows that our developed quantum algorithm can efficiently classify metallic or insulating states of V O 2 .Finally we need 7 sublabels for insulating states and 8 sublabels for metallic states, which means that the classification circuit consists of 5 qubits (4 for sublabels as control qubit and 1 for data).
It is important to note, when Pressure (P ) or Temperature (T ) is small, prediction can be incorrect.The error appears because of the fact that few vectors in these area where used in the training process.Moreover, when we convert a vector (P, T ) into quantum states, we changed them as θ, φ ∈ [0, 2π).Though classically T = 0 o C and T = 120 o C are extremely different, when we convert them as angles, θ = 0 and θ = 2π are nearly the same.One might notice that there is a slim yellow line around P = 20GP a in our prediction.It is reported in Ref. [47] that there is a structural transition between the state M 1 and M 1 .However, this is not a metallic-insulating transition.In our classification algorithm, we did not expect to predict this transition and the slim line shows up "coincidentally".To get a better prediction results, an option is to map both training data and test data into However, here we do not focus on the performance at low temperature, nor the phase transition about P = 20GP a.As a result, mapping data into [0, 2π] × [0, 2π] is still an acceptable choice.More simulation results of the phase transition with different training data are offered in the supplementary materials.New vector in the blue part will be recognized with label 'insulating', and label of new vectors in yellow part will be predicted as 'metallic'.Blue and red dots are still the initial data.

Classification of randomly generated data
Here we will show another classification example, where the distribution of training data is artificially generated randomly.Different from the example of V O 2 , here the two groups can not be divided with a simple single boundary.We generated 1100 red points and 1100 blue points at random, from which 100 red points and 100 blue points are picked up randomly as training data, the others will be used as test data.All test points are shown in ).The distribution of initial data makes it more challenging to classify the test data.After appropriate learning process, 54 sublabels (22 for red and 32 for blue) are obtained from the training data.Finally a classification circuit can be build with 7 qubits, 6 as control qubits and 1 for the given data.Prediction for labels of test data is shown in Fig. (4(c)), where light blue dots are training points that are predicted as 'blue', yellow dots are predicted as 'red', red and blue cross represent the training data.Totally, 878 red test data and 836 blue test data are classified correctly, the matching rate for red and blue points can be estimated as 87.8% and 83.6% respectively (when calculating match rate the training data are all excluded)./2

Entanglement classification in Werner states
Further, our method can also be applied in entanglement classification.Consider the following scenario, Alice wants to send a message to Bob, in which she will send some entangled photon pairs of states as digit 1 and some pairs at an untangled state as digit 0. Initially, Alice will send some photon pairs at various states for training by informing Bob which pair represents 1 and which represents 0. Later she will use photon pairs states for communication.Although the widely used CHSH inequality [48] can be used to detect entanglement as the violation of CHSH inequality guarantees the existence of entanglement, however we can't make any conclusion if the inequality is not violated.To address this issue consider Werner states in the density matrix form: where I is the 4 × 4 identity matrix, the parameter 0 < p < 1, and |Ψ B (φ) is the Bell state given by: Assume that Alice uses Werner states to transport information, while Bob will carry on the Bell test experiment with the following measurements: Z, X; Z+X √ 2 , and Z+X √ 2 .From the measurement results, Bob will calculate four important correlation functions N ++ are the number of photon pairs whose measurement results are both +1 in the two channels.If Alice sets φ = 0, π, Bob will observe violation of CHSH inequality for p > 1 √ 2 .If Alice sets φ = ± π 2 , Bob can never observe violation of CHSH inequality.However, ρ W (p, φ) will be entangled state when p > 1  3 .Consequently, CHSH inequality will not be a good classification way for Bob.Instead, if Bob can set up a machine learning based on neural network, he will be able to 'decode' Alice's information with a much higher match rate [17].Here, we will show that our quantum classification algorithm can classify entanglement states in such Werner state.We will take the 4-dimesinal vectors , and E(X, Z−X √ 2 ) as input calculated based on measurement results.By changing φ and p in Eq.( 9), we can generate different entangled or untangled states.We prepared 200 entangled states and 200 untangled states as the test data.Moreover, we also generated a few different training data set, in each set half are entangled and the other half are untangled.After learning based on different training set, we can build quantum classification circuit to distinguish entanglement from the test data, and the simulation results are shown in Fig. (5).In (a), the training set only contains 32 points, and we keep them all as sublabels.So that there are 7 qubits in the classification circuit (5 for sublabels and 2 for test data).12 points are predicted with wrong label.In (b) the training set contains 64 points and all are kept as sublabels, and there are 8 qubits in the classification circuit.Finally 8 points are predicted with wrong labels.In (c) there are 128 points in the learning set and we derived 64 sublabels.Similarly as (b) we need 8 qubits to build the classification circuit, and only 6 qubits are predicted with wrong sublabels.However in (c) we do not plot the sublabels, as by applying our classification algorithms we can only get the parameters E for every sublabel instead of r, φ.In these figures the points (r, φ) represent Werner states.Notice that r, φ are not used in the learning or classification process, as Bob does not know exact r, φ either.In the supplementary materials we provide details of the simulation.
In the above discussion we assume that in communication between Alice and Bob, all measurement results are discrete, and one can easily calculate the parameters E. However, in chemical reactions the measurement results are often continuous, and one will get some special distributions after measurement.Recently, Zare and coworkers [49] reported the rotationally inelastic collisions of deuterium hydride (HD, prepared at certain quantum states) with H 2 and D 2 under extremely low temperature (mean collision energy around 1 K), and they found that the orientation of HD molecules will lead to different distribution of scattering angle [49].If the scattering experiment is applied as measurement to detect entanglement, then it would be nearly impossible to derive information about entanglement directly from the intricate raw data.Under such situations, some special methods, as we discussed in ref. [50], are required.With the assistance of auxiliary functions it will be possible to obtain the parameters E from raw measurement results, after which by the same procedure we can build a quantum circuit for classification.

Further Discussions
So far, our classification has been restricted to two and four-dimensional vectors.Here, we discuss how to use it to classify vectors in higher dimensional space.Depending on the structure of qubits, two different mapping methods can be used: Mapping method I: An arbitrary quantum state of N-qubits can be described as where c i can be rewritten as a function of Then there exists a mapping operator T (Θ), |Ψ(Θ) = T (Θ)|0 , by which one could map a vector in (2 N − 1)-dimensional space into a quantum state of N-qubits.For this mapping method, the main structure of circuit is still like the one shown in Fig. (2).If all qubits in the main circuit are connected with others and we could build arbitrary quantum gates between any qubits in the main circuit, then we can obviously apply this mapping method I.However, sometimes connection in the machine could not satisfy our demand, then the second mapping method might be more acceptable Mapping method II: An arbitrary untangled quantum state of N-qubits can be described as: Then we could map the vector into the untangled quantum state.Method II requires a circuit where the qubits representing sub labels are connected with all the qubits representing our vector, yet the connection between the 'data qubits' are not required.A sketch of the main circuit using method II can be found in fig.(6), where for simplicity, we note that Fig. 6: Sketch of the main circuit for Method II: Qubits in the main circuit can be divided into two groups: One group will represent the sub labels, and will play the role of control bits, as the L part in the figure.The other will represent the given vector, as the V part in the circuit.Furthermore, qubits representing the vectors are divided into a few groups (In this Figure, 2 groups), and the sublabel qubits will control them respectively.We need to measure all of them to get our results.Connection between the V qubits are not required in this circuit.
For the complexity analysis, let us assume that M d-dimensional vectors are offered as training data, thus log 2 d qubits are required to represent the vectors.When measuring the inner product of two vectors, O {exp (log 2 d)} times of measurements are required.In the learning process to obtain sublabels, as discussed in sec.I, we need to repeat calculating inner products for O M 2 times.Totally, the time complexity to obtain sublabels is O M 2 d .Then assume that we finally obtained L sublabels, then log 2 L qubits are required to represent the sublabel.Thus, the quantum circuit to predict labels of test data contains L multi-control gates.If we prepare the label qubits at uniform superposition state, then all qubits should be measured at last.For the label qubits, O {exp (log 2 L)} times of measurements are required, while for the data qubits O {exp (log 2 d)} times of measurements are required.As a result, we need to repeat measurement for O (Ld) times.However, if we prepare the label qubits at state eq.( 7), then we only need to measure the first label qubit, and required times for measurement will be O (d).

Conclusion
In summary, we developed a quantum classification algorithm where the training data is firstly clustered and assigned as various sublabels, and then based on these sublabels the quantum circuit is built for classification.Further we applied this method for classifications of metallic-insulating transition in V O 2 , distinguish entanglement in Werner states, and classify some randomly generated data.Numerical simulation result shows that our algorithm is capable for various classification problems, especially the study of phases transition in materials.
FOR All other sublabels with same prior label (θ END FOR 18 END FOR Algorithm.S2: Sketch of the basic algorithm to reduce redundancy One might notice that in Algorithm (S2), we did not consider overlap between subgroups with different sublabels.Theoretically, one could always avoid such overlap by choosing small 'D' initially.If we choose D = 1 at start, then no overlap will be between these subgroups.However, to make our algorithm more efficient, D should not be too small at the beginning.Here we encourage to apply an additional algorithm to deal with subgroups with different sub labels, as shown in the following Algorithm (S3), which shares some similarity with Algorithm (S2).In fact, one could combine them together to reduce redundancy and overlap at the same time.The fundamental idea of Algorithm (S3) is to divide one subgroup that has overlap with others into a few smaller ones.Moreover, one could always rewrite line 15 as is too close to 0 then we might need to repeat Algorithm.(S3)many times to make sure that subgroups with different prior labels are untangled, while if f (d lj i ) itself being too close to 1 will lead to redundancy once again.
, Choose an arbitrary vector in this sublabel as averageθ m,i , φ m,i ; 9 FOR other data with this sub label New data belongs to a new sub label.Set a new devise T + 1. 17 END FOR 22 END FOR Algorithm.S3: Sketch of the basic algorithm to reduce overlap between groups with different prior labels After repeating Algorithm (S2,S3) several times, all the training vectors are divided into several sub groups with unique sublabels, and redundancy is deduced to minimum while sub groups with different prior labels are still distinguishable.Also, we have calculated their centroid vectors, which are stored in the devices as θ lj m , φ lj m .Next step is to build classification circuit based on these information, as discussed in the main article.

Detailed example of the quantum classification algorithm
In this section, we will offer an example to demonstrate how the Algorithms (S1,S2,S3) work.To get the training data, we generated some random vectors in 2-dimensional space.These vectors are labeled as 'red' or 'blue', and color is prior label.Distribution of these data is shown in Fig. (S1a).One note that the red ones can be divided into a few subgroups and so can the blue ones.Firstly Algorithm (S1) is applied, as shown in shown in Fig. (S1b) the training data are divided into a great number of small subgroups.Redundancy appears because that we choose large D = 0.9.To reduce excessive sublabels, next step is repeating Algorithm (S2,S3) several times, after which we can get the results shown in Fig. (S1c).Now only five sublabels are left.In Fig. (S1b,c), we use blue or red spheres to represent data in different subgroups, where the center of sphere represents the average vector of a subgroup, and the radius of sphere represents number of vectors in the subgroup.
2 In Fig. [1(c)], we notice that we can not reduce the number of sublabels further, and at minimum 5 sublabels are required to describe the distribution of training data.As 2 2 < 5 < 2 3 , 3 qubits are required as "Label-qubit" which play roles as control qubit.In this case all training vectors are 2-d vectors, so that we need one qubit to represent a given vector.For at last the training vectors are divided into 5 subgroups, the main circuit will contain 5 "Control-Control-Control-U" (CCCU) gates.Based on the final left sublabels and their centroid vectors, classification circuit for new vectors can be built as Fig.
Fig. S2: Sketch of the circuit for classification.T (x t ) converts state |0 into a quantum state representing the given test data x t .q 1,2,3 are label qubits, and q 4 represents the given vector.U are single qubit control gates, and their parameters are derived in the previous steps.
If one tries to simulate this main circuit on IBMQ 5-qubit machine, an auxiliary qubit is required.As not all qubits in the machine are connected with each other in IBM's machine.Assume that we can build arbitrary gates among q 1,2,3 or among q 3,4,5 , yet q 1,2 and q 4,5 are not directly connected with each other, then fig.(S3) can be a ketch for the 'CCCU' gates.
Fig. S3: Structure of the 'CCCU' gate in main circuit Here we show how to build the 'CCCU' gate (exactly, CCCU 010 ) in the main circuit with basic quantum gates, and the Toffoli gates can also be decomposed into a few CNOT gates and T gates (Or π/8 gate).As not all qubits in IBM Q machine are connected with each other, we introduce an auxiliary qubit (q3) to help us build the gate.q1, q2 and q4 are used to represent the sub label, and q5 is used to represent the given vector.V is derived from the operator U 010 , and U 010 = V 2 .The X gates are used as that we assume this sub label of is represented by state |010 .H gates are used as a preset of the sub label, and operator T (x t ) represents the mapping process.
Here we set all |Ψ f L as |0 .Then Parameters of gates U satisfy that where |Ψ m,i is state that represents the centroid state of all vectors with sublabel i.As an example, if a new vector x t = (0.5π, 0.8π) is given as test vector, we can get results as following from IBM's simulator: q 1,2,4 000 001 010 100 101 q 5 = 0 9.668% 10.84% 0.293% 3.125% 2.832% In our simulation, 000 is sublabel for the central red subgroup, 001 represents the red subgroup at left bottom, and 010 represents the red subgroup at right upper right.100 represents the blue subgroup at upper left, and 101 represents the blue subgroup at right bottom.So only the states shown in the table matters.For sublabel q 1 q 2 q 4 , we can calculate P (q 1 q 2 q 4 ; q 5 = 0) − P (q 1 q 2 q 4 ; q 5 = 1).The maximum P (q 1 q 2 q 4 ; q 5 = 0) − P (q 1 q 2 q 4 ; q 5 = 1) corresponds to sublabel of this test vector.With results shown in the table, we know that the test vector is closer to subgroup 001 than any other subgroups, obviously it is "red".
Using the same method we can predict label for arbitrary vectors [0, 2π] × [0, 2π].Result is shown in fig.(S1d), where new vectors in light blue part will be recognised as label 'blue', and ones in the light yellow part will be predicted with label 'red'.

Classification study of the molecular HD scattering experiment
In the recent scattering experiment of the HD molecule with H 2 cluster, states where the orientation of HD molecules is parallel to propagating direction |H and states where the orientation of HD molecules is vertical to propagating direction |V lead to two different scattering results.As shown in fig.(4(a)) the blue and red curves represent the distribution of scattering angles corresponding to the states |H and |V respectively [49].We can regard the initial state of the HD molecules as labels, and assume that we are provided with the standard results for state |H and |V .Then if one carries on the same scattering experiment without knowing the initial state of HD, it is possible to use our quantum classification algorithm to distinguish the initial state of the HD molecule.
In our simulation, we used 6 qubits to build the quantum circuit for this classification, 1 qubit will represent the label and the other 5 qubits are used to describe the distribution of scattering angles, where the scattering angle is divided into 2 5 = 32 slots.For instance, assume that there are 1000 HD molecules at the state |H scattered with H 2 clusters, and distribution of these 1000 scattered particles is shown in fig.(4(b)).After counting particles in each slot, we could map the measurement result f (θ) as state ψ[f (θ)], and input state in the quantum circuit can be chosen as .When only a few particles are scattered, it is impossible to distinguish the initial state, and the matching rate for both |H and |V are quite small.When more molecules are scattered, the pattern in the distribution will be clear, and the matching rate for |H will increase rapidly, until it is close to 0.5.Yet it can no more be greater than 0.5 because of fluctuation.On the other hand, the matching rate for |V will increase at first and finally be stable around 0.12, which infers the overlap between the standard distribution of state |H and |V .
Simulation results for phase transition in VO 2 We can notice that for all these four training data sets, our algorithm can still classify different states efficiently.In fig.(S5a) training vectors are chosen artificially instead of randomly, and we can notice that the predicted boundary is no more close to black curve.When we starts choosing data randomly, as in fig.(S5b,c,d), the predicted boundary between metallic and insulating states will still be around the black curve.Further, when we use data closer to the black curve, the prediction will be more accurate, as the results shown in fig.(S5d) is better than in fig.(S5b,c).

Simulation results for randomly generated data
Intermediate simulation results are shown in fig.(S6).(a) shows results after applying the cluster algorithm, and (b) shows sublabels left after repeating the adjust algorithms to reduce redundancy.As we mentioned in the main text, finally 22 red sublabels and 32 blue sublabels are left.Though we can reduce numbers of sublabels further, we can not reduce the number of total sublabels to 32 or less without losing

Fig.( 3
(a)), the red dots represent metallic state, the blue dots represent the insulating state, and the black solid line represents the phase transition line.Note, our training data were chosen far from the phase transition line in order to test the classification power of the designed quantum algorithm.Fig.(3(b)) Fig.(3(c)).After repeating Algorithm(S2) and Algorithm (S3) three times, the number of classes can not be reduced further as shown in Fig.(3(c)).In both Fig.(3(b)) and Fig.(3(c))

Fig. 3 :
Fig. 3: Classification of metallic and insulating states of V O 2 : (a)Initial data used for classification.Red dots represent metallic state, and blue ones represent insulating state.Phase transition line indicated by the black solid curve.(b)Forming subgroups after applying algorithm[S1] once.Similarly in (b) and (c), blue or red spheres are used to represent data with the same sublabel, where the center of sphere represents the average vector, and the radius represents number of vectors belong to this sublabel.(c) Results after repeating algorithm[S2,S3] 3 times.(d)Prediction of new data.New vector in the blue part will be recognized with label 'insulating', and label of new vectors in yellow part will be predicted as 'metallic'.Blue and red dots are still the initial data.
Fig.(4(a)), and the training points in Fig.(4(b)).The two isolated groups of red points in Fig.(4(a)) are scattered along with the blue ones covering the whole area as shown in Fig.(4(b)

Fig. 4 :
Fig. 4: Classification of randomly generated data: (a) Training data.In total, we generated 1100 blue points and 1100 red pints by the same generating function.100 red points and 100 blue points are chosen randomly as training data, and the left pints are used as test data.(b) Training data including 100 red points are 100 blue points.(c) Prediction for the labels of test data.Light blue points are test data predicted as 'blue', and yellow points are predicted as 'red'.Red and blue cross represent training data.

Fig. 5 :
Fig. 5: Entanglement classification for Werner states: In the plots, the radius represents the parameter p, and the angle represents φ.Every single dot represents a Werner state.Yellow dots represent test data that are predicted as 'untangled', and light blue dots represent test data predicted as 'entangled'.Cross represent training data, and red cross for untangled states, blue for entangled states.In all three figures we used the same 400 test data, half of which are entangled and the other half are untangled.Half of the training data are entangled states and the other half are untangled states.(a)32 test data are used, all are kept as sublabels.(b)64 test data are used, all are kept as sublabels.(c)128 test data are used, and only 64 of them are used as sublabels.

3 FOR 4 FOR
represent average of the data with sub label i and prior label l j 2 New Parameters: For each subgroup, set a new parameter d lj i = D.Each All other sublabels with different prior label (θ ln k

3 FOR 4 FOR
represents average of the data with sub label i and prior label l j 2 New Parameters: For each subgroup, set a new parameter d lj i = D.Each All other sublabels with different prior label (θ ln k

Fig. S1 :
Fig. S1: Training data and simulation of the learning process: (a)The training vectors.Vectors in different color have different prior labels ('red' or 'blue').(b)Subgroups we got after applying Algorithm.(S1).In fig.(S1b,c), we use blue or red sphere to represent data with the same sublabel, where the center of sphere represents the average vector, and the radius represents number of vectors belong to this sublabel.(c) Result after repeating Algorithm.(S2,S3)a few times.(d)Prediction of new vectors.New vector in the blue part will be recognised with label 'blue', and label of new vectors in yellow part will be predicted as 'red'.

Fig. S4 :
Fig. S4: Simulation results when studying scattering experiment: (a.)Standard distribution of scattering angle for initial state |H (blue) and |V (red).We plot this figure based on the experimental results in ref.[49].(b) Distribution of scattering angle for initial state |H , only 1000 molecules are scattered.(c) Relationship between the matching rate and number of scattered particles.All HD molecules are prepared initially at state |H .

Fig. S5 :
Fig. S5: Simulation results for classification of phase transition in VO 2 : Dots in different colors are used as training data.Red ones are at metallic states, and blue ones are at insulating states.(a)Instead of choosing training data randomly, here we picked training vectors artificially.For metallic states, more data at the right side are used and only few at the left side.For insulating state, we avoid using data around P = 20GP a. (b,c)Training data are randomly picked in a larger scale (More vectors are very far away from the black curve).In (c), we use data a little closer to the black curve comparing to (a) and (b).(d)Training data are randomly picked in a larger scale, and now we also use data very close to the black curve, some points are just on the curve.