danubePrediction

Once a rules-set is defined you can start using the danubePrediction endpoint of the danube.ai cloud API. danubePrediction will send your dataset to danube.ai and returns the evaluation results including a new scoring for columns and rows and a list of derived percentage matches per row.

Request

Below, the danubePrediction request's structure is shown.

body: {
  "query": String,
  "variables": {
    "data": PredictionInputData
  }
}

The query parameter specifies which GraphQL endpoint to call (see example).

The PredictionInputData has the following structure:

type PredictionInputData: {
  "rulesId": String,
  "data": String,
  "searchData": String,
  "initialColumnScores": [ColumnScore],
  "strategy": String,
  "mixFactor": Float,
  "impact": Float
}
  • rulesId: The id of a rules-set.
  • data: A stringified Json-Array, holding all data elements (rows) as objects. You can see an example on how to encode your data as a json array here.
  • searchData: A stringified Json-Object with the same structure as a data element.
  • initialColumnScores: A list of initial Column Scores.
  • strategy: Defines the way data attributes are treated by the algorithm. The following options are available:
    • "exclusive": Rare data attributes tend to obtain more weight.
    • "fair": Overly rare or overly frequent data attributes lose weight.
    • "mixed": Mixes "exclusive" and "fair" strategy (see mixFactor).
    • "inverse": Inverse behavior of "fair" strategy.
  • mixFactor: The factor to mix exclusive and fair strategies (Float between 0 and 1; only for mixed strategy.):
    0 (= exclusive) ----------x---------- 1 (= fair)
  • impact: Determines how strongly the initial column values are changed. n=1 means one run of the algorithm with small changes to the initial values. Higher values of n mean iterative runs of the algorithm with stronger changes.

A ColumnScore has the following structure:

type ColumnScore: {
  "property": String,
  "score": Float
}
  • property: The name of a property.
  • score: The property's score.

Response

A response from the danubePrediction endpoint has the following structure:

{
  "data": {
    "danubePrediction": {
      "newColumnScores": [ColumnScore],
      "rowScores": [Float],
      "rowMatches": [[Float]]
    }
  }
}
  • newColumnScores: A list of new Column Scores.
  • rowScores: A list of row scores (same ordering as the data elements in the request), each determined by danube.ai. The row scores define the sorting (highest score = best row).
  • rowMatches: A list of row matches (same ordering as the data elements in the request), each being an array of percentage values, describing how well a property value matches the best data value in this column.