Skip to content

npm-health: Implement data collection and metrics computation pipeline(s) #2154

Description

@pombredanne

I want to collect specific data in ScanCode.io using GrimoireLab using a ScanCode.io pipeline.

The outcome should be new ScanCode.io pipeline(s) to download and/or clone code and collect metrics, by orchestrating the execution of AboutCode, GrimoireLab, and other open source tools. Collected data is stored for further metric computations.

The high level flow would be:

  1. Through a PurlDB API endpoint, the user requests metric scoring for a PURL
  2. If the the package has been analyzed already, the data should be retrieved from the PurlDB DB and returned
  3. Otherwise, PurlDB queues (or run) a scancode.io data collection/metric computation scoring pipeline
  4. The pipeline collects the source/binary/git repos for the PURL
  5. Then does it magic in GrimoireLab
  6. The GrimoireLab analysis is returned somehow (webhook? polling? direct code integration in SCIO?)
  7. PurlDB gets the data back, saves it in its DB and return the results

I suggest we implement a middle out strategy, starting with Grimoire, to ScanCode.io, to PurlDB:

Questions:

  • is the PurlDB API call returning immediately (and run in the background) or wait synchronously? (NB: We have similar pattern for the on-demand scancode scans in the API already)
  • what if the analysis is stale and 3 months old?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions