Data Engineering

Hello,

I have a question - I have differences calculated between game genres. The difference is a positive float number, the bigger the number, the greater the difference there is between the two genres.

I want to visualise differences and I have the following code:

import json
import networkx as nx
import matplotlib.pyplot as plt

with open('genres_weights.json', 'r') as file:
    data = json.load(file)

G = nx.Graph()
max_diff = max(item['difference'] for item in data) if data else 1.0

for item in data:
    node1, node2 = item['weightsPair']
    difference = item['difference']
    weight = item['difference'] + 0.25

    G.add_edge(node1, node2, weight=weight, original_diff=difference)

plt.figure(figsize=(40, 20))
pos = nx.kamada_kawai_layout(G, weight='weight')

nx.draw_networkx_nodes(G, pos, node_size=2000, node_color='#2b83ba', alpha=0.9)

nx.draw_networkx_labels(G, pos, font_size=7, font_family='sans-serif')

plt.show()

that gives the following result for my data:

A lot of things look great, and overall graph represents data correctly (I guess). But there is the thing - in the bottom left part of the graph there are two bubbles: "immersive sim" and "rhythm". Those two genres appear to be very similar (as some other pairs of games that are very similar and have a very low number for difference), but in reality, they are not - they have a difference of 9, which is a lot (the maximum difference between genres is around 14), so I expect them to be on the different side of the graph and not nearly together.

I'm not sure where the problem is. Can someone please help me?

2 replies

Are there best practices for time series database designs?

1y 6mon ago by programming.dev/u/jupyter in data_engineering@programming.dev
1032

Fun with Hy and Pandas

1y 8mon ago by slrpnk.net/u/houseofleft in data_engineering@programming.dev from benrutter.github.io
203

Efficiently Manage Memory Usage in Pandas with Large Datasets

1y 10mon ago by lemmy.ml/u/sem in data_engineering@programming.dev from geekpython.in
704

Shift Left

1y 10mon ago by lemmy.world/u/nydas in data_engineering@programming.dev from medium.com
-535

Dremio is offering free pdf copies of "Apache Iceberg: The Definitive Guide: Data Lakehouse Functionality, Performance and Scalability on the Data Lake"

1y 10mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from hello.dremio.com
1116

Postgres vs. Pinecone | Lantern Blog | Narek Galstyan | July 18, 2024

1y 10mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from lantern.dev
1007

Definite: Comparing Iceberg Query Engines (with Duckdb and Iceberg Full Notebook Example) | Steven Wang | 7/3/2024

1y 11mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from www.definite.app
708

A guide how to adopt an existing Spark scala library for Spark Connect

1y 11mon ago by lemmy.ml/u/sem in data_engineering@programming.dev from semyonsinchenko.github.io
509

Why Use Data Build Tools (dbt)

2y 10d ago by lemmy.world/u/nydas in data_engineering@programming.dev from medium.com
10110

7 best open-source chart libraries for developers

2y 1mon ago by lemmy.world/u/gecloslatitude in data_engineering@programming.dev from dev.to
-1011

Building a real-time data pipeline - Technical article and GitHub repo

2y 1mon ago by lemmy.world/u/nydas in data_engineering@programming.dev from medium.com
20012

Diagrams as Code

2y 1mon ago by lemmy.world/u/nydas in data_engineering@programming.dev from medium.com
17713

6 Best Embedded Databases for 2024

2y 1mon ago by lemmy.world/u/gecloslatitude in data_engineering@programming.dev from dev.to
-2214

Building Meta’s GenAI Infrastructure

2y 3mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from engineering.fb.com
5015

Building data abstractions with streaming at Yelp

2y 3mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from engineeringblog.yelp.com
7016

Building a Data Pipeline from Scratch

2y 3mon ago by lemmy.world/u/nydas in data_engineering@programming.dev from medium.com
6017

Data Newbie Looking for Advice

2y 4mon ago by lemmy.ca/u/Pyr_Pressure in data_engineering@programming.dev
17418

An implementation of Apache Spark physical execution from Apple

2y 4mon ago by lemmy.ml/u/sem in data_engineering@programming.dev from github.com
7019

Infrastructure-as-Code Demo of Terraform on Snowflake

2y 4mon ago by lemmy.world/u/nydas in data_engineering@programming.dev from github.com
4220

Spark vs Presto: A Comprehensive Comparison

2y 4mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from www.analyticsvidhya.com
6621

Offline listening and speaking bot

2y 4mon ago by lemmy.world/u/nydas in data_engineering@programming.dev from github.com
13222

Ceph: A Journey to 1 TiB/s - Ceph

2y 4mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from ceph.io
12023

I'd like to Volunteer to Moderate

2y 4mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev
9324

Data Organization in Spreadsheets

2y 4mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from www.tandfonline.com
5025

Database Fundamentals

2y 6mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from tontinton.com
11026

How Data is Stored for Analytics | A Primer

2y 6mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from github.com
8027

Data Engineering: A Formula 1 Inspired Guide for Beginners | A Glossary with Use Cases for First-Timers in Data Engineering

2y 6mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from freedium.cfd
10028

There is no Data Engineering roadmap

2y 7mon ago by programming.dev/u/ericjmorey in data_engineering@programming.dev from www.alasdairb.com
10029

3 Key Takeaways from Airflow Summit 2023

2y 8mon ago by lemmy.world/u/fritz_astro in data_engineering@programming.dev from www.astronomer.io
-4030

3 Key Takeaways from Airflow Summit 2023

2y 8mon ago by lemmy.world/u/fritz_astro in data_engineering@programming.dev from www.astronomer.io
-4031

Airflow Summit 2023 - Recordings Now Available

2y 8mon ago by lemmy.world/u/fritz_astro in data_engineering@programming.dev from www.youtube.com
-2032

Airflow Summit 2023 - Recordings Now Available

2y 8mon ago by lemmy.world/u/fritz_astro in data_engineering@programming.dev from www.youtube.com
-2033

Bloom filters: real-world applications

2y 9mon ago by programming.dev/u/Reader9 in data_engineering@programming.dev from llimllib.github.io
4434

Hollow (toolset for disseminating in-memory datasets)

2y 9mon ago by programming.dev/u/Reader9 in data_engineering@programming.dev from hollow.how
2135

(2017) Rise of the Data Engineer

2y 10mon ago by programming.dev/u/Reader9 in data_engineering@programming.dev from medium.com
8536

How is the job market for Data Engineers?

2y 11mon ago by programming.dev/u/Sl00k in data_engineering@programming.dev
7137

Discord Migrates Trillions of Messages from Cassandra to ScyllaDB

2y 11mon ago by programming.dev/u/ndotb in data_engineering@programming.dev from www.infoq.com
3038

The Rise of the Semantic Layer: Metrics On-The-Fly

2y 11mon ago by programming.dev/u/Golang in data_engineering@programming.dev from airbyte.com
3039